So here I am, apologizing again. Have I mentioned that I hate computers?
We had a large data center outage. It lasted 17 hours. It meant that pretty much all Crossref services were unavailable - our main website, our content registration system, our reports, our APIs. 17 hours was a long time for us - but it was also an inconvenient time for numerous members, service providers, integrators, and users. We apologise for this.
In my blog post on October 6th, I promised an update on what caused the outage and what we are doing to avoid it happening again. This is that update.
Crossref hosts its services in a hybrid environment. Our original services are all hosted in a data center in Massachusetts, but we host new services with a cloud provider. We also have a few R&D systems hosted with Hetzner.
We know an organization our size has no business running its own data center, and we have been slowly moving services out of the data center and into the cloud.
On October 6 at ~14:00 UTC, our data centre outside of Boston, MA went down. This affected most of our network services- even ones not hosted in the data centre. The problem was that both of our primary and backup network connections went down at the same time. We’re not sure why yet. We are consulting with our network provider. It took us 2 hours to get our systems back online.