DOIs and matching regular expressions

We regularly see developers using regular expressions to validate or scrape for DOIs. For modern Crossref DOIs the regular expression is short


For the 74.9M DOIs we have seen this matches 74.4M of them. If you need to use only one pattern then use this one.

Rehashing PIDs without stabbing myself in the eyeball

Anybody who knows me or reads this blog is probably aware that I don’t exactly hold back when discussing problems with the DOI system. But just occasionally I find myself actually defending the thing…About once a year somebody suggests that we could replace existing persistent citation identifiers (e.g. DOIs) with some new technology that would fix some of the weaknesses of the current systems. Usually said person is unhappy that current systems like

January 2015 DOI Outage: Followup Report


On January 20th, 2015 the main DOI HTTP proxy at experienced a partial, rolling global outage. The system was never completely down, but for at least part of the subsequent 48 hours, up to 50% of DOI resolution traffic was effectively broken. This was true for almost all DOI registration agencies, including Crossref, DataCite and mEDRA.

At the time we kept people updated on what we knew via Twitter, mailing lists and our technical blog at CrossTech. We also promised that, once we’d done a thorough investigation, we’d report back. Well, we haven’t finished investigating all implications of the outage. There are both substantial technical and governance issues to investigate. But last week we provided a preliminary report to the Crossref board on the basic technical issues, and we thought we’d share that publicly now.

Crossref’s DOI Event Tracker Pilot


Crossref’s “DOI Event Tracker Pilot”- 11 million+ DOIs & 64 million+ events. You can play with it at:

Tracking DOI Events

So have you been wondering what we’ve been doing since we posted about the experiments we were conducting using PLOS’s open source ALM code? A lot, it turns out. About a week after our post, we were contacted by a group of our members from OASPA who expressed an interest in working with the system. Apparently they were all about to conduct similar experiments using the ALM code, and they thought that it might be more efficient and interesting if they did so together using our installation. Yippee. Publishers working together. That’s what we’re all about.

Problems with on January 20th 2015- what we know.

Hell’s teeth.

So today (January 20th, 2015) the DOI HTTP resolver at started to fail intermittently around the world. The domain is managed by CNRI on behalf of the International DOI Foundation. This means that the problem affected all DOI registration agencies including Crossref, DataCite, mEDRA etc. This also means that more popularly known end-user services like FigShare and Zenodo were affected. The problem has been fixed, but the fix will take some time to propagate throughout the DNS system. You can monitor the progress here:

Now for the embarrassing stuff…

♫ Researchers just wanna have funds ♫

photo credit Summary You can use a new Crossref API to query all sorts of interesting things about who funded the research behind the content Crossref members publish. Background Back in May 2013 we launched Crossref’s FundRef service. It can be summarized like this: Crossref keeps and manages a canonical list of Funder Names (ephemeral) and associated identifiers (persistent). We encourage our members (or anybody, really- the list is available under A CC-Zero license waiver) to use this list for collecting information on who funded the research behind the content that our members publish.

DOIs unambiguously and persistently identify published, trustworthy, citable online scholarly literature. Right?

The South Park movie , “Bigger, Longer & Uncut” has a DOI: a) So does the pornographic movie, “Young Sex Crazed Nurses”: b) And the following DOI points to a fake article on a “Google-Based Alien Detector”: c) And the following DOI refers to an infamous fake article on literary theory: d) This scholarly article discusses the entirely fictitious Australian “Drop Bear”: e)

DataCite supporting content negotiation

In April In April for its DOIs. At the time I cheekily called-out DataCite to start supporting content negotiation as well. Edward Zukowski (DataCite’s resident propellor-head) took up the challenge with gusto and, as of September 22nd DataCite has also been supporting content negotiation for its DOIs. This means that one million more DOIs are now linked-data friendly. Congratulations to Ed and the rest of the team at DataCite. We hope this is a trend.

Content Negotiation for Crossref DOIs

So does anybody remember the posting DOIs and Linked Data: Some Concrete Proposals? Well, we went with option “D.” From now on, DOIs, expressed as HTTP URIs, can be used with content-negotiation. Let’s get straight to the point. If you have curl installed, you can start playing with content-negotiation and Crossref DOIs right away: curl -D - -L -H “Accept: application/rdf+xml” “”  curl -D - -L -H “Accept: text/turtle” “http://dx.



admin – 2010 August 03

In IdentifiersPdfXmpInchi

Just a quick heads-up to say that we’ve had a go at incorporating InChIs and ontology terms into our PDFs with XMP. There isn’t a lot of room in an XMP packet so we’ve had to be a bit particular about what we include. InChIs: the bigger the molecule the longer the InChI, so we’ve standardized on the fixed-length InChIKey. This doesn’t mean anything on its own, so we’ve gone the Semantic Web route of including an InChI resolver HTTP URI.
RSS Feed