Warnings, Caveats and Weasel Words

Most of the experiments linked to here are running on R&D equipment in a non-production environment. They may disappear without warning and/or perform erratically. If one of them isn’t working for some reason, come back later and try again.

Resolving Citations (we don’t need no stinkin’ parser)

 3 minute read.

If you are reading this, you may be faced with the following problem- You have a collection of free-form citations which you have copied from a scholarly article and you want to import them into a bibliographic management tool (or other database). In short, you would like to turn something like this:

Carberry, J 2008, “Toward a Unified Theory of High-Energy Metaphysics: Silly String Theory.” Journal of Psychoceramics, vol. 5, no. 11, pp. 1-3.

Into something more like this:

@article{Carberry_2008, title={Toward a Unified Theory of High-Energy Metaphysics: Silly String Theory}, volume={5}, url={http://dx.doi.org/10.5555/12345678}, DOI={10.5555/12345678}, number={11}, journal={Journal of Psychoceramics}, publisher={Society of Psychoceramics}, author={Carberry, Josiah}, year={2008}, month={Aug}, pages={1-3}}

Or even this:

TY - JOUR
JO - Journal of Psychoceramics
AU - Josiah Carberry
SN - 0264-3561
TI - Toward a Unified Theory of High-Energy Metaphysics: Silly String Theory
SP - 1
EP - 3
VL - 5
PB - Society of Psychoceramics
PY - 2008

The traditional approach to this is often “We’ll start by trying to parse the citation into its component parts.” Indeed, there are a number of tools that try to do this:

Which is cool, but parsing citations is very difficult- particularly with obscure and/or terse citation styles.

But there is another way!

Instead of trying to parse the citation, just search for the record in a database that already has the citation parsed. The Crossref REST API is remarkably good for this. For example:

https://api.crossref.org/works?query.bibliographic=Carberry%2C+Josiah.+%E2%80%9CToward+a+Unified+Theory+of+High-Energy+Metaphysics%3A+Silly+String+Theory.%E2%80%9D+Journal+of+Psychoceramics+5.11+%282008%29%3A+1-3.#

Gives you the following result:

{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2017,10,26]],"date-time":"2017-10-26T06:16:09Z","timestamp":1508998569281},"reference-count":6,"publisher":"CrossRef Test Account","issue":"11","license":[{"URL":"http:\/\/psychoceramicsproprietrylicenseV1.com","start":{"date-parts":[[2011,11,21]],"date-time":"2011-11-21T00:00:00Z","timestamp":1321833600000},"delay-in-days":1195,"content-version":"tdm"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["CHE-1152342"]},{"DOI":"10.13039\/100006151","name":"Basic Energy Sciences","doi-asserted-by":"publisher","award":["DE-SC0001091"]}],"content-domain":{"domain":["psychoceramics.labs.crossref.org"],"crossmark-restriction":true},"short-container-title":["Journal of Psychoceramics"],"published-print":{"date-parts":[[2008,8,14]]},"DOI":"10.5555\/12345678","type":"journal-article","created":{"date-parts":[[2011,11,9]],"date-time":"2011-11-09T14:42:05Z","timestamp":1320849725000},"page":"1-3","update-policy":"http:\/\/dx.doi.org\/10.5555\/crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Toward a Unified Theory of High-Energy Metaphysics: Silly String Theory"],"prefix":"10.5555","volume":"5","clinical-trial-number":[{"clinical-trial-number":"isrctn12345","registry":"10.18810\/isrctn"}],"author":[{"ORCID":"http:\/\/orcid.org\/0000-0002-1825-0097","authenticated-orcid":true,"given":"Josiah","family":"Carberry","affiliation":[]}],"member":"7822","published-online":{"date-parts":[[2008,8,13]]},"container-title":["Journal of Psychoceramics"],"original-title":[],"deposited":{"date-parts":[[2016,1,20]],"date-time":"2016-01-20T15:44:56Z","timestamp":1453304696000},"score":1.0,"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,8,13]]},"references-count":6,"URL":"http:\/\/dx.doi.org\/10.5555\/12345678","relation":{"references":[{"id-type":"doi","id":"10.5284\/1000389","asserted-by":"object"}]},"ISSN":["0264-3561"],"issn-type":[{"value":"0264-3561","type":"electronic"}],"assertion":[{"value":"http:\/\/orcid.org\/0000-0002-1825-0097","URL":"http:\/\/orcid.org\/0000-0002-1825-0097","order":0,"name":"orcid","label":"ORCID","group":{"name":"identifiers","label":"Identifiers"}},{"value":"2012-07-24","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2012-08-29","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2012-09-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}

That’s already pretty cool. But if you extract the DOI from the above and use DOI content negotiation to query the the DOI like this:

$ curl -LH "Accept: application/x-bibtex" http://dx.doi.org/10.5555/12345678

You get the following result in BibTex:

@article{Carberry_2008, title={Toward a Unified Theory of High-Energy Metaphysics: Silly String Theory}, volume={5}, url={http://dx.doi.org/10.5555/12345678}, DOI={10.5555/12345678}, number={11}, journal={Journal of Psychoceramics}, publisher={Society of Psychoceramics}, author={Carberry, Josiah}, year={2008}, month={Aug}, pages={1-3}}

Yay!

There, that wasn’t too hard, was it?

OK, what is the catch?

Well… using Crossref REST API has a number of limitations that you should be aware of:

  • Crossref metadata contains more than just bibliographic metadata. You need to use query.bibliographic if you want to restrict your query to just bibliographic information. Otherwise you may get false positives.
  • The API will almost always match *something*. You need to look at the score in order to determine the likelihood that you’ve got a correct match.
  • It only works on content listed in Crossref’s database. Still, this is a lot of content.
  • The metadata in Crossref’s database can sometimes be… spotty*

But using the API also has a big benefit– You get fewer false negatives. If you have a typo or incomplete metadata, it will do a much better job than a strict citation parser or OpenURL Query.

In short, the Crossref REST API is very good at resolving citations. We encourage you to try it and let us know how it works for you.

Note that if you are having trouble getting hold of free-form citations to begin with, you may want to use the Cermine tool for extracting citations from PDFs.

(*unmitigated bilge)

Page owner: Geoffrey Bilder   |   Last updated 2017-November-29