Crossref Labs is happy to announce the first public release of “pdf-extract” an open source set of tools and libraries for extracting citation references (and, eventually, other semantic metadata) from PDFs. We first demonstrated this tool to Crossref members at our annual meeting last year. See the pdf-extract labs page for a detailed introduction to this new set of tools.
If you are unable to download and install the tool, you can play with a experimental web interface called “Extracto.” Be warned, Extracto is running on very feeble server using an erratic and slow internet connection. The only guarantee that we can make about using it is that it will repeatedly fall over and annoy you. The weasel has spoken.
PHD Comics has posted its Valentine’s Day Reading list. Without DOIs! So in order to preserve the scholarly citation record, we’ve resolved those that have DOIs…. Title: The St. Valentine’s Day Frontal Passage Citation: Sassen, K, 1980, ‘The St. Valentine’s Day Frontal Passage’, Bulletin of the American Meteorological Society, vol. 61, no. 2, p. 122. Crossref DOI: http://dx.doi.org/10.1175/1520-0477(1980)0612.0.CO;2 Title: SUICIDE AND HOMICIDE ON ST. VALENTINE’S DAY Citation: LESTER, D, 1990, ‘SUICIDE AND HOMICIDE ON ST.
Today two new content types were added to dx.doi.org resolution for Crossref DOIs. These allow anyone to retrieve DOI bibliographic metadata as formatted bibliographic entries. To perform the formatting we’re using the citation style language processor, citeproc-js which supports a shed load of citation styles and locales.
In fact, all the styles and locales found in the CSL repositories, including many common styles such as bibtex, apa, ieee, harvard, vancouver and chicago are supported.
We’ve been asked a few times if it is possible to determine whether or not a particular domain name belongs to a Crossref member. To address this we’re launching another small service that performs something like a “reverse look-up” of URLs and domain names to DOIs and Crossref member status.
The service provides an API that will attempt to reverse look-up a URL to a DOI and return the membership status (member or non-member) of the root domain of the URL.
In April In April for its DOIs. At the time I cheekily called-out DataCite to start supporting content negotiation as well.
Edward Zukowski (DataCite’s resident propellor-head) took up the challenge with gusto and, as of September 22nd DataCite has also been supporting content negotiation for its DOIs. This means that one million more DOIs are now linked-data friendly. Congratulations to Ed and the rest of the team at DataCite.
We hope this is a trend.
Today I’m announcing a small web API that wraps a family name database here at Crossref R&D. The database, built from Crossref’s metadata, lists all unique family names that appear as contributors to articles, books, datasets and so on that are known to Crossref. As such the database likely accounts for the majority of family names represented in the scholarly record.
The web API comes with two services: a family name detector that will pick out potential family names from chunks of text and a family name autocompletion system.
So does anybody remember the posting DOIs and Linked Data: Some Concrete Proposals?
Well, we went with option “D.”
From now on, DOIs, expressed as HTTP URIs, can be used with content-negotiation.
Let’s get straight to the point. If you have curl installed, you can start playing with content-negotiation and Crossref DOIs right away:
curl -D - -L -H “Accept: application/rdf+xml” “http://dx.doi.org/10.1126/science.1157784”
curl -D - -L -H “Accept: text/turtle” “http://dx.
Announcements regarding Crossref system status or changes are posted in an Announcements forum on our support portal (http://support.crossref.org). We recommend that someone from your organization monitor this forum to stay informed about Crossref system status, schema changes, or other issues affecting deposits and queries. Subscribe to this forum via RSS feed (http://support.crossref.org/forums/147622-announcements/posts.rss) or select the ‘Subscribe’ option in the forum to subscribe by email.
The TWG Discussion forum replaces the TWG mailing list and can be accessed by members of the Crossref community who log in to our support portal.
While working on an internal project, we developed “pdfstamp“, a command-line tool that allows one to easily apply linked images to PDFs. We thought some in our community might find it useful and have released it on github. Some more PDF-related tools will follow soon.
Just a quick heads-up to say that we’ve had a go at incorporating InChIs and ontology terms into our PDFs with XMP. There isn’t a lot of room in an XMP packet so we’ve had to be a bit particular about what we include.
InChIs: the bigger the molecule the longer the InChI, so we’ve standardized on the fixed-length InChIKey. This doesn’t mean anything on its own, so we’ve gone the Semantic Web route of including an InChI resolver HTTP URI.