« Monitoring CrossRef Technical Developments | Main | Family Names Service »

Content Negotiation for CrossRef DOIs

So does anybody remember the posting DOIs and Linked Data: Some Concrete Proposals?

Well, we went with option "D."

From now on, DOIs, expressed as HTTP URIs, can be used with content-negotiation.

Let's get straight to the point. If you have curl installed, you can start playing with content-negotiation and CrossRef DOIs right away:

curl -D - -L -H   "Accept: application/rdf+xml" "http://dx.doi.org/10.1126/science.1157784" 

curl -D - -L -H   "Accept: text/turtle" "http://dx.doi.org/10.1126/science.1157784"

curl -D - -L -H   "Accept: application/atom+xml" "http://dx.doi.org/10.1126/science.1157784"

Or if you are already using CrossRef's "unixref" format:

curl -D - -L -H "Accept: application/unixref+xml" "http://dx.doi.org/10.1126/science.1157784" 

This will work with over 46 million CrossRef DOIs as of today, but the beauty of the setup is that from now on, any DOI registration agency can enable content negotiation for their constituencies as well. DataCite- we're looking at you ;-) .

It also means that, as registration agency members (CrossRef publishers, for instance) start providing more complete and richer representations of their content, we can simply redirect content-negotiated requests directly to them.

We expect that that this development will round-out CrossRef's efforts to support standard APIs including OpenURL and OAI_PMH and we look forward to seeing DOIs increasingly used in linked data applications.

Finally, CrossRef would just like to thank the IDF and CNRI for their hard work on this as well as Tony Hammond and Leigh Dodds for their valuable advice and persistent goading.







Comments

In a word: awesome!

Is the unixref option suitable for production use. I also notice it doesn't have a "pid", unlike the current version.

How about a JSONP variant that can be called from the browser?

This is potentially great.

If DataCite DID participate in this system... would the responses to DataCite-registered DOIs from content-negotiated requests to dx.doi.org return data in the same formats using the same vocabularies?

Your CrossRef examples return atom+xml using PRISM and DC (used in particular ways -- you have to know that dc:isPartOf is going to be the ISSN). The rdf+xml example seems to use DC and OWL, again used in particular ways. And the turtle example, despite being RDF, uses yet differnet vocabularies, it looks like DCterms and PRISM again, again used in particular ways.

So if I'm writing software to use this, and I want my software to work regardless of whether the DOI came from CrossRef or DataCite.... my software not only needs to know it's getting atom+xml or rdf+xml or turtle back, it needs to be written to know what vocabularies are going to be used inside these wrapper formats, and in particular the semantic choices of how the vocabularies are used (like using dc:isPartOf for an ISSN, again a convention not really obvious from dc:isPartOf on it's own).

Is there or will there be DOI documentation on vocabulary choices that can be consulted by those writing clients, and that those providing metadata (whether CrossRef or DataCite or someone else) will adhere to? Otherwise.... this is only marginally better than the previous status quo of every registration agency providing their own metadata lookup (or not) using their own custom semantics.

Thanks to Geoffrey and colleagues for this very positive development. I'm pleased to say that DataCite will be working towards the same content-negotiation solution as CrossRef.

Great news! Have you guys considered using the Vary: Accept response header so this plays nice with downstream caches?

One other thing I was wondering about is whether the URIs for people that you include in the RDF (which is really exciting) will eventually resolve. For example: http://id.crossref.org/contributor/a-h-renear-1z0zrfd0bp2b7 from your example above...

First- apologies to all above who sent in comments. Apparently, our MT comment system went berserk and flagged everything as spam. Was wondering why everything was so quiet...

So give me a bit of time and I will try to address the various questions that have come up.

@Ed Summers:

Responses now contain a Vary header.

@Fergus Gallagher:

We will return JSON conversions of rdf+xml for the content type "application/rdf+json". Not what you're asking for, but I thought I'd mention it anyway.

@Ed Summers

Plan is to link authors once we have ORCIDs to link to.

@Jonathan Rochkind The differences that you detect in the atom and rdf are simply attempts to target representations to what particular communities tend to expect. In short, most consumers of atom would probably have a meltdown if they encountered bibo- hence use of more common dc/prism.

As for establishing common practice in use of bibo, it seems that this will emerge from usage, open discussion, etc. Our experience is that trying to do this a-priori rarely works. Having said that we have obviously been coordinating with Tony Hammond and NPG on this as well.

In short, we've deliberately chosen the "publish first, then tidy" approach. We expect we will tweak things, but hope we've got the basics right enough so that any future adjustments won't be too painful to early users of the data. One of the advantages of RDF is that allows for this kind of flexibility. Clearly, as a consumer of it, one has to engineer systems to be tolerant as well.

Thanks for the follow up comments Geoffrey. One more thing that came up in a discussion I had with Alf Eaton today was whether it might be possible to make the redirect URL available in the RDF.

For example:

<http://dx.doi.org/10.1126/science.1157784> foaf:page <http://www.sciencemag.org/cgi/doi/10.1126/science.1157784> .

Just a thought...

@Ed Summers

I realize that we are being inconsistent here in that we return the URL in the unixref, but not in the RDF, but the truth is- I'd rather we not return it in the unixref either.

The issue is this- no matter how many times we tell people about the dangers of caching the URLs- they tend to do it anyway. Then when the link breaks (because the URL has changed and they haven't re-polled it), people interpret it as "the DOI breaking."

And Then they make fun of CrossRef.

And then we cry.

Why would anybody want to do that to us?

But seriously- if you have a good case for including it again, we're willing to reconsider. What are you wanting to do?

@gbilder I certainly don't want to make you cry :-) It just seemed like a key piece of information that isn't being included. I'm not really trying to do anything with the service at the moment.

Oh, and thanks for fixing the Accept header parsing! Several rdf tools now seem to be able to conneg fine now.

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)