Since last month’s threads (here, here, here and here) talking about the issues involved in making the DOI a first-class identifier for linked data applications, I’ve had the chance to actually sit down with some of the thread’s participants (Tony Hammond, Leigh Dodds, Norman Paskin) and we’ve been able sketch-out some possible scenarios for migrating the DOI into a linked data world.
I think that several of us were struck by how little actually needs to be done in order to fully address virtually all of the concerns that the linked data community has expressed about DOIs. Not only that- but in some of these scenarios we would put ourselves in a position to be able to semantically-enable over 40 million DOIs with what amounts to the flick of a switch.
Given the huge interest in linked data on the part of researchers and CrossRef members- it seems like it would be a fantastic boon to both the IDF (International DOI Foundation) and CrossRef if we were able to do something quickly here.
Anyway- The following are notes outlining several concrete proposals for addressing the limitations of DOIs as identifiers in linked data applications. They range in complexity/effort involved- with the simplest scenario providing minimal (yet functional) LD capabilities for just one RA’s members (CrossRef’s) and the most complex providing per-RA and per-RA-member configurability on how DOIs would behave for LD applications.
We’d appreciate comments, questions, suggestions, corrections, etc.
A: Simplest Scenario
What would need to be done?
- CrossRef implements a linked data service. For example, hosted at rdf.crossref.org.
- CrossRef recommends that any member publisher who wants to add rudimentary linked data capabilities to their site could simply insert some simple link elements into their landing Pages. So, for instance, for the article with the DOI 10.5555/1234567 in the Journal of Psychoceramics, the publisher would put the following in the landing page for the article:
In the above snippet the HTML version of the document is the publisher’s existing landing page.
How it would work
- A sem-web-enabled browser would query dx.doi.org/10.5555/1234567 and get a normal 302 redirect to the publisher’s landing page.
- The sem-web-enabled browser would sniff the page for the link elements and retrieve the representations it wanted from rdf.crossref.org
- The returned document would contain an appropriate representation of the metadata that the publisher has deposited with CrossRef. It would also assert that:
Alternatively, the publisher could implement their own linked data support on their own domain using whatever appropriate method they want. So, for instance, a larger publisher could support content negotiation at their site and return different/enhanced metadata, etc.
Pros
- Doesn’t require changes at DOI/Handle levels
- Is easy for publisher to opt-in or opt-out
- Requires minimal development on the part of CrossRef.
Cons
- Only applies to CrossRef DOIs.
- It depends on publishers taking action. Might be a long time before publishers add the needed links to their landing pages or support content negotiation.
- DOI system is still not strictly LD compliant (e.g. it is returning 302 redirects. Naive sem-web browsers might ‘stop’ after getting a 302. Should ideally use 303s, content negotiation, etc.)
- Doesn’t work for DOIs that currently bypass landing pages and which go directly to content.
B: Simple + IDF Global Semantic Compliance
What would need to be done?
- Same as “Simplest Scenario”
- IDF globally changes dx.doi.org to return 303 redirect
How would it work?
Pros
- All DOIs conform to expectations for LD identifiers
- Easy for publisher to opt-in or opt-out
- Requires minimal development on part of CrossRef
- Requires minimal work (?) on part of IDF
Cons
- Requires global change on part of IDF. Global change might conflict with requirements of other RAs.
- It depends on publishers taking action. Might be a long time before publishers add needed links to their landing pages or support content negotiation.
- Doesn’t work for DOIs that currently bypass landing pages (e.g. OECD spreadhseets, UICR datasets, etc.)
C: Simple + IDF Global Semantic Compliance + RA CN Intercept
What would need to be done?
- Same as “B: Simple + IDF Global Semantic Compliance” Scenario
- IDF changes dx.doi.org to redirect content-negotiated dx.doi.org queries to RA-controlled resolver depending on the preferences of the RA.
- RA implements DOI resolver (e.g. dx.crossref.org) that supports content negotiation. RA allows its members to specify to the RA that they want either:
- RA to forward all requests to the member’s site.
- RA to “intercept” content-negotiations for non-HTML representations and direct them appropriately (e.g. return appropriate representation from rdf.crossref.org)
How would it work?
Pros
- All DOIs conform to expectations for LD identifiers
- Allows RA to potentially LD-enable its members very quickly.
- Easy for ra-members to opt-in or opt-out
- Requires minimal development on part of CrossRef
- Would even work for DOIs that bypass landing pages
Cons
- Requires global change on part of IDF. Global change might conflict with requirements of other RAs.
- Requires change to add decision logic implementation on part of IDF.
- Requires development of RA resolvers that implement per-member resolution logic (note- this would probably actually be done at DOI level)
D: Simple + IDF Selective Semantic Compliance + RA CN Intercept
What would need to be done?
- Same as Simplest Scenario
- IDF changes dx.doi.org to return either 302 or 303 redirect depending on the preferences of the RA.
- IDF changes dx.doi.org to redirect content-negotiated dx.doi.org queries to RA-controlled resolver depending on the preferences of the RA.
- RA implements DOI resolver (e.g. dx.crossref.org) that supports content negotiation. RA allows its members to specify to the RA that they want either:
- RA to forward all requests to the member’s site.
- RA to “intercept” content-negotiations for non-HTML representations and direct them appropriately (e.g. return appropriate representation from rdf.crossref.org)
How would it work?
Pros
- Allows RA to potentially LD-enable its members very quickly.
- Easy for ra-members to opt-in or opt-out
- Requires minimal development on part of CrossRef
- Would even work for DOIs that bypass landing pages
Cons
- Only some DOIs conform to expectations for LD identifiers
- Requires change to add decision logic implementation on part of IDF.
- Requires development of RA resolvers that implement per-member resolution logic (note- this would probably actually be done at DOI level)

Thanks very much Geoffrey (and collaborators!) for putting this together! I think this does a better job considering the higher-level concerns than e.g. my post DOIs, URIs and Cool Resolution did a month ago.
I’d like to point out that what has not been made explicitly clear is how the content-type to Handle TYPE mapping (“decision logic” or “resolution logic” in the above) happens. I think there is an opportunity, perhaps in the implementation of the proposed rdf.crossref.org or a generic rdf.doi.org, for a standard TYPE mapping. RA members could, in the Handle record, associate type-specific URIs with a standard set of Handle TYPEs and have those automagically resolved.
How DOI conneg is implemented will probably be at the RA level, if not at the sub-RA (member) level, so having a way for members to choose their content-negotiation-savvy resolvers is a good idea, but there should be a low-threshold-to-entry option.
Finally, I respectfully submit that “Scenario 1″ might only be “simple” for CrossRef.org and the IDF, since embedding the suggested RDFa will only scale for publishers who can generate and embed that code systematically; this is implied by your comment, “requires publishers to take action…”
Regarding scenario A, why not end the URIs with the DOI and put the format before the DOI? So, instead of:
http://rdf.crossref.org/metadata/10.5555/1234567.json
use
http://rdf.crossref.org/metadata/json/10.5555/1234567
I have seen rather weird DOIs, most certainly uncluding dots. The latter would slight easier to ‘read’.
Hi John:
The proposal here on the table considers the simplest changes that can be made to the current web interface to provide a Linked Data experience for web users of DOI. This is becoming an ever more pressing issue for content publishers. There is a wave of interest. When heads of state start talking about Linked Data then we are talking about the mainstream.
While it is certainly true that the handle backend architecture (e.g. TYPE’s) could be leveraged to provide a more integrated systems solution it does not seem likely that this will result in any quick wins. We have already had more than a decade to explore, define and implement something handle wise and still nothing. It is all very disappointing.
So, blue skies is over.
As Geoff notes in his title these are “concrete proposals” that can be effected with a minimum of system changes. The only change requested at the handle system level is for the proxy server to return a HTTP status code 303 instead of a 302 which accords better with current HTTP practice. The original proxy was created before HTTP 1.1 (RFC 2616) was defined back in June 1999, before the Architecture of the World Wide Web was published, before TAG issue httpRange-14 was resolved, before the Linked Data vision was articulated. In short we have been operating with a classic HTTP implementation and it may now be time to make that minimal upgrade step so that a more coherent story can be implemented for DOI used over an HTTP interface.
Let’s take these first steps. Any handle rearchitecturing can follow in its own time.
Tony
Great to see that CrossRef and IDF, with the help of CNRI, have implemented Linked Data friendly DOIs! See: CrossRef and International DOI Foundation Collaborate on Linked-Data-Friendly DOIs. (20 April 2011)
See also: Content Negotiation for CrossRef DOIs