February 09, 2010

DOI: What Do We Got?

doi-what-do-we-got.png
(Click image for full size graphic.)

Following the JISC seminar last week on persistent identifiers (#jiscpid on Twitter) there was some discussion about DOI and its role within a Linked Data context. John Erickson has responded with a very thoughtful post DOIs, URIs and Cool Resolution, which ably summarizes how the current problem with DOI in that the way the DOI is is implemented by the handle HTTP proxy may not have kept pace with actual HTTP developments. (For example, John notes that the proxy is not capable of dealing with 'Accept' headers.) He has proposed a solution, and the post has attracted several comments.

I just wanted to offer here the above diagram in an attempt to corral some of the various facets relating to DOI that I am aware of. I realize that this may seem like an open invitation to flame on - and this is a very preliminary draft - but ... be kind!

So, this may be totally off the wall but it represents my best understanding of DOI as used by CrossRef.

I have distinguished three main contexts:

  1. Generic Data - A generalized information context where the an object is identified with a DOI, an identifier system that is currently being ratified through the ISO process. This is the raw DOI number. (This definitely is not a first class object on the Web as it has no URI.)
  2. Web Data - An online information context (here I use the term 'Web' in its widest sense) where resources are identified by URI (not necessarily an HTTP URI). Here DOI is represented under two URI schemes: 'doi:' (unregistered but preferred by CrossRef), and 'info:' (registered and available for general URI use). Also it has a presence on the Web via an HTTP proxy (dx.doi.org) URL where it is used as a slug to create a permalink (as listed at 'A'). A simple HTTP redirect is used (with status code 302) to turn this permalink into the publisher response page http://example/1. (Note that typically a second redirect will occur on the publisher platform, here shown by the redirect to http://example/2.)
  3. Linked Data - An online information context where resources are identified by HTTP URI and conform to Linked Data principles. Now this is where there is a tension arises between the common publisher perspective and the strict semantic viewpoint. Implicit in the general Web context given above was the notion that the permalink ('A') was somehow related to the abstract object and the redirection service applied to it associated the abstract resource with concrete representations of the object.
So how do we relate the DOI HTTP URI with the abstract ('work') identifier listed at 'D' in the diagram?

Well the Architecture of the World Wide Web recognizes two distinct classes of resources: Information Resources (IR) and Non-Information Resources (NR). (Note: Only the term 'information resource' is used in AWWW.) IR are those that can be directly retrieved using HTTP, whereas NR are not directly retrievable but have an associated description which is retrievable and is itself a proxy for the real world object.

So either the HTTP URI denotes an IR (as listed at 'B') and is resolved (through HTTP status code '302 Found') to a default representation, which is the view that the Linked Data community would currently have of DOI. But this is at odds with what the CrossRef position which regards DOI as identifying the abstract work. Alternately to fit better the CrossRef model of DOI the HTTP URI would denote an NR (as listed at 'A') which would be resolved (through HTTP status code '303 See Other') to an associated description - a publisher response page.

There will be those self-appointed URI czars who will bemoan the fact of there being multiple URIs. But frankly there is nothing inherently wrong with that. Just as in the real world there are many languages so in the online world there are multiple contexts and histories. We can attempt to make some sense of this by making use of the well-known semantic properties owl:sameAs and ore:similarTo and declare (as also shown in the diagram) the following assertions:


info:doi/D owl:sameAs doi:D .

http://dx.doi.org/D ore:similarTo info:doi/D .

http://dx.doi.org/D ore:similarTo doi:D .


Note that ore:similarTo (stemming from the OAI-ORE work) is a weaker kind of relationship than owl:sameAs (which comes from OWL) and may be appropriate in this usage.

In sum, scenario 'A' is what we have currently implemented, scenario 'B' is what might be commonly perceived as being implemented, and scenario 'C' may be a more correct semantic position.

Your comments (and not unkind comments, please;) are more than welcome.

December 13, 2009

A Christmas Reading List... with DOIs

Was outraged (outraged, I tell you) that one of my favorite online comics, PhD, didn't include DOIs in their recent bibliography of Christmas-related citations.. So I've compiled them below.

We care about these things so that you don't have to. Bet you will sleep better at night knowing this.

Or perhaps not...

A Christmas Reading List... with DOIs.

Citation:  Biggs, R, Douglas, A, Macfarlane, R, Dacie, J, Pitney, W, Merskey, C & O'Brien, J, 1952, 'Christmas Disease', BMJ, vol. 2, no. 4799, pp. 1378-1382.
CrossRef DOI:  http://dx.doi.org/10.1136/bmj.2.4799.1378

Title:  More Than a Labor of Love: Gender Roles and Christmas Gift Shopping
Citation:  Fischer, E & Arnold, S, 1990, 'More Than a Labor of Love: Gender Roles and Christmas Gift Shopping', Journal of Consumer Research, vol. 17, no. 3, p. 333.
CrossRef DOI:  http://dx.doi.org/10.1086/208561

Title:  Looking at Christmas trees in the nucleolus
Citation:  Scheer, U, Xia, B, Merkert, H & Weisenberger, D, 1997, 'Looking at Christmas trees in the nucleolus', Chromosoma, vol. 105, no. 7-8, pp. 470-480.
CrossRef DOI:  http://dx.doi.org/10.1007/s004120050209

Title:  The Vela glitch of Christmas 1988
Citation:  McCulloch, P, Hamilton, P, McConnell, D & King, E, 1990, 'The Vela glitch of Christmas 1988', Nature, vol. 346, no. 6287, pp. 822-824.
CrossRef DOI:  http://dx.doi.org/10.1038/346822a0

Title:  Cardiac Mortality Is Higher Around Christmas and New Year's Than at Any Other Time: The Holidays as a Risk Factor for Death
Citation:  Phillips, D, 2004, 'Cardiac Mortality Is Higher Around Christmas and New Year's Than at Any Other Time: The Holidays as a Risk Factor for Death', Circulation, vol. 110, no. 25, pp. 3781-3788.
CrossRef DOI:  http://dx.doi.org/10.1161/01.CIR.0000151424.02045.F7

Title:  Red Crabs in Rain Forest, Christmas Island: Biotic Resistance to Invasion by an Exotic Snail
Citation:  Lake, P & O'Dowd, D, 1991, 'Red Crabs in Rain Forest, Christmas Island: Biotic Resistance to Invasion by an Exotic Snail', Oikos, vol. 62, no. 1, p. 25.
CrossRef DOI:  http://dx.doi.org/10.2307/3545442

Title:  The Carvedilol Hibernation Reversible Ischaemia Trial, Marker of Success (CHRISTMAS) study Methodology of a randomised, placebo controlled, multicentre study of carvedilol in hibernation and heart failure
Citation:  Pennell, D, 2000, 'The Carvedilol Hibernation Reversible Ischaemia Trial, Marker of Success (CHRISTMAS) study Methodology of a randomised, placebo controlled, multicentre study of carvedilol in hibernation and heart failure', International Journal of Cardiology, vol. 72, no. 3, pp. 265-274.
CrossRef DOI:  http://dx.doi.org/10.1016/S0167-5273(99)00198-9

December 09, 2009

Add CrossRef metadata to PDFs using XMP

In order to encourage publishers and other content producers to embed metadata into their PDFs, we have released an experimental tool called "pdfmark", This open source tool allows you to add XMP metadata to a PDF. What's really cool, is that if you give the tool a CrossRef DOI, it will lookup the metadata in CrossRef and then apply said metadata to the PDF. More detail can be found on the pdfmark page on the CrossRef Labs site. The usual weasels words and excuses about "experiments" apply.

December 08, 2009

QR Codes and DOIs

Inspired by Google's recent promotion of QR Codes, I thought it might be fun to experiment with encoding a CrossRef DOI and a bit of metadata into one of the critters. I've put a short write-up of the experiment on the CrossRef Labs site, which includes a demonstration of how you can generate a QR Code for any given CrossRef DOI.

Put them on postcards and send them to your friends for the holidays. Tattoo them on your pets. The possibilities are endless.

November 24, 2009

got SEARCH if you want it!

[See this link if you're short on time: facets search client. Only tested on Firefox at this point. Caveat: At time of writing the CrossRef Metadata Search was being very slow but was still functional. Previously it was just slow.]

Following on from Geoff's announcement last month of a prototype CrossRef Metadata OpenSearch on labs.crossref.org, I wanted to show what typical OpenSearch responses might look like in a more mature implementation.

I have taken the liberty of modelling these on the response formats that we are already providing in our nature.com OpenSearch service which in turn are based on the draft syndication formats that I blogged here earlier.

I am therefore returning ATOM, JSON, JSONP and RSS responses from these four OpenSearch URL templates:

  • http://nurture.nature.com/cgi-bin/opensearch?db=crossref&out=atom&q={searchTerms}
  • http://nurture.nature.com/cgi-bin/opensearch?db=crossref&out=json&q={searchTerms}
  • http://nurture.nature.com/cgi-bin/opensearch?db=crossref&out=jsonp&q={searchTerms}
  • http://nurture.nature.com/cgi-bin/opensearch?db=crossref&out=rss&q={searchTerms}
as this OpenSearch description file details. Note that the URL templates include no indexing or pagination parameters as the CrossRef prototype does not currently support these features.

An example query ('apple') returning an ATOM feed from a CrossRef Metadata OpenSearch would be the following:

And the same query returning a JSON version of that ATOM feed would look as follows:By the way, this is just for demonstration purposes and there are still issues to be resolved including character encoding.

This interface uses the existing CrossRef OpenSearch response format and parses the COinS objects embedded in that response to provide a more standard OpenSearch syndication result set format. The prototype implemenatation also has some bugs which I needed to work around. (I will forward on details of these.) And there is also a more fundamental issue of response time from the experimental search server.

But still this should give some idea of what a CrossRef Metadata OpenSearch service could look like.

To show this all in action I've worked up one of my demo OpenSearch clients for nature.com OpenSearch which displays a facetted search response for a CrossRef search. For good measure this includes also an OpenSearch interface for PubMed and the search client allows for simple selection between three journals databases: nature.com, CrossRef and PubMed.

Of course, with a reasonably uniform set of search result formats such as presented here it then becomes a simple exercise to reuse these search responses in additional search clients.

As can be anticipated it would be very straightforward to carry this over into a single metasearch service which could run across these multiple databases.

Recently Commented On

Powered by
Movable Type 3.2