« May 2008 | Main | July 2008 »

June 30, 2008

The Thing About DOI

With Library of Congress sometime back (Feb. '08) announcing LCCN Permalinks and NLM also (Mar. '08) introducing simplified web links with its PubMed identifier one might be forgiven for wondering what is the essential difference between a DOI name and these (and other) seemingly like-minded identifiers from a purely web point of view. Both these identifiers can be accessed through very simple URL structures:


And the DOI itself can be resolved using an equally simple URL structure:So, why does DOI not just present itself as a simple database number which is accessed through a simple web link and have done with it, e.g. a page for the object named by the DOI "10.1000/1" is retrieved from the DOI proxy server at http://dx.doi.org/?

Essentially the typical DOI link presents an elementary web-based URL which performs a useful redirect service. What is different about this and, say a PURL, which offers a similar redirect service? What's the big deal?

(Continues below.)

Well, the thing about DOI is that it is built upon a directory service - the Handle System - and can be accessed either through native directory calls or more likely through standard web interfaces. From a web point of view we are usually interested in the latter. Differently from a simple lookup and/or redirect service which has a fixed entry point on the Web, the DOI can be serviced at any DOI service access point on the Internet. There are potentially multiple entry points which can be hosted by different organizations with separate IP addresses and/or DNS names.

For example, the DOI proxy (described here) is just one instance of such a service. Others could equally exist. And, in fact, they do. The following handle web services will also take the DOI and do the business:

With handle we have in essence a redirect to a redirect. Or in the case of a web service, a redirect (from HTTP to HDL) to a redirect (from HDL to HDL) to a redirect (from HDL to HTTP). That is, switch down from the web interface to the native handle layer, route the call from this local handle sever (via the global handle server) to the DOI handle server, fetch the URL stored with the DOI and switch back to the Web at that location.


But there's more. The standard URL redirect is just one example of a DOI service. But multiple services can also be provided for the DOI. Currently the DOI travels light and is bound to the minimum of useful data, essentially just the URL for a splash page in the case of many CrossRef DOIs. But it could also carry pointers to structured information or to relationships with other objects.

As yet, the DOI is a fledgling in terms of realizing its true potential as a seasoned actor that can play out many roles - assume many guises. A queen bee, in effect, with a hive of worker bees servicing it. It is not joined at the hip with any particular web service as might be commonly understood with the current simple redirect service. It offers much more.

It is, however, true that both for reasons of link persistency and in order to maintain link ranking with search crawlers that a preferred web entry point is via the DOI proxy. It just doesn't have to be that way - that's all. Hard linking is something we are beginning to unlearn and instead we are taking our first steps towards embracing service-mediated links such as OpenURL and DOI can both offer.

June 20, 2008

Handle System Workshop

charlemagne.jpg

I was invited to speak at the Handle System Workshop which was run back to back with an IDF Open Meeting earlier this week in Brussels and hosted at the Office for Official Publications of the European Union. (Location was in the Charlemagne Building, at left in image, within the rather impressive meeting room Jean Durieux, at right.)

My talk ('A Distributed Metadata Architecture') was focussed on how OpenHandle and XMP could be leveraged to manage dispersed media assets. (The OpenHandle work makes the Handle and DOI systems more readily acessible to applications.)

Other speakers were Norman Paskin (IDF), Gordon Dunsire (Centre for Digital Library Research, University of Strathclyde), Brian Green (Editeur), Jill Cousins (European Digital Library Foundation), Jan Brase (TIB, Germany), Larry Lannom (CNRI), Ed Pentz (CrossRef), Nigel Ward (Link Affiliates), and Dan Broeder (CLARIN/MPG).

The agendas for the two meetings are posted here (DOI) and here (Handle).

June 12, 2008

PubMed Central Links to Publisher Full Text

A CrossRef Member Briefing is available that explains how PubMed Central (PMC) links to publisher full text, how PMC uses DOIs and how PMC should be using DOIs. The briefing is entitled "Linking to Publisher Full Text from PubMed Central" (PDF 85k).

CrossRef considers it very important the PMC uses DOIs as the main means to link to the publisher version of record for an article and we are recommending that publishers try to convince PMC to use DOIs in an automated way. Almost all of the PMC articles contain DOIs but they aren't linked. This seems like a waste considering that publishers have invested a lot in CrossRef and DOIs as unique identifiers and persistent links.

This issue will be of interest to anyone who publishers journal articles that are the result of NIH funding and fall under the NIH Public Access Policy.

June 04, 2008

Robots: One Standard Fits All

Interesting post from Yahoo! Search's Director of Product Management, Priyank Garg, "One Standard Fits All: Robots Exclusion Protocol for Yahoo!, Google and Microsoft". Interesting also for what it doesn't talk about. No mention here of ACAP.