Main

October 10, 2011

DataCite supporting content negotiation

In April CrossRef launched content negotiation support for its DOIs. At the time I cheekily called-out DataCite to start supporting content negotiation as well.

Edward Zukowski (DataCite's resident propellor-head) took up the challenge with gusto and, as of September 22nd DataCite has also been supporting content negotiation for its DOIs. This means that one million more DOIs are now linked-data friendly. Congratulations to Ed and the rest of the team at DataCite.

We hope this is a trend. Back in June Knowledge Exchange organized a seminar on Persistent Object Identifiers. One of the outcomes of the meeting was "Den Haag Manifesto" a document outlining five relatively simple steps that different persistent identifier systems could take in order to increase interoperability. Most of these steps involved adopting linked data principles including support for content negotiation. We look forward to hearing about other persistent identifiers adopting these principles over the next year.

Having said that, this time I will refrain from calling-out anybody specifically...

Enhanced by Zemanta

April 19, 2011

Content Negotiation for CrossRef DOIs

So does anybody remember the posting DOIs and Linked Data: Some Concrete Proposals?

Well, we went with option "D."

From now on, DOIs, expressed as HTTP URIs, can be used with content-negotiation.

Let's get straight to the point. If you have curl installed, you can start playing with content-negotiation and CrossRef DOIs right away:

curl -D - -L -H   "Accept: application/rdf+xml" "http://dx.doi.org/10.1126/science.1157784" 

curl -D - -L -H   "Accept: text/turtle" "http://dx.doi.org/10.1126/science.1157784"

curl -D - -L -H   "Accept: application/atom+xml" "http://dx.doi.org/10.1126/science.1157784"

Or if you are already using CrossRef's "unixref" format:

curl -D - -L -H "Accept: application/unixref+xml" "http://dx.doi.org/10.1126/science.1157784" 

This will work with over 46 million CrossRef DOIs as of today, but the beauty of the setup is that from now on, any DOI registration agency can enable content negotiation for their constituencies as well. DataCite- we're looking at you ;-) .

It also means that, as registration agency members (CrossRef publishers, for instance) start providing more complete and richer representations of their content, we can simply redirect content-negotiated requests directly to them.

We expect that that this development will round-out CrossRef's efforts to support standard APIs including OpenURL and OAI_PMH and we look forward to seeing DOIs increasingly used in linked data applications.

Finally, CrossRef would just like to thank the IDF and CNRI for their hard work on this as well as Tony Hammond and Leigh Dodds for their valuable advice and persistent goading.







March 25, 2010

DOIs and Linked Data: Some Concrete Proposals

Since last month's threads (here, here, here and here) talking about the issues involved in making the DOI a first-class identifier for linked data applications, I've had the chance to actually sit down with some of the thread's participants (Tony Hammond, Leigh Dodds, Norman Paskin) and we've been able sketch-out some possible scenarios for migrating the DOI into a linked data world.

I think that several of us were struck by how little actually needs to be done in order to fully address virtually all of the concerns that the linked data community has expressed about DOIs. Not only that- but in some of these scenarios we would put ourselves in a position to be able to semantically-enable over 40 million DOIs with what amounts to the flick of a switch.

Given the huge interest in linked data on the part of researchers and CrossRef members- it seems like it would be a fantastic boon to both the IDF (International DOI Foundation) and CrossRef if we were able to do something quickly here.

Anyway- The following are notes outlining several concrete proposals for addressing the limitations of DOIs as identifiers in linked data applications. They range in complexity/effort involved- with the simplest scenario providing minimal (yet functional) LD capabilities for just one RA's members (CrossRef's) and the most complex providing per-RA and per-RA-member configurability on how DOIs would behave for LD applications.

We'd appreciate comments, questions, suggestions, corrections, etc.

A: Simplest Scenario

What would need to be done?

  1. CrossRef implements a linked data service. For example, hosted at rdf.crossref.org.
  2. CrossRef recommends that any member publisher who wants to add rudimentary linked data capabilities to their site could simply insert some simple link elements into their landing Pages. So, for instance, for the article with the DOI 10.5555/1234567 in the Journal of Psychoceramics, the publisher would put the following in the landing page for the article:
<link rel="primarytopic" href="http://doi.crossref.org/10.5555/1234567" /> 
    <link rel="alternate" type="application/rdf+xml" href="http://rdf.crossref.org/metadata/10.5555/1234567.rdf" title="RDF/XML version of this document"/> 
    <link rel="alternate" type="text/html" href="http://www.journalofpsychoceramics.org/10.5555/1234567.html" title="HTML version of this document"/> 
    <link rel="alternate" type="application/json" href="http://rdf.crossref.org/metadata/10.5555/1234567.json" title="RDF/JSON version of this document"/> 
    <link rel="alternate" type="text/turtle" href="http://rdf.crossref.org/metadata/10.5555/1234567.ttl" title="Turtle version of this document"/>

In the above snippet the HTML version of the document is the publisher's existing landing page.

How it would work

  1. A sem-web-enabled browser would query dx.doi.org/10.5555/1234567 and get a normal 302 redirect to the publisher's landing page. 
  2. The sem-web-enabled browser would sniff the page for the link elements and retrieve the representations it wanted from rdf.crossref.org
  3. The returned document would contain an appropriate representation of the metadata that the publisher has deposited with CrossRef. It would also assert that:

doi.crossref.org/10.5555/12334567 owl:sameAs dx.doi.org/10.5555/1234567 .
dx.doi.org/10.5555/12334567 owl:sameAs info:doi/10.5555/12334567 .
info:doi/10.5555/12334567 owl:sameAs doi:10.5555/1234567 .

Alternatively, the publisher could implement their own linked data support on their own domain using whatever appropriate method they want. So, for instance, a larger publisher could support content negotiation at their site and return different/enhanced metadata, etc.

Pros

  1. Doesn't require changes at DOI/Handle levels
  2. Is easy for publisher to opt-in or opt-out
  3. Requires minimal development on the part of CrossRef.

Cons

  1. Only applies to CrossRef DOIs.
  2. It depends on publishers taking action. Might be a long time before publishers add the needed links to their landing pages or support content negotiation.
  3. DOI system is still not strictly LD compliant (e.g. it is returning 302 redirects. Naive sem-web browsers might 'stop' after getting a 302. Should ideally use 303s, content negotiation, etc.)
  4. Doesn't work for DOIs that currently bypass landing pages and which go directly to content.

B: Simple + IDF Global Semantic Compliance

What would need to be done?

  1. Same as "Simplest Scenario"
  2. IDF globally changes dx.doi.org to return 303 redirect

How would it work?

Same as Simplest Scenario, except that, because sem-web-enabled browser had been told it was being redirected to a NIR (via the 303), it would presumably be more likely to continue.

Pros

  1. All DOIs conform to expectations for LD identifiers
  2. Easy for publisher to opt-in or opt-out
  3. Requires minimal development on part of CrossRef
  4. Requires minimal work (?) on part of IDF

Cons

  1. Requires global change on part of IDF. Global change might conflict with requirements of other RAs.
  2. It depends on publishers taking action. Might be a long time before publishers add needed links to their landing pages or support content negotiation.
  3. Doesn't work for DOIs that currently bypass landing pages (e.g. OECD spreadhseets, UICR datasets, etc.)

C: Simple + IDF Global Semantic Compliance + RA CN Intercept

What would need to be done?

  1. Same as "B: Simple + IDF Global Semantic Compliance" Scenario
  2. IDF  changes dx.doi.org to redirect content-negotiated dx.doi.org queries to RA-controlled resolver depending on the preferences of the RA.
  3. RA implements DOI resolver (e.g. dx.crossref.org) that supports content negotiation. RA allows its members to specify to the RA  that they want either:
    1. RA to forward all requests to the member's site.
    2. RA to "intercept" content-negotiations for non-HTML representations and direct them appropriately (e.g. return appropriate representation from rdf.crossref.org)

How would it work?



Pros

  1. All DOIs conform to expectations for LD identifiers
  2. Allows RA to potentially LD-enable its members very quickly.
  3. Easy for ra-members to opt-in or opt-out
  4. Requires minimal development on part of CrossRef
  5. Would even work for DOIs that bypass landing pages

Cons

  1. Requires global change on part of IDF. Global change might conflict with requirements of other RAs.
  2. Requires change to add decision logic implementation on part of IDF. 
  3. Requires development of RA resolvers that implement per-member resolution logic (note- this would probably actually be done at DOI level)

D: Simple + IDF Selective Semantic Compliance + RA CN Intercept

What would need to be done?

  1. Same as Simplest Scenario
  2. IDF  changes dx.doi.org to return either 302 or 303 redirect depending on the preferences of the RA.
  3. IDF  changes dx.doi.org to redirect content-negotiated dx.doi.org queries to RA-controlled resolver depending on the preferences of the RA.
  4. RA implements DOI resolver (e.g. dx.crossref.org) that supports content negotiation. RA allows its members to specify to the RA  that they want either:
    1. RA to forward all requests to the member's site.
    2. RA to "intercept" content-negotiations for non-HTML representations and direct them appropriately (e.g. return appropriate representation from rdf.crossref.org)

How would it work?



Pros

  1. Allows RA to potentially LD-enable its members very quickly.
  2. Easy for ra-members to opt-in or opt-out
  3. Requires minimal development on part of CrossRef
  4. Would even work for DOIs that bypass landing pages

Cons

  1. Only some DOIs conform to expectations for LD identifiers
  2. Requires change to add decision logic implementation on part of IDF. 
  3. Requires development of RA resolvers that implement per-member resolution logic (note- this would probably actually be done at DOI level)

December 13, 2009

A Christmas Reading List... with DOIs

Was outraged (outraged, I tell you) that one of my favorite online comics, PhD, didn't include DOIs in their recent bibliography of Christmas-related citations.. So I've compiled them below.

We care about these things so that you don't have to. Bet you will sleep better at night knowing this.

Or perhaps not...

A Christmas Reading List... with DOIs.

Citation:  Biggs, R, Douglas, A, Macfarlane, R, Dacie, J, Pitney, W, Merskey, C & O'Brien, J, 1952, 'Christmas Disease', BMJ, vol. 2, no. 4799, pp. 1378-1382.
CrossRef DOI:  http://dx.doi.org/10.1136/bmj.2.4799.1378

Title:  More Than a Labor of Love: Gender Roles and Christmas Gift Shopping
Citation:  Fischer, E & Arnold, S, 1990, 'More Than a Labor of Love: Gender Roles and Christmas Gift Shopping', Journal of Consumer Research, vol. 17, no. 3, p. 333.
CrossRef DOI:  http://dx.doi.org/10.1086/208561

Title:  Looking at Christmas trees in the nucleolus
Citation:  Scheer, U, Xia, B, Merkert, H & Weisenberger, D, 1997, 'Looking at Christmas trees in the nucleolus', Chromosoma, vol. 105, no. 7-8, pp. 470-480.
CrossRef DOI:  http://dx.doi.org/10.1007/s004120050209

Title:  The Vela glitch of Christmas 1988
Citation:  McCulloch, P, Hamilton, P, McConnell, D & King, E, 1990, 'The Vela glitch of Christmas 1988', Nature, vol. 346, no. 6287, pp. 822-824.
CrossRef DOI:  http://dx.doi.org/10.1038/346822a0

Title:  Cardiac Mortality Is Higher Around Christmas and New Year's Than at Any Other Time: The Holidays as a Risk Factor for Death
Citation:  Phillips, D, 2004, 'Cardiac Mortality Is Higher Around Christmas and New Year's Than at Any Other Time: The Holidays as a Risk Factor for Death', Circulation, vol. 110, no. 25, pp. 3781-3788.
CrossRef DOI:  http://dx.doi.org/10.1161/01.CIR.0000151424.02045.F7

Title:  Red Crabs in Rain Forest, Christmas Island: Biotic Resistance to Invasion by an Exotic Snail
Citation:  Lake, P & O'Dowd, D, 1991, 'Red Crabs in Rain Forest, Christmas Island: Biotic Resistance to Invasion by an Exotic Snail', Oikos, vol. 62, no. 1, p. 25.
CrossRef DOI:  http://dx.doi.org/10.2307/3545442

Title:  The Carvedilol Hibernation Reversible Ischaemia Trial, Marker of Success (CHRISTMAS) study Methodology of a randomised, placebo controlled, multicentre study of carvedilol in hibernation and heart failure
Citation:  Pennell, D, 2000, 'The Carvedilol Hibernation Reversible Ischaemia Trial, Marker of Success (CHRISTMAS) study Methodology of a randomised, placebo controlled, multicentre study of carvedilol in hibernation and heart failure', International Journal of Cardiology, vol. 72, no. 3, pp. 265-274.
CrossRef DOI:  http://dx.doi.org/10.1016/S0167-5273(99)00198-9

December 8, 2009

QR Codes and DOIs

Inspired by Google's recent promotion of QR Codes, I thought it might be fun to experiment with encoding a CrossRef DOI and a bit of metadata into one of the critters. I've put a short write-up of the experiment on the CrossRef Labs site, which includes a demonstration of how you can generate a QR Code for any given CrossRef DOI.

Put them on postcards and send them to your friends for the holidays. Tattoo them on your pets. The possibilities are endless.

March 20, 2009

Citation Typing Ontology

I was happy to read David Shotton's recent Learned Publishing article, Semantic Publishing: The Coming Revolution in scientific journal publishing, and see that he and his team have drafted a Citation Typing Ontology.*

Anybody who has seen me speak at conferences knows that I often like to proselytize about the concept of the "typed link", a notion that hypertext pioneer, Randy Trigg, discussed extensively in his 1983 Ph.D. thesis.. Basically, Trigg points out something that should be fairly obvious- a citation (i.e. "a link") is not always a "vote" in favor of the thing being cited.

In fact, there are all sorts of reasons that an author might want to cite something. They might be elaborating on the item cited, they might be critiquing the item cited, they might even be trying to refute the item cited (For an exhaustive and entertaining survey of the use and abuse of citations in the humanities, Anthony Grafton's, The Footnote: A Curious History, is a rich source of examples)

Unfortunately, the naive assumption that a citation is tantamount to a vote of confidence has become inshrined in everything from the way in which we measure scholarly reputation, to the way in which we fund universities and the way in which search engines rank their results. The distorting affect of this assumption is profound. If nothing else, it leads to a perverse situation in which people will often discuss books, articles, and blog postings that they disagree with without actually citing the relevant content, just so that they can avoid inadvertently conferring "wuffie" on the item being discussed. This can't be right.

Having said that, there has been a half-hearted attempt to introduce a gross level of link typology with the introduction of the "nofollow" link attribute- an initiative started by Google in order to try to address the increasing problem of "Spamdexing". But this is a pretty ham-fisted form of link typing- particularly in the way it is implemented by the Wikipedia where CrossRef DOI links to formally published scholarly literature have a "nofollow" attribute attached to them but, inexplicably, items with a PMID are not so hobbled (view the HTML source of this page, for example). Essentially, this means that, the Wikipedia is a black-hole of reputation. That is, it absorbs reputation (through links too the Wikipedia), but it doesn't let reputation back out again. Hell, I feel dirty for even linking to it here ;-).

Anyway, scholarly publishers should certainly read Shotton's article because it is full of good, and practical ideas about what can can be done with today's technology in order to help us move beyond the "digital incunabula" that the industry is currently churning out. The sample semantic article that Shotton's team created is inspirational and I particularly encourage people to look at the source file for the ontology-enhanced bibliography which reveals just how much more useful metadata can be associated with the humble citation.

And now I wonder whether CiteULike, Connotea, 2Collab or Zotero will consider adding support for the CItation Typing Ontology into their respective services?


* Disclosure:

a) I am on the editorial board of Learned Publishing
b) CrossRef has consulted with David Shotton on the subject of semantically enhancing journal articles

March 11, 2009

Researcher Identification Primer

Discussions around "contributor Ids" (aka "Author ID, Researcher ID, etc.) seem to be becoming quite popular. In the interview that I pointed to in my last post, I mentioned that CrossRef has been talking with a group of researchers who were very interested in creating some sort of authenticated contributor ID as a mechanism for controlling who gets trusted access to sensitive genome-wide aggregate genotype data.

Well, I'm delighted to say that said group of researchers(at the GEN2PHEN project) have created a "Researcher Identification Primer" website in which they outline the many use-cases and issues around creating a mechanism for unambiguously identifying and/or authenticating researchers. This looks like a great resource and I expect it will serve as a useful focus for further discussion around the issue.

December 3, 2008

Ubiquity commands for CrossRef services

So the other day Noel O'Boyle made me feel guilty when he pinged me and asked about the possibility using one of the CrossRef APIs for creating a Ubiquity extension. You see, I had played with the idea myself and had not gotten around to doing much about it. This seemed inexcusable- particularly given how easy it is to build such extensions using the API we developed for the WordPress and Moveable Type plugins that we announced earlier in the year. So I dug up my half-finished code, cleaned it up a bit and have posted the results.

Note that the back-end that supports the plugins has been moved to more stable machines and the index is now being automatically updated with journal and conference proceeding deposits (sorry, no books yet).

Also note that we are hoping that others will look at the code for the WordPress, Moveable Type and Ubiquity plugins and create more such extensions. If you do, please let us know about them at citation-plugin@crossref.org.

July 21, 2008

CrossTech By Numbers

CrossTech is two years old (less one month) and we have now seen some 145 posts. Breaking the posts down by poster we arrive at the following chart:
crosstech.png
Note this is not any real attempt at vainglory, more a simple excuse to play with the wonderful Google Chart API. Also, above I've taken the liberty of putting up an image (.png), although the chart could have been generated on the fly from this link (or tinyurl here).

What is of interest in the chart is that approximately 3/4 of the posts are by CrossRef members (TH, EN, RK) and 1/4 by CrossRef staff (EP, GB, AT, CK). Certainly CrossRef staffers are doing their bit for this blog. There's also way too many posts from me. It would be really interesting to see some others' views or observations per the CrossTech logo legend ("..., collaboration, ...").

I guess the real impediment is that one needs to request an account before posting. (Certainly there's no reason for any member to be shy about requesting an account and posting.) Note that I haven't considered the number of commentators to the blog which is larger than the number of posters. Also a number of CrossRef members are very active with their own blogs. Those blogs with a tech focus could (should?) be scooped up by a Planet style aggregator if there would be sufficient interest in maintaining a publishing technology hub.

One can only hope that the numbers will continue to grow (by direct posts or by aggregations) and that there will be a wider info share over the next couple of years.