February 13, 2010

Is FRBR the OSI for Web Architecture?

(This post is just a repost of a comment to Geoff's last entry made because it's already rather long, because it contains one original thought - FRBR as OSI - and because, well, it didn't really want to wait for moderation.)

Hi Geoff:

First off, there is no question but that CrossRef was established to take on the reference linking challenge for scholarly literature. (Hell, it's there, as you point out, in the organization name - PILA - as well as in the application name - CrossRef.)

But one should also remember that DOI as it was sold at the time was promising so much more. I disagree with you that the participants back then were as wholly innocent of the FRBR terms as you might suggest. Certainly there were ample presentations on DOI that sought to elucidate those relationships.

No matter. FRBR is a useful reference model to clarify some of these concepts. But not one that we are overly concerned with at this time. Nor even whether DOI maps one to one onto a given FRBR layer. What we are more concerned with on a pragmatic level is how DOI maps onto the Web architecture and especially how it plays along with Linked Data concepts.

(Aside: A propos FRBR we might be in danger of repeating the OSI mistake for standardizing the network layer model. Ultimately that was maintained as a reference model but dropped as a concrete model in favour of the TCP/IP stack. Could be that FRBR is our OSI and Linked Data is our TCP/IP stack? That is, we might have to settle on the coarser data model in order to get a coherent story out the door where all can agree.)

You say:

"we need a mechanism to distinguish between when we are getting the thing pointed to by the CrossRef DOI (the PDF , HTML, etc.) as opposed to "something about the thing" (e.g. the landing page, metadata record, etc.)"
But that is exactly what we were chasing up in the earlier posts (both my DOI: What Do We Got? and John Erickson's DOIs, URIs and Cool Resolution). You want to distinguish between a thing and a description about a thing. And Web architecture does just that: it distinguishes between Information Resources (i.e. the things) and Non-Information Resources (i.e. descriptions of the things).

Now is this something that CrossRef can truly distinguish and make apparent in its service architecture? If we retain the notion of landing page we are already essentially saying that a CrossRef HTTP URI identifies a decsription of the resource, i.e. a Non-Information Resource, or Other Resource, and that is properly indicated within the architecture by returning a "303 See Other status" code.

I think that's all we're saying at the moment as a first step.

Web architecture wants to know if the DOI HTTP URI is a thing or description of a thing. I say the latter. You seem to suggest in your comment the latter too. I wonder if we could get a vote on that.

And btw, I am not suggesting that CrossRef needs to dive into the business of "tracking compoend documents in their entirety". Far from it. Lets just get a common resource architecture agreed publicly and then we can build on that.

This observation I received in a private email is something I fully support:

"The real problem is what doi http uri identify on the web. Everything flows from the answer to that Q."
Tony


February 11, 2010

Does a CrossRef DOI identify a "work?"

Tony's recent thread on making DOIs play nicely in a linked data world has raised an issue I've meant to discuss here for some time- a lot of the thread is predicated on the idea that CrossRef DOIs are applied at the abstract "work" level. Indeed, that it what it currently says in our guidelines. Unfortunately, this is a case where theory, practice and documentation all diverge.

When the CrossRef linking system was developed it was focused primarily on facilitating persistent linking amongst journals and conference proceedings. The system was quickly adapted to handle books and more recently to handle working papers, technical reports, standards and “components”- a catchall term used to refer to everything from individual article images to database records.

In practice the content outside of the core journals and conference proceedings has accounted for relatively low volume. However, we expect that over the next few years this will change and that books and databases will increasingly drive the future growth in CrossRef’s citation linking services. Interestingly, these content types all share characteristics that make them substantially different from the journals and conference proceedings that we have hitherto focused on.

Both books and databases introduce new challenges to technology and policies of our citation linking service. The challenges revolved around two areas:

  • Structure: Both books and databases can have complex structures and the publishers of this content are likely to require granular identification of these content substructures along with a mechanism for documenting the relationship between these substructures (e.g. this section is part of this chapter which is part of this monograph which is part of this series)
  • Versioning: Unlike typical journals and conference proceedings, books and database records sometimes change over time.


When confronted with the issues of structure and versioning publishers are often tempted to take shortcuts and decide to simply assign DOIs at the highest level structure and to the “work” instead of a particular “manifestation” or version of that work. Indeed, section 5.5 of CrossRef's DOI Name Information and Guidelines recommends this. But this approach could have a negative impact on the integrity of the scholarly citation record that CrossRef is attempting to maintain.

Fundamentally, CrossRef DOIs are aimed at providing a persistent online citation infrastructure for scholarly and professional publishers. Consequently, decisions about where to apply CrossRef DOIs should be guided by common expectations about the way in which citations work. Citations are typically used to credit ideas or provide evidence. A reader follows a citation in order to obtain more detail or to verify that an author is accurately representing the item cited. A rule of thumb is that a reader has a reasonable expectation that when they follow a citation, they will be taken to what the author saw when creating the citation. Any divergent behavior could result in the reader concluding that the author was misrepresenting the item cited. A further implication of this is that any changes to content that are likely to effect the crediting or interpretation of the content should result in that changed content getting a new CrossRef DOI.

Typically, this means that CrossRef DOIs should be probably assigned at the expression level and different expressions should be assigned different CrossRef DOIs. This is because assigning a CrossRef DOI at the higher "work" level is generally not granular enough to guarantee that a reader following the citation will see what the author saw when creating the citation. For example, one translation of a work might be substantially different from another translation of the same work. Similarly a draft version of a work might be substantially different from the final published version of the work. In each case, resolving a citation to a different expression of the work than the expression that was originally cited might result in the reader interpreting the content differently than the citing author.

In general, different "equivalent manifestations" of the same work can safely be assigned the same CrossRef DOI. So, for instance, the HTML formatted version an article and the PDF formatted version of an article can almost always be assigned the same CrossRef DOI. Any differences between the two are unlikely to affect the crediting of, or reader's interpretation of, the work. But sometimes it is even possible that different manifestations of an expression will differ enough to merit different CrossRef DOIs. For instance, a semantically enhanced version of an article might require new crediting (e.g. the parties responsible for adding the semantic information) and the resulting semantic enhancement may conceivably alter the reader's interpretation of the article.

Unfortunately, there is no hard and fast rule about where and when to assign new CrossRef DOIs. Instead there is only a guideline, namely:

"Assign new CrossRef DOIs to content in a way that will ensure that a reader following the citation will see something as close to what the original author cited as is possible."

The implications of this to publishers are important, especially when they are assigning DOIs to protean content types. For instance, it may mean that:

  • Book publishers should be expected to keep old editions of books available for link resolution purposes.
  • Publishers of content that can change rapidly (e.g. by the second) should provide facilities for creating frozen, archived snapshots of content for citation purposes.
  • All publishers of protean content should issue guidelines instructing researchers on when it is appropriate to cite a work, manifestation or version.

CrossRef needs to actively consider these issues as publishers start assigning CrossRef DOIs to more dynamic types of content. Minimally, we should be able to provide publishers with recommendations on how to make dynamic content citable. We may even want to consider enshrining certain types of behavior in our terms and conditions so as to ensure the future integrity of the scholarly citation record.

In short, we need to update our guidelines.

February 10, 2010

The Response Page

(Update - 2010.02.10: I just saw that I posted here on this same topic over a year ago. Oh well, I guess this is a perennial.)

I am opening a new entry to pick up one point that John Erickson made in his last comment to the previous entry:

"I am suggesting that one "baby step" might be to introduce (e.g.) RDFa coding standards for embedding the doi:D syntax."
Yea!

It might be worth consulting the latest CrossRef "DOI Name Information and Guidelines" (PDF) to see what that has to say about this. Section 6.3 - The response page has these two specific requirements for publishers:

  1. When metadata and DOIs are deposited with CrossRef, the publisher must have active response pages in place so that they can resolve incoming links.
  2. A minimal response page must contain a full bibliographic citation displayed to the user. A response page without bibliographic information should never be presented to a user.
What is truly shocking about these requirements is that this are purely user focussed. There is no mention whatsoever of machines. One might have thought that with the Linked Data gospel in full swing there would at least be a nod to machine-readable metadata. But there's none. I'm not saying that there should be any requirement, or even any recommendation. But a mention might have been useful to chivvy us all along.

I agree with John that publishers could be encouraged (or even just reminded) that machine-readable metadata could be made available through various mechanisms: HTML META tags (such as we currently provide at Nature - and as blogged here earlier), COinS objects, RDF/XML comments, or best of all RDFa markup as John mentions.

The Web is getting semantic. It's about time that CrossRef members joined the wave. And would be helpful if CrossRef were there to help us with some new guidelines too!

February 09, 2010

DOI: What Do We Got?

doi-what-do-we-got.png
(Click image for full size graphic.)

Following the JISC seminar last week on persistent identifiers (#jiscpid on Twitter) there was some discussion about DOI and its role within a Linked Data context. John Erickson has responded with a very thoughtful post DOIs, URIs and Cool Resolution, which ably summarizes how the current problem with DOI in that the way the DOI is is implemented by the handle HTTP proxy may not have kept pace with actual HTTP developments. (For example, John notes that the proxy is not capable of dealing with 'Accept' headers.) He has proposed a solution, and the post has attracted several comments.

I just wanted to offer here the above diagram in an attempt to corral some of the various facets relating to DOI that I am aware of. I realize that this may seem like an open invitation to flame on - and this is a very preliminary draft - but ... be kind!

So, this may be totally off the wall but it represents my best understanding of DOI as used by CrossRef.

I have distinguished three main contexts:

  1. Generic Data - A generalized information context where the an object is identified with a DOI, an identifier system that is currently being ratified through the ISO process. This is the raw DOI number. (This definitely is not a first class object on the Web as it has no URI.)
  2. Web Data - An online information context (here I use the term 'Web' in its widest sense) where resources are identified by URI (not necessarily an HTTP URI). Here DOI is represented under two URI schemes: 'doi:' (unregistered but preferred by CrossRef), and 'info:' (registered and available for general URI use). Also it has a presence on the Web via an HTTP proxy (dx.doi.org) URL where it is used as a slug to create a permalink (as listed at 'A'). A simple HTTP redirect is used (with status code 302) to turn this permalink into the publisher response page http://example/1. (Note that typically a second redirect will occur on the publisher platform, here shown by the redirect to http://example/2.)
  3. Linked Data - An online information context where resources are identified by HTTP URI and conform to Linked Data principles. Now this is where there is a tension arises between the common publisher perspective and the strict semantic viewpoint. Implicit in the general Web context given above was the notion that the permalink ('A') was somehow related to the abstract object and the redirection service applied to it associated the abstract resource with concrete representations of the object.
So how do we relate the DOI HTTP URI with the abstract ('work') identifier listed at 'D' in the diagram?

Well the Architecture of the World Wide Web recognizes two distinct classes of resources: Information Resources (IR) and Non-Information Resources (NR). (Note: Only the term 'information resource' is used in AWWW.) IR are those that can be directly retrieved using HTTP, whereas NR are not directly retrievable but have an associated description which is retrievable and is itself a proxy for the real world object.

So either the HTTP URI denotes an IR (as listed at 'B') and is resolved (through HTTP status code '302 Found') to a default representation, which is the view that the Linked Data community would currently have of DOI. But this is at odds with what the CrossRef position which regards DOI as identifying the abstract work. Alternately to fit better the CrossRef model of DOI the HTTP URI would denote an NR (as listed at 'A') which would be resolved (through HTTP status code '303 See Other') to an associated description - a publisher response page.

There will be those self-appointed URI czars who will bemoan the fact of there being multiple URIs. But frankly there is nothing inherently wrong with that. Just as in the real world there are many languages so in the online world there are multiple contexts and histories. We can attempt to make some sense of this by making use of the well-known semantic properties owl:sameAs and ore:similarTo and declare (as also shown in the diagram) the following assertions:


info:doi/D owl:sameAs doi:D .

http://dx.doi.org/D ore:similarTo info:doi/D .

http://dx.doi.org/D ore:similarTo doi:D .


Note that ore:similarTo (stemming from the OAI-ORE work) is a weaker kind of relationship than owl:sameAs (which comes from OWL) and may be appropriate in this usage.

In sum, scenario 'A' is what we have currently implemented, scenario 'B' is what might be commonly perceived as being implemented, and scenario 'C' may be a more correct semantic position.

Your comments (and not unkind comments, please;) are more than welcome.

December 13, 2009

A Christmas Reading List... with DOIs

Was outraged (outraged, I tell you) that one of my favorite online comics, PhD, didn't include DOIs in their recent bibliography of Christmas-related citations.. So I've compiled them below.

We care about these things so that you don't have to. Bet you will sleep better at night knowing this.

Or perhaps not...

A Christmas Reading List... with DOIs.

Citation:  Biggs, R, Douglas, A, Macfarlane, R, Dacie, J, Pitney, W, Merskey, C & O'Brien, J, 1952, 'Christmas Disease', BMJ, vol. 2, no. 4799, pp. 1378-1382.
CrossRef DOI:  http://dx.doi.org/10.1136/bmj.2.4799.1378

Title:  More Than a Labor of Love: Gender Roles and Christmas Gift Shopping
Citation:  Fischer, E & Arnold, S, 1990, 'More Than a Labor of Love: Gender Roles and Christmas Gift Shopping', Journal of Consumer Research, vol. 17, no. 3, p. 333.
CrossRef DOI:  http://dx.doi.org/10.1086/208561

Title:  Looking at Christmas trees in the nucleolus
Citation:  Scheer, U, Xia, B, Merkert, H & Weisenberger, D, 1997, 'Looking at Christmas trees in the nucleolus', Chromosoma, vol. 105, no. 7-8, pp. 470-480.
CrossRef DOI:  http://dx.doi.org/10.1007/s004120050209

Title:  The Vela glitch of Christmas 1988
Citation:  McCulloch, P, Hamilton, P, McConnell, D & King, E, 1990, 'The Vela glitch of Christmas 1988', Nature, vol. 346, no. 6287, pp. 822-824.
CrossRef DOI:  http://dx.doi.org/10.1038/346822a0

Title:  Cardiac Mortality Is Higher Around Christmas and New Year's Than at Any Other Time: The Holidays as a Risk Factor for Death
Citation:  Phillips, D, 2004, 'Cardiac Mortality Is Higher Around Christmas and New Year's Than at Any Other Time: The Holidays as a Risk Factor for Death', Circulation, vol. 110, no. 25, pp. 3781-3788.
CrossRef DOI:  http://dx.doi.org/10.1161/01.CIR.0000151424.02045.F7

Title:  Red Crabs in Rain Forest, Christmas Island: Biotic Resistance to Invasion by an Exotic Snail
Citation:  Lake, P & O'Dowd, D, 1991, 'Red Crabs in Rain Forest, Christmas Island: Biotic Resistance to Invasion by an Exotic Snail', Oikos, vol. 62, no. 1, p. 25.
CrossRef DOI:  http://dx.doi.org/10.2307/3545442

Title:  The Carvedilol Hibernation Reversible Ischaemia Trial, Marker of Success (CHRISTMAS) study Methodology of a randomised, placebo controlled, multicentre study of carvedilol in hibernation and heart failure
Citation:  Pennell, D, 2000, 'The Carvedilol Hibernation Reversible Ischaemia Trial, Marker of Success (CHRISTMAS) study Methodology of a randomised, placebo controlled, multicentre study of carvedilol in hibernation and heart failure', International Journal of Cardiology, vol. 72, no. 3, pp. 265-274.
CrossRef DOI:  http://dx.doi.org/10.1016/S0167-5273(99)00198-9

Recently Commented On

Powered by
Movable Type 3.2