DOI: What Do We Got?

(Click image for full size graphic.)
Following the JISC seminar last week on persistent identifiers (#jiscpid on Twitter) there was some discussion about DOI and its role within a Linked Data context. John Erickson has responded with a very thoughtful post DOIs, URIs and Cool Resolution, which ably summarizes how the current problem with DOI in that the way the DOI is is implemented by the handle HTTP proxy may not have kept pace with actual HTTP developments. (For example, John notes that the proxy is not capable of dealing with 'Accept' headers.) He has proposed a solution, and the post has attracted several comments.
I just wanted to offer here the above diagram in an attempt to corral some of the various facets relating to DOI that I am aware of. I realize that this may seem like an open invitation to flame on - and this is a very preliminary draft - but ... be kind!
So, this may be totally off the wall but it represents my best understanding of DOI as used by CrossRef.
I have distinguished three main contexts:
- Generic Data - A generalized information context where the an object is identified with a DOI, an identifier system that is currently being ratified through the ISO process. This is the raw DOI number. (This definitely is not a first class object on the Web as it has no URI.)
- Web Data - An online information context (here I use the term 'Web' in its widest sense) where resources are identified by URI (not necessarily an HTTP URI). Here DOI is represented under two URI schemes: 'doi:' (unregistered but preferred by CrossRef), and 'info:' (registered and available for general URI use). Also it has a presence on the Web via an HTTP proxy (dx.doi.org) URL where it is used as a slug to create a permalink (as listed at 'A'). A simple HTTP redirect is used (with status code 302) to turn this permalink into the publisher response page http://example/1. (Note that typically a second redirect will occur on the publisher platform, here shown by the redirect to http://example/2.)
- Linked Data - An online information context where resources are identified by HTTP URI and conform to Linked Data principles. Now this is where there is a tension arises between the common publisher perspective and the strict semantic viewpoint. Implicit in the general Web context given above was the notion that the permalink ('A') was somehow related to the abstract object and the redirection service applied to it associated the abstract resource with concrete representations of the object.
Well the Architecture of the World Wide Web recognizes two distinct classes of resources: Information Resources (IR) and Non-Information Resources (NR). (Note: Only the term 'information resource' is used in AWWW.) IR are those that can be directly retrieved using HTTP, whereas NR are not directly retrievable but have an associated description which is retrievable and is itself a proxy for the real world object.
So either the HTTP URI denotes an IR (as listed at 'B') and is resolved (through HTTP status code '302 Found') to a default representation, which is the view that the Linked Data community would currently have of DOI. But this is at odds with what the CrossRef position which regards DOI as identifying the abstract work. Alternately to fit better the CrossRef model of DOI the HTTP URI would denote an NR (as listed at 'A') which would be resolved (through HTTP status code '303 See Other') to an associated description - a publisher response page.
There will be those self-appointed URI czars who will bemoan the fact of there being multiple URIs. But frankly there is nothing inherently wrong with that. Just as in the real world there are many languages so in the online world there are multiple contexts and histories. We can attempt to make some sense of this by making use of the well-known semantic properties owl:sameAs and ore:similarTo and declare (as also shown in the diagram) the following assertions:
info:doi/D owl:sameAs doi:D .
http://dx.doi.org/D ore:similarTo info:doi/D .
http://dx.doi.org/D ore:similarTo doi:D .
Note that ore:similarTo (stemming from the OAI-ORE work) is a weaker kind of relationship than owl:sameAs (which comes from OWL) and may be appropriate in this usage.
In sum, scenario 'A' is what we have currently implemented, scenario 'B' is what might be commonly perceived as being implemented, and scenario 'C' may be a more correct semantic position.
Your comments (and not unkind comments, please;) are more than welcome.

Comments
Nice summary...
I'm not overly familiar with the semantics of ore:similarTo but it seems to me that to fully bring the DOI into the world of Linked Data you need an http URI about which you can say owl:sameAs doi:D.
In the Twitter #jiscpid discussion, that's what I meant by suggesting use of something like http://doi.net/D, though thinking about it now I wonder if something like http://purl.org/doi/D might be better. Then you could say
http://purl.org/doi/D -- owl:sameAs --> doi:D
Posted by: Andy Powell | February 9, 2010 10:52 AM
Makes sense, this is helpful.
Except, I don't belive "doi:D" is a legal URI, is it? I think that's a convention of it's own, but not a URI. The "info:doi:" one is the actual legal URI version of this. And if "doi:D" ain't a legal URI at all, you can't use it in RDF owl:sameas etc, can you?
Posted by: Jonathan Rochkind | February 9, 2010 11:49 AM
And I'd add... Andy, what's wrong with the "info:doi:D" URI used to same owl:sameAs? Or if you insist on resolvability, perhaps the http://dx.doi.org one, although I'm not sure DOI/CrossRef has made an organizational commitment to persistence of that one. But is there really a need to create yet another form of URI with purl for this stuff, when we already have both "info:doi:" and "http://dx.doi.org/"
I'm still not sure you can say ANY of these are owl:sameAs "doi:D" -- doesn't the object of an RDF predicate need to be a URI? And "doi:D" isn't, is it?
Posted by: Jonathan Rochkind | February 9, 2010 11:52 AM
Thanks for your post, Tony!
Andy, I'm not sure I get the whole point of what you are suggesting w.r.t. owl:sameAs, at least from a technical standpoint.
Today we can with confidence make assertions like...
http://hdl.handle.net/D --> owl:sameAs --> doi:D
...and...
http://dx.doi.org/D --> owl:sameAs --> doi:D
...because (technically) these HTTP URIs represent doi:D in the HTTP universe (aka the Real World).
The discussion we've been having over at my blog has been focused on steps at the Handle System proxy level that could be taken to ensure that the HTTP URIs based on DOIs --- every DOI has one! --- behave themselves according to the expectations of the Web of Data (etc) ecosystem. Mostly, this is about content negotiation...
Posted by: John S. Erickson, Ph.D. | February 9, 2010 11:54 AM
Thanks for your post, Tony!
Andy, I'm not sure I get the whole point of what you are suggesting w.r.t. owl:sameAs, at least from a technical standpoint.
Today we can with confidence make assertions like...
http://hdl.handle.net/D --> owl:sameAs --> doi:D
...and...
http://dx.doi.org/D --> owl:sameAs --> doi:D
...because (technically) these HTTP URIs represent doi:D in the HTTP universe (aka the Real World).
The discussion we've been having over at my blog has been focused on steps at the Handle System proxy level that could be taken to ensure that the HTTP URIs based on DOIs --- every DOI has one! --- behave themselves according to the expectations of the Web of Data (etc) ecosystem. Mostly, this is about content negotiation...
Posted by: John S. Erickson, Ph.D. | February 9, 2010 01:15 PM
Hi Tony,
Nice article. I can't speak for the Linked Data community, but the tweak you suggest of having the crossref proxy return a 303 status code is something Ive wanted to see for some time. It's a minor change but aligns the approaches quite nicely.
What I don't understand is your suggestion of using ore:similarTo between the different forms of the URI. I think we should be using owl:sameAs to make it explicit that the different forms are identifying the same abstract work.
Cheers,
L.
Posted by: Leigh Dodds | February 10, 2010 03:00 AM
Hi Jonathan:
Your question regarding whether "doi:" is a "legal URI" scheme conflates the separate concerns of URI registration ("legal") with URI conformance ("valid").
The "doi:" URI scheme is not registered but it most certainly is a URI scheme which is in use by a wide community and is endorsed by CrossRef.
There is history to this - you may or may not know. An attempt at registering "doi:" was made a number of years ago with the last Internet-Draft ("-04") being ultimately rejected. Suffice to say the registration procedures for URI schemes were very much in flux at that time. The idea since then has been to hold off a further URI registration until DOI was standardized through ISO. (DOI is currently a draft ISO standard ISO/DIS 26324 and is expected to become a full standard later this year.) Meantime though a separate URI scheme "info:" was successfully registered and DOI registered under that as an "info:" namespace.
Throughout CrossRef has recognized "doi:" as a valid URI scheme and its reference guidelines have recommended this form be used in citation formats.
Relevant documents to consult are RFC 3986 (for string conformance) and RFC 4395 (for registration). It is fair to say the the W3C Recommendation "Architecture of the World Wide Web" does go so far as to say that "Unregistered URI schemes SHOULD NOT be used". You should note though that URI governance proceeds through the IETF and not the W3C.
Given the history, usage patterns, and ongoing standardization processes I think it is reasonable to continue to use "doi:" as the canonical form for a DOI reference on the Web.
Tony
Posted by: Tony Hammond | February 10, 2010 04:02 AM
Hi Andy:
Have two questions for you:
1. Why should the HTTP URI be owl:sameAs in order to to bring DOI over into the LD world? Is that because otherwise (e.g. ore:similarTo) we risk only bringing half (or even less) of the DOI over? You're surely not suggesting that owl:sameAs is a privileged property?
2. Putting the property aside for the moment, why would we not just use http://dx.doi.ord/D as the canonical HTTP URI ... as long as we can make it clear that this is not an "Information Resource", but what I earlier called "Non-Information Resource".
(Aside: I believe that would better be termed "Other Resource" by virtue of it being in a "see other" relationship with an "Information Resource". The term "Non-Information Resource" is so negative.)
Tony
Posted by: Tony Hammond | February 10, 2010 04:19 AM
Tony and John,
My comments about the need for a new http URI form of the DOI are based solely on the assumption (based in turn on comments made in previous discussions by Tony) that doi:D and http://dx.doi.ord/D do NOT identify the same resource.
If doi:D and http://dx.doi.ord/D do indeed identify the same 'non-information' resource (Tony, I agree that this is a horrible term) then the assertion
doi:D -- owl:sameAs --> http://dx.doi.org/D
can be made and I am a happy bunny!
However, if that is the case, then I'm confused as to why Tony didn't say that in the original post, preferring instead to use only the weaker ore:similarTo relation?
Tony,
my point about wanting "to bring DOI over into the LD world" is this... We want the thing (the work) identified by the DOI (i.e. by doi:D) to be addressable as part of LD - we want to be able to make LD assertions about it. LD says you must use http URIs to do that. We therefore need an http URI about which we can make the assertion:
doi:D -- owl:sameAs --> http://dx.doi.org/D
(As per the above) you did not make that assertion in your original post and I'm now confused as to why you did not. Without such an assertion, the things identified by doi:D are not fully part of the LD world.
Does that make sense?
As I say, if the assertion
doi:D -- owl:sameAs --> http://dx.doi.org/D
can be made, then my suggestion about using doi.net or purl.org/doi is unnecessary.
Posted by: Andy Powell | February 10, 2010 05:50 AM
Hi Leigh:
I believe that the main question you asked is pretty much the same as Andy's.
But first to deal with the 303. You say that "It's a minor change but aligns the approaches quite nicely." Indeed. But there are two sides to DOI: a) an HTTP front end, and b) a handle back end. What we are principally concerned with here is the a) side and normalizing HTTP behaviour. However, separately we also need to address the flip side b) and how to implement that change. One of the challlenges here, I believe, is how to disentangle DOI's HTTP behaviour from any vanilla handle's HTTP behaviour. I posted this message to the handle-info list, see this extract:
So, it might be a small change but there are also considerations for handle.As for use of owl:sameAs I think the critical first step is to recognize that the DOI HTTP URI identifies an "Other Resource" (i.e. a "Non-Information Resource"). Once we have that explicitly marked out - and 303 would seem to be the correct HTTP response - then we can see about aligning this HTTP URI with the DOI URI. I would be less averse to asserting that this is indeed an owl:sameAs relationship as long as we consider the HTTP URI as a URI alias and that our primary identifier is the DOI unencumbered by any domain name/file path baggage. (See previous comments about DOI URI legitimacy.)
And with 303 in place, we can then begin to consider a mechanism for adding in machine readable descriptions. (In fact, support for HTTP conneg could have implications for DOI multiple resolution.)
Tony
Posted by: Tony Hammond | February 10, 2010 05:55 AM
Tony,
I broadly agree with your comments about the 'legality' of the doi URI. However, continuing to promote this unregistered URI as the "canonical form for a DOI reference on the Web" carries with it a risk. It is only a small risk but it is a risk. You are relying on a social convention within a relatively small part of the overall Internet community for something that is promoted largely on the basis of longevity and persistence. That seems like an odd approach to me.
Posted by: Andy Powell | February 10, 2010 06:00 AM
I agree with Andy's point, that advancing the "doi" URI (e.g. doi:D) carries risks and causes cognitive dissonance. On the one hand, it promotes the DOI "brand" in a compact way; a listing of e.g. doi:10.1109/MIC.2009.93 at a publisher site is compact, distinguishes it from any-old URI, and provides at least of an aura of "actionability," whereas the current practice (see Wiley, IEEE, ACM...) of listing the DOI as Digital Object Identifier: 10.1109/MIC.2009.93 --- sometimes with a link to explanation; sometimes with a proxy link; sometimes neither --- seems to reduce it to "yet other (dumb) identifier."
On the other hand, the risk in socializing the unregistered doi:D form is that it confuses doi: with something that actually has meaning in the popular infrastructure[1]. This may be a relatively minor point that is first reconciled with de facto semantics (e.g. doi:D hard-linked to http://dx.doi.org/D, which is already done); and in some future world, semantically resolving, perhaps aided by wrapping the doi:D reference with some RDFa on the publisher page.[2]
[1] At the risk of further provocation, I'm asserting that existing browser plug-ins that replace doi: encodings with http://dx.doi.org and hdl: with http://hdl.handle.net don't make doi: and hdl: part of the infrastructure. They do illustrate the many possibilities if this was so, however!
[2] At the risk of arguing with [1], it would be an interesting exercise if one was to implement this sort of semantic resolution in the browser, based first on RDFa. Does anyone know of projects that have done this? This would be different than a simple doi: translator plug-in in that it would document the semantics explicitly...
Posted by: John S. Erickson, Ph.D. | February 10, 2010 08:21 AM
@Andy
I am not advocating that DOI continue ad perpetuam to use an unregistered URI scheme. Far from it. As we have earlier demonstrated it was our clear intention to register the "doi:" URI scheme. I think what happened back then (and I am only speaking for myself) was that we got caught out by the combination of a certain bullish resistance to new URI scheme registrations (this coming at a time when the URL/URI debate was still rife) which was coupled with an insufficient application for a new URI scheme (our own read being that we had not gotten a coherent story for DOI dereference).
My understanding anyway is that a new URI scheme registration will be sought after the expected ISO ratification. However, I am unclear as to the particulars as there seems to be no open forum for discussing DOI matters.
@John
I have no particular sympathy for maintaining compatibility with any earlier (or ongoing) experiments to map "doi:" style URIs to actionable "http:" based links.
In fact I am rather against the very notion of DOI somehow being an "actionable" identifier. The notion of "actionability" presumes a context. Are we talking about DOI as a handle - in which case we are looking at mechanisms for resolving handles? Or are we talking about DOI as an HTTP packaged URI - in which case we are talking about proxy servers and the like? The DOI name in itself IMO has no latent "actionability". It needs a springboard.
I am suggesting that it's time for DOI to take some first baby steps and to assume some semantic responsibilities.
Tony
Posted by: Tony Hammond | February 10, 2010 08:55 AM
Tony says,
This question was "between-the-lines" in my comment. First, let us agree that there is some latent semantic when a page or an aggregation asserts the reference doi:D. By "latent," we mean the agent must discover how to resolve the reference; currently with the DOI there are no "standards," other than (in affected browsers) to use a plug-in to attempt a proxy-based resolution. But this is only one interpretation, based on the specific hard-coding of the plug-in and ignorant of context.
I am suggesting that one "baby step" might be to introduce (e.g.) RDFa coding standards for embedding the doi:D syntax. This would minimally provide linking at the (e.g.) linked data level, but also could provide clues for browsers that can interpret encoding --- including owl:sameAs linking to http://dx.doi.org/D version.
In other words, if there is a risk that doi:D might not be "understood," use the tools to make it understood in all the ways that the creator of the current context intended.
Posted by: John S. Erickson, Ph.D. | February 10, 2010 09:27 AM
(I made this comment earlier but it seems to have got lost).
@John - you say "Today we can with confidence make assertions like ...
http://dx.doi.org/D --> owl:sameAs --> doi:D"
I'm not convinced! You'll note that Tony explicitly did *not* make that assertion - rather, he used the much weaker ore:similarTo instead.
If doi:D and http://dx.doi.org/D identify different resources (even if they are both non-information resources) then we have a problem because the resource identified by doi:D (the 'work') is not part of the LD world.
Posted by: Andy Powell | February 10, 2010 10:41 AM
Hi Tony,
Thanks for the response. To me it boils down to something very simple:
If the HTTP form of the DOI identifies a Non-Information Resource (an abstract work) then is that the same abstract work as identified by, e.g. doi:XXXX.
If it is, and I don't see how it could be anything else, then owl:sameAs is the valid relation to use. If we know there is a strong relationship between these identifiers we should publish it.
This doesn't meet your requirement that one or other is a primary identifier, because owl:sameAs explicitly defines them as equivalent.
Cheers,
L.
Posted by: Leigh Dodds | February 10, 2010 10:54 AM
(Sorry Tony, my previous comment definitely seems to have been lost).
To answer your "2 questions"...
We want to bring the resource identified by doi:D (the work) into the world of LD. Agreed?
To do that, we need an http URI about which we can assert http://.../D -- owl:sameAs doi:D. Agreed?
If we can make that assertion about http://dx.doi.org/D then there is no problem and I agree with John.
If we can't make that assertion about http://dx.doi.org/D (either because it identifies an information resource or because it identifies a non-information resource that is different from doi:D) then we need to use a different http URI.
There is some advantage in moving to a different http URI anyway, because it removed some of the current confusion about what http://dx.doi.org/D identifies.
However, there is a legacy issue (as you mentioned during the meeting).
Posted by: Andy Powell | February 10, 2010 11:02 AM
@Andy, @Leigh
I agree (I think) that owl:sameAs could be used. We would need some general agreement on that. (The other player in this space is OAI-ORE and just where might that fit in. That too has a need for a Non-Information Resource URI for the Aggregation. This is still an area for discussion.)
My reason for not using owl:sameAs in the diagram was just in order to be deliberately tentative and to keep my main focus on establishing the need for a 303 status code and making sure that the HTTP URI was clearly marked as a Non-Information Resource.
Tony
Posted by: Tony Hammond | February 10, 2010 11:10 AM
@andy --- I concede that my example assertion...
http://dx.doi.org/D --> owl:sameAs --> doi:D
...is too blunt; maybe for some inference problems it might be "correct," but for others it really isn't.
Perhaps what we're really after is a way to embed a little context with the doi:D. What if we had a (small) family of assertions that more precisely expressed the real meaning or intent that we're chasing; for example, imagine foo:hasHttpUri which would be one directional and has very limited meaning; owl:sameAs isn't and is relatively unconstrained.
Then, doi:D could be embedded via RDFa in a page and "properly" interpreted by a client (possibly a plugin, but not specific to DOI or HDL):
Posted by: John S. Erickson, Ph.D. | February 10, 2010 11:33 AM
@john firstly, your emebedded XML has got trashed so it's hard to see exactly what you are proposing.
Ignoring that, I have a slight worry... when you say "but not specific to DOI or HDL" that might be true in a strict technical sense but I'm not totally sure that it is true more generally.
Here's my concern... is there any significant driver for what you are proposing outside of the use of DOIs (and Handles)?
If the answer is "no", then your proposal is (currently) specific to the DOI (in a non-technical sense). That may not be a problem? But it may be... at least in terms of widespread uptake/understanding.
??
Posted by: Andy Powell | February 10, 2010 12:14 PM
In thinking about this over the course of the day, I fear we may have re-invented XLink...Thoughts?
Posted by: John S. Erickson, Ph.D. | February 10, 2010 03:17 PM
Huh. It seems like a big mistake to me to encourage people to use a URI scheme that is not registered. Makes it hard to figure out what's going on for outsiders -- I've been developing software that uses DOI's for a couple years, and it never occured to me that "doi:" was a URI scheme that my software should recognize.
At the very least, I hope you have it clearly documented somewhere that your community is using an unregistered "doi" scheme, and document the syntax and semantics of that scheme somewhere.
As it is... what if you eventually apply to get a doi scheme registered again, and are rejected again? Now you're stuck using a scheme that will never be registered? Just seems like a bad idea, defeats the purpose of considering it a URI at all.
If you don't want to use the info:doi scheme (just prefer fewer chars?), and don't want to or can't registere doi.... I think it would make a lot more sense to consider "doi:" not a URI scheme, but a non-URI standard 'abbreviation' for the equivalent info:doi:* actual URI.... and to use the actual info:doi:* in any RDF or RDF-like assertions.
If the point of RDF is inter-operability with systems that just know general RDF but doesn't need to know your specific domain... using something that matches the syntax of a URI but isn't actually a registered scheme just seems like adding domain-specific complexity in that's really completely unneccesary.
Posted by: Jonathan Rochkind | February 10, 2010 04:25 PM
I agree it makes sense to think of a DOI URI as identifying a non-information resource, the abstract work.
And I agree there is a place for resolvers without built-in resolving. Although this seems to be somewhat heretical in the RDF/linked data community, so you might have trouble engaging those folks in a story involving a URI without inherent resolvability.
But if it makes sense to use a not-inherently-resolvable URI... I mean you've already got "info:doi:". I'm honestly suspicious that you're going to have much look convincing the powers that be to create a new "doi:" scheme. Half of them are going to have religious objections to a URI without built in resolvability in the first place (and that half will say "what do you need a new scheme for? Just use http:), and the other half will think "we set up info: specifically to be a scheme for specific sub-schemes like DOI, which in fact already HAS an info: sub-scheme, what do they need their own doi: scheme for?"
I don't think there's neccesarily anything wrong with using the "doi:D" form in places like narrative text that do not syntactially _require_ a DOI. But I really think you ought to consider it a non-URI short form of the actual "info:doi:" URI, and use the info:doi: one in any place syntactically requiring a URI (like an RDF assertion).
Posted by: Jonathan Rochkind | February 10, 2010 04:34 PM
Bah, sorry, the above comment is somewhat indecipherable do to wrong words.
Paragraph 1 should read "And I agree there is a place for IDENTIFIERS without built-in resolving."
Last paragraph should read "anything wrong with using the "doi:D" form in places like narrative text that do not syntactially _require_ a URI. "
Posted by: Jonathan Rochkind | February 10, 2010 04:36 PM
Have been travelling the past few days, so sorry to be late to the discussion.
First- a lot of this conversation is predicated on the idea that CrossRef DOIs are applied at the abstract work level. Indeed, that it what it currently says in our DOI Name information and Guidelines. Unfortunately, this is a case where theory, practice and documentation all diverge.
In practice, DOIs are not exclusively assigned at the work level. They are also assigned at the expression and manifestation level. Indeed, they are increasingly being assigned to identify subcomponents and versions of content as well. When you think about it, this variation in practice makes sense. CrossRef DOIs are fundamentally about supporting online citation and you can not adequately support accurate citation at the work level. I will post something separate explaining the thinking behind this, but you can already seem some of this change in thinking in our recent guidelines for assigning DOIs to books (http://www.crossref.org/06members/best_practices_for_books.html)
Second, another chunk of this conversation talks about CrossRef's current recommendation that DOIs be displayed using the do:D form. At our most recent board meeting we decided that we will change this soon. Instead we will be recommending that, for display purposes in web applications, CrossRef DOIs should be shown in HTTP URI form. Note that these recommendations are focused on the display of CrossRef DOIs, but I suspect that we should make a similar recommendation for recording DOIs for machine consumption in web applications. Here it would be worth pointing out that the IDF has recently (last week) made it clearer in it's FAQ that the HTTP URI form of the DOI is recommended for web applications (http://www.doi.org/faq.html#23).
Third- I have deliberately (and, perhaps ponderously) used the phrase "CrossRef DOIs" in the above passages. I've done this because DOIs from other registration agencies might be assigned for different purposes (e.g. supply chain management, licensing, etc.) and may behave differently.
More thoughts later as I try to grok the above thread. Still Jet-lagged.
Posted by: Geoffrey Bilder | February 11, 2010 07:30 AM
Hi Geoff:
I very much appreciate your comments. (It is interesting also to learn from you by way of a blog comment that CrossRef and IDF are both now changing policy and recommending use of the HTTP URI form for the DOI.)
As you will have gathered this post is all about what those HTTP URIs actually mean. What do they identify? Within a linked data application there is a significance in the way that HTTP URIs are processed. Is the current resolution setup (per dx.doi.org) correct and, if so, what does that indeed tell us about the entities that the HTTP URI identifies?
Tony
Posted by: Tony Hammond | February 11, 2010 07:49 AM
I don't think IDF has actually changed any policy. They have just clarified something that was frequently misunderstood.
As for the CrossRef change, the decision was made at Tuesday's board meeting. The change is actually part of a larger review of our current DOI display guidelines and will not be finalised or publicised for a few months yet. Still, thought it relevant to this thread so seemed important to preview.
Posted by: Geoffrey Bilder | February 11, 2010 08:12 AM
Regarding the owl:sameAs versus ore:similiarTo issue, I would like to point at slides 53 and 54 of the OAI-ORE overview slides.
What those attempt to depict is the so-called "appropriate copy problem" . That is multiple copies of the thing with the same DOI available from different publishers/portals. Meaning available at different HTTP URIs. Note that, in the slides, DOI-1 is the info:doi/... version of the DOI. Whereas A-1 and A-2 are intended to be HTTP URIs of ORE Aggregations associated with the DOI-identified thing. One of them could, for example, be http://dx.doi.org/...
I think this scenario suggests that owl:sameAs is probably not kosher in these cases. It also shows how two ORE graphs, each of which describe a DOI-Aggregation (exposed by different publishers/portals) merge in the info:doi/... node. Which is something one would want, I think.
Posted by: Herbert Van de Sompel | February 11, 2010 11:03 AM
Geoff,
thanks for the update. I'm looking at the wording of the answer to question 23 of the IDF FAQ, which says:
"For web applications, the DOI name may be expressed as a HTTP URI. The method for doing so is simply to prepend the DOI with http://dx.doi.org."
Can I take that to imply that:
doi:D -- owl:sameAs --> http://dx.doi.org/D
? That would be a big step forward. It would still leave us with the problem of whether these things identify information resources or non-information resources, but at least we'd know categorically that they identify the same thing.
Posted by: Andy Powell | February 12, 2010 12:30 PM
I think it might be useful to refer to Les Carr's recent post, How Repositories Can Contribute Linked Data, which summarizes his team's work on the JISC dotAC project. In particular, note his point about how they are using content negotiation to return the most appropriate representation of the content in the repository (including content in text/n3 and application/rdf+xml format).
Posted by: John S. Erickson, Ph.D. | February 12, 2010 01:16 PM