Is FRBR the OSI for Web Architecture?

(This post is just a repost of a comment to Geoff’s last entry made because it’s already rather long, because it contains one original thought – FRBR as OSI – and because, well, it didn’t really want to wait for moderation.)
Hi Geoff:
First off, there is no question but that CrossRef was established to take on the reference linking challenge for scholarly literature. (Hell, it’s there, as you point out, in the organization name – PILA – as well as in the application name – CrossRef.)
But one should also remember that DOI as it was sold at the time was promising so much more. I disagree with you that the participants back then were as wholly innocent of the FRBR terms as you might suggest. Certainly there were ample presentations on DOI that sought to elucidate those relationships.
No matter. FRBR is a useful reference model to clarify some of these concepts. But not one that we are overly concerned with at this time. Nor even whether DOI maps one to one onto a given FRBR layer. What we are more concerned with on a pragmatic level is how DOI maps onto the Web architecture and especially how it plays along with Linked Data concepts.
(Aside: A propos FRBR we might be in danger of repeating the OSI mistake for standardizing the network layer model. Ultimately that was maintained as a reference model but dropped as a concrete model in favour of the TCP/IP stack. Could be that FRBR is our OSI and Linked Data is our TCP/IP stack? That is, we might have to settle on the coarser data model in order to get a coherent story out the door where all can agree.)
You say:

“we need a mechanism to distinguish between when we are getting the thing pointed to by the CrossRef DOI (the PDF , HTML, etc.) as opposed to “something about the thing” (e.g. the landing page, metadata record, etc.)”

But that is exactly what we were chasing up in the earlier posts (both my DOI: What Do We Got? and John Erickson’s DOIs, URIs and Cool Resolution). You want to distinguish between a thing and a description about a thing. And Web architecture does just that: it distinguishes between Information Resources (i.e. the things) and Non-Information Resources (i.e. descriptions of the things).
Now is this something that CrossRef can truly distinguish and make apparent in its service architecture? If we retain the notion of landing page we are already essentially saying that a CrossRef HTTP URI identifies a decsription of the resource, i.e. a Non-Information Resource, or Other Resource, and that is properly indicated within the architecture by returning a “303 See Other status” code.
I think that’s all we’re saying at the moment as a first step.
Web architecture wants to know if the DOI HTTP URI is a thing or description of a thing. I say the latter. You seem to suggest in your comment the latter too. I wonder if we could get a vote on that.
And btw, I am not suggesting that CrossRef needs to dive into the business of “tracking compoend documents in their entirety”. Far from it. Lets just get a common resource architecture agreed publicly and then we can build on that.
This observation I received in a private email is something I fully support:

“The real problem is what doi http uri identify on the web. Everything flows from the answer to that Q.”

Tony

10 thoughts on “Is FRBR the OSI for Web Architecture?

  1. Jonathan Rochkind

    When someone uses a DOI to cite a reference — they are not intending to cite a specific web document, are they? They are not meaning to cite a particular HTML (or PDF, but not both) document on a particular platform. They are meaning to cite “the article”. Which at first without examining it some of us were suggesting was ‘the work’, but as someone somewhere suggested, if the article exists translated in several langauges or verisons, they probably are meaning to cite a particular version in a particular langauge.
    So the HTTP URI of a DOI may correspond to a particular web document. Although the http://dx.doi.org/ can easily CHANGE what document it points to, it’s not actually an identifier with an organizational commitment to persistence if you view it as representing a _particular_ document. The publisher can change what _particular_ web document it points to whenever they want. (Just metadata; the actual text; in various formats, on various platforms).
    I hope these propositions make it clear why this stuff actually is worthwhile to consider a bit; it’s not quite enough to say “well, of course an http doi URI represents a web document, it’s the web.” If you do that, the thing represented http DOI URI is not the thing an author means to cite when using a DOI (does this mean they shoudln’t use an http DOI URI to cite?), and it’s also an identifier without an organizational commitment to persistence, it slips and slides at the publishers whim as far as what _web document_ it represents.
    I’d suggest this also points out how, heretical as it is to some in the linked data community, using a not-automatically-resolveable URI can make things _clearer_.
    I’d suggest that the info:doi:D URI indeed represents a non-information-resource, a particular expression of the article. And authors should be encouraged to use this one (or the “doi:D” form, which I’d consider just a shortcut for registered “info:doi:D”) when they mean to cite an article. The various http URIs and what they represent… well, perhaps it becomes less important then, but if you insist that http://dx.doi.org/D represents a web document and an information resource, then you have to realize that _which_ web document/information resource it represents will change over time, which isn’t very good identifier practice. Probably better to say that that one too represents a non-information-resource expression, which will give you a 303 redirect to a URI representing a _particular_ web document (and the end-point of that 303 redirect will indeed change over time, which is fine).

  2. Tony Hammond

    Nicely reasoned, Jonathan. As you say: “it slips and slides”. But the CrossRef requirement for providing a landing page (or response page) with metadata does not. And that is a description of the document.
    So, if dx.doi.org could just issue a ’303′ then maybe CrossRef could indeed go the ball.
    Of course, there is also the “small” matter of who maintains dx.doi.org. If this is service is available to a federation of DOI registration agents with their own separate agendas then we might be slip/sliding back to the start.
    I’ve said it before but will say it again: DOI risks to mean nothing if it can only provide a blind redirect service with no associated semantics. I am hoping that a common sense can prevail.

  3. Geoffrey Bilder

    The OSI-FRBR comparison is good one and I think that (at great length) we are agreeing that we shouldn’t be overly concerned about whether the DOI maps to a particular FRBR layer. This was basically my original point. I was concerned that people were citing the DOI guidelines and assuming that DOI=Work when it doesn’t. And again, my point about “thing” vs “about a thing” was basically agreement, albiet noting that just because a landing page was “about a thing”, it wasn’t necessarily “about a work”.
    BTW- while we are talking about landing pages- I don’t think we can assume that current CrossRef DOIs always point at landing pages. See for instance:
    http://dx.doi.org/10.1107/S0108767308022149/zm5045sup1.pdf
    or
    http://dx.doi.org/doi:10.4997/JRCPE.2009.407
    Again, I think we agree that the problem at the moment is that we don’t know when a publisher is linking to a landing page and when they are not. We need to use LD principles to make this explicit.
    In summary- I hope I am not being cast as anti linked data here. The irony would be too much given that I’ve spent a good deal of my time at CrossRef preaching about it.

  4. Geoffrey Bilder

    @jonathan I’m not sure why you say that:

    it’s also an identifier without an organizational commitment to persistence, it slips and slides at the publishers whim as far as what _web document_ it represents.

    The entire point of publishers joining CrossRef is that they are making an organizational commitment to update their DOI links and associated metadata appropriately. In fact, we’ve just recently updated our terms & conditions to strengthen this commitment by making it clear that a member’s obligation to maintain CrossRef metadata and links survives even if they leave CrossRef and that CrossRef reserves the right to redirect a former member’s links appropriately should they not live up to this obligation (our so-called “Hotel California clause”). See the “PIla Membership Agreement changes” section here:
    http://www.crossref.org/10quarterly/quarterly.html
    Now, is this a ‘guarantee?’ No- but it is about as much of a commitment as you can get our of any organization. Between that and the IDFs organizational commitment to maintain dx.doi.org I think we are doing pretty well.
    @tony I am interested to hear you express concern that dx.doi.org might have to service ” a federation of DOI registration agents with their own separate agendas”. Perhaps you could elaborate on your concern here?

  5. Tony Hammond

    Geoff: I’m glad to see that you have a positive read on the OSI/FRBR comparison. I think it’s quite helpful too.
    But before we succumb to a general outbreak of agreements and nodding, I have to pick up on two points – one small (but bad), the other contentious:

    1. The second URL you cite has an embedded string “doi:”. It just so happens that the dx.doi.org resolver will silently drop this and this “feature” was added as a user “convenience”. This is absolutely appalling because it confuses identifiers with service behaviours, the more so since we are currently talking about HTTP URIs as *identifiers*. And now we seem to have carelessly spawned a bastard alias. Really must take steps not to show such identifiers again. (Note that Andy Powell earlier picked up one of these queerly designed HTTP URIs. Not!)
    2. The other point – more contentious – is that I would maintain that the link direct to the PDF is still a description of the document in that it does not represent the document in it’s entirety but one facet of it. (I use the word “facet” to distinguish from “representation” which has a more restrictive meaning.) I am thus of the opinion that whether the DOI resolve to a landing page, to an HTML full text, or to a PDF full text that these are all facets of the underlying object and thus can be treated as “Other Resources” and served by a 303. That is a view that I am sure some will dislike. But unfortunately the opposite view which maintains that PDF or HTML full text *is* the object identified by the DOI will ultimately lead to the disintegration of any semantics to be associated with DOI. :)
  6. Tony Hammond

    @geoff
    Just saw your second post which asked for clarification on DX concern. The point is just this: if RA A maintains DOIs that have set semantics A and chooses to serve a 303, then if RA B maintains DOIs with an alternate set semantics and chooses to serve a 307, and if both RAs are using the same global resolver, then who lucks out? Do we get 303′s, 307′s, or will it depend on the DOI, or will we just default back to 302′s. Does anybody really have a clue what is happening here?
    I guess bluntly the question on DX is: Is DX a dedicated CrossRef DOI resolver?

  7. Jonathan Rochkind

    Geoff: There is organizational commitment to persistence of a DOI, and of resolution through the DOI/Handle system.
    It is not clear to me whether there is an organizational commitment specifically to the http://dx.doi.org URI or not (by whom?). But I’m no expert in what’s up with DOI, it confuses me sometimes.
    Per that confusion, does an http://dx.doi.org ALWAYS point to a description, rather than to full text? I don’t know the answer!
    But to me it’s confusing to say that an HTTP DOI URI represents a web document describing a publication — rather than a publication itself. Precisely because most people think of a DOI as representing an article, and it seems like they’re going to be inclined to think of a URI encapsulating a DOI the same way.
    But if an HTTP DOI URI instead represents a particular web document, which may change, but isn’t an article, it’s just a metadata page for an article — does that mean that if someone uses an HTTP DOI URI intending to cite an article, they’re not actually doing what they think, instead they’re just citing a web document describing an article? Even if they INTEND to do that, the fact that the http://dx.doi.org can point to Web Document A on one day, and Web Document B the next day… seems problematic.
    The more I think about it, the more sense it makes to say that the http://dx.doi.org URI represents the (non-web document) Publication itself, and will 303 redirect you to a URI representing a web document (such as a summary description).
    Under that model, you don’t need to worry about a URI representing one thing one day and one thing the next. The http://dx.doi.org URI always represents the publication the DOI was assigned to. Where it redirects you with a 303 may change from one day to the next, no problem. And under that model, if someone uses the http://dx.doi.org intending to reference/cite an article (not a web document description), then that’s quite right.

  8. Jonathan Rochkind

    And yeah, also I’m with Tony on his last comment there.
    I guess dx.doi.org is for CrossRef DOIs only? So even if you have a DOI, and you want to make an HTTP URI for it… you’ve first got to figure out which registrar it ‘belongs’ to?
    I think this may be a reason to prefer the not-inherently-resolvable info:doi:D form (or the equivalent doi:D) form. It doesn’t matter what registrar ‘owns’ it, the URI is clear. It doesn’t matter, as Tony is concerned, if different registrars decide to do different things for resolution — because there is no built-in resolution (except extracting the handle from it and using the actual Handle system). And it’s more clear that info:doi:D or doi:D represents a non-web-resource (the Publication itself), precisely because it doesn’t “inherently” resolve to any particular web document — and without relying on any particular registrar to do any particular thing.

  9. Chris Shillum

    In response to Jonathan’s last post, dx.doi.org is the proxy for the entire DOI system, and will resolve any DOI regardless of RA. There is currently a proposal to create a “branded” resolver specific to CrossRef DOIs (e.g. somethink like doi.crossref.org), however the discussion on this topic has thus far been limited to branding and human-readability concerns. I think that CrossRef very carefully needs to consider this implications for machine-machine interaction and linked data of this change, and would be interested in any thoughts on this.

  10. Andy Powell

    Re: Could be that FRBR is our OSI and Linked Data is our TCP/IP stack? That is, we might have to settle on the coarser data model in order to get a coherent story out the door where all can agree.
    I’m not sure that I totally buy that analogy since the use of Linked Data doesn’t preclude the need for domain modelling (of which FRBR is one possible example). I’m not sure that anyone anticipates that the ‘information resource’ / ‘non-information resource’ level is (on its own) sufficient for real-world applications of Linked Data.
    Re: And Web architecture does just that: it distinguishes between Information Resources (i.e. the things) and Non-Information Resources (i.e. descriptions of the things).
    Surely you’ve got these the wrong way round? (Although in the case of, say, a born-digital document your ‘thing’ (as opposed to your ‘description of the thing’) may also be an information resource (assuming that some DOIs identify web documents).

Comments are closed.