(Update – 2007.09.15: Clean forgot to add in the rdf: namespace to the examples for xmp:Identifier in this post. I’ve now added in that namespace to the markup fragments listed. Also added in a comment here which shows the example in RDF/XML for those who may prefer that over RDF/N3.)
So, as a preliminary to reviewing how a fuller metadata description of a CrossRef resource may best be fitted into an XMP packet for embedding into a PDF, let’s just consider how a DOI can be embedded into XMP. And since it’s so much clearer to read let’s just conduct this analysis using RDF/N3. (Life is too short to be spent reading RDF/XML or C++ code. :~)
(And further to Chris Shillum’s comment here on my earlier post Metadata in PDF: 2. Use Cases where he notes that Elsevier are looking to upgrade their markup of DOI in PDF to use XMP, I’m really hoping that Elsevier may have something to bring to the party and share with us. A consensus rendering of DOI within XMP is going to be of benefit to all.)
(Continues.)
Within an XMP packet our first idea might be to include the DOI using the Dublin Core (DC) schema element dc:identifier in minimalist fashion:
@prefix dc: <http://purl.org/dc/elements/1.1/> . <> dc:identifier "10.1038/nrg2158" .
This simply says that the current document (denoted by the empty URI “<>“) has a string property "10.1038/nrg2158" which is of type identifier from the dc (or Dublin Core) schema which is identified by the URI <http://purl.org/dc/elements/1.1/>.
Now, since this is just a DOI and the wider public cannot be expected to know about DOIs, it would surely be better to present the DOI in URI form (doi:) as
@prefix dc: <http://purl.org/dc/elements/1.1/> . <> dc:identifier "doi:10.1038/nrg2158" .
or, using a registered URI form (info:) as
@prefix dc: <http://purl.org/dc/elements/1.1/> . <> dc:identifier "info:doi/10.1038/nrg2158" .
Aside: This shows up a limitation of XMP where the DC schema property value for dc:identifier is fixed as type Text. The natural way to express the above in RDF/N3 would be as:
@prefix dc: <http://purl.org/dc/elements/1.1/> . <> dc:identifier <info:doi/10.1038/nrg2158> .
which says that the value is a URI (type URI in XMP terms), not a string (type Text in XMP terms). We either have to flout the XMP specification or else live with this restriction. We’ll opt for the latter for now.
But, the XMP Spec deprecates the use of dc:identifier since the context is not specific. (Note that that’s what was just discussed above. The limitation is built into XMP which builds on RDF but does not fully endorse the RDF world view.) Instead the XMP Spec recommends using xmp:Identifier since the context can be set using a qualified property as:
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix xmp: <http://ns.adobe.com/xap/1.0/> . @prefix xmpidq: <http://ns.adobe.com/xmp/Identifier/qual/1.0/> . <> xmp:Identifier [ a rdf:Bag; rdf:_1 [ xmpidq:Scheme "DOI"; rdf:value "10.1038/nrg2158" ] ] .
This says the string "10.1038/nrg2158" belongs to the scheme "DOI".
Here we have used the scheme “DOI” and, as noted above, for wider recognition it would be better to employ one of the URI forms, e.g.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix xmp: <http://ns.adobe.com/xap/1.0/> . @prefix xmpidq: <http://ns.adobe.com/xmp/Identifier/qual/1.0/> . <> xmp:Identifier [ a rdf:Bag; rdf:_1 [ xmpidq:Scheme "URI"; rdf:value "doi:10.1038/nrg2158" ] ] .
This says the string "doi:10.1038/nrg2158"belongs to the scheme "URI".
But this is the unregistered URI form (doi:), so should we be using instead the registered form (info:)? Well, turns out that this construct for xmp:Identifier is an rdf:Bag so we can include more than one term. How about using this construct then:
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix xmp: <http://ns.adobe.com/xap/1.0/> . @prefix xmpidq: <http://ns.adobe.com/xmp/Identifier/qual/1.0/> . <> xmp:Identifier [ a rdf:Bag; rdf:_1 [ xmpidq:Scheme "URI"; rdf:value "info:doi/10.1038/nrg2158" ]; rdf:_2 [ xmpidq:Scheme "URI"; rdf:value "doi:10.1038/nrg2158" ] ] .
Now we’ve got both forms, which is fair enough since these are equivalent. In RDF terms we can make the statement that:
doi:10.1038/nrg2158 owl:sameAs info:doi10.1038/nrg2158 .
which asserts that the two URIs are equivalent and that they reference the same resource.
So, what if we want to include a native DOI without the URI garb? We can easily do that:
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix xmp: <http://ns.adobe.com/xap/1.0/> . @prefix xmpidq: <http://ns.adobe.com/xmp/Identifier/qual/1.0/> . <> xmp:Identifier [ a rdf:Bag; rdf:_1 [ xmpidq:Scheme "URI"; rdf:value "info:doi/10.1038/nrg2158" ]; rdf:_2 [ xmpidq:Scheme "URI"; rdf:value "doi:10.1038/nrg2158" ]; rdf:_3 [ xmpidq:Scheme "DOI"; rdf:value "10.1038/nrg2158" ] ] .
OK, that takes care of the XMP direction to use xmp:Identifier, but, while deprecated by XMP, we note that back in the real world folks will be looking at the DC elements which is the schema with the greatest purchase. So, why not also add in a dc:identifier element such as would be used typically for DOI in citations. How about this:
@prefix dc: <http://purl.org/dc/elements/1.1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix xmp: <http://ns.adobe.com/xap/1.0/> . @prefix xmpidq: <http://ns.adobe.com/xmp/Identifier/qual/1.0/> . <> xmp:Identifier [ a rdf:Bag; rdf:_1 [ xmpidq:Scheme "URI"; rdf:value "info:doi/10.1038/nrg2158" ]; rdf:_2 [ xmpidq:Scheme "URI"; rdf:value "doi:10.1038/nrg2158" ]; rdf:_3 [ xmpidq:Scheme "DOI"; rdf:value "10.1038/nrg2158" ] ]; dc:identifier "doi:10.1038/nrg2158" .
Right, so we’ve taken care of the identfiers. But maybe there’s something missing? There’s no link to the DOI proxy. For widest applicability we should not assume prior knowledge of the DOI system. Perhaps we could include this link using the property dc:relation? Seems feasible though would really like to get some feedback on this. Any ideas?
So here, then, is a fairly full and complete expression of DOI within the XMP packet.
@prefix dc: <http://purl.org/dc/elements/1.1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix xmp: <http://ns.adobe.com/xap/1.0/> . @prefix xmpidq: <http://ns.adobe.com/xmp/Identifier/qual/1.0/> . <> xmp:Identifier [ a rdf:Bag; rdf:_1 [ xmpidq:Scheme "URI"; rdf:value "info:doi/10.1038/nrg2158" ]; rdf:_2 [ xmpidq:Scheme "URI"; rdf:value "doi:10.1038/nrg2158" ]; rdf:_3 [ xmpidq:Scheme "DOI"; rdf:value "10.1038/nrg2158" ] ]; dc:identifier "doi:10.1038/nrg2158"; dc:relation "http://dx.doi.org/10.1038/nrg2158" .
Ta-da!
(Of course, this is all premised on having freedom in writing out the XMP packet. If one is dependent on commercial applications to write out the packet then things may be different. Actually, they will be very different. They may not even be workable.)
Feedback would be very welcome.

Hi Tony,
A couple of quick comments:
(i) On the use of the dc:identifier property in your fourth N3 example (the “aside” – which I appreciate is tangential to what follows in your post)
@prefix dc: .
> dc:identifier .
says not that the resource being described is identified by the URI info:doi/10.1038/nrg2158 (which I think is what you want to say), but rather that the resource is identified by a second resource, itself denoted by the URI info:doi/10.1038/nrg2158.
To say that the resource being described is identified by the URI info:doi/10.1038/nrg2158 I think the N3 should read something like
@prefix dc: .
> dc:identifier “info:doi/10.1038/nrg2158″^^ .
(ii) In the final example, I guess it’s hard to say that the suggested use of the dc:relation property is “wrong”, but all that triple says is that the resource being described (i.e. the resource identified by the plain literal “doi:10.1038/nrg2158″) is related in some unspecified way to the plain literal “http://dx.doi.org/10.1038/nrg2158″ (to the literal, not to the resource denoted by that URI).
This may be an obvious question from a relative newcomer to the Semantic Web, but when you say…
…where exactly is that statement made? And wouldn’t we have to make a statement about all of the forms of the identifier you mentioned in this post (including the dc.identifier one) in order to reduce the proliferation of equivalent identifiers? Do we risk a proliferation of identifiers that are equivalent when someone gets the equivalence markup wrong?
Oops, I didn’t escape the < and > characters in the first part of my comment above.
So trying again:
(i) On the use of the dc:identifier property in your fourth N3 example (the “aside” – which I appreciate is tangential to what follows in your post)
@prefix dc: <http://purl.org/dc/elements/1.1/> .
<> dc:identifier <info:doi/10.1038/nrg2158> .
says not that the resource being described is identified by the URI info:doi/10.1038/nrg2158 (which I think is what you want to say), but rather that the resource is identified by a second resource, itself denoted by the URI info:doi/10.1038/nrg2158.
To say that the resource being described is identified by the URI info:doi/10.1038/nrg2158 I think the N3 should read something like
@prefix dc: <http://purl.org/dc/elements/1.1/> .
<> dc:identifier “info:doi/10.1038/nrg2158″^^<http://www.w3.org/2001/XMLSchema#anyURI> .
(I think the second part above on the use of dc:relation reads OK).
Hi Peter:
.
Re the equivalence assertion. That would be made as and when it appears. (Doesn’t exist objectively until stated.)
I think such equivalence statements only need to be stated as needed. If one is happy working n the context of doi:10…, then fine. Likewise if one is working in context of info:doi/10…, then also fine. One only needs to consider equivalence when crossing between one URI and another. There is no requirement that every instance of a URI be stated to be equivalent to all its other identities.
Wouldn’t worry too much about proliferation or by having unique identifiers. Somewhat like the Perl adage: TMTOWTDI
Tony
My question is more along the lines of “where is the assertion made?” This may be too low-level for your discussion, but I was wondering if the RDF triples asserting equivalence are in the XMP or somewhere else. Maybe an OAI-ORE document carry the assertion?
Hi Pete:
Nopes. I’ve screwed up my eyes and looked out into the far-off distance. And no, I don’t think I agree with you on your point #1. What I have asserted in that statement is that *this* resource (i.e. the current resource) has such an identifier. (And the current resource is identified with an empty URI which defaults to the in-context base URI.) I don’t see where the notion of any *second* resource creeps in. Sure there are different URIs but that has no bearing on the resources that are referenced. (That is, one cannot infer *anything* on the basis of a URI alone.)
If you want to get real picky then, yes, they are different. The current resource is a representation of the work which is identified by the DOI and is a metadata description. It is embodied as a standalone description or embedded within a media file. The representation is a manifestation with a particular instance URI. The work on the other hand is identified with an unchanging, invariant URI – the DOI.
(Actually, we have used two URI forms – one registered, the other as yet unregistered – to represent the DOI in a URI context. These can be asserted to be equivalent. The HTTP form for the DOI proxy is very definitely a different resource being a web service. In short,
doi: == info:doi/
doi: != http://dx.doi.org/
)
Re your second point, this is a (serious) limitation of XMP, where values for the termn “identifier” in the scheme “dc” are restricted to be “Text” rather than “URI”. As indicated in a comment on my following post the only practicable alternatives with respect to XMP are to ignore it, change it, flout it, or embrace it. Which would you suggest?
Cheers,
Tony
Peter:
Re my earlier comment, the assertion is only made when it needs to be made. It is only when dealing with both URI forms at the same time that one might stop to say: “Oh, by the way, (the resource referenced by) URI A is the same as (the resource referenced by) URI B.” Having said that, the XMP description I gave makes clear that the two identifiers are equivalent by listing them both as equally-weighted values in the rdf:Bag which is the value of xmp:Identifier.
And yes, I do think that a complete OAI-ORE description might inventory both forms.
Tony
OK, so let’s assume there are some folks who would prefer to see the RDF/XML for the example above. Well, here it, is via cwm which is parsing the RDF/N3 and outputting the RDF/XML: