« August 2007 | Main | October 2007 »

September 30, 2007

Authors in Context?

On the subject of author IDs (a subject CrossRef is interested in and on which held a meeting earlier this year, as blogged about here), this post by Karen Coyle "Name authority control, aka name identification" may be worth a read. She starts off with this:

"Libraries do something they call "name authority control". For most people in IT, this would be called "assigning unique identifiers to names." Identifying authors is considered one of the essential aspects of library cataloging, and it isn't done in any other bibliographic environment, as far as I know."
and concludes thus:
"Perhaps the days of looking at lists of authors' names is over. Maybe users need to see a cloud of authors connected to topic areas in which they have published, or related to books titles or institutional affiliations. In this time of author abundance, names are not meaningful without some context."

September 25, 2007

XMP-Ville

Been so busy looking into the technical details of XMP that I almost forgot to check out the current landcsape. Luckily I chanced on these articles by Ron Roszkiewicz for The Seybold Report (and apologies for lifting the title of this post from his last). The articles about XMP are well worth reading and chart the painful progress made to date:

From the earlier characterization of XMP as "underachieving teenager" Roszkiewicz is cautiously optimistic that IDEAlliance's XMP Open initiative (an initiative to advance XMP as an open industry specification) will help outreach and foster adoption of this fledgling technology.

(Continues.)

There has been some activity here. Following on from an industry open day event last year:

there have been two metadata summits earlier this year co-sponsored by Adobe Systems and IDEAlliance:Promising bestirrings. (And also with the recent public airing of the PRISM 2.0 draft with its support for XMP which was reviewed at the PRISM WG F2F last week for publication as a standard.) But generally the state of XMP-Ville at this time is rather sleepy. There's not much by way of news on the XMP Open website. At least promise, if no promises.

Back to the articles. The really interesting thing of note (to me at any rate) in Roszkiewicz's review of the last summit is the almost total absence of any mention of the Web. It is as if XMP users (both consumers and providers) would be content to play within the walled garden of the CS3 product portfolio. I don't get that. The Web changes everything.

Although XMP maps its native data model to RDF (and RDF is an inherently open technology allowing arbitrary schemas to be mixed at will), XMP betrays its application roots by seeming to want to impose some kind of veto on the schemas to be used. Or rather, how they are to be used. It also seems to be all fussed up by centralized notions such as a cross-mapping schema registry. (As if that were part of its remit.)

As Roszkiewicz notes:

"The consortia [IDEAlliance and the stakeholders] will have ownership responsibility for name space registry, cross-map definition and support, standards group outreach and coordination, compliance certification and logo and the “XMP Open” brand."

And elsewhere:

"So while the standard for XMP might be defined, the data that will be fed into files is not, for want of an IDEAlliance-like standards management body to filter and rationalize the many [schema] into a few."

And then more worryingly, this:

"That schema should be managed by a government agency such as the Library of Congress which could manage the dictionaries and schema, certify them, register the namespace and provide a centralized location to distribute them."

Well, I don't see what this matters to the core technology of XMP which is just a specification for the sneaking in of an XML document into arbitrary media files. And the use of RDF/XML would seem to be a further indication that XMP is to be independent of the schema used. The use of both RDF plus XML technologies should allow XMP to present itself as a framework or "platform" for metadata exchange and to get out of the way of what is actually carried by the XMP packets. App neutrality, if you will.

Again the notion of Web as just an alternate channel is apparent in the third of the articles where Roszkiewicz talks about the Device Central tool which allows a user of a CS3 product to "Save for Web or Devices...". This article talks about the clumsy handling of metadata in such device saves, whereby the packet may be abbreviated - and metadata terms dropped - when printing to small footprint devices. Not a feature to be retained for too long, I would hope.

So, where are we currently with XMP? According to Roszkiewicz:

"As the developer of a suite of applications that relies on XMP as the vehicle for managing metadata, Adobe has too much invested in its development to allow any substantive changes by outsiders. So “open” primarily will mean open to suggestions, with an official channel in place to process them."

And as to that channel?

"As the principal conduit to Adobe for changes to XMP, IDEAlliance will act as a gateway and support organization to the user community - a role for which it is well-suited. ... As a sponsor-supported, not-for-profit organization, IDEAlliance can serve as a credible buffer for Adobe to the user community and synchronize and standardize third-party development efforts."

And goes on:

"The principal unanswered questions at this point are: Will the stakeholders represent all of the key industries; will Adobe provide timely support for considering user input and updating the XMP Toolkit; and can Adobe, IDEAlliance and IDEAlliance workgroups manage all of the responsibilities that will fall upon them when the deal is struck. The hand-over doesn’t seem to have taken place yet, and we are still examining the scope and feasibility of the proposal."

It seems to me that Adobe is the party girl, IDEAlliance is the special guest, and CrossRef publishers are the neighbourly gatecrashers who want to play with the toys. And not perhaps too nicely neither. I just hope that the toys aren't taken away from us. They're too much fun.

Ironic really that we're on the outside of this since scholarly publishers have a very clearcut grasp of what to do with metadata and a ready application in terms of citation linking. XMP is worth it.

September 20, 2007

The Name's The Thing

I'm always curious about names and where they come from and what they mean. Hence, my interest was aroused with the constant references to "XAP" in XMP. As the XMP Specifcation (Sept. 2005) says:

"NOTE: The string “XAP” or “xap” appears in some namespaces, keywords, and related names in this document and in stored XMP data. It reflects an early internal code name for XMP; the names have been preserved for compatibility purposes."

Actually, it occurs in most of the core namespaces: XAP, rather than XMP.

(Continues.)

An earlier XMP Specification from 2001 (v. 1.5 - and see here for an earlier post of mine about XMP's missing version numbers, and here about Adobe's lack of archiving for XMP specifications) says almost the same thing:

"NOTE: Many namespaces, keywords, and related names in this document are prefaced with the string “XAP”, which was an early internal code name for XMP metadata. Because the Acrobat 5.0 product shipped using those names and keywords, they were retained for compatibility purposes."

So, there's no indication in either of these specifications as to what the original name signified.

But then I turned up this issue in the Adobe Developer Knowledgebase:

"Known Issue: The metadate framework name was changed from XAP to XMP
 
Summary
XAP (Extensible Authoring Publishing) was an early internal code name for XMP (Extensible Metadata Platform).
 
Issue
Why are many namespaces, keywords, data structures, and related names in the documents and XMP toolkit code prefaced with the string "XAP" rather than "XMP"?
 
Solution
XAP (Extensible Authoring and Publishing) was an early internal code name for XMP (Extensible Metadata Platform) metadata. Because Acrobat 5.0 used those names, they were retained for compatibility purposes. XMP is the formal name used the framework specification."


Aha! Now it's all clear. And now I'm also wondering if this original name still reflects Adobe's thinking on the purpose of XMP that it be primarily an authoring utility rather than a workflow utility. That is, is Adobe's XMP more geared to individual authors of Adobe's Creative Suite products entering in metadata by hand as part of the authoring act, rather than as a batch entry process within an automated publishing workflow? The emphasis that Adobe put on Custom File Info panels for their CS products would seem to foster the view that Adobe see XMP as an interactive authoring device for adding metadata. But what about the publishers and their workflows? The SDK is a rather poor effort at garnering any widespread support of XMP within the publishing industry.

September 19, 2007

ACAP - Any chance of success?

ACAP has released some documents outlining the use cases they will be testing and some proposed changes to the Robots Exclusion Protocol (REP) - both robots.txt and META tags. There are some very practical proposals here to improve search engine indexing. However, the only search engine publicly participating in the project is http://www.exalead.com/ (which according to Alexa attracted 0.0043% of global internet visits over the last three months). The main docs are "ACAP pilot Summary use cases being tested", "ACAP Technical Framework - Robots Exclusion Protocol - strawman proposals Part 1", "ACAP Technical Framework - Robots Exclusion Protocol - strawman proposals Part 2", "ACAP Technical Framework - Usage Definitions - draft for pilot testing".

What would cause other search engines to recognize the ACAP protocols rather than ignore them? A lot of publishers implementing this and requiring search engines to recognize it to index content could put pressure on the engines. Maybe.

Style Guides Recommend DOI strings

A couple of recent posts - from Scott Memorial Library at Jefferson University and IFST at Univ of Delaware- note that the AMA and APA style guides now recommend using a DOI, if one is assigned, in a journal article citation.

A citation in the APA style with a DOI would be:

Conley, D., Pfeiffera, K. M., & Velez, M. (2007). Explaining sibling differences in achievement and behavioral outcomes: The importance of within- and between-family factors. Social Science Research36(3), 1087-1104. doi:10.1016/j.ssresearch.2006.09.002

In the AMA style a reference would be:

Kitajima TS, Kawashima SA, Watanabe Y. The conserved kinetochore protein shugoshin protects centromeric cohesion during meiosis. Nature. 2004;427(6974):510-517. doi:10.1038/nature02312

This is great news. I haven't looked at the full style guides but it's not clear if information is given about linking DOIs via http://dx.doi.org/

Information on the APA Style Guide is available - http://apastyle.apa.org/ with specific info on electronic references, URLs and DOIs and here is the AMA info.

This raises the existential question of a DOI as a URI. Is

Conley, D., Pfeiffera, K. M., & Velez, M. (2007). Explaining sibling differences in achievement and behavioral outcomes: The importance of within- and between-family factors. Social Science Research36(3), 1087-1104. doi:10.1016/j.ssresearch.2006.09.002 http://dx.doi.org/10.1016/j.ssresearch.2006.09.002

unnecessary or redundant?

September 15, 2007

Chapter 9 - The Closed Book

Hadn't really noticed before but was fairly gobsmacked by this notice I just saw on the DOI® Handbook:

**Please note that Chapter 9, Operating Procedures is for Registration Agency personnel only.**

DOI® Handbook
doi:10.1000/182
http://www.doi.org/hb.html

And, indeed, the Handbook's TOC only reconfirms this:

9 Operating procedures*

*The RA password is required for viewing Chapter 9.

9.1 Registering a DOI name with associated metadata
9.2 Prefix assignment
9.3 Transferring DOI names from one Registrant to another
9.4 Handle System® policies and procedures
9.4.1 Overview
9.4.2 Policies and Procedures
9.4.3 Requirements for Administrators of Resolution Services
9.4.4 Protocols and Interfaces
9.5 DOI® System error messages

That's spooky. A book with a hidden chapter. I really don't like that at all. Especially on a book aiming to provide general information and guidance. Seems to be that if that information needs to be kept private to RA's then it has no business rubbing shoulders with public information. I would suggest that the material be opened up or else moved out. Makes me feel so second class.

Custom Panel for CC

Creative Commons now have a custom panel for adding CC licenses using Adobe apps - see here.

Interesting on two counts:

  • Machine readable licenses
  • XMP metadata

But I still think that batch solutions for adding XMP metadata are really required for publishing workflows. And ideally there should be support for adding arbitrary XMP packets if we're going to have truly rich metadata. I rather fear the constraints that custom panels place upon the publisher.

September 13, 2007

Last Orders Please!

Public comment period on the PRISM 2.0 draft ends Saturday (Sept. 15) ahead of next week's WG meeting to review feedback and finalize the spec.

(I put in some comments about XMP already. Hope they got that.)

September 11, 2007

The Second Wave

You might have been wondering why I've been banging on about XMP here. Why the emphasis on one vendor technology on a blog focussed on an industry linking solution? Well, this post is an attempt to answer that.

Four years ago we at Nature Publishing Group, along with a select few early adopters, started up our RSS news feeds. We chose to use RSS 1.0 as the platform of choice which allowed us to embed a rich metadata term set using multiple schemas - especially Dublin Core and PRISM. We evangelized this much at the time and published documents on XML.com (Jul. '03) and in D-Lib Magazine (Dec. '04) as well as speaking about this at various meetings and blogging about it. Since that time many more publishers have come on board and now provide RSS routinely, many of them choosing to enrich their feeds with metadata.

Well, RSS can be seen in hindsight as being the First Wave of projecting a web presence beyond the content platform using standard markup formats. With this embedded metadata a publisher can expand their web footprint and allow users to link back to their content server.

Now, XMP with its potential for embedding metadata in rich media can be seen as a Second Wave. Media assets distributed over the network can now carry along their own metadata and identity which can be leveraged by third-party applications to provide interesting new functionalities and link-back capability. Again a projection of web presence.

(Continues.)

XMP has much in common with RSS 1.0. They are both profiles of RDF/XML. They are both flawed in certain respects because of self-imposed limitations. But they both build on a robust and open data model for the web (RDF) and are reasonably open, at least they are extensible. One (RSS 1.0) was defined in an open process by committee, the other is an open (i.e published) specification provided by a vendor.

From our point of view both specifications are sufficiently advanced to be immediately useful. I'm not sure how one could interact with the further development of either specification. RSS 1.0 is essentially frozen with Atom being posed as a successor technology, although Atom does not conform to the RDF model. (The upshot is that an RSS 1.0 feed can be consumed completely by an RDF-aware application, while an Atom feed would need to be pre-processed before any RDF "goodness" could be gleaned from it.) By contrast, XMP is a vendor-defined technology and alive, if not perhaps kicking. I am unaware of any process to formally contribute to the XMP development apart from shouting from the terraces. None the less, both technologies are usable as is.

It is curious that no consistent packaging (and delivery) of metadata has yet been achieved with HTML, the original web interface. The HTML <title> and <meta> elements are employed by publishers with various degrees of consistency. There are also RDF islands that can be embedded within HTML comments (as used e.g. by CC licenses). And then there are COinS objects. But it's all a bit of a mish-mash to date. Certainly, I don't recall seeing any guidelines from CrossRef as to how machine readable metadata (even markup for the DOI itself) may be embedded within HTML pages, rather than on HTML pages for human readers.

This lack of uniform metadata deployment for HTML pages could be something to do with context. With RSS and XMP we are dealing with remote objects, whereas with HTML we are generally accessing this directly on the content server and so have a semantic context. It could be though that metadata delivery from HTML pages will finally be more uniformly available with the further development of standards such as microformats and especially RDFa, GRDDL, etc. It is also interesting to note that an XMP packet could just as easily be embedded within the HTML page, and if this technology were to be adopted more widely for embedding in other media assets then why not consider the same technology for ordinary web pages?

I can't help feeling though that XMP has a lot of promise and is very timely. There are only three real obstacles: creating XMP packets, writing them and reading them. To my mind, once one has a good grasp of XMP then creating the packets can be done with common tools. The same, more or less, for reading the packets. I have shown earlier that this is readily achievable. The only major block is writing the packets into media files although there is support for create/write (if patchy) by open source libraries, as well as there being support (perhaps limited) from products for create/write. But, anyway, it's certainly do-able.

Marking up DOI

(Update - 2007.09.15: Clean forgot to add in the rdf: namespace to the examples for xmp:Identifier in this post. I've now added in that namespace to the markup fragments listed. Also added in a comment here which shows the example in RDF/XML for those who may prefer that over RDF/N3.)

So, as a preliminary to reviewing how a fuller metadata description of a CrossRef resource may best be fitted into an XMP packet for embedding into a PDF, let's just consider how a DOI can be embedded into XMP. And since it's so much clearer to read let's just conduct this analysis using RDF/N3. (Life is too short to be spent reading RDF/XML or C++ code. :~)

(And further to Chris Shillum's comment here on my earlier post Metadata in PDF: 2. Use Cases where he notes that Elsevier are looking to upgrade their markup of DOI in PDF to use XMP, I'm really hoping that Elsevier may have something to bring to the party and share with us. A consensus rendering of DOI within XMP is going to be of benefit to all.)

(Continues.)

Within an XMP packet our first idea might be to include the DOI using the Dublin Core (DC) schema element dc:identifier in minimalist fashion:


@prefix dc: <http://purl.org/dc/elements/1.1/> .
<> dc:identifier "10.1038/nrg2158" .

This simply says that the current document (denoted by the empty URI "<>") has a string property "10.1038/nrg2158" which is of type identifier from the dc (or Dublin Core) schema which is identified by the URI <http://purl.org/dc/elements/1.1/>.

Now, since this is just a DOI and the wider public cannot be expected to know about DOIs, it would surely be better to present the DOI in URI form (doi:) as


@prefix dc: <http://purl.org/dc/elements/1.1/> .
<> dc:identifier "doi:10.1038/nrg2158" .

or, using a registered URI form (info:) as

@prefix dc: <http://purl.org/dc/elements/1.1/> .
<> dc:identifier "info:doi/10.1038/nrg2158" .

Aside: This shows up a limitation of XMP where the DC schema property value for dc:identifier is fixed as type Text. The natural way to express the above in RDF/N3 would be as:


@prefix dc: <http://purl.org/dc/elements/1.1/> .
<> dc:identifier <info:doi/10.1038/nrg2158> .

which says that the value is a URI (type URI in XMP terms), not a string (type Text in XMP terms). We either have to flout the XMP specification or else live with this restriction. We'll opt for the latter for now.

But, the XMP Spec deprecates the use of dc:identifier since the context is not specific. (Note that that's what was just discussed above. The limitation is built into XMP which builds on RDF but does not fully endorse the RDF world view.) Instead the XMP Spec recommends using xmp:Identifier since the context can be set using a qualified property as:


@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xmp: <http://ns.adobe.com/xap/1.0/> .
@prefix xmpidq: <http://ns.adobe.com/xmp/Identifier/qual/1.0/> .
<> xmp:Identifier [
a rdf:Bag;
rdf:_1 [
xmpidq:Scheme "DOI";
rdf:value "10.1038/nrg2158" ] ] .

This says the string "10.1038/nrg2158" belongs to the scheme "DOI".

Here we have used the scheme "DOI" and, as noted above, for wider recognition it would be better to employ one of the URI forms, e.g.


@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xmp: <http://ns.adobe.com/xap/1.0/> .
@prefix xmpidq: <http://ns.adobe.com/xmp/Identifier/qual/1.0/> .
<> xmp:Identifier [
a rdf:Bag;
rdf:_1 [
xmpidq:Scheme "URI";
rdf:value "doi:10.1038/nrg2158" ] ] .

This says the string "doi:10.1038/nrg2158"belongs to the scheme "URI".

But this is the unregistered URI form (doi:), so should we be using instead the registered form (info:)? Well, turns out that this construct for xmp:Identifier is an rdf:Bag so we can include more than one term. How about using this construct then:


@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xmp: <http://ns.adobe.com/xap/1.0/> .
@prefix xmpidq: <http://ns.adobe.com/xmp/Identifier/qual/1.0/> .
<> xmp:Identifier [
a rdf:Bag;
rdf:_1 [
xmpidq:Scheme "URI";
rdf:value "info:doi/10.1038/nrg2158" ];
rdf:_2 [
xmpidq:Scheme "URI";
rdf:value "doi:10.1038/nrg2158" ] ] .

Now we've got both forms, which is fair enough since these are equivalent. In RDF terms we can make the statement that:

doi:10.1038/nrg2158 owl:sameAs info:doi10.1038/nrg2158 .

which asserts that the two URIs are equivalent and that they reference the same resource.

So, what if we want to include a native DOI without the URI garb? We can easily do that:


@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xmp: <http://ns.adobe.com/xap/1.0/> .
@prefix xmpidq: <http://ns.adobe.com/xmp/Identifier/qual/1.0/> .
<> xmp:Identifier [
a rdf:Bag;
rdf:_1 [
xmpidq:Scheme "URI";
rdf:value "info:doi/10.1038/nrg2158" ];
rdf:_2 [
xmpidq:Scheme "URI";
rdf:value "doi:10.1038/nrg2158" ];
rdf:_3 [
xmpidq:Scheme "DOI";
rdf:value "10.1038/nrg2158" ] ] .

OK, that takes care of the XMP direction to use xmp:Identifier, but, while deprecated by XMP, we note that back in the real world folks will be looking at the DC elements which is the schema with the greatest purchase. So, why not also add in a dc:identifier element such as would be used typically for DOI in citations. How about this:


@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xmp: <http://ns.adobe.com/xap/1.0/> .
@prefix xmpidq: <http://ns.adobe.com/xmp/Identifier/qual/1.0/> .
<> xmp:Identifier [
a rdf:Bag;
rdf:_1 [
xmpidq:Scheme "URI";
rdf:value "info:doi/10.1038/nrg2158" ];
rdf:_2 [
xmpidq:Scheme "URI";
rdf:value "doi:10.1038/nrg2158" ];
rdf:_3 [
xmpidq:Scheme "DOI";
rdf:value "10.1038/nrg2158" ] ];
dc:identifier "doi:10.1038/nrg2158" .

Right, so we've taken care of the identfiers. But maybe there's something missing? There's no link to the DOI proxy. For widest applicability we should not assume prior knowledge of the DOI system. Perhaps we could include this link using the property dc:relation? Seems feasible though would really like to get some feedback on this. Any ideas?

So here, then, is a fairly full and complete expression of DOI within the XMP packet.


@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xmp: <http://ns.adobe.com/xap/1.0/> .
@prefix xmpidq: <http://ns.adobe.com/xmp/Identifier/qual/1.0/> .
<> xmp:Identifier [
a rdf:Bag;
rdf:_1 [
xmpidq:Scheme "URI";
rdf:value "info:doi/10.1038/nrg2158" ];
rdf:_2 [
xmpidq:Scheme "URI";
rdf:value "doi:10.1038/nrg2158" ];
rdf:_3 [
xmpidq:Scheme "DOI";
rdf:value "10.1038/nrg2158" ] ];
dc:identifier "doi:10.1038/nrg2158";
dc:relation "http://dx.doi.org/10.1038/nrg2158" .

Ta-da!

(Of course, this is all premised on having freedom in writing out the XMP packet. If one is dependent on commercial applications to write out the packet then things may be different. Actually, they will be very different. They may not even be workable.)

Feedback would be very welcome.

September 10, 2007

XMP - Some Other Gripes

Following on from the missing XMP Specification version number discussed in the previous post here below are listed some miscellaneous gripes I've got with XMP (on what otherwise is a very promising technology). I would be more than happy to be proved wrong on any of these points.

1. XMP version history and archive

There doesn't appear to be any XMP version history or archive hosted by Adobe as far as I can tell.

2. Unpublished schemas

Also there is nothing published - outside the XMP Spec itself - on the core schemas used by XMP. There's nothing to be gleaned from the namespace URIs used. The Adobe namespaces, e.g.


http://ns.adobe.com/xap/1.0/ (listed in XMP Spec)
 
http://ns.adobe.com/pdfx/1.3/ (not listed in XMP Spec)

seem to all resolve to this page

http://www.adobe.com/products/xmp/.

So, that can leave us with undocumented terms (e.g. 'xmpMM:Manifest' used by Adobe InDesign CS2 4.0.5) from documented schemas and also undocumented schemas (e.g. 'pdfx').

3. UUID

Note also that many Adobe apps do not use the URN syntax for 'uuid:'. The XMP Spec also has this to say:

"There is no formal standard for URIs that are based on an abstract UUID. The following proposal may be relevant:


http://www.ietf.org/internet-drafts/draft-mealling-uuid-urn-01.txt"


(see: 3 XMP Storage Model / Serializing XMP / rdf:Description elements / rdf:about attribute)"

I guess the XMP Spec (Sept. '05) had just been bedded down more or less when the URN namespace for 'uuid:' was published as RFC 4122 in July '05.

4. RDF/XML serialization

The biggie.

XMP schemas specify fixed property value types in RDF/XML, i.e. they specify a fixed profile of RDF/XML instead of generic RDF/XML. This has been commented on recently by myself on the semantic-web list, and also here by Bruce D'Arcus speaking about OpenDocument, and here by Mike Linksvayer speaking for CC.

This profiling of RDF/XML leads to real problems. For example, Adobe have defined a Dublin Core (DC) schema which lists the property value types that DC values can assume in an XMP serialization. Meantime, the PRISM 2.0 draft spec defines an incompatible mapping of DC terms to XMP property values. Since both schemas would make use of the same DC namespace (though PRISM haven't actually specified a DC namespace for use with XMP but do use elsewhere the regular DC namespace) this isn't going to work. I did supply some feedback on this to the PRISM WG but have heard nothing back from them. So, PRISM XMP looks uncertain at this time. Which, for us, must be a shame.

W5M0MpCehiHzreSzNTczkc9d

What on earth can this string mean: 'W5M0MpCehiHzreSzNTczkc9d'? This occurs in the XMP packet header:

<?xpacket begin='' id='W5M0MpCehiHzreSzNTczkc9d'?>

Well from the XMP Specification (September 2005) which is available here there is this text:

"The required id attribute must follow begin. For all packets defined by this version of the syntax, the value of id is the following string: W5M0MpCehiHzreSzNTczkc9d"


(See: 3 XMP Storage Model / XMP Packet Wrapper / Header / Attribute: id)

OK, so it's no big deal to cut and paste that string, it's just mighty curious why this cryptic key is needed in an open specification, especially since (contrary to what might be implied by the text) it doesn't seem to vary with version. (Or hasn't yet, at any rate - more below.)

Right, so now we get down to it. Just what is the version number of the current XMP Specification anyways? I couldn't for the life of me find one. (Note that I am talking about the XMP Specification itself and not the XMP Toolkit which is versioned at 4.1.1.) I am assuming that I have the latest version, else I really don't know where else to look. This link


http://www.adobe.com/products/xmp/

leads me to

http://www.adobe.com/devnet/xmp/

which leads me to

http://www.adobe.com/devnet/xmp/pdfs/xmp_specification.pdf

which by the way is also the same version that ships with the SDK.

I do know that there was a Version 1.5 published in September 14, 2001. (You can see that this is a fairly slow changing technology - the published spec is from 2 years back, and an earlier - the earlier? - version is from 6 years back). Note that this version has a version number (1.5) but still uses the same XMP packer header 'id' attribute.

No good, by the way, peeking inside the XMP of the XMP Spec either. Here's a dump (using the DumpMainXMP utility with the SDK):


% xmpd xmp_spec.xmp
 
 
// -----------------------------------
// Dumping main XMP for xmp_spec.xmp :
 
File info : format = " ", handler flags = 00000260
Packet info : offset = 0, length = 4051
 
Initial XMP from xmp_spec.xmp
Dumping XMPMeta object "" (0x0)
 
http://ns.adobe.com/pdf/1.3/ pdf: (0x80000000 : schema)
pdf:Producer = "Acrobat Distiller 7.0 (Windows)"
pdf:Copyright = "2005 Adobe Systems Inc."
pdf:Keywords = "XMP metadata schema XML RDF"
 
http://ns.adobe.com/xap/1.0/ xap: (0x80000000 : schema)
xap:CreateDate = "2005-09-23T15:19:07Z"
xap:ModifyDate = "2005-09-23T15:19:07Z"
xap:CreatorTool = "FrameMaker 7.1"
 
http://purl.org/dc/elements/1.1/ dc: (0x80000000 : schema)
dc:description (0x1E00 : isLangAlt isAlt isOrdered isArray)
[1] = "XMP metadata specification" (0x50 : hasLang hasQual)
? xml:lang = "x-default" (0x20 : isQual)
dc:creator (0x600 : isOrdered isArray)
[1] = "Adobe Developer Technologies"
dc:title (0x1E00 : isLangAlt isAlt isOrdered isArray)
[1] = "Extensible Metadata Platform (XMP) Specification" (0x50 : hasLang hasQual)
? xml:lang = "x-default" (0x20 : isQual)
dc:format = "application/pdf"
 
http://ns.adobe.com/pdfx/1.3/ pdfx: (0x80000000 : schema)
pdfx:Copyright = "2005 Adobe Systems Inc."
 
http://ns.adobe.com/xap/1.0/mm/ xapMM: (0x80000000 : schema)
xapMM:InstanceID = "uuid:99b91701-a78b-4652-84e5-6bccaeb7534e"
xapMM:DocumentID = "uuid:374ea24b-3931-4b83-944d-5b9daa42277e"

or in more readable form (courtesy of 'cwm'):

% xmp2n3q docs/XMP-Specification.pdf
#Processed by Id: cwm.py,v 1.164 2004/10/28 17:41:59 timbl Exp
# using base file:/Users/tony/Sources/Build/XMP-SDK/
 
# Notation3 generation by
# notation3.py,v 1.166 2004/10/28 17:41:59 timbl Exp
 
# Base was: file:/Users/tony/Sources/Build/XMP-SDK/
 
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix pdf: <http://ns.adobe.com/pdf/1.3/> .
@prefix pdfx: <http://ns.adobe.com/pdfx/1.3/> .
@prefix xmp: <http://ns.adobe.com/xap/1.0/> .
@prefix xmpMM: <http://ns.adobe.com/xap/1.0/mm/> .
 
<> pdf:Copyright "2005 Adobe Systems Inc.";
pdf:Keywords "XMP metadata schema XML RDF";
pdf:Producer "Acrobat Distiller 7.0 (Windows)";
pdfx:Copyright "2005 Adobe Systems Inc.";
xmp:CreateDate "2005-09-23T15:19:07Z";
xmp:CreatorTool "FrameMaker 7.1";
xmp:ModifyDate "2005-09-23T15:19:07Z";
xmpMM:DocumentID "uuid:374ea24b-3931-4b83-944d-5b9daa42277e";
xmpMM:InstanceID "uuid:99b91701-a78b-4652-84e5-6bccaeb7534e";
dc:creator [
a rdf:Seq;
rdf:_1 "Adobe Developer Technologies" ];
dc:description [
a rdf:Alt;
rdf:_1 "XMP metadata specification"@x-default ];
dc:format "application/pdf";
dc:title [
a rdf:Alt;
rdf:_1 "Extensible Metadata Platform (XMP) Specification"@x-default ] .
 
#ENDS

So, just what then is the version number of the XMP Specification which the id string 'W5M0MpCehiHzreSzNTczkc9d' is marking?

September 07, 2007

connecting things: bioGUID, iSpiders and DOI

David Shorthouse and Rod Page have developed some great tools for linking references by tying together a number of services and using the CrossRef OpenURL interface amongst other things. See David's post - Gimme That Scientific Paper Part III and Rod's post on OpenURL and using ParaTools - "OpenURL and Spiders".

Unfortunately our planned changes to the CrossRef OpenURL interface (the 100 queries per day limit in particular) caused some concern for David ("CrossRef Takes a Step Back") - but make sure you read the comments to see my response!

We decided to drop the 100 per day query limit for the OpenURL interface and there will be no charges for non-commercial use of the interface - http://www.crossref.org/requestaccount/

We want to encourage innovative uses of CrossRef services and disseminate DOIs as effectively as possible so we appreciate feedback and encourage the type of development David and Rod are doing. It will be interesting to see if what they are doing has wider applicability. Maybe CrossRef could host a webpage to point to tools like this and encourage more development?