Main

October 17, 2007

Hybrid

So, back on the old XMP tack. The simple vision from the XMP spec is that XMP packets are embedded in media files and transported along with them - and as such are relatively self-contained units, see Fig 1.

Hybrid - A.jpg
Fig. 1 - Media files with fully encapsulated descriptions.

But this is too simple. Some preliminary considerations lead us to to see why we might want to reference additional (i.e. external) sources of metadata from the original packet:

PDFs
PDFs are tightly structured and as such it can be difficult to write a new packet, or to update an existing packet. One solution proposed earlier is to embed a minimal packet which could then reference a more complete description in a standalone packet. (And in turn this standalone packet could reference additional sources of metadata.)

Images
While considerably simpler to write into web-delivery image formats (e.g. JPEG, GIF, PNG), it is the case that metadata pertinent to the image only is likely to be embedded. Also, of interest is the work from which the image is derived which is most likely to be presented externally to the image as a standalone document. (And in turn this standalone packet could reference additional sources of metadata.)

(Continues)

Continue reading "Hybrid" »

October 13, 2007

I Want My XMP

Now, assuming XMP is a good idea - and I think on balance it is (as blogged here earlier), why are we not seeing any metadata published in scholarly media files? The only drawbacks that occur to me are:

  1. Hard to write - it's too damn difficult, no tools support, etc.
  2. Hard to model - rigid, "simple" XMP data model, both complicates and constrains the RDF data model

Well, I don't really believe that 1) is too difficult to overcome. A little focus and ingenuity should do the trick. I do, however, think 2) is just a crazy straitjacket that Adobe is forcing us all to wear but if we have to live with that then so be it. Better in Bedlam than without. (RSS 1.0 wasn't so much better but allowed us to do some useful things. And that came from the RDF community itself.) We could argue this till the cows come home but I don't see any chance of any change any time soon.

(Continues)

Continue reading "I Want My XMP" »

Metadata - For the Record

Interesting post here from Gunar Penikis of Adobe entitled "Permanent Metadata" (Oct. '04). He talks about the the issues of embedding metadata in media and comes up with this:

"It may be the case that metadata in the file evolves to become a "cache of convenience" with the authoritative information living on a web service. The web service model is designed to provide the authentication and permissions needed. The link between the two provided by unique IDs. In fact, unique IDs are already created by Adobe applications and stored in the XMP - that is what the XMP Media Management properties are all about."

An intriguing idea. Of course, Gunar's (and Adobe's) preoccupations with metadata revolve mainly around document workflow whereas, at least as things stand currently, scholarly publisher concerns are mainly with the dissemination of media in final form. Hence some differences in thinking:
Subject
As just noted Adobe are more interested in workflow than in work. Scholarly articles are rich in descriptive metadata about the work itself and have a well-developed ctation model. Academic interest is in the intellectual content rather than the vehicle used to carry and preserve that content - the file format.

Unique IDs
Workflow IDs are UUIDs which identify specific instances and expressions, but do not identify the abstract work. UUIDs provide a unique identifier but there is no central registry for such identifiers, hence they cannot be "looked up". CrossRef publishers should be concerned to associate closely the DOI for the underlying work with a given media file. That's the identifier that this community is actively promoting.

Read/Write
Because of the focus on workflow, the XMP specification recommends that XMP packets be "writeable", that is that they be marked as "writeable" and that they include padding whitespace which can accommodate updates without changing packet size. Publishers distributing final form documents are more likely to want to distribute "read-only" metadata which is authoritative and which describes the work, rather than the document format and workflow. Of course, this should not preclude additional sources of metadata which may be added "by reference" rather than "by value". That is, a pointer to a web page (or service) may be sufficient to relate additional publisher terms and user annotations instead of embedding them directly in the file for various reasons: a) file integrity, b) limiting growth of file size, c) term authority, d) dynamic production (in forward time), and e) multiple sources.

September 25, 2007

XMP-Ville

Been so busy looking into the technical details of XMP that I almost forgot to check out the current landcsape. Luckily I chanced on these articles by Ron Roszkiewicz for The Seybold Report (and apologies for lifting the title of this post from his last). The articles about XMP are well worth reading and chart the painful progress made to date:

From the earlier characterization of XMP as "underachieving teenager" Roszkiewicz is cautiously optimistic that IDEAlliance's XMP Open initiative (an initiative to advance XMP as an open industry specification) will help outreach and foster adoption of this fledgling technology.

(Continues.)

Continue reading "XMP-Ville" »

September 20, 2007

The Name's The Thing

I'm always curious about names and where they come from and what they mean. Hence, my interest was aroused with the constant references to "XAP" in XMP. As the XMP Specifcation (Sept. 2005) says:

"NOTE: The string “XAP” or “xap” appears in some namespaces, keywords, and related names in this document and in stored XMP data. It reflects an early internal code name for XMP; the names have been preserved for compatibility purposes."

Actually, it occurs in most of the core namespaces: XAP, rather than XMP.

(Continues.)

Continue reading "The Name's The Thing" »

September 11, 2007

Marking up DOI

(Update - 2007.09.15: Clean forgot to add in the rdf: namespace to the examples for xmp:Identifier in this post. I've now added in that namespace to the markup fragments listed. Also added in a comment here which shows the example in RDF/XML for those who may prefer that over RDF/N3.)

So, as a preliminary to reviewing how a fuller metadata description of a CrossRef resource may best be fitted into an XMP packet for embedding into a PDF, let's just consider how a DOI can be embedded into XMP. And since it's so much clearer to read let's just conduct this analysis using RDF/N3. (Life is too short to be spent reading RDF/XML or C++ code. :~)

(And further to Chris Shillum's comment here on my earlier post Metadata in PDF: 2. Use Cases where he notes that Elsevier are looking to upgrade their markup of DOI in PDF to use XMP, I'm really hoping that Elsevier may have something to bring to the party and share with us. A consensus rendering of DOI within XMP is going to be of benefit to all.)

(Continues.)

Continue reading "Marking up DOI" »

September 10, 2007

XMP - Some Other Gripes

Following on from the missing XMP Specification version number discussed in the previous post here below are listed some miscellaneous gripes I've got with XMP (on what otherwise is a very promising technology). I would be more than happy to be proved wrong on any of these points.

Continue reading "XMP - Some Other Gripes" »

W5M0MpCehiHzreSzNTczkc9d

What on earth can this string mean: 'W5M0MpCehiHzreSzNTczkc9d'? This occurs in the XMP packet header:

<?xpacket begin='' id='W5M0MpCehiHzreSzNTczkc9d'?>

Well from the XMP Specification (September 2005) which is available here there is this text:

"The required id attribute must follow begin. For all packets defined by this version of the syntax, the value of id is the following string: W5M0MpCehiHzreSzNTczkc9d"


(See: 3 XMP Storage Model / XMP Packet Wrapper / Header / Attribute: id)

OK, so it's no big deal to cut and paste that string, it's just mighty curious why this cryptic key is needed in an open specification, especially since (contrary to what might be implied by the text) it doesn't seem to vary with version. (Or hasn't yet, at any rate - more below.)

Continue reading "W5M0MpCehiHzreSzNTczkc9d" »