What on earth can this string mean: ‘W5M0MpCehiHzreSzNTczkc9d’? This occurs in the XMP packet header:
<?xpacket begin=” id=‘W5M0MpCehiHzreSzNTczkc9d’?>
Well from the XMP Specification (September 2005) which is available here (PDF) there is this text:
“The required id attribute must follow begin. For all packets defined by this version of the syntax, the value of id is the following string: W5M0MpCehiHzreSzNTczkc9d”
(See: 3 XMP Storage Model / XMP Packet Wrapper / Header / Attribute: id)
OK, so it’s no big deal to cut and paste that string, it’s just mighty curious why this cryptic key is needed in an open specification, especially since (contrary to what might be implied by the text) it doesn’t seem to vary with version. (Or hasn’t yet, at any rate - more below.)
Following on from the missing XMP Specification version number discussed in the previous post here below are listed some miscellaneous gripes I’ve got with XMP (on what otherwise is a very promising technology). I would be more than happy to be proved wrong on any of these points.
Ed Pentz – 2007 September 07
David Shorthouse and Rod Page have developed some great tools for linking references by tying together a number of services and using the Crossref OpenURL interface amongst other things. See David’s post - Gimme That Scientific Paper Part III and Rod’s post on OpenURL and using ParaTools - “OpenURL and Spiders“.
Unfortunately our planned changes to the Crossref OpenURL interface (the 100 queries per day limit in particular) caused some concern for David (“Crossref Takes a Step Back“) - but make sure you read the comments to see my response!
We decided to drop the 100 per day query limit for the OpenURL interface and there will be no charges for non-commercial use of the interface - https://apps.crossref.org/requestaccount/
thammond – 2007 August 28
Boy, was I ever so wrong! Contrary to what I said in yesterday’s post, the new PRISM 2.0 spec does support XMP value type mappings for its terms. See the table below which lists the PRISM basic vocabulary terms and the XMP value types.
Many thanks to Dianne Kennedy and the rest of the PRISM Working Group for having added this support to PRISM 2.0.
thammond – 2007 August 27
(Update - 2007.08.28: I inadvertently missed out the term names in the last example of XMP as RDF/N3 with QNames and have now added these in. Also - a biggie - I said that PRISM had no XMP schema defined. This is actually wrong and as I blogged here today, the new PRISM 2.0 spec does indeed have a mapping of PRISM terms to XMP value types. Should actually have read the spec instead of just blogging about it earlier here. :~)
Having previously stooped to an extremely crass hack for pulling out a document information dictionary from PDFs (for which no apologies are sufficient but it does often work) I feel I should make some kind of amends and mention the wonderful ExifTool by Phil Harvey for reading and writing metadata to media files. This is both a Perl library and command-line application (so it’s cross-platform - a Windows .exe and Mac OS .dmg are also provided.) Besides handling EXIF tags in image files this veritable swissknife of metadata inspectors can also read PDFs for the information dictionary and the document XMP packet. And moreover, intriguingly, can dump the raw (document) XMP packet.
I’m still experimenting with it. There’s quite a number of features to explore. But some preliminary finds are listed below.
thammond – 2007 August 23
thammond – 2007 August 22
So, following up on my recent posts here on Metadata in PDFs (Strategies, Use Cases, Deployment), I finally came across PDF/A and PDF/X, two ISO standardized subsets of PDF. the former (ISO 19005-1:2005) for archiving and the latter (ISO 15929:2002, ISO 15930-1:2001, etc.) for prepress digital data exchange.
Both formats share some common ground such as minimizing surprises between producer and consumer and keeping things open and predictable. But my interest here is specifically in metadata and to see what guidance these standards might provide us. Not unsurprisingly, metadata is a key issue for PDF/A, less so for PDF/X. I’ll discuss PDF/X briefly but the bulk of this post is focussed on PDF/A. See below.
thammond – 2007 August 08
The first thing to note is that this demo (the Acrobat plugin) is an application. And that comes with its own baggage, i.e. this is a Windows only plugin and is targeted at Acrobat Reader 8. On a wider purview the application merely bridges an identifier embedded in the media file and the handle record filed against that identifier and delivers some relevant functionality. The data (or metadata) declared in the PDF and in the associated handle if rich enough and structured openly can also be used by other applications. I think this is a key point worth bearing in mind, that the demo besides showing off new functionalities is also demonstrating how data (or metadata) can be embedded at the respective endpoints (PDF, handle).
Some initial observations follow below.
So, assuming we know the form of the metadata we wish to add to our PDFs (or else to comply with if there is already a set of guidelines, or some industry initiative in effect) how can we realize this? And, on the flip side, how can we make it easier for consumers to extract metadata we have embedded in our PDFs.
Below are some considerations on deploying metadata in PDFs and consumer access.
2019 August 11
2019 August 11
2019 July 29