Blog

Hybrid

thammond

thammond – 2007 October 17

In Xmp

So, back on the old XMP tack. The simple vision from the XMP spec is that XMP packets are embedded in media files and transported along with them - and as such are relatively self-contained units, see Fig 1.

Hybrid - A.jpg

Fig. 1 - Media files with fully encapsulated descriptions.

But this is too simple. Some preliminary considerations lead us to to see why we might want to reference additional (i.e. external) sources of metadata from the original packet:

PDFs
PDFs are tightly structured and as such it can be difficult to write a new packet, or to update an existing packet. One solution proposed earlier is to embed a minimal packet which could then reference a more complete description in a standalone packet. (And in turn this standalone packet could reference additional sources of metadata.)
Images
While considerably simpler to write into web-delivery image formats (e.g. JPEG, GIF, PNG), it is the case that metadata pertinent to the image only is likely to be embedded. Also, of interest is the work from which the image is derived which is most likely to be presented externally to the image as a standalone document. (And in turn this standalone packet could reference additional sources of metadata.)

(Continues)

NLM Blog Citation Guidelines

I’ve just returned from Frankfurt Book fair and noticed that there has been some recent in the The NLM Style Guide for Authors, Editors and Publishers recommendations concerning citing blogs.

Which reminds me of an issue that has periodically been raised here at Crossref- should we be doing something to try and provide a service for reliably citing more ephemeral content such as blogs, wikis, etc.?

OpenDocument Adds RDF

thammond

thammond – 2007 October 14

In Metadata

Bruce D’Arcus left a comment here in which he linked to post of his: “OpenDocument’s New Metadata System“. Not everybody reads comments so I’m repeating it here. His post is worth reading on two counts: He talks about the new metadata functionality for OpenDocument 1.2 which uses generic RDF. As he says: “Unlike Microsoft’s custom schema support, we provide this through the standard model of RDF. What this means is that implementors can provide a generic metadata API in their applications, based on an open standard, most likely just using off-the-shelf code libraries.

I Want My XMP

thammond

thammond – 2007 October 13

In Xmp

Now, assuming XMP is a good idea - and I think on balance it is (as blogged earlier), why are we not seeing any metadata published in scholarly media files? The only drawbacks that occur to me are:

  1. Hard to write - it’s too damn difficult, no tools support, etc.

    • Hard to model - rigid, “simple” XMP data model, both complicates and constrains the RDF data model

Well, I don’t really believe that 1) is too difficult to overcome. A little focus and ingenuity should do the trick. I do, however, think 2) is just a crazy straitjacket that Adobe is forcing us all to wear but if we have to live with that then so be it. Better in Bedlam than without. (RSS 1.0 wasn’t so much better but allowed us to do some useful things. And that came from the RDF community itself.) We could argue this till the cows come home but I don’t see any chance of any change any time soon.

(Continues)

Metadata - For the Record

thammond

thammond – 2007 October 13

In Xmp

Interesting post here from Gunar Penikis of Adobe entitled “Permanent Metadata” (Oct. ’04). He talks about the the issues of embedding metadata in media and comes up with this: “It may be the case that metadata in the file evolves to become a “cache of convenience” with the authoritative information living on a web service. The web service model is designed to provide the authentication and permissions needed. The link between the two provided by unique IDs.

DataNet

thammond

thammond – 2007 October 12

In Data

Last week, my colleague Ian Mulvany posted on Nascent an entry about NSF’s recent call for proposals on DataNet (aka “A Sustainable Digital Data Preservation and Access Network”). Peter Brantley, of DLF, has set up a public group DataNet on Nature Network where all are welcome to join in the discussion on what NSF effectively are viewing as the challenge of dealing with “big data”. As Ian notes in a mail to me:

OTMI Applied - Means More Search Hits

thammond

thammond – 2007 October 09

In Otmi

(Click image to enlarge.) Following up on previous posts on OTMI (the proposal from NPG for scholarly publishers to syndicate their full text to drive text-mining applications), Fabien Campagne from Cornell, a long-time OTMI supporter, has created an OTMI-driven search engine (based on his Twease work). This may be the first publicly accessible OTMI-based service. It currently only contains NPG content from the OTMI archive online - some 2 years worth of Nature and four other titles.

Mars Bar

thammond

thammond – 2007 October 08

In Pdf

Just noticed that there is now (as of last month) a blog for Mars (“Mars: Comments on PDF, Acrobat, XML, and the Mars file format”). See this from the initial post: “The Mars Project at Adobe is aimed at creating an XML representation for PDF documents. We use a component-based model for representing different aspects of the document and we use the Universal Container Format (a Zip-based packaging format) to hold the pieces.

Scholarly DC

thammond

thammond – 2007 October 05

In Metadata

This This was just sent out to the DC-GENERAL mailing list about the new DCMI Community for Scholarly Communications. As Julie Allinson says: “The aim of the group is to provide a central place for individuals and organisations to exchange information, knowledge and general discussion on issues relating to using Dublin Core for describing items of ‘scholarly communications’, be they research papers, conference presentations, images, data objects. With digital repositories of scholarly materials increasingly being established across the world, this group would like to offer a home for exploring the metadata issues faced.

The Names Project

thammond

thammond – 2007 October 05

In Orcid

Was reminded to blog about this after reading Lorcan’s post on the Names Project being run by JISC. From the blurb: _“The project is going to scope the requirements of UK institutional and subject repositories for a service that will reliably and uniquely identify names of individuals and institutions.   It will then go on to develop a prototype service which will test the various processes involved. This will include determining the data format, setting up an appropriate database, mapping data from different sources, populating the database with records and testing the use of the data.
RSS Feed

Archives