« rdfQuery | Main | RSS Good Practice Guidelines »

Machine Readable: Are We There Yet?

The guidelines for CrossRef publishers ("DOI Name Information and Guidelines" - PDF, 210K) has this to say in "Sect. 6.3 The response page" regarding the response page for a DOI:

"A minimal response page must contain a full bibliographic citation displayed to the user. A response page without bibliographic information should never be presented to a user."
which would seem to be all fine and dandy. But if that user is a machine (or an agent acting for a user) they'll likely be out of luck as the metadata in the bibliographic citation is generally targeted at human users.

So here's a quick and dirty implementation of what a machine readable page could look like using RDFa. (The demo uses Jeni Tennison's wonderful rdfQuery plugin which I blogged about earlier.)

Clicking the DOI link below will bring up in a sub-window a bibliographic citation which might be found in a typical DOI repsonse page. If you now click the "Read Me" link you should see an alert message which presents the bibliographic metadata as a complete RDF document (in a simple N3 – or Notation3 – format). This document is assembled on the fly by rdfQuery using the RDFa markup embedded in the page.

doi:10.1038/nature05634 (Click for demo)

See the "View Source" link to list the actual XHTML markup and the RDFa properties which have been added. And note also that some of the properties are partially "hidden" to the human reader, e.g. a publication date is given in year form only whereas the machine record has the date in full, and some of the properties are fully "hidden": print and electronic ISSNs, issue number, ending page, etc.

(Continues below.)

So, what's new about this? There are already various means of adding metadata to pages using e.g. metadata tags (see here for an earlier post on this), or COinS objects, or even RDF/XML in comment sections. All of these have their various utilities but are still just early attempts at automation. What makes this new and compelling is that RDFa allows publishers to embed machine readable metadata that can be read as a complete machine description in RDF using pretty much off-the-shelf tools and that this markup is embedded unobtrusively into the content in the proper context.

Note that there are some similarities here between embedding an XMP packet (which includes metadata) into an arbitrary binary object, e.g. a PDF file, and embedding RDF into a section of a web page – or perhaps "draping" the RDF over the document markup would be a better term – so that the metadata travels along with the actual content.

By the way, the RDFa can be processed to yield valid RDF (as is shown in the demo) and which can also be seen by running the web page through the RDFa Distiller. (You just need to cut and paste the link of the demo page given above into the Distiller form box.) This will produce RDF in various serializations (N3, XML, Triples) from the RDFa.

So, is there really any longer any reason not to have machine readable metadata at the end of the DOI? Are we there yet?

Comments

Machine readable metadata in RDFa at the end would make DOI's very interesting indeed.

I'm am currently thinking about stable identifiers for journal articles when I export RDF from Zotero. DOI's are already quite nice for that objective: they are dereferencable (if prefixed with http://dx.doi.org/), stable & "clean", and solve the httpRange-14 issue through a 302-redirect (best would be 303, but hey, Dublin Core does it also). If there is also RDFa at the end, it gets even better...

(Except ofcourse for the costs of registering, hopefully purlz will fill in for scholary documents that lack DOI's).

M

This is nice, Tony. And, I very much agree with the need for machine readable data to be made available for these DOI-identified objects. I thought one could consider the ORE Aggregation Model as an attractive foundation for expressing such machine readable data, since most of those DOI-identified objects are effective aggregations of several resources, many of which are semantically inter-related, and/or are related to yet other resources (e.g. cited resources). And, as a bonus, our dear friend Pete Johnston actually even wrote up an RDFa serialization for ORE Resource Maps. See http://www.openarchives.org/ore/1.0/rdfa

Nice to see RDFa picked up by Nature! And really interesting use of JavaScript to show the RDF content!

Of course, I'd very much like to see much more metadata available on this page, such as peeks into the content... maybe linking up to DBPedia...

Interesting material!

This has great potential, I think. These DOI front page could give some interesting insight in what the paper is about too, e.g. by linking about with DBPedia...

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)