Blog

Poorboy Metadata Hack

thammond

thammond – 2009 January 06

In Metadata

I was playing around recently and ran across this little metadata hack. At first, I thought somebody was doing something new. But no, nothing so forward apparently. (Heh! 🙂 I was attempting to grab the response headers from an HTTP request on an article page and was using by default the Perl LWP library. For some reason I was getting metadata elements being spewed out as response headers - at least from some of the sites I tested.

And the DOI is …

thammond

thammond – 2008 December 22

In Metadata

Once structured metadata is added to a file then retrieving a given metadata element is usually a doddle. For example, for PDFs with embedded XMP one can use Phil Harvey’s excellent Exiftool utility. Exiftool is a Perl library and application which I’ve blogged about here earlier which is available as a ‘.zip‘ file for Windows (no Perl required) or ‘.dmg‘ for MacOS. Note that Phil maintains this actively and has done so over the last five years.

Machine Readable: Are We There Yet?

thammond

thammond – 2008 November 19

In Metadata

The guidelines for Crossref publishers (“DOI Name Information and Guidelines” - [PDF, 210K][1]) has this to say in “Sect. 6.3 The response page” regarding the response page for a DOI:

“A minimal response page must contain a full bibliographic citation displayed to the user. A response page without bibliographic information should never be presented to a user.”

which would seem to be all fine and dandy. But if that user is a machine (or an agent acting for a user) they’ll likely be out of luck as the metadata in the bibliographic citation is generally targeted at human users.

So here’s a quick and dirty implementation of what a machine readable page could look like using RDFa. (The demo uses Jeni Tennison’s wonderful rdfQuery plugin which I blogged about earlier.)

Clicking the DOI link below will bring up in a sub-window a bibliographic citation which might be found in a typical DOI repsonse page. If you now click the “Read Me” link you should see an alert message which presents the bibliographic metadata as a complete RDF document (in a simple N3 – or Notation3 – format). This document is assembled on the fly by rdfQuery using the RDFa markup embedded in the page.

doi:10.1038/nature05634 (Click for demo)

See the “View Source” link to list the actual XHTML markup and the RDFa properties which have been added. And note also that some of the properties are partially “hidden” to the human reader, e.g. a publication date is given in year form only whereas the machine record has the date in full, and some of the properties are fully “hidden”: print and electronic ISSNs, issue number, ending page, etc.

(Continues below.)

rdfQuery

thammond

thammond – 2008 November 17

In Metadata

Whaddya know? I was just on the point of blogging about the real nice demo given by Jeni Tennison at last week’s SWIG UK meeting at HP Labs in Bristol of rdfQuery (an RDF plugin for jQuery - the zip file is here). And there today on her blog I see that she has a full writeup on rdfQuery, so I’ll defer to the expert. :~) All I can really add to that is that rdfQuery is a pretty darn cool way to add and manipulate RDFa using jQuery.

PRISM 2.1

thammond

thammond – 2008 October 24

In Metadata

Yesterday a new PRISM spec (v2.1) was released for public comment. (Comment period lasts up to Dec. 3, ’08.) Changes are listed in pages 8 and 9 of the Introduction document. Some highlights: New PRISM Usage Rights namespace Accordingly usage of prism:copyright, prism:embargoDate, and prism:expirationDate no longer recommended New element prism:isbn introduced for book serials An updated mod_prism RSS 1.0 module is available which lists all versions of PRISM specs including the forthcoming v2.

Metadata Matters

thammond

thammond – 2008 July 21

In Metadata

Andy Powell has published on Slideshare this talk about metadata - see his eFoundations post for notes. It’s 130 slides long and aims “to cover a broad sweep of history from library cataloguing, thru the Dublin Core, Web search engines, IEEE LOM, the Semantic Web, arXiv, institutional repositories and more.” Don’t be fooled by the length though. This is a flip through and is a readily accessible overview on the importance of metadata.

PRISM Press Release

thammond

thammond – 2008 July 09

In Metadata

The PRISM metadata standards group issued a press release yesterday which covered three points: PRISM Cookbook The Cookbook provides “a set of practical implementation steps for a chosen set of use cases and provides insights into more sophisticated PRISM capabilities. While PRISM has 3 profiles, the cookbook only addresses the most commonly used profile #1, the well-formed XML profile. All recipes begin with a basic description of the business purpose it fulfills, followed by ingredients (typically a set of PRISM metadata fields or elements), and, closes with a step-by-step implementation method with sample XMLs and illustrative images.

Exposing Public Data: Options

thammond

thammond – 2008 July 01

In Metadata

This is a follow-on to an earlier post which set out the lie of the land as regards DOI services and data for DOIs registered with Crossref. That post differentiated between a native DOI resolution through a public DOI service which acts upon the “associated values held in the DOI resolution record” (per ISO CD 26324) and other related DOI protected and/or private services which merely use the DOI as a key into non-public database offering.

Following the service architecture outlined in that post, options for exposing public data appear as follows:

  1. Private Service

    1. Publisher hosted – Publisher private service
  2. Protected Service
    1. Crossref hosted – Industry protected service
    2. Crossref routed – Publisher private service
  3. Public Service
    1. Handle System (DOI handle) – Global public service (native DOI service)
    2. Handle System (DOI ‘buddy’ handle) – Publisher public service

(Continues below.)

                <p>
	

Metadata Reuse Policies

thammond

thammond – 2008 May 20

In Metadata

Following on from yesterday’s post about making metadata available on our Web pages, I wanted to ask here about “metadata reuse policies”. Does anybody have a clue as to what might constitute a best practice in this area? I’m specifically interested in license terms, rather than how those terms would be encoded or carried. Increasingly we are finding more channels to distribute metadata (RSS, HTML, OAI-PMH, etc.) but don’t yet have any clear statement for our customers as to how they might reuse that data.

Nature’s Metadata for Web Pages

thammond

thammond – 2008 May 19

In Metadata

Well, we may not be the first but wanted anyway to report that Nature has now embedded metadata (HTML meta tags) into all its newly published pages including full text, abstracts and landing pages (all bar four titles which are currently being worked on). Metadata coverage extends back through the Nature archives (and depth of coverage varies depending on title). This conforms to the W3C’s Guideline 13.2 in the Web Content Accessibility Guidelines 1.0 which exhorts content publishers to “provide metadata to add semantic information to pages and sites”.

Metadata is provided in both DC and PRISM formats as well as in Google’s own bespoke metadata format. This generally follows the DCMI recommendationExpressing Dublin Core metadata using HTML/XHTML meta and link elements, and the earlier RFC 2731Encoding Dublin Core Metadata in HTML”. (Note that schema name is normalized to lowercase.) Some notes:

  • The DOI is included in the “dc.identifier” term in URI form which is the Crossref recommendation for citing DOI.

    • We could consider adding also “prism.doi” for disclosing the native DOI form. This requires the PRISM namespace declaration to be bumped to v2.0. We might consider synchronizing this change with our RSS feeds which are currently pegged at v1.2, although note that the RSS module mod_prism currently applies only to PRISM v1.2.

      • We could then also add in a “prism.url” term to link back (through the DOI proxy server) to the content site. The namespace issue listed above still holds.

        • The “citation_” terms are not anchored in any published namespace which does make this term set problematic in application reuse. It would be useful to be able to reference a namespace (e.g. “rel="schema.gs" href="..."“) for these terms and to cite them as e.g. “gs.citation_title“.

        The HTML metadata sets from an example landing page are presented below.

RSS Feed

Categories

Archives

}