I was invited to speak at the Handle System Workshop which was run back to back with an IDF Open Meeting earlier this week in Brussels and hosted at the Office for Official Publications of the European Union. (Location was in the Charlemagne Building, at left in image, within the rather impressive meeting room Jean Durieux, at right.)
My talk (‘A Distributed Metadata Architecture‘) was focussed on how OpenHandle and XMP could be leveraged to manage dispersed media assets.
A Crossref Member Briefing is available that explains how PubMed Central (PMC) links to publisher full text, how PMC uses DOIs and how PMC should be using DOIs. The briefing is entitled “Linking to Publisher Full Text from PubMed Central” (PDF 85k).
Crossref considers it very important the PMC uses DOIs as the main means to link to the publisher version of record for an article and we are recommending that publishers try to convince PMC to use DOIs in an automated way.
Interesting post from Yahoo! Search’s Director of Product Management, Priyank Garg, “One Standard Fits All: Robots Exclusion Protocol for Yahoo!, Google and Microsoft“. Interesting also for what it doesn’t talk about. No mention here of ACAP.
As the range of public services (e.g. RSS) offered by publishers has matured this gives rise to the question: How can they expose their public data so that a user may discover them? Especially, with DOI there is now in place a persistence link infrastructure for accessing primary content. How can publishers leverage that infrastructure to advantage?
Anyway, I offer this figure as to how I see the current lie of the land as regards DOI services and data.
(Click to enlarge.)
For infotainment only (and because it’s a pretty printing). Glimpse into the dark world of DOI. Here, the handle contents for doi:10.1038/nature06930 exposed as a standard OpenHandle ‘Hello World’ document. Browser image courtesy of Processing.js and Firefox 3 RC1.
So, why is it just so difficult to reference OpenURL?
Apart from the standard itself (hardly intended for human consumption - see abstract page here and PDF and don’t even think to look at those links - they weren’t meant to be cited!), seems that the best reference is to the Wikipedia page. There is the OpenURL Registry page at http://alcme.oclc.org/openurl/servlet/OAIHandler?verb=ListSets but this is just a workshop. Not much there beyond the OpenURL registered items.
So, the big guns have decided that XRI is out. In a message from the TAG yesterday, variously noted as being “categorical” (Andy Powell, eFoundations) and a “proclamation” (Edd Dumbill, XML.com), the co-chairs (Tim Berners-Lee and Stuart Williams) had this to say:
“We are not satisfied that XRIs provide functionality not readily available from http: URIs. Accordingly the TAG recommends against taking the XRI specifications forward, or supporting the use of XRIs as identifiers in other specifications.
Following on from yesterday’s post about making metadata available on our Web pages, I wanted to ask here about “metadata reuse policies”. Does anybody have a clue as to what might constitute a best practice in this area? I’m specifically interested in license terms, rather than how those terms would be encoded or carried. Increasingly we are finding more channels to distribute metadata (RSS, HTML, OAI-PMH, etc.) but don’t yet have any clear statement for our customers as to how they might reuse that data.
Well, we may not be the first but wanted anyway to report that Nature has now embedded metadata (HTML meta tags) into all its newly published pages including full text, abstracts and landing pages (all bar four titles which are currently being worked on). Metadata coverage extends back through the Nature archives (and depth of coverage varies depending on title). This conforms to the W3C’s Guideline 13.2 in the Web Content Accessibility Guidelines 1.0 which exhorts content publishers to “provide metadata to add semantic information to pages and sites”.
Metadata is provided in both DC and PRISM formats as well as in Google’s own bespoke metadata format. This generally follows the DCMI recommendation “Expressing Dublin Core metadata using HTML/XHTML meta and link elements, and the earlier RFC 2731 “Encoding Dublin Core Metadata in HTML”. (Note that schema name is normalized to lowercase.) Some notes:
- The DOI is included in the “
dc.identifier” term in URI form which is the Crossref recommendation for citing DOI.
- We could consider adding also “
prism.doi” for disclosing the native DOI form. This requires the PRISM namespace declaration to be bumped to v2.0. We might consider synchronizing this change with our RSS feeds which are currently pegged at v1.2, although note that the RSS module mod_prism currently applies only to PRISM v1.2.
- We could then also add in a “
prism.url” term to link back (through the DOI proxy server) to the content site. The namespace issue listed above still holds.
The HTML metadata sets from an example landing page are presented below.
- The “
citation_” terms are not anchored in any published namespace which does make this term set problematic in application reuse. It would be useful to be able to reference a namespace (e.g. “
rel="schema.gs" href="..."“) for these terms and to cite them as e.g. “
Further to my previous post “NIH Mandate and PMCIDs” we’ve been looking into linking to articles on publishers’ sites from PubMed Central (PMC). There are a couple of ways this happens currently (see details below) but these are complicated and will lead to broken links and more difficulty for PMC and publishers in managing the links. Crossref is going to be putting together a briefing note for its members on this soon.
The main issue we are raising with PMC, and that we will encourage publishers to raise too, is why doesn’t PMC just automatically link DOIs? Most of the articles in PMC have DOIs so this would require very little effort from PMC and no effort from publishers and would give readers a perisistent link to the publisher’s version of an article.