I was invited to speak at the Handle System Workshop which was run back to back with an IDF Open Meeting earlier this week in Brussels and hosted at the Office for Official Publications of the European Union. (Location was in the Charlemagne Building, at left in image, within the rather impressive meeting room Jean Durieux, at right.)
My talk (âA Distributed Metadata Architectureâ) was focussed on how OpenHandle and XMP could be leveraged to manage dispersed media assets.
A Crossref Member Briefing is available that explains how PubMed Central (PMC) links to publisher full text, how PMC uses DOIs and how PMC should be using DOIs. The briefing is entitled âLinking to Publisher Full Text from PubMed Centralâ (PDF 85k).
Crossref considers it very important the PMC uses DOIs as the main means to link to the publisher version of record for an article and we are recommending that publishers try to convince PMC to use DOIs in an automated way.
Interesting post from Yahoo! Searchâs Director of Product Management, Priyank Garg, âOne Standard Fits All: Robots Exclusion Protocol for Yahoo!, Google and Microsoftâ. Interesting also for what it doesnât talk about. No mention here of ACAP.
As the range of public services (e.g. RSS) offered by publishers has matured this gives rise to the question: How can they expose their public data so that a user may discover them? Especially, with DOI there is now in place a persistence link infrastructure for accessing primary content. How can publishers leverage that infrastructure to advantage?
Anyway, I offer this figure as to how I see the current lie of the land as regards DOI services and data.
(Click to enlarge.)
For infotainment only (and because itâs a pretty printing). Glimpse into the dark world of DOI. Here, the handle contents for doi:10.1038/nature06930 exposed as a standard OpenHandle âHello Worldâ document. Browser image courtesy of Processing.js and Firefox 3 RC1.
So, why is it just so difficult to reference OpenURL?
Apart from the standard itself (hardly intended for human consumption - see abstract page here and PDF and donât even think to look at those links - they werenât meant to be cited!), seems that the best reference is to the Wikipedia page. There is the OpenURL Registry page at http://alcme.oclc.org/openurl/servlet/OAIHandler?verb=ListSets but this is just a workshop. Not much there beyond the OpenURL registered items.
So, the big guns have decided that XRI is out. In a message from the TAG yesterday, variously noted as being âcategoricalâ (Andy Powell, eFoundations) and a âproclamationâ (Edd Dumbill, XML.com), the co-chairs (Tim Berners-Lee and Stuart Williams) had this to say:
âWe are not satisfied that XRIs provide functionality not readily available from http: URIs. Accordingly the TAG recommends against taking the XRI specifications forward, or supporting the use of XRIs as identifiers in other specifications.
Following on from yesterdayâs post about making metadata available on our Web pages, I wanted to ask here about âmetadata reuse policiesâ. Does anybody have a clue as to what might constitute a best practice in this area? Iâm specifically interested in license terms, rather than how those terms would be encoded or carried. Increasingly we are finding more channels to distribute metadata (RSS, HTML, OAI-PMH, etc.) but donât yet have any clear statement for our customers as to how they might reuse that data.
Well, we may not be the first but wanted anyway to report that Nature has now embedded metadata (HTML meta tags) into all its newly published pages including full text, abstracts and landing pages (all bar four titles which are currently being worked on). Metadata coverage extends back through the Nature archives (and depth of coverage varies depending on title). This conforms to the W3Câs Guideline 13.2 in the Web Content Accessibility Guidelines 1.0 which exhorts content publishers to âprovide metadata to add semantic information to pages and sitesâ.
Metadata is provided in both DC and PRISM formats as well as in Googleâs own bespoke metadata format. This generally follows the DCMI recommendation âExpressing Dublin Core metadata using HTML/XHTML meta and link elements, and the earlier RFC 2731 âEncoding Dublin Core Metadata in HTMLâ. (Note that schema name is normalized to lowercase.) Some notes:
- The DOI is included in the â
dc.identifier
â term in URI form which is the Crossref recommendation for citing DOI.
- We could consider adding also â
prism.doi
â for disclosing the native DOI form. This requires the PRISM namespace declaration to be bumped to v2.0. We might consider synchronizing this change with our RSS feeds which are currently pegged at v1.2, although note that the RSS module mod_prism currently applies only to PRISM v1.2.
- We could then also add in a â
prism.url
â term to link back (through the DOI proxy server) to the content site. The namespace issue listed above still holds.
- The â
citation_
â terms are not anchored in any published namespace which does make this term set problematic in application reuse. It would be useful to be able to reference a namespace (e.g. ârel="schema.gs" href="..."
â) for these terms and to cite them as e.g. âgs.citation_title
â.
The HTML metadata sets from an example landing page are presented below.
Further to my previous post âNIH Mandate and PMCIDsâ weâve been looking into linking to articles on publishersâ sites from PubMed Central (PMC). There are a couple of ways this happens currently (see details below) but these are complicated and will lead to broken links and more difficulty for PMC and publishers in managing the links. Crossref is going to be putting together a briefing note for its members on this soon.
The main issue we are raising with PMC, and that we will encourage publishers to raise too, is why doesnât PMC just automatically link DOIs? Most of the articles in PMC have DOIs so this would require very little effort from PMC and no effort from publishers and would give readers a perisistent link to the publisherâs version of an article.