Exposing Public Data: Options


thammond – 2008 July 01

In Metadata

This is a follow-on to an earlier post which set out the lie of the land as regards DOI services and data for DOIs registered with Crossref. That post differentiated between a native DOI resolution through a public DOI service which acts upon the “associated values held in the DOI resolution record” (per ISO CD 26324) and other related DOI protected and/or private services which merely use the DOI as a key into non-public database offering.

Following the service architecture outlined in that post, options for exposing public data appear as follows:

  1. Private Service

    1. Publisher hosted – Publisher private service
  2. Protected Service
    1. Crossref hosted – Industry protected service
    2. Crossref routed – Publisher private service
  3. Public Service
    1. Handle System (DOI handle) – Global public service (native DOI service)
    2. Handle System (DOI ‘buddy’ handle) – Publisher public service

(Continues below.)


Metadata Reuse Policies


thammond – 2008 May 20

In Metadata

Following on from yesterday’s post about making metadata available on our Web pages, I wanted to ask here about “metadata reuse policies”. Does anybody have a clue as to what might constitute a best practice in this area? I’m specifically interested in license terms, rather than how those terms would be encoded or carried. Increasingly we are finding more channels to distribute metadata (RSS, HTML, OAI-PMH, etc.) but don’t yet have any clear statement for our customers as to how they might reuse that data.

Nature’s Metadata for Web Pages


thammond – 2008 May 19

In Metadata

Well, we may not be the first but wanted anyway to report that Nature has now embedded metadata (HTML meta tags) into all its newly published pages including full text, abstracts and landing pages (all bar four titles which are currently being worked on). Metadata coverage extends back through the Nature archives (and depth of coverage varies depending on title). This conforms to the W3C’s Guideline 13.2 in the Web Content Accessibility Guidelines 1.0 which exhorts content publishers to “provide metadata to add semantic information to pages and sites”.

Metadata is provided in both DC and PRISM formats as well as in Google’s own bespoke metadata format. This generally follows the DCMI recommendationExpressing Dublin Core metadata using HTML/XHTML meta and link elements, and the earlier RFC 2731Encoding Dublin Core Metadata in HTML”. (Note that schema name is normalized to lowercase.) Some notes:

  • The DOI is included in the “dc.identifier” term in URI form which is the Crossref recommendation for citing DOI.
    • We could consider adding also “prism.doi” for disclosing the native DOI form. This requires the PRISM namespace declaration to be bumped to v2.0. We might consider synchronizing this change with our RSS feeds which are currently pegged at v1.2, although note that the RSS module mod_prism currently applies only to PRISM v1.2.
      • We could then also add in a “prism.url” term to link back (through the DOI proxy server) to the content site. The namespace issue listed above still holds.
        • The “citation_” terms are not anchored in any published namespace which does make this term set problematic in application reuse. It would be useful to be able to reference a namespace (e.g. “rel="" href="..."“) for these terms and to cite them as e.g. “gs.citation_title“.
        The HTML metadata sets from an example landing page are presented below.

Word Add-in for Scholarly Authoring and Publishing

Last week Pablo Fernicola sent me email announcing that Microsoft have finally released a beta of their Word plugin for marking-up manuscripts with the NLM DTD. I say “finally” because we’ve know this was on the way and have been pretty excited to see it. We once even hoped that MS might be able to show the plug-in at the ALPSP session on the NLM DTD, but we couldn’t quite manage it.



thammond – 2008 February 22

In Metadata

The new PRISM spec (v. 2.0) was published this week, see the press release. (Downloads are available here.) This is a significant development as there is support for XMP profiles, to complement the existing XML and RDF/XML profiles. And, as PRISM is one of the major vocabularies being used by publishers, I would urge you all to go take a look at it and to consider upgrading your applications to using it.

Crossref Citation Plugin (for WordPress)

OK, after a number of delays due to everything from indexing slowness to router problems, I’m happy to say that the first public beta of our WordPress citation plugin is available for download via SourceForge. A Movable Type version is in the works.

And congratulations to Trey at OpenHelix who became laudably impatient, found the SourceForge entry for the plugin back on February 8th and seems to have been testing it since. He has a nice description of how it works (along with screenshots), so I won’t repeat the effort here.

Having said that, I do include the text of the README after the jump. Please have a look at it before you install, because it might save you some mystification.

DC in (X)HTML Meta/Links


thammond – 2007 November 06

In Metadata

This message posted out yesterday on the dc-general list (with following extract) may be of interest: _“Public Comment on encoding specifications for Dublin Core metadata in HTML and XHTML 2007-11-05, Public Comment is being held from 5 November through 3 December 2007 on the DCMI Proposed Recommendation, “Expressing Dublin Core metadata using HTML/XHTML meta and link elements” «» by Pete Johnston and Andy Powell. Interested members of the public are invited to post comments to the DC-ARCHITECTURE mailing list «http://www.

OpenDocument Adds RDF


thammond – 2007 October 14

In Metadata

Bruce D’Arcus left a comment here in which he linked to post of his: “OpenDocument’s New Metadata System“. Not everybody reads comments so I’m repeating it here. His post is worth reading on two counts: He talks about the new metadata functionality for OpenDocument 1.2 which uses generic RDF. As he says: > _&#8220;Unlike Microsoft’s custom schema support, we provide this through the standard model of RDF. What this means is that implementors can provide a generic metadata API in their applications, based on an open standard, most likely just using off-the-shelf code libraries.

Scholarly DC


thammond – 2007 October 05

In Metadata

This This was just sent out to the DC-GENERAL mailing list about the new DCMI Community for Scholarly Communications. As Julie Allinson says: “The aim of the group is to provide a central place for individuals and organisations to exchange information, knowledge and general discussion on issues relating to using Dublin Core for describing items of ‘scholarly communications’, be they research papers, conference presentations, images, data objects. With digital repositories of scholarly materials increasingly being established across the world, this group would like to offer a home for exploring the metadata issues faced.

Custom Panel for CC


thammond – 2007 September 15

In Metadata

Creative Commons now have a custom panel for adding CC licenses using Adobe apps - see here. Interesting on two counts: Machine readable licenses XMP metadata But I still think that batch solutions for adding XMP metadata are really required for publishing workflows. And ideally there should be support for adding arbitrary XMP packets if we’re going to have truly rich metadata. I rather fear the constraints that custom panels place upon the publisher.