« February 2007 | Main | April 2007 »

March 30, 2007

Citing Data Sets

This D-Lib paper by Altman and King looks interesting: "A Proposed Standard for the Scholarly Citation of Quantitative Data". (And thanks to Herbert Van de Sompel for drawing attention to the paper.) Gist of it (Sect. 3) is

"We propose that citations to numerical data include, at a minimum, six required components. The first three components are traditional, directly paralleling print documents. ... Thus, we add three components using modern technology, each of which is designed to persist even when the technology changes: a unique global identifier, a universal numeric fingerprint, and a bridge service. They are also designed to take advantage of the digital form of quantitative data.

An example of a complete citation, using this minimal version of the proposed standards, is as follows:

Micah Altman; Karin MacDonald; Michael P. McDonald, 2005, "Computer Use in Redistricting",
hdl:1902.1/AMXGCNKCLU UNF:3:J0PkMygLPfIyT1E/8xO/EA==
http://id.thedata.org/hdl%3A1902.1%2FAMXGCNKCLU

"

So the abbreviated citation (author, date, title, unique ID) is supplemented by a UNF which fingerprints the data. UNFs would appear to be a sort of super MD5 in providing a signature of the data content independent of the data serialization to a filestore.

"Thus, we add as the fifth component a Universal Numeric Fingerprint or UNF. The UNF is a short, fixed-length string of numbers and characters that summarize all the content in the data set, such that a change in any part of the data would produce a completely different UNF. A UNF works by first translating the data into a canonical form with fixed degrees of numerical precision and then applies a cryptographic hash function to produce the short string. The advantage of canonicalization is that UNFs (but not raw hash functions) are format-independent: they keep the same value even if the data set is moved between software programs, file storage systems, compression schemes, operating systems, or hardware platforms. ...

Finally, since most web browsers do not currently recognize global unique identifiers directly (i.e., without typing them into a web form), we add as the sixth and final component of the citation standard a bridge service, which is designed to make this task easier in the medium term."

Certainly looks promising. I'm not sure if there's any other contestants in this arena.

March 29, 2007

CrossRef Forward Linking Webinar

The next CrossRef Forward Linking Webinar is coming on Monday April 30th , 2007 at 12:00pm.

Registration is now available: http://www.crossref.org/10meetings/2007_fl_webinar_res.html

Agenda is coming soon.

Markup for DOIs

Following up on his earlier post (which was also blogged to CrossTech here), Leigh Dodds is now proposing the possibility of using machine-readable auto-discovery type links for DOIs of the form

<link rel="bookmark" title="DOI" href="http://dx.doi.org/10.1000/1"/>

These LINK tags are placed in the document HEAD section and could be used by crawlers and agents to recognize the work represented by the current document. This sounds like a great idea and we'd like to hear feedback on it.

Concurrently at Nature we have also been considering how best to mark up in a machine-readable way DOIs appearing within a document page BODY. Current thinking is to do something along the following lines:

<a href="http://dx.doi.org/10.1038/nprot.2007.43">
<abbr title="Digital Object Identifier">doi</abbr>:
<abbr class="uri" id="doi" title="info:doi/10.1038/nprot.2007.43">10.1038/nprot.2007.43</abbr>
</a>

which allows the DOI to be presented in the preferred CrossRef citation format (doi:10.1038/nprot.2007.43), to be hyperlinked to the handle proxy server (http://dx.doi.org/10.1038/nprot.2007.43), and to refer to a validly registered URI form for the DOI (info:doi/10.1038/nprot.2007.43). Again, we would be real interested to hear any opinions on this proposal for inline DOI markup as well as on Leigh's proposal for document-level DOI markup.

(Oh, and btw many congrats to Leigh on his recent promotion to CTO, Ingenta.)

Publishing 2.0

XML:UK is holding a one-day conference entitled titled “Publishing 2.0” at Bletchley Park on Wednesday 25th April 2007. Bletchley Park was the location of the United Kingdom's main codebreaking establishment during the Second World War and is now a museum (and has a train station!). The event will examine some of the more cutting-edge applications of XML technology to publishing. With keynotes by Sean McGrath and Kate Warlock and a series of must-see presentations, this will be the place to be on the last Wednesday in April.

March 23, 2007

Welcome to "Otmi-discuss"

Just a quick note to mention that we've now set up a new mailing list otmi-discuss@crossref.org for public discussion of OTMI - the Open Text Mining Interface proposed by Nature. See the list information page here for details on subscribing to the list and to access the mail archives.

And many thanks to the CrossRef folks for hosting this for us!

March 22, 2007

XMP Capabilities Extended

This post on Adobe's Creative Solutions PR blog may be worth a gander:

"This new update, the Adobe XMP 4.1, provides new libraries for developers to read, write and update XMP in popular image, document and video file formats including: JPEG, PSD, TIFF, AVI, WAV, MPEG, MP3, MOV, INDD, PS, EPS and PNG. In addition, the rewritten XMP 4.1 libraries have been optimized into two major components, the XMP Core and the XMP Files.

The XMP Core enables the parsing, manipulating and serializing of XMP data, and the XMP Files enables the reading, rewriting, and injecting serialized XMP into the multiple file formats. The XMP Files can be thought of as a "file I/O" component for reading and writing the metadata that is manipulated by the XMP Core component.

Supported development environments for Adobe’s XMP 4.1 are: XCode 2.3 for Macintosh universal binaries, Visual Studio 2005 (VC8) for Windows, and Eclipse 3.x on any available platform. The XMP Core is available as C++ and Java sources with project files for the Macintosh, Windows and Linux platform. A Java version of XMP Files is under consideration for a future update."

And now I just read that last sentence again: "A Java version of XMP Files is under consideration for a future update." So, how hard do they really want to make uptake of XMP be? Am surprised they're even still considering offering full Java support, and not offering also anything in the way of support for glue languages such as Perl, Python, or Ruby.

Which leads to the question: Is anybody here using XMP and had any success to relate or lessons for the rest of us?

March 21, 2007

SIIA Executive FaceTime Webcast Series

We thought that this program might interest our CrossTech bloggers.

Howard Ratner, Chief Technology Officer, Executive Vice-President at Nature Publishing Group is on the agenda.

More information is available at: http://www.siia.net/content/events_face.asp.

You may register for this event at: http://www.siia.net/events/prereg.asp?eventid=709

SIIA Executive FaceTime Webcast Series

Howard Ratner, EVP/CTO, Nature Publishing Group
Wednesday, March 28, 2007
12:00PM – 1:30PM EST

The SIIA is pleased to announce that Howard Ratner of Nature Publishing Group will be our guest for the upcoming Executive FaceTime. This live webcast series features one-on-one conversations between leading industry executives and host Hal Espo. Participation is encouraged, the web audience is invited to submit questions posed through the host. Past guests include Tad Smith, CEO of Reed Business Information and L. Gordon Crovitz, EVP of Dow Jones & Company. Registration is free to SIIA members and non-members alike; to participate, you must register at http://www.siia.net/events/prereg.asp?eventid=709 by the end of the day on Tuesday, March 27th.

Howard Ratner
Howard Ratner is Chief Technology Officer, Executive Vice-President, for the Nature Publishing Group. Based in New York, Howard is in charge of NY operations and has global responsibilities for Production and Manufacturing, Web Development, Web Services, Content Services, and Information Technology across all NPG products. Howard's prior positions include Director, Electronic Publishing & Production for Springer-Verlag New York, as well as the North American Manager for LINK, and a member of the production staff at John Wiley & Sons. He also serves on the CrossRef board, PubMed Central, CORDS and LOCKSS advisory committees, and is a former chair for both the AAP/PSP DOI subcommittee and the DOI-X project.

Hal Espo
Hal Espo is President of Contextual Connections, LLC, a NYC-based consultancy which focuses exclusively in the digital services arena, including digital content, distribution, and applications. Hal has more than 25 years experience as an operating executive as well as a business and product development professional in the electronic information industry. He served as Chief Operating Officer of Index Stock Imagery, Inc., a web-based commercial stock photography and illustration vendor, and previously was the Chief Operating Officer at CORSEARCH, Inc., a trademark research firm serving Fortune 500 companies and law firms.


March 20, 2007

Agile Descriptions

Apologies to blog yet another of my posts to Nascent, this time on Agile Descriptions - a talk I gave the week before last before the LC Future of Bibliographic Control WG. (Don't worry - I shan't be making it a habit of this.) But certain aspects of the talk (powerpoint is here) may be interesting to this readership, in particular the slides on microformats and how these are tentatively being deployed on Nature Network, and also a detailed anatomy of OTMI files.

March 15, 2007

New-Look Web Feeds from Nature

I just posted this entry on Nascent, Nature's Web Publishing blog, about Nature's new look for web feeds which essentially boils down to our using the RSS 1.0 'mod_content' module to add in a rich content description for human consumption to complement our long-standing commitment to machine-readable descriptions. We are thus able to deliver full citation details in our RSS feeds as XHTML in CDATA sections for humans and as DC/PRISM properties for machines, the whole encoded in our feed format of choice - RSS 1.0. Note also that we declared our intention to publish parallel feeds in Atom which again will carry both human- and machine-readable citations. Further details on the RSS 1.0/Atom paired feeds will be posted here in the near future.

Perhaps of special note we have added in the DOI in our descriptions in standard CrossRef citation format and linked it to the DX resolver.

March 08, 2007

Indexing URLs

Leigh Dodds proposes in this post some solutions to persistent linking using web crawlers and social bookmarking.

"When I use del.icio.us, CiteULike, or Connotea or other social bookmarking service, I end up bookmarking the URL of the site I'm currently using. Its this specific URL that goes into their database and associated with user-assigned tags, etc.
...
A more generally applicable approach to addressing this issue, one that is not specific to academic publishing, would be to include, in each article page, embedded metadata that indicates the preferred bookmark link. The DOI could again be pressed into service as the preferred bookmarking link."

He's inviting feedback. I'd certainly like to hear what others may think of these suggestions.

March 02, 2007

Open Content

In light of my earlier post on OTMI, the mail copied below from Sebastian Hammer at Index Data about open content may be of interest. They are looking to compile a listing of web sources of open content - see this page for further details.

(Via XML4lib and other lists.)

"Hi All,

(apologies for any cross-posting)

At Index Data, we have long felt that there were really interesting
sources of open content out there that was not being utilized as well as
it could be because it was hidden away in websites. We're a software
company specializing in information retrieval applications, so
eventually we asked ourselves, 'what could we all do with this stuff if
it were exposed using our favorite open standards'.

We thought it was worth finding out, so we have set up processes to
regularly retrieve indexes of major open content resources, and make
them available using SRU and Z39.50. We've started with the Open Content
Alliance and Project Gutenberg (two quite different approaches to
producing free eBooks), Wikipedia, the Open Directory Project, and
OAIster. More is on the way.

Connection information and more details are available at
http://www.indexdata.com/opencontent/ .

The kind of metadata you can get from these sources varies. The Open
Content Alliance captures MARC records along with the scanned books,
which makes for excellent metadata. Many of the others produce some
variation of DublinCore. Our service, through either Z39.50 or SRU/W,
exposes both MARC (or MARCXML) and DublinCore in XML for all sources.

We've created a new mailing list to help inform people of changes to the
services, new resources available, etc. Signup at
http://lists.indexdata.dk/cgi-bin/mailman/listinfo/oclist/ .

We sincerely hope you will find these resources exciting and useful.
Feel free to get in touch if you have questions or input.

--Sebastian

--
Sebastian Hammer, Index Data
quinn@indexdata.com www.indexdata.com
Ph: (603) 209-6853 Fax: (866) 383-4485"

Sir TimBL's Testimony

Just in case anybody may not have seen this, here's the testimony of Sir Tim Berners-Lee yesterday before a House of Representatives Subcommittee on Telecommunications and the Internet. Required reading.

(Via this post yesterday in the Save the Internet blog.)

OTMI - An Update

We've just posted an update about OTMI (the Open Text Mining Interface) on our Web Publishing blog Nascent. This post details the following changes:

The OTMI content repository currently provides two years' worth of full text across five of our titles:

See the wiki for draft technical specs and for a sample script to generate the OTMI files. And feel free to add to the wiki on existing pages or create new pages as required.

We're very much looking forward to any feedback you may have on what we consider to be a very exciting new initiative for scholarly publishers.

eprintweb.org

IOP has created an instance of the arXiv repository calle eprintweb.org at http://www.eprintweb.org/. What's the difference from arXiv? From the eprinteweb.org site - "We have focused on your experience as a user, and have addressed issues of navigation, searching, personalization and presentation, in order to enhance that experience. We have also introduced reference linking across the entire content, and enhanced searching on all key fields, including institutional address."

The site looks very good and it's interesting to see a publisher developing a service directly engaging with a repository.

Some interesting points to note: There are DOI links to published articles - http://www.eprintweb.org/S/article/astro-ph/0603001 - which IOP gets from CrossRef. References in the pre-prints are also linked - http://www.eprintweb.org/S/article/astro-ph/0603001/refs

CrossRef will soon be making available an author/title only query for repositories to use to find DOIs for published papers when the preprint doesn't have the full citation. Many authors don't go back to their preprints to update the reference to the published version but the new CrossRef query will enable the repositories to do this automatically.