« January 2007 | Main | March 2007 »

February 23, 2007

"Spinning Around"

There's a great exposition of FRBR (the Functional Requirements for Bibliographic Records model "work -> expression -> manifestation -> item") in this post from The FRBR Blog on De Revolutionibus as described in The Book Nobody Read: Chasing the Revolutions of Nicolaus Copernicus by Owen Gingerich. See post for the background and here (103 KB PNG) for a map of the FRBR relationships.

(Yes, and a twinkly star in the title too. ;~)

February 20, 2007

Kay Sera Sera

Not specifically publishing-related, but here is a fun rant interview with Alan Kay titled The PC Must Be Revamped—Now.

My favorite bit...

"...in the last few years I've been asking computer scientists and programmers whether they've ever typed E-N-G-E-L-B-A-R-T into Google-and none of them have. I don't think you could find a physicist who has not gone back and tried to find out what Newton actually did. It's unimaginable. Yet the computing profession acts as if there isn't anything to learn from the past, so most people haven't gone back and referenced what Engelbart thought. "

February 19, 2007

At Last! URIs for InChI

The info registry has now added in the InChI namespace (see registry entry here) which now means that chemical compounds identified by InChIs (IUPAC's International Chemical Identifiers) are expressible in URI form and thus amenable to many Web-based description technologies that use URI as the means to identify objects, e.g. XLink, RDF, etc. As an example, the InChI identifier for naphthalene is

InChI=1/C10H8/c1-2-6-10-8-4-3-7-9(10)5-1/h1-8H

and can now be legitimately expressed in URI form as

info:inchi/InChI=1/C10H8/c1-2-6-10-8-4-3-7-9(10)5-1/h1-8H

The info URI scheme exists to support legacy namespaces get a leg up onto the Web. Registered namespaces include PubMed identifiers, DOIs, handles, ADS bibcodes, etc. Increasingly we'll be expecting to see identifiers (both new and old) represented in a common form - URI.

Stick this in your pipe...

Rob Cornelius has a practical little demo of using Yahoo! pipes against some Ingenta feeds.

Like Tony, I keep experiencing speed/stability problems while accessing pipes so I haven't yet become a crack-pipes-head.

"We're sorry..."

Update: All apologies to Google. Apparently this was a problem at our end which our IT folks are currently investigating. (And I thought it was just me. :)

Just managed to get this page:

"Google Error

We're sorry...

... but your query looks similar to automated requests from a computer virus or spyware application. To protect our users, we can't process your request right now.

We'll restore your access as quickly as possible, so try again soon. In the meantime, if you suspect that your computer or network has been infected, you might want to run a virus checker or spyware remover to make sure that your systems are free of viruses and other spurious software.

We apologize for the inconvenience, and hope we'll see you again on Google.
To continue searching, please type the characters you see below:"

And my search request?

ark

(Actual query is here as argument to the continue parameter.)

Was hoping to find results related to the The ARK Persistent Identifier Scheme. Maybe I missed something but I'm not impressed.

February 17, 2007

OpenURL Podcast

Jon Udell interviews Dan Chudnov about OpenURL, see his blog entry: "A conversation with Dan Chudnov about OpenURL, context-sensitive linking, and digital archiving". The podcast of the interview is available here.

Interesting to see these kind of subjects beginning to be covered by a respected technology writer like Jon. As he says in his post:

"I have ventured into this confusing landscape because I think that the issues that libraries and academic publishers are wrestling with — persistent long-term storage, permanent URLs, reliable citation indexing and analysis — are ones that will matter to many businesses and individuals. As we project our corporate, professional, and personal identities onto the web, we’ll start to see that the long-term stability of those projections is valuable and worth paying for."

February 15, 2007

OpenDocument 1.1 is OASIS Standard

From the OASIS Press Release:

"Boston, MA, USA; 13 February 2007 -- OASIS, the international standards consortium, today announced that its members have approved version 1.1 of the Open Document Format for Office Applications (OpenDocument) as an OASIS Standard, a status that signifies the highest level of ratification."

February 14, 2007

CrossRef Author ID meeting

February 5, 2007, Washington DC

CrossRef invited a number of people to attend an information gathering session on the topic of Author IDs. The purpose of the meeting was to determine:

  • About whether there is an industry need for a central or federated contributor id registry;
  • whether CrossRef should have a role in creating such a registry;
  • how to proceed in a way that builds upon existing systems and standards.

In attendance:

Jeff Baer, CSA
Judith Barnsby, IOPP
Geoff Bilder, CrossRef
Amy Brand, CrossRef
David Brown, British Library
Richard Cave, PLoS (remote)
Bill Carden, ScholarOne
Gregg Gordon, SSRN
Gerry Grenier, IEEE
Michael Healy, BISG (remote)
Helen Henderson, Ringgold
Thomas Hickey, OCLC (remote)
Terry Hulburt, IOPP
Tim Ingoldsby, AIP
Ruth Jones, Britsh Library
Marl Land, Parity
Dave Martinson, ACS
Georgios Papadapoulos, Atypon (with two colleagues)
Jim Pringle, Thomson
Chris Rosin, Parity
Tim Ryan, Wiley
Philippa Scoones, Blackwell
Chris Shillum, Elsevier
Neil Smalheiser, UIC (remote)
Barbara Tillett, LoC
Vetle Torvik, UIC (remote)
Charles Trowbridge, ACS
Amanda Ward, Nature (remote)
Stu Weibel, OCLC (remote)
David Williamson, LoC

Notes

Amy Brand opened the meeting and welcomed attendees. She said the goal of the meeting was really nothing more than to launch a discussion on a topic of author identifiers and hear from participants re their views and experiences on unique identifiers for individuals -- be they authors, contributors, or otherwise. We went around the table and everyone introduced themselves. Amy then introduced Geoff Bilder as moderator of the meeting.

Geoffrey Bilder said that CrossRef's members had indicated that they would like CrossRef to explore whether it could play a role in creating an author identification system. The members feel that an "author DOI" scheme would help them with production and editorial issues. They also recognize that such a scheme could fuel numerous downstream applications. Geoff apologized for sounding like Rumsfeld and said, we know that there is a lot that we don't know, but we don't know exactly what we don't know. We have just started this project and we wanted to get some feedback from various groups concerned with scholarly publishing in order to understand what people would like to see in regards to author identification schemes and what initiatives/efforts we need to be aware of. He commented that the currently assembled group failed to include the open web community, and their input would be important too as this project develops.

The meeting then turned to short project summaries from others.

Project Summaries

Jim Pringle gave a short PPT presentation (attached) and reported that Thomson first started creating its own author ids in 2000, in relation to the launch of its Highly Cited service. The focus for Thomson in this area has been on author disambiguation. Jim said that the focus for CrossRef in this area would be a system that could respond to the question "who are you and what have you written"; he also raised concern about matters of author privacy.

Michael Healy then discussed the International Standard Party Identifier (ISPI). ISO TC 46/SC 9 is developing ISPI as a new international identification system for the parties (persons and corporate bodies) involved in the creation and production of content entities. Work on the ISPI project began in August 2006 when the New Work Item proposal was approved by the member bodies of ISO TC 46/SC 9. The first meeting of the ISPI project group was held at CISAC's offices in Paris on September 12, 2006.

This project has strong representation the library sector, RRO's, booksellers, music and film/TV industries represented as well. Mr. René Lloret Linares from CISAC (International Confederation of Societies of Authors and Composers) chairs the group; until now CISAC has been using a proprietary id scheme and would like to move to use of an open standard to identify all contributors and creators. Michael was asked whether membership in the project group was open, and he replied that anyone can attend meetings as observers but that voting is restricted to those nominated by their own national standards organization.

Chris Shillum then asked the group to think about developed use cases for the publishing industry, and how they differ from potential ISPI applications.

Helen Henderson reported on the Journals Supply Chain project, a pilot that aims to discover whether the creation of a standard, commonly used identifier for Institutions (customer ids) will be beneficial to parties involved in the journal supply chain. The pilot models interactions between each party -- library, publisher, agent. 35 publishers are participating thus far. Helen also said there is a clear need for sub-institutional level ids. Helen also pointed out the value of associating author and institutional ids. On the topic of institutions, Tim Ingoldsby pointed out that both academic and corporate institutions are important.

Chris Rosin said Parity is working on author merger and disambiguation as core use cases of author ids for its publisher clients. In particular, they have developed automated merging of instances into profiles, proceeding with conservative bias on what constitutes a match/merge. Parity is also looking at applying author cv's onto profiles. This will require contributors to participate, and they will need to make it as easy as possible for contributors. Chris said that authentication, trust, and privacy are key considerations; even collecting public information in one place raises privacy issues.[slides]


Judith Barnsby pointed out that the UK has stronger data protection rules than the US, re privacy.

Discussion among the group at this point in the meeting resulted in identifying two different areas in author id assignment -- (1) ongoing assignment, (2) retroactive assignment. Geoff said this distinction was useful for CrossRef, who could more easily address ongoing assignment via publishers working directly with authors.

Neil Smalheiser, a neuroscientist at UIC, reported on the Arrowsmith Project, a statistical model based on multiple features of the Medline database. The goal of the model is to predict the probability that any two papers are written by the same person. The project's "Authority" tool weighs criteria such as researcher affiliation, co-author names, journal title, and medical subject headings to identify the papers most likely written by a target author. For details: arrowsmith.psych.uic.edu/arrowsmith_uic/index.html

David Williamson of LoC said he was working on name authority files, using ONIX metadata. Barbara Tillet of LoC spoke about authority files and related efforts in library world, which uses the control number, one type of unique id. She reported that IFLA (International Federation of Library Associations) has a group working on how to share authority numbers, which has actually been in discussion since the 1970s; there is to be an IFLA-IPA meeting in April 2007. The library community is eager to share what it knows and what it has developed this far. Barbara suggested that use of Dublin Core format here may be the best way to go. Different communities will no doubt need different ids. What is needed in the library community is an international, multi-lingual solution, based on unicode, connecting regional authority files. Publishers will want to take advantage of library author-ity files for retrospective identifications.

Thomas Hickey of OCLC mentioned the WorldCat Identity service, which summarizes information for 20 million authors searchable in WorldCat.

Gerry Grenier reported that IEEE was about to implement its own author disambiguation and id system, and he offered that this metadata could be fed into a CrossRef system.

Different participants had different views on whether the goal here should be a "light and non-centralized" (or federated) approach versus a centralized registry with one place to link authors across all publishers, versus a hybrid --- centralized source to handout unique id, but publisher data could be distributed. There could also be a network of registration agencies working in a federated system.

Different participants also had different views on CrossRef's role. Several publishers at the meeting supported CrossRef's role, especially in the STM space, whereas there was concern raised among some parties about whether CrossRef was an appropriate choice for a system that will need to be "available everywhere to everybody", and others re-iterated the importance of giving the academic community a voice in the development of such a service

Discussion then turned to use cases -- the question being, what problems would having an author id help you solve in your organization?

USE CASES ARTICULATED AT MEETING:


  • For RROs, known use case is to facilitate distribution of monies owed to authors;;

  • for booksellers, disambiguation in search;;

  • to understand the provenance of documents;

  • search -- to find works for particular person; self presentation -- how can I effectively present myself and my work to the world?;

  • cross-walks -- associating various life sciences ids, such as PubChem;

  • identity of society members;

  • identity of research funding institutions;

  • disambiguation and attribution;

  • linking authors and institutions;

  • for enhancing peer review system -- need unique ids to share information with various departments;

  • to better know the value of our authors -- for activities such as peer review, tracking stats on authors, article downloads, and individualized or personalized services;

  • with a central registry, author only has one place they have to update their information;

  • authors will want the information to be portable when they move from inst to another -- "where is Jeff Smith now?" is one such question;

  • to associate connected authors with one another;

  • to aggregate info on where (what institution) research is being done on a particular topic;

  • privacy can be enhanced with author DOIs;

  • sharing info from library to library;

  • cluster all the works of a particular person for search purposes;

  • stats about authors -- "how many times has this author tried and been rejected from Nature?" for instance.


NEXT STEPS: Please watch the CrossTech blog for ongoing discussion

February 8, 2007

Remixing RSS

Niall Kennedy has a post about the newly released Yahoo! Pipes. As he says:

"Yahoo! Pipes lets any Yahoo! registered user enter a set of data inputs and filter their results. You might splice a feed of your latest bookmarks on del.icio.us with the latest posts from your blog and your latest photographs posted to Flickr."

He also warns about possible implications for web publishers:

"Yahoo! Pipes makes it easy to remove advertising from feeds or otherwise reformat your content."

Note: As yet, I have not been able to access the site. Interested to learn if anybody else has and what their experiences have been.

RSS Validator in the Spotlight

Sam Ruby responds to Brian Kelly's post about the RSS Validator and its treatment of RSS 1.0, or rather, RSS 1.0 modules. As Ruby notes:

"There is no question that RSS 1.0 is widely deployed. RSS 1.0 has a minimal core. The validation for that core is pretty solid."

Not sure if I'd seen that RSS comparison table before, but it is reassuring. (Oh, and see the really simple case off to the right. ;)

Good point, anyway about contributing test cases. I guess we should really submit a PRISM test case. And yes, the Validator is somewhat buggy as some recent testing confirms. On which more later.

Microsoft to Support OpenID

Kim Cameron, Microsoft's Identity Czar and member of the Identity Gang, comments on Microsoft's announcement that they will support OpenID. Another sign that federated identity schemes are gaining traction and OpenID is likely to emerge as a standard the publishers are going to want to grapple with soon.

This follows Doc Searl's comments on the notion of "Creator Relationship Management" where he speculates that the techniques being used in federated identity schemes and the Creative Commons can be combined to create a new "silo-free" value chain amongst creators, producers and distributors.

February 5, 2007

SearchULike

Nelson Minar has a short post on Google's Search History 'feature' and how it can be used to enhance your search experience. I guess that should be SearchULike.

What's My Link?

Simon Willison has a great piece here about disambiguating URLs. Best practice on creating and publishing URLs is obviously something of interest to any publisher. See this excerpt from Simon's post:

"Here's a random example, plucked from today's del.icio.us popular. convinceme.net is a new online debating site (tag clouds, gradient fills, rounded corners). It's listed in del.icio.us a total of four times!

* http://www.convinceme.net/ has 36 saves
* http://www.convinceme.net/index.php has 148 saves
* http://convinceme.net/ has 211 saves
* http://convinceme.net/index.php has 38 saves

Combined that's 433 saves; much more impressive, and more likely to end up at the top of a social sharing sites."

February 2, 2007

comments and trackbacks

Due to spam the comments and trackbacks were turned off on the blog since last week. Comments can be moderated so they have now been turned back on. Glad to see postings picking up.

Hooray!

Somebody is both reading (and recommending) this blog - see Lorcan's post here. Just my opinion but would be really good to see more librarians following this in order to arrive at better consensus.

February 1, 2007

RSC launches semantic enrichment of journal articles

The RSC has gone live today with the results of Project Prospect, introducing semantic enrichment of journal articles across all our titles. I'm pretty sure we're the first primary research publisher to do anything of this scope.

We’re identifying chemical compounds and providing synonyms, InChIs (IUPAC's Chemical Identifier), downloadable CML (Chemical Markup Language), SMILES strings and 2D images for these compounds. In terms of subject area we're marking up terms from the IUPAC Gold Book, and also Open Biomedical Ontology terms from the Gene, Cell, and Sequence Ontologies. All this stuff is currently available from an enhanced HTML view, with the additional information and links to related articles accessed via highlights in the article and popups.

The mark-up tools have been developed together with UK academics based at the Unilever Centre of Molecular Informatics and the Computing Laboratory at Cambridge University.

At launch we have about 100 articles from our 2007 publications, with the enhanced views currently free-to-air. Feel free to take a look.