February 5, 2007, Washington DC
CrossRef invited a number of people to attend an information gathering session on the topic of Author IDs. The purpose of the meeting was to determine:
- About whether there is an industry need for a central or federated contributor id registry;
- whether CrossRef should have a role in creating such a registry;
- how to proceed in a way that builds upon existing systems and standards.
Jeff Baer, CSA
Judith Barnsby, IOPP
Geoff Bilder, CrossRef
Amy Brand, CrossRef
David Brown, British Library
Richard Cave, PLoS (remote)
Bill Carden, ScholarOne
Gregg Gordon, SSRN
Gerry Grenier, IEEE
Michael Healy, BISG (remote)
Helen Henderson, Ringgold
Thomas Hickey, OCLC (remote)
Terry Hulburt, IOPP
Tim Ingoldsby, AIP
Ruth Jones, Britsh Library
Marl Land, Parity
Dave Martinson, ACS
Georgios Papadapoulos, Atypon (with two colleagues)
Jim Pringle, Thomson
Chris Rosin, Parity
Tim Ryan, Wiley
Philippa Scoones, Blackwell
Chris Shillum, Elsevier
Neil Smalheiser, UIC (remote)
Barbara Tillett, LoC
Vetle Torvik, UIC (remote)
Charles Trowbridge, ACS
Amanda Ward, Nature (remote)
Stu Weibel, OCLC (remote)
David Williamson, LoC
Amy Brand opened the meeting and welcomed attendees. She said the goal of the meeting was really nothing more than to launch a discussion on a topic of author identifiers and hear from participants re their views and experiences on unique identifiers for individuals — be they authors, contributors, or otherwise. We went around the table and everyone introduced themselves. Amy then introduced Geoff Bilder as moderator of the meeting.
Geoffrey Bilder said that CrossRef’s members had indicated that they would like CrossRef to explore whether it could play a role in creating an author identification system. The members feel that an “author DOI” scheme would help them with production and editorial issues. They also recognize that such a scheme could fuel numerous downstream applications. Geoff apologized for sounding like Rumsfeld and said, we know that there is a lot that we don’t know, but we don’t know exactly what we don’t know. We have just started this project and we wanted to get some feedback from various groups concerned with scholarly publishing in order to understand what people would like to see in regards to author identification schemes and what initiatives/efforts we need to be aware of. He commented that the currently assembled group failed to include the open web community, and their input would be important too as this project develops.
The meeting then turned to short project summaries from others.
Jim Pringle gave a short PPT presentation (attached) and reported that Thomson first started creating its own author ids in 2000, in relation to the launch of its Highly Cited service. The focus for Thomson in this area has been on author disambiguation. Jim said that the focus for CrossRef in this area would be a system that could respond to the question “who are you and what have you written”; he also raised concern about matters of author privacy.
Michael Healy then discussed the International Standard Party Identifier (ISPI). ISO TC 46/SC 9 is developing ISPI as a new international identification system for the parties (persons and corporate bodies) involved in the creation and production of content entities. Work on the ISPI project began in August 2006 when the New Work Item proposal was approved by the member bodies of ISO TC 46/SC 9. The first meeting of the ISPI project group was held at CISAC’s offices in Paris on September 12, 2006.
This project has strong representation the library sector, RRO’s, booksellers, music and film/TV industries represented as well. Mr. René Lloret Linares from CISAC (International Confederation of Societies of Authors and Composers) chairs the group; until now CISAC has been using a proprietary id scheme and would like to move to use of an open standard to identify all contributors and creators. Michael was asked whether membership in the project group was open, and he replied that anyone can attend meetings as observers but that voting is restricted to those nominated by their own national standards organization.
Chris Shillum then asked the group to think about developed use cases for the publishing industry, and how they differ from potential ISPI applications.
Helen Henderson reported on the Journals Supply Chain project, a pilot that aims to discover whether the creation of a standard, commonly used identifier for Institutions (customer ids) will be beneficial to parties involved in the journal supply chain. The pilot models interactions between each party — library, publisher, agent. 35 publishers are participating thus far. Helen also said there is a clear need for sub-institutional level ids. Helen also pointed out the value of associating author and institutional ids. On the topic of institutions, Tim Ingoldsby pointed out that both academic and corporate institutions are important.
Chris Rosin said Parity is working on author merger and disambiguation as core use cases of author ids for its publisher clients. In particular, they have developed automated merging of instances into profiles, proceeding with conservative bias on what constitutes a match/merge. Parity is also looking at applying author cv’s onto profiles. This will require contributors to participate, and they will need to make it as easy as possible for contributors. Chris said that authentication, trust, and privacy are key considerations; even collecting public information in one place raises privacy issues.[slides]
Judith Barnsby pointed out that the UK has stronger data protection rules than the US, re privacy.
Discussion among the group at this point in the meeting resulted in identifying two different areas in author id assignment — (1) ongoing assignment, (2) retroactive assignment. Geoff said this distinction was useful for CrossRef, who could more easily address ongoing assignment via publishers working directly with authors.
Neil Smalheiser, a neuroscientist at UIC, reported on the Arrowsmith Project, a statistical model based on multiple features of the Medline database. The goal of the model is to predict the probability that any two papers are written by the same person. The project’s “Authority” tool weighs criteria such as researcher affiliation, co-author names, journal title, and medical subject headings to identify the papers most likely written by a target author. For details: arrowsmith.psych.uic.edu/arrowsmith_uic/index.html
David Williamson of LoC said he was working on name authority files, using ONIX metadata. Barbara Tillet of LoC spoke about authority files and related efforts in library world, which uses the control number, one type of unique id. She reported that IFLA (International Federation of Library Associations) has a group working on how to share authority numbers, which has actually been in discussion since the 1970s; there is to be an IFLA-IPA meeting in April 2007. The library community is eager to share what it knows and what it has developed this far. Barbara suggested that use of Dublin Core format here may be the best way to go. Different communities will no doubt need different ids. What is needed in the library community is an international, multi-lingual solution, based on unicode, connecting regional authority files. Publishers will want to take advantage of library author-ity files for retrospective identifications.
Thomas Hickey of OCLC mentioned the WorldCat Identity service, which summarizes information for 20 million authors searchable in WorldCat.
Gerry Grenier reported that IEEE was about to implement its own author disambiguation and id system, and he offered that this metadata could be fed into a CrossRef system.
Different participants had different views on whether the goal here should be a “light and non-centralized” (or federated) approach versus a centralized registry with one place to link authors across all publishers, versus a hybrid — centralized source to handout unique id, but publisher data could be distributed. There could also be a network of registration agencies working in a federated system.
Different participants also had different views on CrossRef’s role. Several publishers at the meeting supported CrossRef’s role, especially in the STM space, whereas there was concern raised among some parties about whether CrossRef was an appropriate choice for a system that will need to be “available everywhere to everybody”, and others re-iterated the importance of giving the academic community a voice in the development of such a service
Discussion then turned to use cases — the question being, what problems would having an author id help you solve in your organization?
USE CASES ARTICULATED AT MEETING:
- For RROs, known use case is to facilitate distribution of monies owed to authors;;
- for booksellers, disambiguation in search;;
- to understand the provenance of documents;
- search — to find works for particular person; self presentation — how can I effectively present myself and my work to the world?;
- cross-walks — associating various life sciences ids, such as PubChem;
- identity of society members;
- identity of research funding institutions;
- disambiguation and attribution;
- linking authors and institutions;
- for enhancing peer review system — need unique ids to share information with various departments;
- to better know the value of our authors — for activities such as peer review, tracking stats on authors, article downloads, and individualized or personalized services;
- with a central registry, author only has one place they have to update their information;
- authors will want the information to be portable when they move from inst to another — “where is Jeff Smith now?” is one such question;
- to associate connected authors with one another;
- to aggregate info on where (what institution) research is being done on a particular topic;
- privacy can be enhanced with author DOIs;
- sharing info from library to library;
- cluster all the works of a particular person for search purposes;
- stats about authors — “how many times has this author tried and been rejected from Nature?” for instance.
NEXT STEPS: Please watch the CrossTech blog for ongoing discussion