An interview about "Author IDs"
Over the past few months there seems to have been a sharp upturn in general interest around implementing an "author identifier" system for the scholarly community. This, in turn, has meant that more people have been getting in touch with us about our nascent "Contributor ID" project. The other day, after seeing my comments in the above thread, Martin Fenner asked if he could interview me about the issue of author identifiers for his blog on Nature Networks, Gobbledygook. I agreed and he posted the interview the other day.
I warn you ahead of time, I did ramble on a bit and the interview is long. There is a lot of stuff at the beginning about the DOI and it might seem off-topic, but I do think that there is a lot that we can learn from our DOI experiences which would apply to any author identifier. Just be thankful I didn't start talking about the privacy issues that will inevitably arise from any author identifier system. If I had, the interview would have probably gone on for another six pages ;-).
Anyway, as most of our membership knows, we have a pilot project underway to explore what it would take to launch a "CrossRef Contributor ID" system. We still haven't concluded whether it makes sense for us to do it, but one thing is clear from the recent discussions we've had and that is that, if we don't do it, somebody else almost certainly will.

Comments
I agree with the OpenID angle some were suggesting in that thread you link to. A centralized system is doomed to fail.
Posted by: bdarcus | February 19, 2009 12:29 AM
Geoffrey, thanks again for the interview. It wasn't too long, the topic is just complicated. And I'm looking forward to hear what you think about privacy.
Posted by: Martin Fenner | February 19, 2009 04:29 PM
Hi Geoffrey, enjoyed the interview - the rambling was good! Was wondering, do you have any specific references for "State-of-the-art mechanisms for automatic disambiguation of authors from a defined corpus can be 96-97% accurate"?
Posted by: Duncan Hull | February 20, 2009 02:54 AM
Martin,
A teaser on some privacy issues...
Most people take it as a given that it would be wonderful to be able to instantly and unambiguously determine precisely who wrote what, who is an expert in what topic, etc. That is, until you point out that this facility would be available to those with a less-than-salubrious agenda as well. Imagine how delighted some would be to be able to be able to instantly find out things like:
- All the researchers in Texas who do work involving stem cells.
- All the researchers in Oxford who seem to do work involving animal testing.
Then suddenly things can get scary.
My general feeling is that, in conversations about author identifiers, the words "open' and "privacy" are thrown around without anybody spending too much time reflecting on how the two desiderata might sometimes conflict with each other.
Note carefully that, in pointing this out, I am *not* saying that I think the solution is a "closed" or "proprietary" system.
--------
Posted by: gbilder | February 21, 2009 02:24 AM
Duncan,
Try
On, B., Lee, D., Kang, J., and Mitra, P. 2005. Comparative study of name disambiguation problem using a scalable blocking-based
framework. In Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (Denver, CO, USA, June 07 - 11, 2005). JCDL '05. ACM Press, New York, NY, 344-353. http://dx.doi.org/10.1145/1065385.1065463
Note that the range has been confirmed (anecdotally, I'm afraid) in my interviews with publishers and other players in the industry who have been doing work on this. In fact, one recent interviewee claims 99% accuracy, so I should revise what I say about this a bit, though will note that even at 99% accuracy the number of errors one would have to deal with (on an industry-wide scale) is formidable.
One thing I should also note is that author disambiguation seems to fall into the "more data beats better algorithms" category of problem. That is, somebody with bibliographic metadata, citation data, abstracts, fulltext and manuscript tracking registration information is likely to do a lot better than somebody who is just working with bibliographic metadata and abstracts.
Posted by: gbilder | February 21, 2009 01:43 PM