2016 upcoming events – we’re out and about!

Check out the events below where Crossref will attend or present in 2016. We have been busy over the past few months, and we have more planned for the rest of year. If we will be at a place near you, please come see us (and support these organizations and events)!

Upcoming Events
SHARE Community Meeting, July 11-14, Charlottesville, VA, USA
Crossref Outreach Day – July 19-21 – Seoul, South Korea
CASE 2016 Conference – July 20-22 – Seoul, South Korea
ACSE Annual Meeting 2016 – August 10-11 – Dubai, UAE
Vivo 2016 Conference – August 17-19 – Denver CO, USA
SciDataCon – September 11-17 – Denver CO, USA
ALPSP – September 14-16 – London, UK
OASPA – September 21-22 – Arlington VA, USA
3:AM Conference – September 26 – 28 – Bucharest, Romania
ORCID Outreach Conference – October 5-6 – Washington DC, USA
Frankfurt Book Fair – October 19-23 – Frankfurt, Germany (Hall 4.2, Stand #4.2 M 85)
**Crossref Annual Community Meeting #Crossref16 – November 1-2 – London, UK**
PIDapalooza – November 9-10 – Reykjavik, Iceland
OpenCon 2016 – November 12-14 – Washington DC, USA
STM Digital Publishing Conference – December 6-8 – London, UK

The Crossref outreach team will host a number of outreach events around the globe. Updates about events are shared through social media so please connect with us via @CrossrefOrg.
 

DOI-like strings and fake DOIs

TL;DR

Crossref discourages our members from using DOI-like strings or fake DOIs.

discouraged

Details

Recently we have seen quite a bit of debate around the use of so-called “fake-DOIs.” We have also been quoted as saying that we discourage the use of “fake DOIs” or “DOI-like strings”. This post outlines some of the cases in which we’ve seen fake DOIs used and why we recommend against doing so.

Using DOI-like strings as internal identifiers

Some of our members use DOI-like strings as internal identifiers for their manuscript tracking systems. These only get registered as real DOIs with Crossref once an article is published. This seems relatively harmless, except that, frequently, the unregistered DOI-like strings for unpublished (e.g. under review or rejected manuscripts) content ‘escape’ into the public as well. People attempting to use these DOI-like strings get understandably confused and angry when they don’t resolve or otherwise work as DOIs. After years of experiencing the frustration that these DOI-like things cause, we have taken to recommending that our members not use DOI-like strings as their internal identifiers.

Using DOI-like strings in access control compliance applications

We’ve also had members use DOI-like strings as the basis for systems that they use to detect and block tools designed to bypass the member’s access control system and bulk-download content. The methods employed by our members have fallen into two broad categories:

  • Spider (or robot) traps.
  • Proxy bait.

Spider traps

spider trap

A “spider trap” is essentially a tripwire that allows a site owner to detect when a spider/robot is crawling their site to download content. The technique involves embedding a special trigger URL in a public page on a web site. The URL is embedded such that a normal user should not be able see it or follow it, but an automated bot (aka “spider”) will detect it and follow it. The theory is that when one of these trap URLs is followed, the website owner can then conclude that the ip address from which it was followed harbours a bot and take action. Usually the action is to inform the organisation from which the bot is connecting and to ask them to block it. But sometimes triggering a spider trap has resulted in the IP address associated with it being instantly cut off. This, in turn, can affect an entire university’s access to said member’s content.

When a spider/bot trap includes a DOI-like string, then we have seen some particularly pernicious problems as they can trip-up legitimate tools and activities as well. For example, a bibliographic management browser plugin might automatically extract DOIs and retrieve metadata on pages visited by a researcher. If the plugin were to pick up one of these spider traps DOI-like strings, it might inadvertently trigger the researcher being blocked- or worse- the researcher’s entire university being blocked. In the past, this has even been a problem for Crossref itself. We periodically run tools to test DOI resolution and to ensure that our members are properly displaying DOIs, CrossMarks, and metadata as per their member obligations. We’ve occasionally been blocked when we ran across the spider traps as well.

Proxy bait

proxy bait

Using proxy bait is similar to using a spider trap, but it has an important difference. It does not involve embedding specially crafted DOI like strings on the member’s website itself. The DOI-like strings are instead fed directly to tools designed to subvert the member’s access control systems. These tools, in turn, use proxies on a subscriber’s network to retrieve the “bait” DOI-like string. When the member sees one of these special DOI-like strings being requested from a particular institution, they then know that said institution’s network harbours a proxy. In theory this technique never exposes the DOI-like strings to the public and automated tools should not be able to stumble upon them. However, recently one of our members had some of these DOI-like strings “escape” into the public and at least one of them was indexed by Google. The problem was compounded because people clicking on these DOI-like strings sometimes ended having their university’s IP address banned from the member’s web site. As you can imagine, there has been a lot of gnashing of teeth. We are convinced, in this case, that the member was doing their best to make sure the DOI-like strings never entered the public. But they did nonetheless. We think this just underscores how hard it is to ensure DOI-like strings remain private and why we recommend our members not use them.

Pedantry and terminology

Notice that we have not used the phrase “fake DOI” yet. This is because, internally, at least, we have distinguished between “DOI-like strings” and “fake DOIs.” The terminology might be daft, but it is what we’ve used in the past and some of our members at least will be familiar with it. We don’t expect anybody outside of Crossref to know this.

To us, the following is not a DOI:

10.5454/JPSv1i220161014

It is simply a string of alphanumeric characters that copy the DOI syntax. We call them “DOI-like strings.” It is not registered with any DOI registration agency and one cannot lookup metadata for it. If you try to “resolve” it, you will simply get an error. Here, you can try it. Don’t worry- clicking on it will not disable access for your university.

http://doi.org/10.5454/JPSv1i220161014

The following is what we have sometimes called a “fake DOI”

10.5555/12345678

It is registered with Crossref, resolves to a fake article in a fake journal called The Journal of Psychoceramics (the study of Cracked Pots) run by a fictitious author (Josiah Carberry) who has a fake ORCID (http://orcid.org/0000-0002-1825-0097) but who is affiliated with a real university (Brown University).

Again, you can try it.

http://doi.org/10.5555/12345678

And you can even look up metadata for it.

http://api.crossref.org/works/10.5555/12345678

Our dirty little secret is that this “fake DOI” was registered and is controlled by Crossref.

Why does this exist? Aren’t we subverting the scholarly record? Isn’t this awful? Aren’t we at the very least hypocrites? And how does a real university feel about having this fake author and journal associated with them?

Well- the DOI is using a prefix that we use for testing. It follows a long tradition of test identifiers starting with “5”. Fake phone numbers in the US start with “555”. Many credit card companies reserve fake numbers starting with “5”. For example, Mastercard’s are “5555555555554444” and “5105105105105100.”

We have created this fake DOI, the fake journal and the fake ORCID so that we can test our systems and demonstrate interoperable features and tools. The fake author, Josiah Carberry, is a long-running joke at Brown University. He even has a Wikipedia entry. There are also a lot of other DOIs under the test prefix “5555.”

We acknowledge that the term “fake DOI” might not be the best in this case- but it is a term we’ve used internally at least and it is worth distinguishing it from the case of DOI-like strings mentioned above.

But back to the important stuff….

As far as we know, none of our members has ever registered a “fake DOI” (as defined above) in order to detect and prevent the circumvention of their access control systems. If they had, we would consider it much more serious than the mere creation of DOI-like strings. The information associated with registered DOIs becomes part of the persistent scholarly citation record. Many, many third party systems and tools make use of our API and metadata including bibliographic management tools, TDM tools, CRIS systems, altmetrics services, etc. It would be a very bad thing if people started to worry that the legitimate use of registered DOIs could inadvertently block them from accessing content. Crossref DOIs are designed to encourage discovery and access- not block it.

And again, we have absolutely no evidence that any of our members has registered fake DOIs.

But just in case, we will continue to discourage our members from using DOI-like strings and/or registering fake DOIs.

This has been a public service announcement from the identifier dweebs at Crossref.

Image Credits

Unless otherwise noted, included images purchased from The Noun Project

Hello preprints, what’s your story?

The role of preprints

Crossref provides infrastructure services and therefore we support scholarly communications as it evolves over time. Today, preprints are increasingly discussed as a valuable part of the research story (beyond physics, math, and a small set of sub-disciplines). Preprints might play a positive role in catalyzing research discovery, establishing priority of discoveries and ideas, facilitating career advancement, and improving the culture of communication within the scholarly community.

As we shared in an earlier blog post last month, members will be able to register Crossref DOIs for preprints later this year. We will connect the full history of a research work, and ensure the citation record is clear and up-to-date. As we build out this new content type, we’d love to hear how the research community envisions what preprints will do.

What’s your story, preprint?

So we can develop a service that supports the whole host of potential uses for all stakeholders, we ask the entire research community to contribute preprints user stories. User stories are concrete descriptions of a specific need, typically used in technology development: As a [x], I want to [y] to that I can [z]. User stories take the “end-user’s” perspective as they focus on a discrete result and its value. They are essential when implementing solutions that must meet a wide range of needs, across a diverse set of constituents. For example:

As an author, I want to share results before my paper is submitted to a journal so that I can get rapid feedback on it and make improvements before publication.

As a researcher who is part of a tenure and promotion committee or funder review panel, I want to know the reach of early results published from the candidate so that I can more quickly track the impact of results, rather than relying only on journal articles that take much longer to publish.

As a journal publisher, I want to know whether a preprint exists for a manuscript submitted to me so that I can decide whether I will accept the submission based on my editorial policy.

We aim to assemble a full catalog that cuts across research disciplines and stakeholder groups. We want to hear from you: researchers, publishers, funding agencies, scholarly societies, academic institutions, technology providers, other infrastructure providers, etc.

Tell us your story here

To ensure that your needs are included, please send us your user stories via this user story “deposit” form. They will be added to the full registry of contributions from the community, which we hope will serve as a key resource for all those developing preprints into a core part of scholarly communications (e.g., ASAPbio, etc.).

Our memories of #SSP2016

Last week a bunch of Crossref’s staff traveled to the 2016 Society for Scholarly Publishing Annual Meeting in Vancouver, BC.  After we returned en masse, all nine of us put our heads together to share some of our personal memories of the event.   

Enjoying-the-High-Wire-Run-Walk-at-SSP2016_
Crossref’s Rosa and Susan at the Fun Walk/Run sponsored by High Wire. 5K before breakfast!

On Cybersecurity and the Scholarly World —“The session described the many and complicated security threats that IT systems face and how threat detection and defense is a constantly ongoing activity. Certainly system administrators are challenged with the technology issues that build firewalls, block intrusions and divert disruptive activity. But perhaps even more important are the social issues that must be managed to develop an informed user community that is immune to the less technical but probably more effective hacks like phishing for user passwords.” Continue reading “Our memories of #SSP2016”

HTTPS and Wikipedia

This is a joint blog post with Dario Taraborelli, coming from WikiCite 2016.

In 2014 we were taking our first steps along the path that would lead us to Crossref Event Data. At this time I started looking into the DOI resolution logs to see if we could get any interesting information out of them. This project, which became Chronograph, showed which domains were driving traffic to Crossref DOIs.

You can read about the latest results from this analysis in the “Where do DOI Clicks Come From” blog post.

Having this data tells us, amongst other things:

  • where people are using DOIs in unexpected places
  • where people are using DOIs in unexpected ways
  • where we knew people were using DOIs but the links are more popular than we realised

Continue reading “HTTPS and Wikipedia”