Where do DOI clicks come from?

As part of our Event Data work we’ve been investigating where DOI resolutions come from. A resolution could be someone clicking a DOI hyperlink, or a search engine spider gathering data or a publisher’s system performing its duties. Our server logs tell us every time a DOI was resolved and, if it was by someone using a web browser, which website they were on when they clicked the DOI. This is called a referral.

This information is interesting because it shows not only where DOI hyperlinks are found across the web, but also when they are actually followed. This data allows us a glimpse into scholarly citation beyond references in traditional literature.

Last year Crossref Labs announced Chronograph, an experimental system for browsing some of this data. We’re working toward a new version, but in the meantime I’d like to share the results for 2015 and some of 2016. We have filtered out domains that belong to Crossref member publishers to highlight citations beyond traditional publications.

Top 10 DOI referrals from websites in 2015

This chart shows the top 10 referring non-primary-publisher domains of DOIs per month. Because the top 10 can be different month to month, the total number of domains mentioned can be more than 10. Subdomains are combined, which means that, for example, the wikipedia.org entry covers all Wikipedia languages. This chart covers all of 2015 and the first two months of 2016.

month-top-10-filtered-domains

The top 10 referring domains for the period:

  1. webofknowledge.com
  2. baidu.com
  3. serialssolutions.com
  4. scopus.com
  5. exlibrisgroup.com
  6. wikipedia.org
  7. google.com
  8. uni-trier.de
  9. ebsco.com
  10. google.co.uk

It’s not surprising to see some of these domains here: for example serialssolutions.com and exlibrisgroup.com are effectively proxies for link resolvers, Baidu and Google are incredibly popular search engines which would show up anywhere. But it is exciting to see Wikipedia ranked amongst these. For more detail look out for the new Chronograph.

HTTP vs HTTPS in 2015

We’ve also seen a steady increase in HTTPS referral traffic, i.e. people clicking on DOIs from sites that are using HTTPS. While it is still dwarfed by HTTP, there was a steady uptick throughout 2015.

This chart shows HTTP vs HTTPS referrals per day, which shows up the weekly spikes. It doesn’t include resolutions where we don’t know the referrer.

HTTP vs HTTPS DOI Referrals

Increasing numbers of people are moving to HTTPS for reasons of security, privacy and protection from tampering. Google has announced plans to take HTTPS into account when ranking search results. Wikipedia has moved exclusively to HTTPS, and I’ll be telling the story of how Crossref and Wikipedia collaborated in an upcoming blog post.

Chronograph

Another version of Chronograph will be available soon. It will contain full data for all non-primary-publisher referring domains. Stay tuned!

Clinical trial data and articles linked for the first time

It’s here. After years of hard work and with a huge cast of characters involved, I am delighted to announce that you will now be able to instantly link to all published articles related to an individual clinical trial through the CrossMark dialogue box. Linked Clinical Trials are here!

In practice, this means that anyone reading an article will be able to pull a list of both clinical trials relating to that article and all other articles related to those clinical trials – be it the protocol, statistical analysis plan, results articles or others – all at the click of a button.

Linked Clinical Trials interface

Now I’m sure you’ll agree that this sounds nifty. It’s definitely a ‘nice-to-have’. But why was it worth all the effort? Well, simply put: “to move a mountain, you begin by carrying away the small stones”.

Science communication in its current form is an anachronism, or at the very least somewhat redundant.

You may have read about the ‘crisis in reproducibility’. Good science, at its heart, should be testable, falsifiable and reproducible, but an historical over-emphasis on results has led to a huge number of problems that seriously undermine the integrity of the scientific literature.

Issues such as publication bias, selective reporting of outcome and analyses, hypothesising after the results are known (HARKing) and p-hacking are widespread, and can seriously distort the literature base (unless anyone seriously considers Nicholas Cage to be causally related to people drowning in swimming pools).

This is, of course, nothing new. Calls for prospective registration of clinical trials date back to the 1980s and it is now becoming increasingly commonplace, recognising that the quality of research lies in the questions it asks and the methods it uses, not the results observed.

Uptake of trial registration since 2000
Uptake of trial registration year-on-year since 2000

Building on this, a number of journals and funders – starting with BioMed Central’s Trials over 10 years ago – have also pushed for the prospective publication of a study’s protocol and, more recently, statistical analysis plan. The idea that null and non-confirmatory results have value and should be published has also gained increasing support.

Over the last ten years, there has been a general trend towards increasing transparency. So what is the problem? Well, to borrow an analogy from Jeremy Grimshaw, co-Editor-in-Chief of Trials – we’ve gone from Miró to Pollock.

Although a results paper may reference a published study protocol, there is nothing to link that report to subsequent published articles; and no link from the protocol itself to the results article.

A single clinical trial can result in multiple publications: the study protocol and traditional results paper or papers, as well as commentaries, secondary analyses and, eventually, systematic reviews, among others, many published in different journals, years apart. This situation is further complicated by an ever-growing body of literature.

Researchers need access to all of these articles if they are to reliably evaluate bias or selective reporting in a piece of research, but – as any systematic reviewer can tell you – actually finding them all is like looking for a needle in a haystack. When you don’t know how many needles there are. With the haystack still growing.

That’s where we come in. The advent of trial registration means that there is a unique identifier associated with every clinical trial, at the study-level, rather than the article level. Building on this, the Linked Clinical Trials project set out to connect all articles relating to an individual trial together using its trial registration number (TRN).

By adapting the existing CrossMark standard, we have captured additional metadata about an article, namely the TRN and the trial registry, with this information then associated with the article’s DOI on publication. This means that you will be able to pull all articles related to an individual clinical trial from the CrossMark dialogue box on any relevant article. Continue reading “Clinical trial data and articles linked for the first time”

Members will soon be able to assign Crossref DOIs to preprints

TL;DR

By August 2016, Crossref will enable its members to assign Crossref DOIs to preprints. Preprint DOIs will be assigned by the Crossref member responsible for the preprint and that DOI will be different from the DOI assigned by the publisher to the accepted manuscript and version of record. Crossref’s display guidelines, tools and APIs will be modified in order to enable researchers to easily identify and link to the best available version of a document (BAV). We are doing this in order to support the changing publishing models of our members and in order to clarify the scholarly citation record.

Background

Why is this news? Well, to understand that you need to know a little Crossref history.

(cue music and fade to sepia) 


ukelele memoryWhen Crossref was founded, one of its major goals was to clarify the scholarly record by uniquely identifying formally published scholarly content on the web so that it could be cited precisely. At the time, our members had two primary concerns:

  • That a Crossref DOI should point to one intellectually discrete scholarly document. That is, they did not want one Crossref DOI to be assigned to two documents that appeared largely similar, but which might vary in intellectually significant ways.
  • That two DOIs should not point to the same intellectually discrete document. They wanted it to be easy for all to tell when the same discrete intellectual content was cited.

As such, when Crossref was founded, we developed a complex set of rules that were colloquially known by our members as Crossref’s rules “prohibiting the assignment of DOIs to duplicative content.”

(cue music, show wavy lines, return to color)

Well… as we gained experience in assigning DOIs, many of these rules have been amended or discarded when it became apparent that they didn’t actually support common scholarly citation practice and/or otherwise muddied the scholarly citation record.

For example, sometimes a document will be re-published in a special issue or an anthology. Before the advent of the DOI, it was common citation practice to always cite a document in the context in which it was read. The context of the document could, after all, affect the interpretation or crediting of the work. But it would be impossible to support this common citation practice if we were to assign the same Crossref DOI to the article on both its original context and in its re-published form. Our current recommendation in these situations is to assign separate DOIs to content that is republished in another context.

Another example occurs when a particular copy of a two identical documents has been annotated. For example, though the Handbook to The birds of Australia By John Gould has its own Crossref DOI (http://doi.org/10.5962/bhl.title.8367), another copy of the same book has been hand-annotated by Charles Darwin and also has its own, different Crossref DOI (http://doi.org/10.5962/bhl.title.50403). Historians of science quite reasonably may want to refer and cite the particular annotated copy of this historic document.

[So much for not assigning two separate Crossref DOIs to identical documents.]

Finally, we should note a far more common example practice in our industry. Our members often make content available online with a Crossref DOI before they consider it to be formally published. This practice goes by a number of names including “publish ahead of print,” “article in progress,” “article in press,” “online ahead of print,” “online first”, etc.

But in each case, the process is the same- the publisher is assigning a Crossref DOI to the document soon after it has been accepted for publication and this same Crossref DOI is carried over to the finally published article. Again, this practice just reflects that the “intellectual” content of the accepted manuscript should not change between the point of acceptance and the point of publication, so of the purposes of “citation” they are largely interchangeable.

[So much for not assigning one Crossref DOI to two versions of the same document.]

Now, in the above cases it also helps to clarify the scholarly record to also specify that the respective Crossref DOIs of the original and the “duplicative” work are related, and we encourage our members to make these connections explicit when they can. Nonetheless, it is paramount in both cases to allow the “duplicative works” to be cited precisely and independently.

Which brings us back to preprints.

The case for preprints

First we should define what was meant by preprints because even this commonly used term sometimes means different things to different communities. We have historically considered preprints to be any version of a manuscript that is intended for publication but that has not yet been submitted to a publisher for formal review. Note that this definition does not include “accepted manuscripts” which -as we noted above- often already have Crossref DOIs assigned to them soon after acceptance.

Crossref members originally worried that, by assigning DOIs to preprints, we would end up muddying the scholarly record. They worried that the very presence of a Crossref DOI would be interpreted to mean that the content to which it had been applied had gone through a formal publishing process. And unlike the case with “accepted manuscripts”, the difference between intellectual content of a preprint and the final published version can sometimes be substantial. At the time, it seemed that the scholarly record would be clarified by prohibiting the assignment of DOIs to preprints.

But again, changes in the scholarly communication landscape have led us to -as the youngsters say- pivot.

A Koan

When is a preprint a preprint?

contemplative handCrossref has always been catholic in its definition of “publisher.” Many of our members do not consider “publishing” to be their primary mission. The OECD and World Bank are two obvious cases here. But our membership also includes government departments, universities and archives. In these latter cases they have traditionally assigned Crossref DOIs to things like internal reports, grey literature, working papers, etc. This activity was clearly within the original rules set out by Crossref. And this is where our koan comes into play- “when is a preprint a preprint?”

It is often difficult to predict when something might eventually be formally published. How do you a priori know that working paper will never be submitted for publication? After all, everything could potentially be submitted for publication (Sometimes it seems everything is.)

This is the dilemma that was faced by a few of our members. For example, Cold Spring Harbor Laboratory, which runs bioRxiv has been a Crossref member since 2000 and has assigned over 35,000 Crossref DOIs. They have been assiduous in trying to stick to Crossref’s rules about preprints. Furthermore, they have taken equal care to ensure that preprints in bioRxiv are labeled as such and linked to the final publication (via a Crossref journal DOI) when it is available. This takes a lot of work.

But often bioRxiv simply has no way of telling when the authors of a working paper or report might suddenly decide to submit their work for publication. So they have found themselves occasionally and inadvertently violating Crossref’s rules on preprints because they had no way of predicting when something would magically transform from being an innocuous working paper into a fraught preprint.

It is a testament to bioRxiv that they have persevered. We have other members who face the same problem. They have not given up. They have not gone elsewhere for their DOIs.

Which brings us to our next point.

Not All DOIs

Have you noticed how often we use the phrase “Crossref DOIs?” Were you wondering if this was an annoying affectation or an example of a marketing department gone mad? It’s neither. It is an essential distinction that we make because Crossref is just one of several DOI registration agencies. Although all DOIs are “compatible” in the minimal sense that you can “resolve” them to a location on the web, that does not mean that all DOIs work identically. Different DOI registration agencies have different constituencies, different services, different governance models and different rules covering what their members can assign their respective DOIs to.

This was not the case when Crossref was founded and our rules were first drafted. At the time, Crossref was the only registration agency and, as such, the rule which prohibited the assignment of Crossref DOIs to preprints kinda worked. But it was unworkable in the longer term.

Quite naturally, new DOI registration agencies have been established for different communities with different primary use-cases. While Crossref could have a rule prohibiting the assignment of Crossref DOIs to preprints, there was nothing stopping another registration agency from allowing (indeed, encouraging) its members to assign DOIs to preprints.

So the simple fact is that DOIs could be assigned to preprints regardless of Crossref’s old rules. By continuing to prohibit the practice at Crossref we were just making life for some of our existing members more difficult.

And it has become clear that the situation would only get worse as more of our members started to roll-out new publishing and business models.

Business model neutral 

Crossref has always been business model neutral. We need to adapt and change to support our members’ business models, not the other way around.

A number of our members are starting to adopt publishing workflows that are more fluid and public than established publishing models. These new workflows make much of the submission and review process open, which, in turn often blurs the historically hard distinctions between a draft manuscript, a preprint, a revised proof, an accepted manuscript, the “final” published version, and subsequent corrections and updates. Where as in classic publishing models a document went through a series of discrete state-changes (some in public, many in private) new publishing workflows treat document versions as a continuum, most of which are made available publicly and which consequently may be used cited at almost any point in the publishing process.

In short, Crossref’s members increasingly need the flexibility to assign DOIs at different points in the publishing lifecycle. Rather than enforce rules that enshrined an existing publishing or business model, we need to work with our members to establish and adopt new DOI assignment practices which support evolving publishing models whilst maintaining a clear citation record and which lets researchers easily identify the best available version (BAV) of a document or research object.

flinty-exteriorSo you see, not all of our motivations for this change in policy are opportunistic or prosaic. Underneath our gruff and flinty exterior is a soft, idealistic center. There are principles at work here as well.

What next

So this isn’t just matter of changing our rules and display guidelines. We also have to make some schema changes, and adjust our services and APIs to clearly distinguish between preprints and accepted manuscripts/versions of record. Additionally, we will be building tools to make it much easier for our members to link preprints to the final published article (and vice versa). Finally, we need to update our documentation to help our members take advantage of the new functionality. We expect that everything will be in place by the end of August, 2016, at which point you will see another announcement from us.

Crossref Brand update: new names, logos, guidelines, + video

It can be a pain when companies rebrand as it usually requires some coordinated updating of wording and logos on websites, handouts, and slides. Nevermind changing habits and remembering to use the new names verbally in presentations.

Why bother?

As our infrastructure and services expanded, we sometimes branded services with no reference to Crossref. As explained in our The Logo Has Landed post last November, this has led to confusion, and it was not scalable nor sustainable. 

With a cohesive approach to naming and branding, the benefits of changing to (some) new names and logos should help everyone. Our aim is to stem confusion and be in a much better position to provide clear messages and useful resources so that people don’t have to try hard to understand what Crossref enables them to do. 

So while it may be a bit of a pain short-term, it will be worth it!

What are the new names?

As a handy reference, here is a slide-shaped image giving an overview of our services with their new names:

Overview of brand name changes
Overview of brand name changes, April 2016

It’s a lowercase ‘r’ in Crossref 

That’s right, you’ve spent fifteen years learning to capitalize the second R in CrossRef, and now we’re asking you to lowercase it! Please say hello to and start to embrace the more natural and contemporary Crossref.

Reference logos from our new CDN via assets.crossref.org

I’m hoping we can count on our community to update logos and names on your end, keeping consistent with new brand guidelines. And I hope we can make it as easy as possible to do: 

  1. This Content Delivery Network (CDN) at assets.crossref.org allows you to reference logos using a snippet of code. Please do not copy/download the logos.
  2. This set of brand guidelines for members (pdf) should help give background, and we’ll add to it as we create more templates and other resources.

We also have a new website in development which will put support and resources front and center of the user experience. More on that in the next month or two.

By using the snippets of code provided via our new CDN at assets.crossref.org, these kind of manual updates should never be a problem in the future if the logo changes again (no plans anytime soon!).

Of course, we don’t expect people to update new logos and names immediately, there is always a period of transition. Please let us know if we can help you to update your sites and materials in the coming weeks.

Also, check out the launch video, which presents five key Crossref brand messages:

Crossref Event Data: early preview now available

Crossref Event Data logo

Test out the early preview of Event Data while we continue to develop it. Share your thoughts. And be warned: we may break a few eggs from time to time!

Egg
Chicken by anbileru adaleru from the The Noun Project

Want to discover which research works are being shared, liked and commented on? What about the number of times a scholarly item is referenced? Starting today, you can whet your appetite with an early preview of the forthcoming Crossref Event Data service. We invite you to start exploring the activity of DOIs as they permeate and interact with the world after publication.

Continue reading “Crossref Event Data: early preview now available”

What is there 80 million of?

As of this week, there are 80,000,000 scholarly items registered with Crossref!

By the way, we update these interesting Crossref stats regularly and you can search the metadata.

The 80 millionth scholarly item is [drumroll…] Management Approaches in Beihagi History from the journal Oman Chapter of Arabian Journal of Business and Management Review, published by Al Manhal in the United Arab Emirates. Continue reading “What is there 80 million of?”

Dr Norman Paskin

Dr Norman Paskin
Dr Norman Paskin

It was with great sadness and shock that I learned that Dr Norman Paskin had passed away unexpectedly on the 27th March. This is a big loss to the DOI, Crossref and digital information communities. Norman was the driving force behind the DOI System and was a key supporter and ally of Crossref from the start. Norman founded the International DOI Foundation in 1998 and ran it successfully until the end of 2015 when he moved to a strategic role as an Independent Board Member. Continue reading “Dr Norman Paskin”

The Wikipedia Library: A Partnership of Wikipedia and Publishers to Enhance Research and Discovery

Back in 2014, Geoffrey Bilder blogged about the kick-off of an initiative between Crossref and Wikimedia to better integrate scholarly literature into the world’s largest knowledge space, Wikipedia. Since then, Crossref has been working to coordinate activities with Wikimedia: Joe Wass has worked with them to create a live stream of content being cited in Wikipedia; and we’re including Wikipedia in Event Data, a new service to launch later this year. In that time, we’ve also seen Wikipedia importance grow in terms of the volume of DOI referrals.

Alex Stinson, Project Manager for the Wikipedia Library, and our guest blogger! This file is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported license (Source: Myleen Hollero Photography)
Alex Stinson, Project Manager for the Wikipedia Library, and our guest blogger! This file is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported license (Source: Myleen Hollero Photography)

Alex Stinson, Project Manager for the Wikipedia Library, and guest blogger! This file is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported license (Source: Myleen Hollero Photography)

How can we keep this momentum going and continue to improve the way we link Wikipedia articles with the formal literature? We invited Alex Stinson, a project manager at The Wikipedia Library (and one of our first guest bloggers) to explain more: Continue reading “The Wikipedia Library: A Partnership of Wikipedia and Publishers to Enhance Research and Discovery”