Events got the better of us

Publisher metadata is one side of the story surrounding research outputs, but conversations, connections and activities that build further around scholarly research, takes place all over the web. We built Event Data to capture, record and make available these ‘Events’ –– providing open, transparent, and traceable information about the provenance and context of every Event. Events are comments, links, shares, bookmarks, references, etc.

What’s your (citations’) style?

Bibliographic references in scientific papers are the end result of a process typically composed of: finding the right document to cite, obtaining its metadata, and formatting the metadata using a specific citation style. This end result, however, does not preserve the information about the citation style used to generate it. Can the citation style be somehow guessed from the reference string only? TL;DR I built an automatic citation style classifier.

What if I told you that bibliographic references can be structured?

Last year I spent several weeks studying how to automatically match unstructured references to DOIs (you can read about these experiments in my previous blog posts). But what about references that are not in the form of an unstructured string, but rather a structured collection of metadata fields? Are we matching them, and how? Let’s find out.

Underreporting of matched references in Crossref metadata

Geoffrey Bilder

Geoffrey Bilder – 2019 February 05

In APIsCitationMetadata


About 11% of available references in records in our OAI-PMH & REST API don’t have DOIs when they should. We have deployed a fix, but it is running on billions of records, and so we don’t expect it to be complete until mid-April.

Note that the Cited-by API that our members use appears to be unaffected by this problem.

Reference matching: for real this time

In my previous blog post, Matchmaker, matchmaker, make me a match, I compared four approaches for reference matching. The comparison was done using a dataset composed of automatically-generated reference strings. Now it’s time for the matching algorithms to face the real enemy: the unstructured reference strings deposited with Crossref by some members. Are the matching algorithms ready for this challenge? Which algorithm will prove worthy of becoming the guardian of the mighty citation network? Buckle up and enjoy our second matching battle!

Data Citation: what and how for publishers

We’ve mentioned why data citation is important to the research community. Now it’s time to roll up our sleeves and get into the ‘how’. This part is important, as citing data in a standard way helps those citations be recognised, tracked, and used in a host of different services.

Matchmaker, matchmaker, make me a match

Matching (or resolving) bibliographic references to target records in the collection is a crucial algorithm in the Crossref ecosystem. Automatic reference matching lets us discover citation relations in large document collections, calculate citation counts, H-indexes, impact factors, etc. At Crossref, we currently use a matching approach based on reference string parsing. Some time ago we realized there is a much simpler approach. And now it is finally battle time: which of the two approaches is better?

What does the sample say?

At Crossref Labs, we often come across interesting research questions and try to answer them by analyzing our data. Depending on the nature of the experiment, processing over 100M records might be time-consuming or even impossible. In those dark moments we turn to sampling and statistical tools. But what can we infer from only a sample of the data?

Why Data Citation matters to publishers and data repositories

A couple of weeks ago we shared with you that data citation is here, and that you can start doing data citation today. But why would you want to? There are always so many priorities, why should this be at the top of the list?

Data citation: let’s do this

Data citation is seen as one of the most important ways to establish data as a first-class scientific output. At Crossref and DataCite we are seeing growth in journal articles and other content types citing data, and datasets making the link the other way. Our organizations are committed to working together to help realize the data citation community’s ambition, so we’re embarking on a dedicated effort to get things moving.
RSS Feed