4 minute read.
Event Data: A Plan of Action
Event Data uncovers links between Crossref-registered DOIs and diverse places where they are mentioned across the internet. Whereas a citation links one research article to another, events are a way to create links to locations such as news articles, data sets, Wikipedia entries, and social media mentions. We’ve collected events for several years and make them openly available via an API for anyone to access, as well as creating open logs of how we found each event. Some organisations are already using Event Data and we are keen for more to come on board.
Last year we gave an update on Event Data with apologies for being so quiet and a promise of more information at a later date. It’s been some time, so here goes…
I joined Crossref in the middle of last year as a Product Manager and was tasked with looking into Event Data. The first thing I found was a large amount of enthusiasm for Event Data, both within Crossref and further afield. The idea of gathering information beyond the metadata deposited by our members is popular, and creates valuable connections between DOIs and a range of other sources. Interest spans the spectrum of academic research, publishing, bibliometrics, and beyond.
At the same time, I found a project with a very solid, well-built code base but unstable performance. After being put into production in 2018, we didn’t provide sufficient support. Coupled with staff changes and other competing priorities, Event Data hasn’t had the opportunity to live up to early expectations.
To address these issues, we have embarked on a plan to make the server infrastructure more robust, improve monitoring, and make sure that the future of Event Data makes the best use of the resources we have without over-stretching. It means working with the community to determine the most essential aspects of Event Data, and providing support where it’s needed.
The steps below are not necessarily sequential and some depend on the completion of work in other parts of Crossref, but they outline the priorities we have for Event Data in 2021.
Since we put in place our original Event Data infrastructure, the amount of incoming data has grown, and at an ever-increasing rate. In 2017 we were creating 2 million new events per month, that number is now over 20 million. We have known for some time that we need to refresh the infrastructure, but didn’t have the resources to move forward: now we do.
In the first part of the plan we will renew the server infrastructure that underpins Event Data. Maybe not a headline-grabbing move, but the aim is to reduce downtime and pull in missing data. Through improving our monitoring and shortening the response time when things go wrong, we will be able to ensure that events are added on a regular basis and the API can reliably handle requests.
We’ve made the first steps in this direction by upgrading our API infrastructure and making some other tweaks to improve performance. There is still work to do, but we’ve already seen a significant improvement in performance with nearly >99.99% uptime in December.
The second component of the plan is to review performance and data quality. We will evaluate the event sources, update artefacts (such as the lists of publisher landing pages and news websites, and review performance reporting. This will help us to have a better understanding of Event Data in its current form: if the stability component is about improving what comes in and goes and out, this part will give us increased confidence in what Event Data already contains.
While the two steps above are being carried out, we will revisit the applications of Event Data and talk to organizations that currently use it or have expressed an interest. These conversations will feed into future development in which we will evaluate new sources and other ways to optimize the service.
Central to the roadmap will be continued support of the data citation endpoint in Scholix format, which we run in close collaboration with DataCite. Additionally, we will add new data from relationships between Crossref works, for example a preprint is matched to a journal article, or where there are corrections, retractions, or translations of works.
We expect to continue supporting the current sources of events and where there are organizations with either a strong interest in a particular source or a database of events that they can send directly, we are keen to build collaborations. Event Data, like everything that Crossref does, is a community-based effort.
Staying in touch
To join the conversation about Event Data and keep informed, head over to our Community pages. You can also check out our Gitlab pages. At the end of last year we updated the Education pages where you can learn more about Event Data.