Crossref holds metadata for approximately 150 million scholarly artifacts. These range from peer reviewed journal articles through to scholarly books through to scientific blog posts. In fact, amid such heterogeneity, the only singular factor that unites such items is that they have been assigned a document object identifier (DOI); a unique identification string that can be used to resolve to a resource pertaining to said metadata (often, but not always, a copy of the work identified by the metadata).
We’re equally sad and proud to report that Rachael Lammey is moving on in her career to the very lucky team at 67Bricks. Her last day at Crossref is today, Friday 16th February. Which is too soon for us, but very exciting for her!
It’s hard to overstate Rachael’s impact on Crossref’s growth and success in her 12 years here. She started as a Product Manager where she developed that role into a broad and central function, and soon moved into the newly-formed community team as International Outreach Manager where she grew important programs such as Sponsors, Ambassadors, a series of ‘LIVE’ events around the world, and she went on to manage her own team and establish some of the most important strategic relationships that Crossref now feels fortunate to have.
Great news to share: our Executive Director, Ed Pentz, has been selected as the 2024 recipient of the Miles Conrad Award from the USA’s National Information Standards Organization (NISO). The award is testament to an individual’s lifetime contribution to the information community, and we couldn’t be more delighted that Ed was voted to be this year’s well-deserved recipient.
During the NISO Plus conference this week in Baltimore, USA, Ed accepted his award and delivered the 2024 Miles Conrad lecture, reflecting on how far open scholarly infrastructure has come, and the part he has played in this at Crossref and through numerous other collaborative initiatives.
Metadata about research objects and the relationships between them form the basis of the scholarly record: rich metadata has the potential to provide a richer context for scholarly output, and in particular, can provide trust signals to indicate integrity. Information on who authored a research work, who funded it, which other research works it cites, and whether it was updated, can act as signals of trustworthiness. Crossref provides foundational infrastructure to connect and preserve these records, but the creation of these records is an ongoing and complex community effort.
Have you attended any of our annual meeting sessions this year? Ah, yes – there were many in this conference-style event. I, as many of my colleagues, attended them all because it is so great to connect with our global community, and hear your thoughts on the developments at Crossref, and the stories you share.
Let me offer some highlights from the event and a reflection on some emergent themes of the day. You can browse the recordings and slides archived on our Annual Meeting page.
Ginny Hendricks opened the meeting by reminding everyone about the research nexus vision, and the work that’s underway to bring us closer to it. Ginny went on to highlight progress in metadata and relationships being registered by our members, and mentioned members that have particularly rich metadata records – with the special joint recognition for learned societies of South Korea. Participation statistics can be reviewed in our Labs Member Metadata Metrics Tables.
Since 2018 we’ve seen a 512% increase in the number of abstracts included in the metadata; with Wiley’s recent addition of millions of abstracts to their records largely contributing to this change. On the relationships side, in the same period, we’ve noted a staggering 3004% growth in preprint-to-article links, and we’re pleased to report a growing number of funding relationships being made available thanks to more and more funders registering Crossref DOIs for grants.
For those who couldn’t join us at such an early hour, Ed Penz included some of these highlights in his own strategic update later in the day. However, he focused on our activity and plans towards fulfilling our four strategic goals:
To contribute to an environment where the community identifies and co-creates solutions for broad benefit
To be a sustainable source of complete, open, and global scholarly metadata and relationships
To be publicly accountable to the Principles of Open Scholarly Infrastructure (POSI) practices of sustainability, insurance, and governance
To foster a strong team—because reliable infrastructure needs committed people who contribute to and realise the vision, and thrive doing it
Making data citations available at scale: The Global Open Data Citation Corpus by Iratxe Puebla;
“Who Cares?” Defining Citation Style in Scholarly Journals by Vincas Grigas and Pavla Vizváry;
DOI registration for scholarly blogs by Martin Fenner;
Enhancing Research Connections through Metadata: A Case Study with AGU and CHORUS by Tara Packer, Kristina Vrouwenvelder, Shelley Stall;
Index Crossref, Integrity, Professional And Institutional Development by Engjellushe Zenelaj;
Brazilian retractions in the Retraction Watch Database - RWDB by Edilson Damasio; and
Now that you’ve published, what do you do with Metadata? - by Joann Fogleson.
In addition to these updates, we’ve heard from:
Izabela Szyprowska (OP, European Commission), Nikolaos Mitrakis (RTD, European Commission), and Paola Mazzucchi (mEDRA) talked about the process and rationale of implementing Crossref DOIs for grants at the European Commission; and
Amanda French from ROR/Crossref about the new ‘ROR / Open Funder Registry overlap’ tool.
We also assembled a diverse panel and invited the community to discuss “What we still need to build a robust Research Nexus?” The discussion ranged from how different parts of our community currently use existing metadata, to how we can come together to make improvements, especially in the area of standards and equitability, and touched on metadata priorities. I’ll highlight some of the threads below, but it’s certainly worth engaging with the full recording of the discussion, and offering your own perspective on the Community Forum, commenting below.
Having participated in the whole day of talks, I found that a few themes emerged as popular in the community: data citations, making it easier to register metadata, making better use of metadata, retractions, and equity of participation in the research nexus.
With the advances in the Crossref API relationships endpoint, Martyn Rittman demonstrated how we’re now providing more comprehensive support for data citations. You can follow his demonstration in the Collab Notebook he used for the demo and shared for your perusal. He also mentioned that the developments in this feature of our API will soon replace the current service provided via the Events API. Feel free to connect with Martin on the community forum and comment with questions and suggestions.
As mentioned above, DataCite’s Iratxe Puebla mentioned the Make Data Count initiative and the leaky pipeline of data citations we’ve got at the moment in the scholarly literature, obscuring the true picture of data reuse. This prevents the community from recognising and incentivising data creation and reuse appropriately. One way of addressing this is the Global Open Data Citation Corpus. Crossref and DataCite collaborate closely in connecting and making that data available.
Linking datasets, as well as software, was reported as part of the AGU and CHORUS initiative in Enhancing Research Connections through Metadata.
Data sharing and citing is as much a culture as a technology problem. As Iratxe Puebla admitted, there are many norms and processes for capturing and sharing that information,and DataCite is interested to hear about different use cases. As highlighting data’s relationship with works is a growing interest for our community, hopefully more understanding and perhaps even commonality can be built soon.
Making it easier to register metadata
As part of the Demonstrations session, we’ve seen two developments to support members with registering their metadata more easily.
Crossref’s Lena Stoll shared plans for the new version of the Crossref Registration Form, the helper tool for manual registration of metadata, which translates the submission into XML, for inclusion in the Crossref database. At the moment, the form only accepts grant registrations, but it will be bolstered before the end of the year to include journal articles then other record types in time.
Erik Hanson from PKP demonstrated the latest OJS version, commenting on specific changes made in the new version in response to the key pain points reported by users of the previous release.
In addition, we’ve heard of two independent projects by Martin Fenner and Esha Data to enable metadata registration and Crossref DOIs for scholarly blogs.
Making better use of metadata
Supported by the beginner’s demo of our REST API by Luis Montilla, there were many voices about opportunities for making good use of Crossref’s open metadata.
Nikolaos Mitrakis of the European Commission talked about the implementation of Crossref IDs for grants as a step towards tracing and connecting the grants with not just academic but also societal outcomes of the awards, and the plans for using those in the evaluation and steering of their funding programmes.
Joann Fogleson of the American Society of Civil Engineers gave a buzzy metaphor of publishers’ role in their work with metadata being comparable with that of a pollinator – collecting the metadata at one end, then registering, displaying and making it available to different services, in order to enable a reacher scholarly environment for discovery.
Many of the major themes have found their way to the discussion of what is still needed to build a robust network of connections between scholarly objects, institutions and individuals. One of the ways Ludo Waltman of CWTS, Leiden University, intends to use our open metadata is as part of the upcoming open-source version of the Laiden rankings and he invited the community to contribute and help optimise this project to provide an alternative to closed and selective databases.
Panellists also spoke of new opportunities in the light of data mining and machine learning. Ran Dang, Atlantis Press, as a publisher shared a concern about the standard of metadata across cultures and disciplines, and the need to digitise past publications – which can then help better leverage multi-lingual scholarship. Matt Buys of DataCite, pointed out to the Global Data Citation Corpus they are developing, which leverages a SciBERT model to pull out data citations, which is brought together with Crossref/DataCite citation metadata.
Opening the data is essential to enabling its wider use, and here Ludo gave the example of the fantastic outcome for references metadata, which has been made open by default for the entire corpus of Crossref-registred works. He hopes that this can inspire us to make similar progress in other areas.
A little on a tangent with regards to metadata use, yet speaking of excellent examples of the community making progress together, Ginny pointed out ROR, how this is becoming a new standard for solving a longstanding problem of standardising affiliations metadata.
Perhaps not entirely surprising, given the recent acquisition of the Retraction Watch database by Crossref and making the data openly available, retractions featured in a few different talks at the meeting. First, Lena Stoll and Martin Eve from Crossref, shared how that data can be accessed – that is as the csv file from https://api.labs.crossref.org/data/retractionwatch?[your-email@here](add your email as indicated), and the Crossref Labs API also displays information about retractions in the /works/ route when metadata is available. There are plans for incorporating this information with our REST API in the future.
Ed and Ginny have shown stats for increases in retraction metadata registered in Crossmark but commented on limited participation in Crossmark overall. Recording retraction information in this way is still important, alongside the Retraction Watch data, this allows for multiple assertions of that information, and increases confidence in its accuracy. We’re preparing to consult with the community at large about the future direction of the Crossmark service, to make it easier to implement and more useful for the readers.
Finally, Edilson Damasio from State University of Maringá-UEM, Brazil, and a long-time Crossref Ambassador, presented the analysis of Brazilian records in the Retraction Watch data, and he promises further analysis to come, comparing the situation across geographies.
Equity of participation in the research nexus
Amanda Bartell opened the research nexus discussion with a reminder of what that vision entails and pointing out commonality of goals in the community – “Like others, Crossref has a vision of a rich and reusable open network of relationships connecting research organisations, people, things, and actions; a scholarly record that the global community can build on forever, for the benefit of society. We call this interconnected network the Research Nexus, but others in the community have different names for it, such as knowledge graph or PID graph.”
The richness of this network depends upon the participation of all those who produce and publish scholarship, so naturally the topic of equality emerged in that discussion. In addition to Ran Dang’s concern for multilingualism and digitisation of past publications from all parts of the world, Mercury Shitindo of St Paul’s University, Kenya talked of the need for more education, training and accessible resources for her community, to be able to participate more effectively in this ecosystem. She can see that affiliations and citations are of priority there, as these enable transparency and facilitate collaborations. Matt Buys of DataCite echoed her point, talking about the importance of the role of contributors “It’s important not to lose sight of people and places – to recognise the importance of contributor roles in the PID-graph”.
Earlier in the day, we mentioned the launch of our Global Equitable Membership, or GEM programme. Since January, 110 new organisations from eligible countries have joined Crossref fee-free. Ginny was quick to admit that the need for a fee-waiver programme like this stems from the regular fees schedule not being in tune with our global membership, and she mentioned the upcoming fees review.
Financial barriers are often what get attention, yet reducing barriers to participation with technology is equally important for building a robust research nexus. With the planned changes to our registration form, we’ll make it easier to register works for those who don’t regularly use XML.
Johanssen Obanda took time to show the examples of community activity and events organised by our global network of Ambassadors, and to thank all our advocates and partners for their tireless work. They are also helping tackle barriers, supporting our members to actively participate in the research nexus with their metadata, and help enable the community to make good use of the network of relationships that data denotes.
Showcasing our “One member one vote” truth, the Board election was the focal point of the annual meeting, as always. We closed the ballot and announced the results, with seven members selected to join the Board in 2024.
The event went very smoothly overall. Talks were delivered efficiently, the panellists shared diverse perspectives and we elected our new Board members. Huge thanks to Rosa Clark, our Communications and Events Manager, who orchestrated the event and has been a constant behind-the-scenes presence supervising the entire show. I’m grateful to all colleagues at Crossref, who helped make it an enjoyable experience and an informative event for our community. Finally – it wouldn’t be a real meeting without the active participation of the speakers and panellists, who shared their metadata stories, and even joined us for some relaxed unplugged chats.