In the scholarly communications environment, the evolution of a journal article can be traced by the relationships it has with its preprints. Those preprint–journal article relationships are an important component of the research nexus. Some of those relationships are provided by Crossref members (including publishers, universities, research groups, funders, etc.) when they deposit metadata with Crossref, but we know that a significant number of them are missing. To fill this gap, we developed a new automated strategy for discovering relationships between preprints and journal articles and applied it to all the preprints in the Crossref database. We made the resulting dataset, containing both publisher-asserted and automatically discovered relationships, publicly available for anyone to analyse.
The second half of 2023 brought with itself a couple of big life changes for me: not only did I move to the Netherlands from India, I also started a new and exciting job at Crossref as the newest Community Engagement Manager. In this role, I am a part of the Community Engagement and Communications team, and my key responsibility is to engage with the global community of scholarly editors, publishers, and editorial organisations to develop sustained programs that help editors to leverage rich metadata.
STM, DataCite, and Crossref are pleased to announce an updated joint statement on research data.
In 2012, DataCite and STM drafted an initial joint statement on the linkability and citability of research data. With nearly 10 million data citations tracked, thousands of repositories adopting data citation best practices, thousands of journals adopting data policies, data availability statements and establishing persistent links between articles and datasets, and the introduction of data policies by an increasing number of funders, there has been significant progress since.
Have you attended any of our annual meeting sessions this year? Ah, yes – there were many in this conference-style event. I, as many of my colleagues, attended them all because it is so great to connect with our global community, and hear your thoughts on the developments at Crossref, and the stories you share.
Let me offer some highlights from the event and a reflection on some emergent themes of the day.
We’ve been spending some time speaking to the community about our role in research integrity, and particularly the integrity of the scholarly record. In this blog, we’ll be sharing what we’ve discovered, and what we’ve been up to in this area.
We’ve discussed in our previous posts in the “Integrity of the Scholarly Record (ISR)” series that the infrastructure Crossref builds and operates (together with our partners and integrators) captures and preserves the scholarly record, making it openly available for humans and machines through metadata and relationships about all research activity. This Research Nexus makes it easier and faster for everyone involved in research performance, management, and communications to understand information in context and make decisions about the trustworthiness of organizations and their published research outputs. Conversely, it can make it harder for parties to pass off information as trustworthy when the information doesn’t include that context.
The community needs open scholarly infrastructure that can adapt to the changes in scholarly research and communications, and we’ve been changing and adapting already by building on the concept of the scholarly record with our vision:
Like others, we envision a rich and reusable open network of relationships connecting research organizations, people, things, and actions; a scholarly record that the global community can build on forever, for the benefit of society.
We don’t assess the quality of the work that our members register, and we keep the barriers to membership deliberately low to ensure that we are capturing as much of the scholarly record as possible and encouraging best practice. We are careful to talk about Crossref’s specific role being with the Integrity of the Scholarly Record (ISR), and not the broader area of ‘research integrity’ (i.e. the integrity of the research process or content itself).
But there are many challenges and threats to research integrity and the integrity of the scholarly record, and there are tradeoffs with keeping the barriers to membership low. With that in mind, we have been dedicating more time to speaking with the community to explore what part we are and should in future play to help the community assess and improve trustworthiness in the scholarly record. We also want to work out where we can make use of our neutral, central role to convene different groups in scholarly communications to work together on these challenges.
A revealing afternoon in Frankfurt
Our starting point was a roundtable discussion in Frankfurt in October 2022. We organized it to coincide with the Frankfurt Book Fair, but the invited participants were from a wider spectrum than just publishers. The 40 invited participants represented editors, funders, research integrity professionals at publishers, representatives of ministries of science, and other partner organizations such as OASPA, COPE, STEM and DOAJ.
This half-day session enabled us to sense-check our thinking with the community and get input into whether our position is the best one for their needs.
Ed Pentz introduced the session by reminding participants that integrity is key to Crossref’s mission and is the basis of the shared Research Nexus vision. Amanda (that’s me) talked through our current membership processes, recent membership trends, and why wider participation is key and also the sort of questions the community comes to Crossref to solve (eg title ownership disputes). And finally, Ginny Hendricks talked through the specific services and metadata that Crossref has already developed to support the community as signals of trustworthiness, and introduced some new activities and ideas.
Participants then split into small groups representing a mix of communities, and we asked them to discuss three key questions:
Is Crossref’s role what you expected? What surprised you? What are we missing?
Are you aware of Crossref services? What are the barriers to more uptake? What are the challenges and opportunities?
What more could Crossref or its members do?
After discussion, each small group fed back to the room, and we followed up with a whole group discussion, before ending the day with a post-it note exercise for what Crossref should start doing, stop doing, and continue doing.
Here’s what we learned.
The importance of whole community involvement in research integrity and ISR
The need for all parts of the community to come together to solve the problems of research integrity came through loud and clear - there is no single group that can solve this problem on its own.
Publishers expressed frustration that responsibility for research integrity has been placed seemingly solely in their hands when institutions and funders can “unwittingly incentivise bad behaviour”. But it was clear that funders are just as concerned with research integrity issues, with many having made a dedicated trip for the roundtable. There were comments that bringing publishers and funders together around these issues was a rare but important opportunity, and there were calls for this to be an annual event. Both funders and publishers called for more involvement from and inclusion of research institutions in the discussion.
The group agreed that Crossref’s main focus should continue to be capturing and sharing the scholarly record, and that metadata and relationships are key for attribution, evidence, and provenance. One participant commented that “you can’t make open science work unless the metadata is complete” and that this would only happen with efforts throughout the community. Accurate and complete metadata needs to be:
pushed for by funders and institutions (through advocacy and policy)
provided by the authors and other contributors
collated, curated, and registered by the publishers and repositories
collected, matched, (sometimes cleansed), and distributed by Crossref.
(and we would add “prioritised by all who want to support open infrastructure over commercial alternatives”)
Interestingly, this echoes the ‘metadata personas’ output of the Metadata 20/20 initiative which defined roles in the community’s collective metadata effort:
Metadata Creators: providing descriptive information (metadata) about research and scholarly objects.
Metadata Curators: classifying, normalising, and standardising this descriptive information to increase its value as a resource.
Metadata Custodians: storing and maintaining this descriptive information and making it available for consumers.
Metadata Consumers: knowingly or unknowingly using the descriptive information to find, discover, connect, cite, and assess research objects.
Importance of whole-publisher involvement
A few participants, particularly those in editorial or integrity roles at publishing organisations, had not previously made the connection that metadata could be important signals of integrity. This highlighted a key problem - working with Crossref is seen by publishers as a technical/production workflow issue, and so knowledge of the benefits of metadata can be siloed within those teams. Crossref needs to reach out to editorial and research integrity teams to explain that good metadata isn’t just an end in itself and reinforce the impact it has on research integrity. This buy-in from across publisher organisations is vital.
We’re currently recruiting a Community Engagement Manager with editorial or research integrity experience to dedicate time to this area, to advocate for richer metadata within the editorial community, and progress this important conversation.
Agreement of the importance of metadata but an acknowledgment that this brings extra cost
Most participants agreed that rich metadata and relationships provide a core tool in establishing and protecting integrity. But they also acknowledged that collecting and registering more metadata often comes with an extra cost - whether that’s from system changes or just extra staff time. This is particularly true where publishers are working with third-party platforms and suppliers where there may be additional costs for adding fields and functionality to collect more metadata and register it with Crossref. Where knowledge of metadata is siloed in technical and production teams, and the wider benefits aren’t acknowledged, it can be hard to get internal buy-in for these extra costs and efforts.
The Frankfurt group also pointed out that the benefits of more comprehensive metadata (and what this means for ISR) are spread across the research ecosystem, but it is the publisher that usually bears the costs.
Need to define which metadata elements are trust signals and make it easier for the community to provide and access them
Through the course of the discussion, various elements were determined to be important to capture as “trust signals” and to identify relationships such as for retractions, conferences, reviewers, data, and when Crossref membership has been revoked for cause. We need to spend time identifying and prioritising these so that our members can do the same.
We need to make it easier for smaller, less technically-resourced members to provide this metadata, both through our tools and our documentation, as “doing this work can be very geeky and the documentation isn’t easy to understand as a layperson”.
There was also a discussion about where the metadata comes from - should community members be able to contribute metadata and assertions to other members’ records? If the provenance is captured then yes.
Once the metadata is captured, there remain challenges for users in where to start with the 145 million Crossref records. The groups asked Crossref to make it easier for community members to understand and use these records to make informed decisions, including by creating and sharing sample queries, libraries, and case studies.
The importance of retractions/corrections information
There was a lot of discussion about retractions and their importance as trust indicators. The group was surprised by how few retractions are currently registered with Crossref through Crossmark (12k). There was a lot of discussion around why Crossmark isn’t currently being adopted, and interest in taking this forward.
This needs to be a focus for Crossref, to encourage members to register retractions, corrections, and updates, and to make it easier for smaller publishers. There are new and emerging publishers who really want tools to help them demonstrate the legitimacy of their research, and an easy way for them to record corrections and retractions is key.
Collecting the information is just the start - cascading retraction information throughout the research ecosystem is the main goal, and Crossref plays a central role here. As noted in the Information Quality Lab’s project Reducing the inadvertent spread of retracted science: Shaping a research and implementation agenda, “Many retracted papers are not marked as retracted on publisher and aggregator sites, and retracted articles may still be found in readers’ PDF libraries, including in reference management systems such as Zotero, EndNote, and Mendeley”.
It’s particularly important that this information is fed back to funders and institutions, and the group discussed having push notifications to these audiences for retractions. Some funders even employ staff members whose main purpose is to identify retractions.
It was pointed out that there may be good sources of retraction information (such as Retraction Watch) that Crossref could incorporate and match in our metadata.
Gaps in ‘ownership’, and Crossref’s role
The group discussed the many gaps in ownership for elements of research integrity, and some groups wondered if Crossref should actually change our approach and take on more responsibility for vetting content. However, after discussion, the group mostly agreed that this would mean a change of mission (and more staff) for Crossref and potentially limit global participation, thus making the metadata corpus less useful. Crossref should provide the widest possible metadata in an easy-to-consume format, and “other organisations can provide the verification layer”.
It was acknowledged that it would be easy for Crossref to get overwhelmed, so we ended the day by discussing not only what we should start doing, but also what we should stop doing. Unsurprisingly, there was a lot more to continue or start doing than stop doing!
Many noted that Crossref is well-positioned to convene horizontal multi-stakeholder discussions to start to find solutions.
We also know that there are other industry initiatives aimed at supporting this work. The STM Association’s work on an Integrity Hub is gathering pace and aims to provide, among other things ‘a cloud-based environment for publishers to check submitted articles for research integrity issues’.
What happened next? Turns out, it really is all about relationships…
Since this meeting in Frankfurt last October, we’ve been focusing on relationships - thinking about how we capture them in our metadata, and working in partnership with other organisations to bolster our support for ISR.
The rest of this blog post highlights some of the activities underway:
Increasing participation in Crossref
In January 2023, we launched our new GEM Program, which offers relief from fees for members in the least economically-advantaged countries in the world. By opening up participation even further, we aim to extend the corpus of open metadata, giving opportunities for more connections, more context, and more relationships.
Supporting members in meeting best practices
ISR blog 2 explained more about how we help new members become “good Crossref citizens” with automated onboarding emails, extensive documentation, events and webinars, and help from our support team, Ambassadors, and other members in our Community Forum.
We’ve recently joined forces with COPE, DOAJ, and OASPA to create a new online public forum for organisations interested in adopting best practices in scholarly publishing. At the Publishers Learning And Community Exchange or The PLACE, new scholarly publishers can access information from multiple agencies in one place, ask questions of the experts, and join conversations with each other. Do take a look!
Being clearer on the impact of better metadata
As discussed earlier, better metadata can sometimes bring extra costs, and it’s helpful to understand the impact of this investment. We know from our ongoing outreach work that it’s difficult for our members to keep hearing that Crossref needs more and better metadata. They ask us for resources and increasingly want to see hard evidence of benefits to them. We recently showcased the journey of the American Society for Microbiology which went from ‘zero to hero’ in terms of metadata participation and completeness in Crossref. They describe their efforts to increase their registered metadata over the last few years, and note a significant increase in their average monthly successful DOI resolutions from ~390,000 in 2015 to an average of ~3.7 million in 2022. They found that “the more metadata we push out into the ecosystem, the more it appears to be used… Remembering that your publishing program benefits as much as everyone else’s when you deposit more metadata can help refine your short-term and long-term priorities.”
We know we sound like a broken record sometimes, but now other members can take it from ASM!
Encouraging better metadata and more relationships and identifying ’trust signals'
We’re trying to make it easier for members to accurately register key metadata fields, with the launch of our new grants registration form which will be extended to journals and other record types soon. This includes a ROR lookup - adding this unique identifier for research organisations gives even better context for the metadata.
“We want to be a sustainable source of complete, open, and global scholarly metadata and relationships. We are working towards this vision of a ‘Research Nexus’ by demonstrating the value of richer and connected open metadata, incentivising people to meet best practices, while making it easier to do so.”
… with item number one under projects ‘in focus’, being: “Adoption activities to focus on top metadata adoption priorities, which are:
We’re continuing to talk with the community to work out which metadata elements are most useful as trust signals, and we’re trying to prioritise some of the schema changes required to capture new elements. If you haven’t already, please respond to Patricia Feeney’s metadata priorities survey.
Thinking about retractions and corrections
We’ve been closely involved with the NISO CREC working group, and they should be making the initial draft recommendations public soon - watch this space!
Making it easier to view and compare metadata and expand the relationships
Our Participation Reports provide a visualisation of the metadata that’s available via our free REST API. There’s a separate Participation Report for each member, and it shows what percentage of that member’s content includes nine key metadata elements. It’s an important tool to help those in the community understand our metadata more easily.
We have been working on a new version of Participation Reports, allowing more comparison between members, and extra metadata elements to communicate trustworthiness, including whether each member has thought about the long-term preservation of their content, and whether it has been added to a repository. There is a test version to look at in our Labs sandbox. Do take a look and provide feedback.
We’re continuing to work with funders through our growing funder membership, the Funder Advisory Group and other groups, including the Open Research Funders Group, the HRA, Altum, Europe PMC, and the ORCID Funder Interest Group. And we’re continuing to build the important relationships between funding and outputs (see Dominika Tkaczyk’s recent report) and engage with this key audience for research integrity.