Blog on Crossref

Innovation in scientific publishing and its implications for Crossref DOI registration practices - MetaROR’s approach

Ludo Waltman — Tue, 03 Feb 2026 00:00:00 +0000

A couple of months ago, Ludo Waltman and André Brasil raised some questions about good practices for Crossref DOI registration, asking for input from the scholarly communication community. In this post, Ludo and André reflect on the input received and discuss the approach to DOI registration that the MetaROR publish-review-curate platform is going to take.

Practices for assigning DOIs and structuring the associated metadata are not merely technical details. They shape how scholarly outputs are discovered, cited, evaluated, indexed, and preserved over time. As new models of publishing emerge, especially those that decouple dissemination from evaluation, these infrastructural choices increasingly influence what counts as a scholarly object, as well as how credit and accountability mechanisms are organized.

As editors of MetaROR (MetaResearch Open Review), a platform launched in 2024 and operating under the publish-review-curate model, we are interested in good practices for Crossref DOI registration in the context of innovative new approaches to scientific publishing. In the earlier blog post, we invited members of the broader scholarly communication community to share their perspective on the following two questions:

For each article on the MetaROR platform, there is a corresponding article on a preprint server. Is it acceptable to have two Crossref DOIs, one registered by the preprint server and one registered by the MetaROR platform, for essentially the same article?
If Crossref DOIs are registered for articles on the MetaROR platform, should the articles be assigned the type ‘journal-article’ or the type ‘preprint’ in their Crossref metadata, or something else entirely?

We were pleasantly surprised by the level of interest in these two questions. We received about 15 responses from colleagues in the scholarly communication community. Some colleagues posted a reply at the bottom of our blog post. Others responded on social media (Bluesky, LinkedIn) or shared their perspective by email.

Below we reflect on the responses received and we outline the approach to Crossref DOI registration that MetaROR is going to take.

DOI registration for articles on the MetaROR platform

Colleagues offered mixed opinions on the question of whether articles on the MetaROR platform should have their own DOI, in addition to the DOI these articles have on the preprint server on which they were originally published. Some colleagues argued there is no good reason for registering DOIs for articles on the MetaROR platform and suggested this may cause confusion. One colleague reasoned that “if we want peer review to be something more ongoing and evolve beyond a single point in time judgment”, our approach should be to “better map the connections between events” rather than registering a new DOI each time an article has been peer-reviewed.

However, other colleagues expressed support for registering DOIs for articles on the MetaROR platform. One colleague pointed out that this “allows the user to reference the exact artefact they have consulted”. This colleague also reminded us that in the past “people were worried about having a different DOI for a preprint and another for a VoR (version of record)”, while nowadays this is a generally accepted practice. Another colleague emphasized the value of decentralization and suggested to “let a thousand DOIs bloom”. Authors of an article peer-reviewed by MetaROR argued in favor of “an overarching DOI for the full package (preprint, reviews, author response and link to updated preprint)”, which in their view would make MetaROR’s “process more coherent”.

Having considered the various arguments in favor of or against registering DOIs for articles on the MetaROR platform, we feel the arguments in favor are more compelling. Our perspective is that an article on the MetaROR platform differs in a meaningful way from the corresponding article on a preprint server, since the article on the MetaROR platform has been enriched with an evaluation by peer reviewers and editors. MetaROR provides a carefully curated package that includes not only the article itself, but also review reports and an editorial assessment. In our view, this justifies registering DOIs for articles on the MetaROR platform. We also see DOI registration for articles on the MetaROR platform as a way to promote appropriate recognition for authors of articles peer-reviewed by MetaROR, similar to the way authors get recognition for articles published in traditional peer-reviewed journals.

Of course, when an article has multiple versions, each with their own DOI, it is important to establish a link between the different DOIs, indicating that the DOIs are associated with the same work. This is important for articles published first on a preprint server and then on a platform such as MetaROR just like it is important for articles published first on a preprint server and then in a peer-reviewed journal. In practice, we establish these links by registering relationships between DOIs in the associated metadata. In this way, we ensure that indexing services, discovery systems, and research analytics tools are able to recognize that the DOIs refer to different manifestations of the same work rather than independent outputs.

Record type for articles on the MetaROR platform

Our second question is about the record type to be used when registering a Crossref DOI for an article on the MetaROR platform. Many colleagues who provided input on this question argued there is a need for a new Crossref record type for ‘reviewed preprints’.

We feel the idea of such a new record type is interesting and its pros and cons deserve further consideration. However, any solution that requires changes in Crossref’s metadata schema will take time to realize, while for MetaROR we need a solution in the short term. At the moment, the most obvious options for MetaROR therefore seem to be to use either the record type ‘journal-article’ or the record type ‘preprint’ (which is in fact a subtype of the record type ‘posted-content’).

The use of the record type ‘preprint’ seems somewhat problematic to us, because preprints are typically understood to be articles that have not yet been formally peer-reviewed. In a way, articles on the MetaROR platform are the opposite of this, since these articles have undergone formal peer review. An article on the MetaROR platform is part of a package that also includes review reports and an editorial assessment. Such a package provides readers with a more informed understanding of an article than what they get from reading only the article itself. For this reason, we do not consider the record type ‘preprint’ to be suitable for articles on the MetaROR platform.

Instead of the record type ‘preprint’, we have decided to use the record type ‘journal-article’ for articles on the MetaROR platform. The record type ‘journal-article’ is intended for articles published in journals. To be clear, MetaROR considers itself a ‘platform’, not a ‘journal’. However, the distinction between ‘platforms’ and ‘journals’ is not very well defined and the choice of terminology therefore involves a certain degree of arbitrariness. Moreover, articles on the MetaROR platform have been formally evaluated, and in that sense they resemble articles in traditional peer-reviewed journals. Although the nature of the evaluation is different (i.e., MetaROR provides a narrative assessment, while traditional journals provide a ‘stamp of approval’), we feel the resemblance justifies the use of the record type ‘journal-article’. We also hope that the use of this record type will help to ensure that articles evaluated by publish-review-curate (PRC) platforms are treated similarly to articles evaluated by traditional journals, advancing beyond more conservative ways of dealing with articles on PRC platforms.

There is a precedent for using the Crossref record type ‘journal-article’ for articles evaluated by PRC platforms. For over a decade, this approach has been used by platforms operated by F1000, such as F1000Research, Gates Open Research, Open Research Europe, and Wellcome Open Research. The approach we are taking at MetaROR is similar to the approach taken by these platforms. At the same time, our approach is different from the approach of eLife, another prominent PRC platform. eLife uses the record type ‘preprint’ for all versions of an article on its platform except for the version that the authors consider to be final and that they choose to designate as the ‘version of record’. This version has the record type ‘journal-article’.

Summary of MetaROR’s approach to Crossref DOI registration

Figure 1 summarizes MetaROR’s approach to Crossref DOI registration. The figure considers the situation in which an article went through two rounds of peer review by MetaROR. Both rounds of peer review involved two reviewers. After two rounds of peer review by MetaROR, the article was published in a journal. We emphasize that journal publication is optional in MetaROR’s PRC approach. It is included in Figure 1 for the sake of completeness.

Figure 1: MetaROR’s approach to Crossref DOI registration

Each element in Figure 1 represents an item that has its own Crossref DOI. The shape of an element indicates the Crossref record type of an item (‘preprint’, ‘journal-article’, ‘peer-review’). MetaROR is responsible for the blue elements in the figure. The gray elements are the responsibility of other actors, either a preprint server or a journal. Arrows represent relationships between items. These relationships are captured in the Crossref metadata of the various items.

Figure 1 shows how MetaROR treats articles, review reports, editorial assessments, and author responses as first-class research objects. Each object has its own DOI, while the objects are linked through structured metadata. Assigning DOIs to review reports, editorial assessments, and author responses is central to our commitment to transparency, recognition, and reuse of evaluative contributions.

We note that Figure 1 assumes each version of an article on a preprint server has its own DOI. This is indeed how DOI registration is handled by many preprint servers, such as the OSF servers (e.g., MetaArXiv, PsyArXiv, SocArXiv), ChemRxiv, Research Square, and Preprints.org. However, some preprint servers use a single DOI for all versions of an article. This is the case for bioRxiv and medRxiv and also for arXiv, which registers DOIs with DataCite rather than Crossref. In the future, we hope these preprint servers will also adopt versioned DOIs.

Outlook

Over the past 25 years, practices for registering DOIs and associated metadata have evolved along with broader developments in the scholarly communication landscape. Inevitably, DOI registration practices will always be lagging behind the most recent developments in scholarly communication. From this point of view, the lack of agreement on good practices for DOI registration in the context of PRC platforms is not surprising. This lack of agreement can in fact be seen as part of a larger discussion about the pros and cons of different infrastructural approaches for handling ‘preprint review metadata’, including for instance the COAR Notify approach and the DocMaps approach.

MetaROR’s approach to DOI registration demonstrates both the power and richness of Crossref’s metadata schema and its limitations. As discussed above, several colleagues who responded to our earlier blog post consider the lack of a record type for ‘reviewed preprints’ to be a significant limitation. With the growing interest in PRC models for scientific publishing, there appears to be a need to systematically evaluate possible improvements that can be made to Crossref’s metadata schema to offer better support for new approaches to scientific publishing.

We see this not only as a technical challenge but also as an issue of infrastructure governance. We therefore invite further dialogue between DOI registration agencies, other metadata infrastructures, preprint servers, PRC platforms, and indexing services to explore pathways for improving metadata standards, whether through new record types, extended relationship vocabularies, or shared best practices. We hope our experiences with MetaROR will contribute to the collective effort needed to ensure that emerging models of scholarly communication are represented accurately, transparently, and responsibly in the scholarly record.

Crossref note: This discussion chimes with related plans for extending our schemas: more granular vocabulary for items within journal articles, preprints, reviews, and others; clearer relationship types; and support for the forthcoming NISO JAV recommendations. Our Preprint Advisory Group will discuss the topic this year, and our Metadata Advisory Group has both ‘journal article type vocab’ and ‘relationships’ on its radar for 2026. We look forward to engaging further on this topic as we work towards more flexible schemas in support of the Research Nexus.

A spotlight on our community in Indonesia

Susan Collins — Wed, 28 Jan 2026 00:00:00 +0000

Click here for the translation in Bahasa Indonesia

As Crossref celebrated its 25th anniversary last year, we are highlighting some of the most active and engaged regions in our global community.

Over the past 25 years, the makeup of Crossref membership has evolved significantly; founded by a handful of large publishers, we now have more than 24,000 members representing 165 countries. Nearly two-thirds of them self-identify as universities, libraries, government agencies, foundations, scholar publishers, and research institutions.

The Crossref community in Indonesia is by far the most dynamically growing region. Each year since 2017, we’ve seen the highest number of new members joining from the country. There are now over 4,400 members based in Indonesia who have registered the metadata for more than 2.6 million works, connecting their research to the global community.

Indonesia also happens to be the largest user of OJS globally, with close to 20,000 journals publishing on the platform. Most journals are published by universities, research institutions, and government agencies.

There is a strong emphasis on publishing as part of completing a university degree. The Ministry of National Education policy requires all students to publish their research before graduation. To provide opportunities and accessible platforms for publication, Indonesian universities and faculties have established journals to help their students meet these requirements for graduation.

Most journals in Indonesia are indexed in SINTA (Science and Technology Index), which is managed by the Ministry of Higher Education, Science, and Technology (MoHEST). The aim of SINTA is to improve journal quality, facilitate assessment, and increase the competitiveness of Indonesian journals. The use of DOIs is a requirement for indexing on the platform.

Members know the value of persistent identifiers for their content, but many also realise the value of Crossref’s commitment to open metadata and the open scholarly record. Being a member of Crossref means being part of a larger community. While DOIs may be required for national indexing, organisations have various reasons for becoming Crossref members. One of the most important factors is to increase the global visibility of their content and, therefore, increase the impact of their publications.

“We feel like we’re part of the Crossref community because we don’t just use your service; we contribute to it. By providing DOIs and metadata, we’re helping to build the open scholarly record that benefits everyone. Being a part of the Crossref network is more than just being a member—it’s about a shared vision. We see ourselves as active contributors. Every time we register a DOI and provide metadata, we add a new link to the global chain of knowledge. This helps ensure our research can be easily found, cited, and connected to other works, which benefits everyone.” — Nita Nurdiana, Universitas PGRI Palembang

We have very dedicated ambassadors based in Indonesia who advocate for Crossref’s mission, Fauji Nurdin ST. Mudo and Zulidyana Rusnalasari. Each has been instrumental in organising in-person events and webinars for members, as well as in representing Crossref at events throughout the region.

In October, as part of our 25th Anniversary celebration, the ambassadors, with the support of our Sponsor Relawan Jurnal Indonesia (RJI), held a satellite event in Medan, which brought together participants from universities, publishers, government agencies, research institutes, non-governmental organisations, libraries, and museums. It provided a forum for dialogue around key topics in scholarly publishing.

Crossref 25th Anniversary Satellite Event, Medan, October 2025

The majority of members in Indonesia work through one of our regional sponsors. Sponsors provide support to smaller organisations that often face financial, technical, and language barriers, making membership challenging. Their knowledge of the unique needs of their local publishing community and extensive networks help organisations learn more about Crossref in a more accessible way.

Our first sponsor in Indonesia, Relawan Jurnal Indonesia (RJI), joined in 2017; we now have eight sponsors that together support over 3,900 members in Indonesia.

Our sponsors are also key partners in helping us engage with the community, facilitating webinars and supporting our in-person meetings. In August 2024, in collaboration with RJI, we held a two-day in-person event in Jakarta, attended by over 100 members, and joined by our sponsors and ambassadors. Along with discussions on the fundamentals of Crossref and the role of quality metadata, we’ve heard from Ahmad Saefudin Surapermana, a sub-coordinator from ISSN Indonesia. Because so many members in Indonesia use the OJS publishing platform, colleagues from the Public Knowledge Project (PKP) joined us for a session on OJS plugins and an upgrade workshop for OJS system administrators. We continue to receive feedback from members that more regular in-person and online events should be held to facilitate connections and share developments.

Crossref Jakarta, August 2024

While interest in Crossref among this community is ever-growing, there are still painpoints for Indonesian members. Though many join through a Sponsor, some report challenges with metadata deposits, errors, and submission failures, and others struggle to navigate the documentation when technical issues arise. Some members have noted that our metadata requirements can be complex and that they struggle to achieve metadata completeness in their records. These concerns can be particularly challenging for institutions with limited resources.

To provide additional support, we developed a series of webinars in Bahasa Indonesia, covering topics such as using our Participation Reports to assess metadata completeness and workshops on best practices for using OJS. These webinars have been some of the most attended by our members. The strong interest reflects the value these sessions bring to our community, and we continue to receive requests for additional training opportunities. In total, we welcomed 1,044 registrants and 501 attendees across our webinars last year. This level of participation highlights the importance of ongoing training and the enthusiasm of our members to engage, learn, and grow together.

Despite some challenges, many members feel there is significant value in being a Crossref member. Including their metadata in Crossref enhances the visibility and accessibility of their journals globally. Because Crossref provides the infrastructure of persistent identifiers and open metadata, this ensures scholarly outputs are discoverable, connected, and part of a global research record.

“Crossref’s vision of creating open, connected scholarly infrastructure directly supports our university’s core mission of advancing knowledge and research impact. As an academic institution, we rely on Crossref’s DOI system to ensure our faculty publications and institutional repository content remain permanently accessible and properly cited. This infrastructure is essential for maximizing the visibility and impact of our research output, which directly contributes to our university’s reputation and ranking. Additionally, Crossref’s commitment to open scholarly communication aligns with our values of making knowledge freely accessible, supporting our open access initiatives and helping us demonstrate research impact to funding bodies and stakeholders. The persistent linking system also supports our students and researchers in conducting reliable literature reviews and building upon existing scholarship with confidence that their citations will remain valid over time.” — Anggota dari STIS Darul Falah, Indonesia

Ratna Galuh Manika Trisista, from Universitas Islam Jakarta, has also illustrated how joining Crossref and stewardship of rich metadata supports the development of Indonesian journals in her presentation, Our Metadata Story: Improving Citation Visibility through Reference Linking during the Crossref2025 Annual Meeting.

As membership growth in Indonesia continues, we look forward to building relationships within the community, supported by our ambassadors, sponsors, and members’ contributions.

Much of the information in this report comes from a survey sent to our members, sponsors, and ambassadors in Indonesia. We appreciate all the feedback, comments, and suggestions we received, and we look forward to continuing our collaborations and increasing our engagement with the community.

Translation in Bahasa Indonesia

Tahun lalu Crossref merayakan usia ke-25, dan momen ini menjadi kesempatan istimewa untuk menyoroti wilayah-wilayah yang paling aktif dan berperan penting dalam komunitas global Crossref. Salah satunya adalah Indonesia.

Dalam perjalanan 25 tahun tersebut, keanggotaan Crossref telah berkembang pesat. Yang awalnya hanya digagas oleh beberapa penerbit besar, kini Crossref menaungi lebih dari 24.000 anggota dari 165 negara. Menariknya, hampir dua pertiga anggota Crossref saat ini berasal dari perguruan tinggi, perpustakaan, lembaga pemerintah, yayasan, penerbit ilmiah, serta institusi riset, menunjukkan semakin kuatnya peran komunitas akademik dalam ekosistem publikasi global.

Indonesia menjadi wilayah dengan pertumbuhan komunitas paling dinamis di Crossref. Sejak tahun 2017, Indonesia secara konsisten mencatat jumlah anggota baru terbanyak setiap tahunnya. Saat ini, lebih dari 4.400 anggota Crossref berbasis di Indonesia telah mendaftarkan metadata untuk lebih dari 2,6 juta karya ilmiah. Kontribusi ini tidak hanya memperkuat visibilitas riset nasional, tetapi juga menghubungkan pengetahuan yang dihasilkan di Indonesia dengan komunitas ilmiah global.

Pertumbuhan ini tentu tidak terjadi begitu saja. Ia lahir dari kerja kolektif para pengelola jurnal, penerbit perguruan tinggi, editor, dan komunitas akademik di Indonesia yang terus belajar, beradaptasi, dan saling berbagi praktik baik dalam tata kelola publikasi ilmiah. Semakin banyak institusi yang menyadari pentingnya metadata yang berkualitas, transparansi dalam publikasi, serta keterhubungan riset melalui DOI sebagai fondasi visibilitas dan keberlanjutan ilmu pengetahuan.

Di berbagai forum, pelatihan, dan pendampingan komunitas, semangat kolaborasi ini terus tumbuh. Komunitas Crossref di Indonesia tidak hanya berkembang secara kuantitas, tetapi juga menunjukkan peningkatan kualitas dalam pengelolaan metadata, kepatuhan terhadap standar internasional, serta komitmen terhadap praktik publikasi ilmiah yang etis dan terbuka. Inilah yang menjadikan Indonesia bukan sekadar pengguna, melainkan kontributor aktif dalam ekosistem pengetahuan global.

Indonesia juga dikenal sebagai pengguna Open Journal Systems (OJS) terbesar di dunia, dengan hampir 20.000 jurnal yang dikelola dan diterbitkan melalui platform ini. Sebagian besar jurnal tersebut diterbitkan oleh perguruan tinggi, lembaga riset, dan instansi pemerintah, yang menunjukkan kuatnya peran institusi akademik dan publik dalam ekosistem publikasi ilmiah nasional.

Budaya publikasi ilmiah di Indonesia sangat erat kaitannya dengan dunia pendidikan tinggi. Kebijakan Kementerian Pendidikan Tinggi, Sains, dan Teknologi mewajibkan mahasiswa untuk mempublikasikan hasil penelitiannya sebagai salah satu syarat kelulusan. Untuk menjawab kebutuhan tersebut sekaligus menyediakan ruang publikasi yang inklusif dan mudah diakses, banyak universitas dan fakultas di Indonesia membentuk serta mengelola jurnal ilmiah mereka sendiri sebagai wadah bagi karya mahasiswa.

Sebagian besar jurnal di Indonesia terindeks dalam SINTA (Science and Technology Index) yang dikelola oleh Kementerian Pendidikan Tinggi, Sains, dan Teknologi (MoHEST). SINTA bertujuan untuk meningkatkan kualitas jurnal, memfasilitasi proses penilaian, serta mendorong daya saing jurnal ilmiah Indonesia. Dalam konteks ini, penggunaan DOI menjadi salah satu persyaratan penting agar jurnal dapat terindeks di platform tersebut.

Para anggota Crossref di Indonesia memahami pentingnya persistent identifiers untuk memastikan keberlanjutan dan keterlacakan karya ilmiah mereka. Namun, semakin banyak pula yang menyadari nilai lebih dari komitmen Crossref terhadap metadata terbuka dan rekam jejak ilmiah yang terbuka. Menjadi anggota Crossref bukan sekadar memenuhi kewajiban teknis, melainkan juga menjadi bagian dari komunitas global yang lebih besar. Meski DOI dibutuhkan untuk kepentingan pengindeksan nasional, banyak organisasi memilih bergabung dengan Crossref demi meningkatkan visibilitas global konten mereka—dan pada akhirnya, memperluas dampak dari publikasi yang dihasilkan.

“Kami merasa menjadi bagian dari komunitas Crossref karena kami tidak hanya menggunakan layanannya, tetapi juga berkontribusi di dalamnya. Melalui pendaftaran DOI dan penyediaan metadata, kami ikut membangun rekam jejak keilmuan terbuka yang bermanfaat bagi semua. Menjadi bagian dari jejaring Crossref bukan sekadar status keanggotaan—ini adalah tentang visi bersama. Kami melihat diri kami sebagai kontributor aktif. Setiap kali mendaftarkan DOI dan metadata, kami menambahkan satu mata rantai baru dalam jejaring pengetahuan global. Hal ini memastikan riset kami dapat ditemukan, disitasi, dan terhubung dengan karya lain, sehingga memberi manfaat bagi semua pihak.” — Nita Nurdiana, Universitas PGRI Palembang

Semangat kontribusi ini juga diperkuat oleh peran para ambassador Crossref di Indonesia yang dengan penuh dedikasi mengadvokasi misi Crossref. Fauji Nurdin ST. Mudo dan Zulidyana Rusnalasari telah menjadi penggerak penting dalam penyelenggaraan berbagai kegiatan, mulai dari acara luring hingga webinar untuk para anggota, sekaligus mewakili Crossref dalam beragam forum di berbagai wilayah Indonesia.

Pada bulan Oktober lalu, sebagai bagian dari perayaan ulang tahun ke-25 Crossref, para ambassador ini—dengan dukungan sponsor dari Relawan Jurnal Indonesia (RJI)—menyelenggarakan sebuah acara satelit di Medan. Kegiatan ini mempertemukan peserta dari perguruan tinggi, penerbit, instansi pemerintah, lembaga riset, organisasi non-pemerintah, perpustakaan, hingga museum. Acara tersebut menjadi ruang dialog yang hidup untuk membahas isu-isu kunci dalam dunia publikasi ilmiah dan memperkuat jejaring kolaborasi lintas sektor.

Crossref 25th Anniversary Satellite Event, Medan, October 2025

Sebagian besar anggota Crossref di Indonesia bergabung dan beraktivitas melalui sponsor regional. Para sponsor ini berperan penting dalam mendampingi organisasi-organisasi kecil yang kerap menghadapi berbagai tantangan—mulai dari keterbatasan finansial, kendala teknis, hingga hambatan bahasa—yang membuat proses keanggotaan menjadi tidak selalu mudah. Dengan pemahaman yang kuat terhadap kebutuhan khas komunitas penerbitan lokal serta jejaring yang luas, para sponsor membantu organisasi mengenal dan memanfaatkan Crossref dengan cara yang lebih ramah dan mudah diakses.

Sponsor pertama Crossref di Indonesia, Relawan Jurnal Indonesia (RJI), bergabung pada tahun 2017. Hingga kini, Indonesia telah memiliki delapan sponsor yang secara kolektif mendukung lebih dari 3.900 anggota di seluruh Indonesia. Peran ini menjadikan para sponsor sebagai tulang punggung pertumbuhan dan keberlanjutan komunitas Crossref di tanah air.

Lebih dari sekadar pendamping teknis, para sponsor juga menjadi mitra strategis dalam membangun keterlibatan komunitas—mulai dari memfasilitasi webinar hingga mendukung pertemuan luring. Pada Agustus 2024, misalnya, Crossref bekerja sama dengan RJI menyelenggarakan acara luring selama dua hari di Jakarta, yang dihadiri oleh lebih dari 100 anggota. Selain diskusi mengenai dasar-dasar Crossref dan pentingnya metadata berkualitas, kegiatan ini juga menghadirkan Ahmad Saefudin Surapermana dari ISSN Indonesia, serta para sponsor dan ambassador Crossref. Mengingat banyaknya anggota di Indonesia yang menggunakan platform OJS, rekan-rekan dari Public Knowledge Project (PKP) turut bergabung untuk memberikan sesi khusus tentang plugin OJS serta lokakarya peningkatan versi bagi para administrator sistem OJS. Hingga kini, Crossref terus menerima masukan dari para anggota bahwa kegiatan luring dan daring yang lebih rutin sangat dibutuhkan—tidak hanya untuk memperkuat jejaring, tetapi juga untuk berbagi perkembangan terbaru dalam dunia publikasi ilmiah.

Crossref Jakarta, August 2024

Seiring dengan meningkatnya minat komunitas ini terhadap Crossref, masih terdapat sejumlah tantangan (pain points) yang dirasakan oleh anggota di Indonesia. Meskipun banyak yang bergabung melalui sponsor, sebagian anggota melaporkan kendala dalam proses deposit metadata, munculnya error, hingga kegagalan pengiriman data. Ada pula yang merasa kesulitan menavigasi dokumentasi teknis ketika menghadapi permasalahan sistem. Beberapa anggota juga menilai bahwa persyaratan metadata Crossref cukup kompleks, sehingga mereka mengalami tantangan dalam mencapai kelengkapan metadata pada rekaman mereka. Kondisi ini tentu menjadi lebih berat bagi institusi dengan sumber daya yang terbatas.

Untuk memberikan dukungan tambahan, Crossref kemudian mengembangkan rangkaian webinar dalam Bahasa Indonesia, yang membahas topik-topik praktis seperti pemanfaatan Participation Reports untuk menilai kelengkapan metadata, serta lokakarya praktik terbaik dalam penggunaan OJS. Webinar-webinar ini menjadi salah satu kegiatan dengan tingkat kehadiran tertinggi. Minat yang kuat mencerminkan nilai yang dibawa sesi ini bagi komunitas kami, dan Crossref terus menerima permintaan untuk pelatihan tambahan. Secara keseluruhan, kami menyambut 1.044 pendaftar dan 501 peserta dalam webinar sepanjang tahun 2025. Tingkat partisipasi ini menegaskan pentingnya pelatihan berkelanjutan serta antusiasme anggota kami untuk terlibat, belajar, dan berkembang bersama.

Di balik berbagai tantangan tersebut, banyak anggota tetap merasakan nilai strategis dari keanggotaan Crossref. Penyertaan metadata jurnal ke dalam Crossref secara signifikan meningkatkan visibilitas dan aksesibilitas jurnal Indonesia di tingkat global. Melalui infrastruktur persistent identifiers dan metadata terbuka yang disediakan Crossref, keluaran ilmiah menjadi lebih mudah ditemukan, saling terhubung, dan tercatat sebagai bagian dari rekam jejak riset global.

“Visi Crossref dalam membangun infrastruktur keilmuan yang terbuka dan saling terhubung sangat mendukung misi utama universitas kami dalam memajukan pengetahuan dan dampak riset. Sebagai institusi akademik, kami mengandalkan sistem DOI Crossref untuk memastikan publikasi dosen dan konten repositori institusi kami tetap dapat diakses secara permanen dan disitasi dengan tepat. Infrastruktur ini sangat penting untuk memaksimalkan visibilitas dan dampak luaran riset kami, yang secara langsung berkontribusi pada reputasi dan peringkat universitas. Selain itu, komitmen Crossref terhadap komunikasi ilmiah terbuka sejalan dengan nilai-nilai kami dalam membuka akses pengetahuan seluas-luasnya, mendukung inisiatif open access, serta membantu kami menunjukkan dampak riset kepada lembaga pendanaan dan para pemangku kepentingan. Sistem keterhubungan yang berkelanjutan ini juga mendukung mahasiswa dan peneliti kami dalam melakukan tinjauan pustaka yang andal, dengan keyakinan bahwa sitasi yang digunakan akan tetap valid dalam jangka panjang.”
— Anggota dari STIS Darul Falah, Indonesia

Pengalaman serupa juga disampaikan oleh Ratna Galuh Manika Trisista dari Universitas Islam Jakarta, yang memaparkan bagaimana keikutsertaan di Crossref dan pengelolaan metadata yang kaya dapat mendukung pengembangan jurnal Indonesia. Hal ini ia sampaikan dalam presentasinya berjudul “Our Metadata Story: Improving Citation Visibility through Reference Linking” pada Crossref Annual Meeting 2025. Seiring pertumbuhan keanggotaan Crossref di Indonesia yang terus berlanjut, kami menantikan penguatan relasi dengan komunitas—dengan dukungan para ambassador, sponsor, serta kontribusi aktif dari para anggota itu sendiri.

Sebagian besar informasi dalam laporan ini bersumber dari survei yang dikirimkan kepada anggota, sponsor, dan ambassador Crossref di Indonesia. Kami sangat menghargai seluruh umpan balik, komentar, dan saran yang telah diberikan, dan berharap dapat terus melanjutkan kolaborasi serta meningkatkan keterlibatan bersama komunitas di masa mendatang.

Insights from a roundtable on author affiliation metadata

Amanda French — Thu, 22 Jan 2026 00:00:00 +0000

It’s been said that Americans are unusual in tending to ask “Where do you work?” as an initial question upon introduction to a new acquaintance, indicating a perhaps unhealthy preoccupation with work as identity. But in the context of published research, “What is this author’s affiliation?” is a question of global importance that goes beyond just wanting to know the name – and perhaps prestige level – of the place a researcher works.

When collected, used, and analyzed at scale, data about author affiliations can provide intriguing insights about international collaboration trends, signal trust and lack of trust in particular research institutions, generate business intelligence for publishers, help universities track the work their researchers do, help funders demonstrate the impact of their funding, and much more.

In November we partnered with OA Switchboard to organize a roundtable on author affiliation metadata for the Crossref community, service and infrastructure providers, production vendors, data scientists, researchers, and librarians. We aimed to bring together scholarly information professionals with many diverse perspectives; ultimately, participants from more than 40 organizations joined the roundtable to share their experiences and their thoughts.

In focusing on a single type of metadata, we hoped to focus our discussions, as well. Similarly, in October the Barcelona Declaration on Open Research Information organized a roundtable on “Moving Funding Metadata Forward” in which it became clear that “improving the quality and coverage of funding metadata was on the agenda of many organisations and there was a strong interest in collaborating on practical next steps.”

While many of the issues and solutions discussed at both roundtables are similar, in the course of the author affiliation metadata roundtable we identified some unique challenges as well as benefits related to this particular flavor of information. In this blog post, I’ll share these insights.

Insights from presenters

I opened the roundtable with a brief introduction and a working definition of affiliation metadata: names and/or identifiers such as Research Organization Registry (ROR) IDs for organizations where research was conducted or with which authors and contributors are associated, usually officially, as in their place of employment.

Next, to create a shared context for discussion, we heard four presentations on the current state of author affiliation metadata, its importance, and Crossref’s ongoing initiative to enhance it automatically.

Nees Jan van Eck of Leiden University’s Center for Science and Technology Studies (CWTS) shared observations on the state of author affiliations from a preprint titled “Crossref as a source of open bibliographic metadata” that presents the findings of an analysis performed annually since 2021. Nees’s key points:

Crossref is a foundational data source for bibliographic metadata.
Affiliation metadata is available for only 1 out of 3 journal articles in Crossref for the period 2023-2024.
There is considerable variation in the extent to which Crossref members deposit affiliation metadata.
Downstream sources try to fill gaps using suboptimal approaches, leading to missing, inaccurate, and inconsistent linking of publications to institutions.
Publications lacking affiliation metadata in Crossref are less visible in bibliometric applications, analyses, studies, and tools (such as the open edition of the Leiden Ranking of over 2800 universities).

Next, Yvonne Campfens of OA Switchboard reiterated the desirability of the Crossref community providing complete and accurate author affiliation metadata at the source. Yvonne called upon publishers to “Integrate metadata creation in your systems and workflows before publication and relay it throughout the editorial, production, and publication processes.”

Yvonne pointed out that in the context of managing Open Access agreements, publishers ought to keep in mind that providing good affiliation metadata improves customer satisfaction, since institutions and consortia need to have that information in order to connect research to the correct organization. In closing, Yvonne featured best practices from OA Switchboard’s Data Quality Challenge:

eLife captures affiliations at submissions with “author select,” ensuring that ROR IDs are introduced early and verified before publication, coupled with a quality assurance process during proofing. (See also our piece on Metadata Excellence Award winner eLife.)
EMS Press captures metadata via manuscript extraction as early as at submission, building on globally valid identifiers whenever possible (ROR IDs, DOIs, ORCIDs).
Pensoft Publishers uses AI-assisted metadata extraction with human review and in-house metadata validation.
Beilstein-Institut performs post-acceptance metadata quality assurance through automation and expert review.
The Royal Society embeds metadata in OA payment and agreement workflows.
American Chemical Society (ACS) has a multi-method persistent identifier matching strategy with near-complete coverage.
The American Society for Microbiology (ASM) combines AI-powered submission tools with editorial oversight via expert manual checks. (See also our piece on Metadata Excellence Award winner ASM.)
Rockefeller University Press (RUP) maintains ROR IDs across the full publishing workflow with “author select” at submission through metadata deposits upon publication. (See also the ROR case study on RUP.)

Adam Day of Clear Skies Ltd began his talk by wryly framing the first and second rules of data science as contradictory: “Never fix data: always use sources that produce high-quality data in the first place,” but also “Get good at fixing data, because you will have to.” Adam went on to demonstrate the central role author affiliation metadata plays in research integrity investigations, displaying anonymized data for institutions with a high number of alerts. In conclusion, Adam reiterated the importance of author affiliation metadata to research integrity efforts:

Data analysis is critical to research integrity.
Quality data helps enormously by giving oversight, saving time, and assisting investigations.

Lastly, our own Director of Technology Dominika Tkaczyk gave an account of our plans to enrich author affiliation metadata by matching organization name text strings to ROR IDs as part of our metadata matching initiative. A strategy for performing such matching has already been developed and tested and an open dataset of results made available. Tests on a set of 3,000 affiliations sampled from our metadata show that the strategy can be expected to match 95 million ROR IDs to organization names with 97.35% precision, an astronomical increase over the less than 1 million ROR IDs deposited in Crossref records to date.

Dominika concluded the presentation portion of the session by reiterating that our planned enrichment of author affiliation metadata

Will use flexible and transparent matching strategies (and open code),
Will welcome community participation in developing new strategies, and
Will be available in the REST API.

Automatic matching of organization names to ROR IDs in author affiliations cannot solve the problem of missing organization names, of course, but it represents a huge leap forward in addressing metadata quality issues.

All of our speakers’ presentations are available on Zenodo at https://doi.org/10.13003/661591chqlyw.

Insights into challenges

In the next stage of the event, participants broke into six breakout groups to identify factors contributing to incomplete or inaccurate affiliation metadata. Participants were pre-assigned to groups randomly by role to ensure a variety of perspectives in every discussion.

At least two participants, it should be noted, pointed out that it would be helpful to agree on a definition of “complete” and “accurate” affiliation metadata, which in itself is a challenge, and one we did not address in this roundtable. For instance, practices most recently have trended away from defining a complete author affiliation in open metadata as including an institutional address, although many internal databases might include such information separately.

Even without such definitions, however, all six groups were able to identify several general areas for attention, and one participant provided a particularly helpful categorization of these areas that is largely reused here.

Inherent data complexity

Research organizations have names in different languages, abbreviations, and many other name variants.
Research organizations have frequent name changes, mergers, and rebranding.
Research organizations have different degrees, levels, and complexity of hierarchical granularity, and authors, publishers, and software systems are often misaligned as to which level in an organization’s structure is appropriate to use in a particular instance.
Research organizations often lack official policies on how affiliations should be written, leading to hundreds of variations for a single institution.

Corresponding authors often submit information for all co-authors, which can lead to inaccuracies.
Many authors have multiple profiles across multiple submission systems, which can introduce errors.
Authors may have “octopus affiliations,” claiming affiliations with many institutions that are difficult to verify.
Authors may fail to update affiliations when changing institutions between manuscript acceptance and publication.
Authors may demonstrate “apathy” when repeatedly filling out submission forms, sometimes providing incomplete, inconsistent, or incorrect information.
On occasion, authors might even provide false or purchased affiliations, which of course is a significant research integrity concern.

Technical barriers

Many manuscript tracking and peer review systems, especially legacy systems, lack structured fields for affiliations or don’t support open organization identifiers like ROR.
Some systems limit authors to a single affiliation, despite many researchers having multiple institutional connections.
Some systems only collect affiliation information for the corresponding author.
Some systems link affiliations to user accounts instead of to publications.
Different systems use competing identifier registries, including proprietary identifier registries, creating interoperability challenges.

Publisher practices

Even when publishers improve current metadata collection practices, historical data correction is resource-intensive and often not prioritized.
Publishers collect affiliation information at submission but don’t ensure that it is maintained throughout all stages of the publication process and deposited in metadata.
Some publishers are unaware of the importance of author affiliation metadata or do not prioritize its improvement.
Some publishers deliberately choose not to deposit affiliation metadata to Crossref, viewing it as value-added information they’ve invested in curating.

Insights into solutions

Naturally, we didn’t rest at identifying challenges: after a break, we gathered in the same groups to brainstorm approaches to improving author affiliation metadata.

Adopt collective approaches

Collective action, where corrections and improvements made by various stakeholders flow back into shared systems, has historically worked for proprietary systems and could be even more powerful with open infrastructure.
Since those who do not provide metadata “upstream” will inevitably have it provided for them “downstream” by multiple separate entities using multifarious methods, provenance metadata indicating who asserted author affiliations and how (whether automatically or with the author’s or editor’s input) would help metadata users assess trust levels.

Engage authors and institutions

Reach out to authors and institutions to educate them on the need for more consistent affiliation reporting, especially in terms of language, name format, and degree of hierarchical granularity.
Demonstrate the benefit to institutions of maintaining accurate records in registries like ROR, including abbreviations and name variants.
Publishers and/or software systems should allow authors to review (though not necessarily edit) affiliation information during the proofing process to verify accuracy. Authors should not, however, need to know, see, or use ROR IDs.

Improve the tech

Publishers would welcome submission systems that incorporate structured fields for author affiliations with well-designed auto-suggestions linked to ROR or other organization identifiers.
Making affiliation data mandatory at submission could significantly improve capture rates, although it would be important to ensure that independent researchers can use these systems as well.
Enable collection of affiliations for all authors, not just the corresponding author.
Pull in verified affiliation information from ORCID.
Increasingly, intelligent matching systems can be implemented to reduce author burden and perhaps also increase accuracy and completeness of metadata.
Better crosswalks between different organization identifier systems would make it vastly easier for publishers to maintain better metadata. Since open registries cannot include proprietary information, proprietary registries should provide their customers with crosswalks to all standard open identifiers.

Encourage publisher best practices

Publishers can use already-available tools to help assess and improve the quality of both new and legacy author affiliation metadata.
Share the benefits of improved author affiliation metadata for internal and external analytics, customer satisfaction, and research integrity.
Identify best practices in collecting and structuring author affiliation metadata.
Understand that the entire research ecosystem would benefit from publishers sharing collected affiliation data with Crossref.

It’s worth mentioning that these solutions are heterogeneous: not all strategies can be implemented by any one actor nor even by any one sector of our profession. Clearly, collaborative action is necessary for substantive change.

Moving forward

The affiliations metadata roundtable represented an important step in addressing affiliation metadata challenges in a productive and collaborative way. If there was a consensus, it was that while perfect completeness and accuracy of author affiliation metadata may not be achievable (or even definable), incremental improvements can substantially enhance the quality and availability of affiliation metadata for the entire scholarly information community.

Here at Crossref, we intend to use the insights from this roundtable to inform our support of the Crossref community, including publishers, service providers, and metadata users. We welcome your comments, questions, and suggestions on this issue! Share your thoughts with Amanda French at alfrench@crossref.org.

References

van Eck, N. J., & Waltman, L. (2025). Crossref as a source of open bibliographic metadata (No. smxe5_v2). MetaArXiv. https://doi.org/10.31222/osf.io/smxe5_v2
Tkaczyk, D. (2025). Crossref relationships involving research organisations [Dataset]. Zenodo. https://doi.org/10.5281/zenodo.15254993
French, A., van Eck, N. J., Campfens, Y., Day, A., & Tkaczyk, D. (2026, January 19). Affiliations Metadata Roundtable 2025—All Presentations. https://doi.org/10.13003/661591chqlyw

Participating organizations


Africa PID Alliance / TCC Africa	Frontiers Media SA
American Association of Cancer Research (AACR)	Iowa State
American Chemical Society (ACS)	Kriyadocs / Exeter Premedia Services
American Physical Society (APS)	MDPI
American Society for Microbiology (ASM)	Noyam Publishers
Aptara	OpenAIRE / OpenOrgs
Australian Research Data Commons (ARDC)	Optica Publishing Group
Atypon	ORCID
Beilstein-Institut	Oxford University Press
California Digital Library (CDL)	Public Knowledge Project (PKP)
Cambridge University Press	Public Library of Science (PLOS)
Carnegie Mellon University	River Valley Technologies
CHORUS	Rockefeller University Press
Clarivate / Web of Science	SAGE Publications
Copernicus GmBH	Barcelona Declaration on Open Research Information
Curtin University / Curtin Open Knowledge Initiative (COKI)	Silverchair / ScholarOne
De Gruyter Brill	Springer Science & Business
Digital Science / Figshare	TNQTech
Digital Science / Symplectic Elements	University of Laval
eLife	University of Chicago Press
Elsevier BV	University of Split
Enago

The GEM program - Year Three and program expansion for 2026

Susan Collins — Wed, 14 Jan 2026 00:00:00 +0000

As Crossref membership continues to grow, finding ways to help organisations participate is an important part of our mission. Although Crossref membership is open to all organisations that produce scholarly and professional materials, cost and technical challenges can be barriers to joining for many.

Our Global Equitable Membership (GEM) Program aims to provide greater membership equity and accessibility to organisations in the world’s least economically advantaged countries. Eligible members pay no membership or record registration fees. Eligibility for the program is based on a member’s country. Seeing its effectiveness in increasing participation in the research nexus from previously underrepresented regions, this year we are expanding the GEM program to include 18 new countries.

Overview of the first 3 years of GEM

The program began in January 2023 with 214 existing members. By the end of 2025, we had 628 organisations under the GEM program. Of these, 535 are independent members, and 89 members work through one of our sponsors. To date, GEM program members have contributed approximately 334,000 works to the Research Nexus.

Global equitable membership	2023	2024	2025
New members joining	129	127	151
Total member count	327	458	628

Total number of Crossref GEM members by country until the end of 2025:

GEM country – alphabetically	Total no. of members	GEM country – alphabetically	Total no. of members
Afghanistan	29	Malawi	2
Bangladesh	167	Maldives	4
Benin	6	Mali	4
Bhutan	6	Marshall Islands	0
Burkina Faso	7	Mauritania	1
Burundi	3	Micronesia	0
Cambodia	14	Mozambique	2
Central African Republic	1	Myanmar	3
Chad	0	Nepal	60
Comoros	1	Nicaragua	2
Congo, Democratic Republic	24	Niger	0
Côte d’Ivoire	3	Rwanda	9
Djibouti	0	Samoa	0
Eritrea	0	São Tomé and Principe	0
Ethiopia	17	Senegal	7
Gambia	0	Sierra Leone	2
Ghana	38	Solomon Islands	0
Guinea	0	Somalia	10
Guinea-Bissau	0	South Sudan	0
Guyana	3	Sri Lanka	31
Haiti	2	Sudan	14
Honduras	3	Tajikistan	8
Kiribati	0	Tanzania, United Republic of	28
Kosovo	9	Togo	1
Kyrgyz Republic	27	Tonga	0
Lao, People’s Democratic Rep.	5	Tuvalu	0
Lesotho	0	Uganda	23
Liberia	1	Vanuatu	0
Madagascar	5	Yemen	37
		Zambia	8

Membership Density in GEM Program Countries until the end of 2025

Program expansion in 2026

Starting on 1st of January 2026, we’re excited to invite organisations from Angola, Belize, Cabo Verde, Cameroon, Republic of Congo, Dominica, Eswatini, Fiji, Grenada, Kenya, Nigeria, Pakistan, Papua New Guinea, Saint Lucia, Saint Vincent and the Grenadines, Suriname, Timor Leste, and Uzbekistan to join Crossref and register their content and metadata with us without membership or record registration fees. There are 711 existing Crossref members based in these countries who are now eligible for the program, bringing the overall number of GEM members to 1339 across 77 countries (that’s close to 5% of all Crossref members).

In creating our eligibility list, we refer to existing sources. For the first three years of the program, our list was predominantly based on the World Bank’s International Development Association (IDA) classification. In 2026, we leveraged additional sources to curate our list, resulting in the inclusion of 18 new countries in the program. Following community feedback, we now refer to the IDA, the IDA Blend List, and the United Nations Least Developed Countries list. In our choices, we also keep abreast of the global situation and conversations about supporting equitability in scholarly publishing and in the future, we may consider other factors too.

We will review our lists and the eligibility criteria annually and note any changes on our website. Members whose country moves on or off the GEM Program will be notified of any upcoming fees (or the removal of fees) with adequate time to plan and budget accordingly. Although the GEM program reduces financial barriers, many small organisations may still need administrative, technical, and language support provided by our Sponsors, and we will continue working with suitable organisations to make participation in Crossref easier.

Reduction of Grant DOI registration fees: a boost for the Research Nexus

Ginny Hendricks — Thu, 08 Jan 2026 00:00:00 +0000

We are pleased to announce that—effective 1st January 2026—we have made two changes to grant record registration fees that aim to accelerate adoption of Crossref’s Grant Linking System (GLS) and provide a two-year window of opportunity to increase the number and availability of open persistent grant identifiers and boost the matching of relationships with research objects.

Launched in 2019 with close input from several funders and other infrastructure organisations, the GLS primarily offers the ability to create and steward Crossref Grant DOIs, along with several benefits such as dedicated grant/award metadata like funding type, value, contributors, and projects, as well as hosted landing pages, tools to create and update metadata, and of course both member-asserted and Crossref-automatic matching of relationships within the global corpus of 180 million other research objects. Essentially, we need to identify what research objects are produced as a result of the award, and these objects could be articles, preprints, data, code, blogs, posters, and more.

This connected network is what we call the Research Nexus, essential for exploring research activity in general, as well as evaluating reach and return on funding and other support like use of facilities/equipment.

A fee reduction and a two-year fee waiver pilot

Following a review by our Membership & Fees Committee, the Board met in December and passed two related motions:

Current-Year (CY) grant registration fee has been cut in half to match other record types: The board approved the adjustment of the Current-Year (CY) grant registration fee down from $2.00 to $1.00 USD, effective 1st January 2026.
Back-Year (BY) grant registration fee is waived through 2027: The board approved a time-limited fee waiver as a pilot for Back-Year (BY) grant registration fees, bringing that per-record fee down from $0.30 to $0.00 for 2026 and 2027.

We aim to boost registration of Back-Year (BY) records and accelerate the growth of the Research Nexus with millions more grant<->output matches. During the course of the two-year pilot, the Membership & Fees Committee and our fee project work that started in 2023 and also brought in other fee reductions, will consider more adjustments across BY registration fees for the benefit of members beyond just funders and beyond just grants.

All Board motions are publicly available and we encourage questions from the community about our governance processes and the decisions on our members’ behalf; email us via feedback@crossref.org anytime, or post on the forum.

Supercharging the Grant Linking System

Leading up to the GLS launch in 2019, we worked with a group of funders and metadata experts to inform the design and implementation of the new service, including a funder governance and fees working group. That was seven years ago, and our Funder Advisory Group now includes nearly 100 funding community representatives the GLS has grown to almost 50 funder members that have registered more than 185,000 open grant metadata records. But they are mostly research councils and agencies or charities from Europe and North America, and we know that for a truly comprehensive and interconnected Research Nexus, more needs to be done to include organisations from all parts of the world. The other key driver is simply to boost more metadata connections; the more grant metadata we gather, the better we can match it to all kinds of research outputs, and this metadata directly feeds thousands of services available in our community, from Dimensions and Scopus, to OA.Report and OpenAlex, as well as funders’ own analytics tools. See our recent report about the latest dataset and of course use api.crossref.org directly.

Relatedly, we just added a new Grant DOI field to our schema for all record types, to give our members a precise and accurate way of capturing funding metadata for all research outputs. With the new lower CY registration fee and a pilot waiver of BY fees for grant records, we hope to boost the creation of more Grant DOIs by more funders from more parts of the world—so that others also see and can build on the momentum and reuse the data in their own tools and services. All actors need to play their role, and Crossref’s part is in running the global linking infrastructure at scale, connecting research objects and making them openly available while ensuring that the barriers for the registration, use and reuse of metadata remain as low as possible.

We feel we’re at a tipping point that only needs a small nudge to truly scale the Grant Linking System.

By waiving BY fees entirely for two years, we’re hoping to see members fill in historical data and create more comprehensive grant<->outcome connections. There is often a long period of time between funding being awarded, and the resulting research objects being generated and communicated. That is why historical grant metadata is so important; we think that there will be many funding outcome relationships and insights just waiting to be uncovered!

Why give funders a fee break and not others?

We’re not ruling out this kind of fee incentive in future for other members and other object types, but that needs more analysis (which we plan to do) and right now, the relatively small number of grant records, combined with a growing need for this kind of metadata, means the changes are small enough to have almost no impact on Crossref’s healthy financial position.

This decision is consistent with the goals of our Resourcing Crossref for Future Sustainability (RCFS) to review our fees to make sure they are equitable and clear, while ensuring Crossref retains a sustainable business model. Our fees can encourage or discourage the community to participate in Crossref. The RCFS project has also resulted in the creation of a lower membership fee tier for the very lowest-resourced members, and the tidying up of things like outlier volume discounts.

The BY fee waiver is positioned as a pilot to allow us to measure its impact over the next two years and feed into the Membership & Fees Committee and RCFS project. We will evaluate the pilot results (i.e. does it indeed supercharge funding metadata connections and adoption?) and consider additional adjustments to other BY registration fees and whether such fee incentives might be extended to other members.

We encourage all funders to take advantage of these reduced rates to contribute to the Research Nexus and help us build a more complete picture of the relationship between research funding and outcomes.

Take a look at the recent case studies from early GLS adopters FWF (Austria), NWO (The Netherlands), FCCN|FCT (Portugal), and Wellcome/EuropePMC, reach out to them or us with any questions, or peruse the GLS community forum!

The best way of acknowledging research funding in the metadata: Crossref Grant ID

Patricia Feeney — Tue, 06 Jan 2026 00:00:00 +0000

We are very pleased to kick off the New Year with another important schema update and the news that a Grant DOI field is now supported for all record types. This means that Crossref members can explicitly include the Crossref Grant IDs as part of their DOI metadata records for publications and any other output type, accurately linking research outputs to the funding that made it possible, all through metadata. We hope that our members will leverage this to respond to recent calls for stronger funding transparency and best practices for reporting funding sources in research outputs.

Funding information is very important for the research community. As explored by some key European funder representatives, providing mechanisms to clearly link funding with its outputs is essential for the community to have a full picture of the research endeavour.

When funders systematically register grants with persistent identifiers and make this information openly available, they create a foundation that publishers and infrastructure providers such as repositories can reliably build upon when depositing output metadata.”

– Hans de Jonge, Katharina Rieck and Zoé Ancion

Up until now, if a Crossref member wanted to include a Crossref Grant ID to unambiguously identify the output funding source, they would need to use other available fields, such as for an award number. While it was an important step towards increasing transparency and is heavily used for reporting and impact assessment, being an unstructured field, it was prone to errors, and of course, funders’ internal award identifiers are not unique, persistent, or necessarily open. This limited our ability to create unambiguous relationships with the Crossref Grant DOIs registered by our now ~50 funder members. As the new field becomes increasingly populated by our members, this rich metadata will pave the way for capturing and representing the funding relationships in a more accurate and complete way and fulfilling one of our commitments at the recent funding metadata workshop with the Barcelona Declaration.

The Crossref Grant ID field in the schema is a clear signal of the growing demand for these persistent Grant IDs (Crossref DOIs), and the relationships these help us create. Those connections can in turn enable streamlined reporting for the grantees, as well as compliance tracking and programme evaluation for funders.

As part of our work to enable the research nexus, Crossref has been proactively identifying funding information and prototyping metadata enrichment processes through matching projects, ensuring that as many relationships as possible are established and made discoverable. With this schema update, we aim to lower barriers and encourage more members to register output-funding relationships at source. This will facilitate the links that make the research nexus a connected, interoperable, and an important source of information that ensures a transparent and trustworthy research process.

We encourage all Crossref members to start incorporating Grant DOIs when available into your metadata submissions. By taking advantage of this new field, you’ll help build a more complete and transparent record of research funding, making it easier for the community to understand and trace the impact of funded research.

When collecting funding information for your publication, please consider asking the authors for the Grant DOI (Crossref Grant ID) as well as the funder’s details (such as their name and identifier). Here’s how the U.S. Department of Energy, Office of Scientific and Technical Information’s (OSTI-DOE) grant https://doi.org/10.46936/aps-182101/60010611 can be included in the metadata for related works, from datasets, to preprints, conference proceedings, journal articles, and more:

<assertion name="fundgroup">
 <assertion name="ror">https://ror.org/04qxsr837</assertion>
 <assertion name="grant_doi">10.46936/aps-182101/60010611</assertion>
 </assertion>

Similarly, a grant https://doi.org/10.3030/732489 from European Union H2020-EU.2.1.1. - INDUSTRIAL LEADERSHIP, would be represented in related work’s metadata as follows:

<assertion name="fundgroup">
 <assertion name="funder_name”>H2020 LEIT Information and Communication Technologies
 <assertion name="funder_identifier”>10.13039/100010669</assertion>
 </assertion>
 <assertion name="grant_doi">10.3030/732489</assertion>
</assertion>

For more technical documentation and implementation guidance, please visit our funding data documentation. If you have questions or need support integrating Grant IDs into your workflow, our support team is here to help!

Highlights of a very busy year: our 2025 annual report

Ginny Hendricks — Thu, 18 Dec 2025 00:00:00 +0000

As we finish celebrating our 25th anniversary, we can look back on a truly transformational year, defined by the successful delivery of several long-planned, foundational projects—as well as updates to our teams, services, and fees—that position Crossref for success over the next quarter century as essential open scholarly infrastructure. In our update at the end of 2024, we highlighted that we had restructured our leadership team and paused some projects. The changes made in 2024 positioned us for a year of getting things done in 2025. We launched cross-functional programs, modernised our systems, strengthened connections with our growing global community, and streamlined a bunch of technical and business operations while continuing to grow our staff, members, content, relationships, and community connections.

Read on for the highlights of a very busy year, grouped around our four strategic themes.

Strategic theme 1: Contribute to an environment where the community identifies and co-creates solutions for broad benefit

Enhanced tools and services

In October, we released an enhanced Participation Reports dashboard that shows metadata coverage across all 180 million records and provides individual member organisations with actionable gap reports to guide them to improve metadata completeness. The new tool provides more complete coverage of all members and resource types, now including funders and grants, with up to 11 best-practice metadata elements publicly tracked.

We launched support for journal articles in the New Metadata Manager record registration form (initially only for grants), which includes built-in reference and relationships deposit capabilities. In the New Metadata Manager, it’s now also possible to search for previously registered DOIs to edit your metadata records. In the coming years, we are planning to expand the new Metadata Manager to support all the many different content types that you can register with Crossref DOIs.

After a long break between regular updates, we have fixed our process for and just released v.1.63 of the Open Funder registry. With the updated process, we’re now able to resume more frequent updates to the registry (while of course still working towards the transition to ROR for funders).

Throughout 2025, we conducted a website information architecture review to improve the information we provide to our members and the wider community. Based on the recommendations from this review, we will be renewing our website and documentation in 2026.

Deprecations and modernisation

‘Old’ Metadata Manager is to be retired at the end of 2025, with users transitioning to the ‘New’ version or to our other helper tools for registering and updating DOIs. All users have been contacted during 2025 and received training on how to use the New Metadata Manager.

We also announced the deprecation of Co-access, which will end in 2026, bringing an end to the service that allowed duplicate DOIs for book content. Users of co-access have been informed and are in the process of transitioning to multiple resolution.

Together with Turnitin and our members, we are working to transition all subscribers to our Similarity Check service to a new version of iThenticate 2.0. We are happy to report that all platforms with integrations with us transitioned to 2.0 during 2025, and we will continue working with our members to get everyone transitioned during 2026.

Eating our own DOI dogfood

In June this year, we were particularly pleased to finally support the registration of DOIs for our own content, this very blog, through partnering with Rogue Scholar. Blogs are a growing format for scholarly discourse and our own blog is no different as it’s the main way that we share guidelines and best practices, as well as news and stories from the scholarly community. With a Crossref DOI for all blogs going back to 2006, we’re setting ourselves up to ensure better future preservation of the discussion and information about Crossref.

Community connections

We delivered 29 metadata health-check webinars over the course of the year, in French, Indonesian, Spanish, and English, reaching 2,166 participants with practical advice on identifying gaps in journal metadata using Participation Reports.

Crossref Accra took place in March as our first in-person event in a GEM country. We also held similar events in Ecuador and Türkiye with Crossref Quito in September and Crossref Ankara in November. At these three events, we welcomed key figures from each country’s library, government, publishing, and academic communities and we learned so much about the thriving communities there, and also that even more dedicated workshops on the specifics of metadata quality improvements would be appreciated.

Our metadata sprint in Madrid in April brought together community members to tackle specific problems collaboratively, with teams exploring coding, documentation, translation, and research using our open metadata. We’re already planning our next sprint in São Paulo for March 2026, and it will be held in three languages: Portuguese, Spanish, and English.

A strategic goal for Crossref is to grow research funders’ adoption of the Grant Linking System, and we produced the first in a series of interviews with funder members this year to highlight how and why Crossref DOIs are fulfilling goals to assess the reach and return of their research support for FWF (Austria), NWO (Netherlands), FCCN|FCT (Portugal), and Wellcome. This year, we welcomed more funders including Fonds de recherche du Québec (Canada) and Independent Research Fund Denmark as part of their national research platform NORA; we look forward to reporting on their experiences and outcomes next year and others as they work towards Crossref Grant DOI adoption.

We continued working closely with PKP and renewed our partnership to help drive better experience for OJS users registering metadata with Crossref. We also delivered a proportion of the metadata health-checks together to maximise the learning opportunities for our members using OJS; and we joined PKP Sprint in Oslo to help make improvements to OJS and OMP.

Crossref staff members serve on almost 50 committees, boards, and other community bodies alongside our own direct work. These include in the areas of research integrity, metascience, metadata and PID standards, open science policy or monitoring, development of new models (such as Diamond OA), editorial production, library and institutional publishing, and citation and other metadata analyses. We also work with other DOI Registration Agencies and support the sustainability of the DOI Foundation with an additional annual subsidy. Many DOI RAs are also Crossref Sponsors so that their members can access our unique reference matching service. While we often might advise, we also learn a huge amount from collaborating with the numerous systems and initiatives that make up thw wider research community.

Our involvement with developing the Barcelona Declaration on Open Research Information led us to become the fiscal host and to participate in most of the working groups on open metadata. Of particular note this year was the Funding Metadata Working Group round table about moving forward the state of funding metadata, which we co-hosted with Barcelona Declaration colleagues, and three funding bodies, NWO (Netherlands), FWF (Austria), and ANR (France) as we heard from publishers and their vendors about challenges and how to overcome them to increase the quantity and quality of available open funding metadata.

All our community engagement activities have been enthusiastically supported and enriched by our indispensable Ambassadors and our group of now 130 Sponsors, organisations that help thousands of Crossref members with local language and technical support and lower cost access to our membership.

Strategic theme 2: A sustainable source of complete, open, and global scholarly metadata and relationships

Schema developments

The grant schema version 0.2.0 was released in January, adding support for ROR identifiers to identify funders and new funding types for in our taxonomy, including APC, BPC, and infrastructure. All of these funding types can be specified in the metadata of our grant-giving members alongside the existing types such as use of facilities or salary/training awards, etc.

Version 5.4 of our publications schema was released in March, marking our first update in many years and a great opportunity to learn how to do this and make the process more efficient. This release introduced typed references to denote the type of object referenced (dataset, blog, software, etc.), preprint status indicators, and version numbering.

Just last week, we also added a dedicated field for grant DOIs to our publications schema. This means it’s now possible to indicate in an article’s metadata which grant(s) funded the research using the persistent identifier. This is an essential step toward better alignment between grant funding and research, enriching the Research Nexus.

We also launched our new Metadata Advisory Group and they have already devised sub-working groups in the three focus topic areas:

Multilingual metadata
Subjects and keywords
Relationships

Public data file

We released the 2025 public data file in March, containing metadata for (at the time) over 165 million research outputs from more than 22,000 organisations.

Inaugural Metadata Awards

In May, we launched the first-ever Metadata Awards to recognise members demonstrating excellence in metadata completeness and enrichment. Winners included Noyam Publishers (Ghana), GigaScience Press (Hong Kong), eLife (UK), American Society for Microbiology (USA), Universidad La Salle Arequipa (Peru), and Instituto Geologico y Minero de España (Spain). The awards will be held biennially going forward.

Metadata Matching project

In April, we launched the metadata matching project with the aim of building a more complete picture of the research nexus over time by automatically identifying missing relationships between entities across the scholarly record. The project’s goal is to modernise Crossref’s enrichment workflows by rebuilding them using modern software development and data science practices.

We are in the throws of developing a consolidated matching workflow that will eventually replace all existing production matching processes, with results exposed through the REST API. All new matching strategies will be rigorously evaluated, and the resulting data will be accompanied by clear provenance information. This project covers six matching tasks:

bibliographic reference matching
funder name matching
preprint matching
affiliation matching
grant matching
title matching

In the meantime, while work continues on integrating matching results into the REST API, we’ve been releasing standalone matching datasets for separate download and analysis. These include relationships between preprints and journal articles, relationships involving research organisations, and relationships between grants and research outputs.

Data infrastructure and Research Nexus participation dashboard

Staying on the data science front, we’ve established an internal data environment that combines all relevant data sources (scholarly metadata, logs and usage data, and external datasets) in their raw forms into a single place. This environment is supported by a suite of modern tools and data processing techniques, enabling data science experiments and analytics pipelines to run effectively at scale.

Building on this foundation, we plan to develop a series of dashboards to monitor the state of the scholarly record over time. These dashboards will feature both work-level and member-level statistics (for example, how many works of a given type have been registered, or how many members are registering grant IDs) as well as more detailed insights at the relationship level (for example, how many bibliographic references have been automatically matched, or how many times ROR IDs are included in funder assertions). Some of these statistics are already available in a public spreadsheet for now, pending the dashboard.

Retraction Watch integration

In 2023, Crossref acquired the Retraction Watch database to make it open data. Initially, this was done through sharing simple CSV files, but this year we have set up a pipeline to feed this information into our REST API, which means that Retraction Watch data is now fully available through the REST API, integrated with Crossref member-supplied retraction and correction metadata. This is the first example of Crossref integrating third-party metadata, and we’re learning a lot about how to best incorporate other datasets in future.

Metadata API and services improvements

From 1 December 2025, we revised rate limits for the REST API to ensure system stability whilst maintaining free access to metadata for everyone. Changes were made to the rate limits for our ‘public’ and ‘polite’ APIs, while the limits for our Metadata Plus users stayed the same. We continue to make all metadata openly available to the whole community.

We also improved how information from our content system feeds into the REST API. A tool we call ‘pusher’—because it pushes information from the content system to the REST API—was rebuilt so that we now have a more reliable transfer of information between our two systems.

While adding to technical improvements, we’ve also worked to better understand the use of and streamline the service offering for paid options. We’ll share more about this year’s Metadata Plus consultation soon. And based on feedback, we have already retired the ‘Query Affiliate’ service, where a handful of organisations still paid us a fee to access our XML API, whereas no credentials have been required for some time.

Strategic theme 3: Manage Crossref openly and sustainably, modernising and making transparent all operations so that we are accountable to the communities that govern us

Infrastructure modernisation

One of our biggest projects of 2025—if not the biggest—was the move from our data centre into the cloud (AWS). For 25 years, Crossref had been running a physical data centre in Massachusetts, USA, but as part of modernising our systems, it was high time to move everything into the cloud. The move to AWS took several months, but we successfully completed this move to the cloud in July this year. We’re spending these last weeks of 2025 fully decommissioning our data centre, which means that we are removing all the equipment we had there and locking the door for the last time.

A part of the move to AWS included moving onto an open-source database solution, PostgreSQL. This reduced our reliance on closed, costly licensed solutions, while also aligning with our POSI commitment to open-source. Running our entire system in AWS provides a more stable, modern approach to our infrastructure, but it also is expensive. We expect to spend about 2 million USD on AWS fees next year, with the majority of this cost coming from REST API usage. Some of the improvements described above will help us manage those costs and better observe traffic patterns.

Our new cloud infrastructure is a bittersweet milestone: while we are happy to not have to rely on a physical presence to support a 24/7 global infrastructure, we also say a sad farewell of our much-loved and long-suffering Sys Admin, Tim Pickard, who has been with Crossref since 2002, and has contributed significantly and unwaveringly to keeping our system up and running in the data centre. Tim will be leaving Crossref at the end of the year; we’re grateful to Tim for all his years of dedication, and we will greatly miss his impressive Hawaiian shirt game on our all-staff calls.

After 25 years, it was also time to get serious about modernising our core content system, because even though it serves our community well, an older system with legacy code is a constant risk and frustration. We’ve therefore embarked on a multi-year modernisation project where we are replacing our old code piece by piece. We no longer want to have one big content system (a monolith), but are planning to identify different pieces of functionality and rebuild these as separate services (a modular, flexible, and robust approach). This year, we already managed to reconstruct some smaller pieces (for example, the ‘pusher’ mentioned above), and next year we will tackle larger projects, such as Metadata Matching and Authentication.

We continue to prioritise open, timely communication for planned or unplanned service interruptions and encourage everyone to monitor our status page at status.crossref.org. We’ll further hone our incident response processes in 2026, including openly posting incident reviews, and we’ll also centre system maintenance and documentation clarity in everything we do.

RCFS Projects

The Resourcing Crossref for Future Sustainability projects (RCFS) and the work of our Membership & Fees Committee resulted in deciding not to change some things (such as the basis for annual membership fees), but to change three things about our fees, as reported in July:

A new lower membership fee tier of 200 USD for members with annual revenues/expenses of under 1000 USD - so far, this includes around 3000 members. See below for more info.
A removal of volume discounts to reduce complexity in our billing code; they were little used, and those who did use these were fine with the loss of the discount.
A removal of the rule that only publishers of a title could register peer review reports (including comments and annotations) at the lower 0.25 USD fee for the first review; this lower fee is now available to any member to register any reviews of any other members’ works.

A new late-breaking addition to these fee decisions is the reduction of fees for members registering Grant IDs. As of January 1st 2026, there will be no fee for back-year (BY) grant registration, to encourage the faster adoption of older grants, which are more likely to have research outputs to be matched. This will be a two-year pilot to trial how a reduced fee incentivises adoption and boosts metadata connections, and could be extended to other record types as we monitor its success and sustainability. In addition, the 2 USD fee per current-year (CY) grant record is being reduced to 1 USD in line with the next-nearest fee, this is a permanent change for the foreseeable future. More on this change in January.

Membership growth, efficiencies, and accessibility

Crossref now serves 23,600 members across 164 countries, with continued growth particularly in Asia and Latin America. We’ve continued our ongoing member onboarding activities to support new members joining the community. We see around 230 new members join each month, and have welcomed 2,700 this year so far. We recently reported on how the shape of membership has evolved over our 25 years of operation.

From January 2026, we’re introducing a new lower membership fee tier of 200 USD for organisations with annual revenue or expenses of 1,000 USD or less, making membership more accessible to low-resourced organisations. Already, over 3000 members have been eligible to move into or join under that fee, and the idea is to monitor how this affects Crossref’s financial sustainability and potentially adjust the 200 USD annual fee down again in future years.

From 1 January 2026, the GEM program, which offers fee-free membership and content registration for all members from certain countries, will expand to include 19 additional countries, further reducing financial barriers to participation in the scholarly record, so we expect several hundred further members to join the existing 600 organisations in this category.

As our membership base continues to grow, the Membership and Finance teams are constantly exploring ways to make shared processes more efficient. A key component in this work has been the efforts to automate several tasks within both teams to help us manage the additional work caused by our growth and allow our teams to focus more on providing the best quality service we can.

In March, the board voted to update membership terms and bylaws to clarify processes for suspending and revoking membership, and to be more explicit about member practices that preserve the integrity of the scholarly record. A short-term Member Practices Working Group will be meeting in the first half of 2026 to draft these.

Our membership team continues to support our members, sponsors, service providers, metadata users and the wider community by email and through our community forum. The membership team includes staff members who focus on member support, and staff members who focus on technical support. During 2025 so far, we’ve received 36.8k member enquiries through our support system, a 17% increase from last year. This includes 22.6k inquiries related to general membership and 13k technical support enquiries. We’ve received 3.8k membership applications, and welcomed 2.7k new members.

Growth by the numbers

Crossref continues its steady revenue growth in 2025 due to the expansion of our membership base. With the addition of new members and the general growth of Crossref, comes an increase in the transaction-based tasks our Finance team handles.

So far in 2025 we have issued 14,833 invoices, which is a 9% increase since last year. We’ve seen an 11% increase in the number of payments received and applied, and a 12% increase in the amount of credit and debit memos applied over the same time last year. We have also seen a 42% increase in the number of billing-related tickets, totalling 20,723. A large segment of these tickets are related to fee updates associated with the new $200 membership tier.

Not all transactional work in Finance has increased as steadily, with increased revenue of 8% we have also seen a 14% increase in operating expenses. Through the strategic consolidation of vendors and use of financial tools, we have only seen a 1% increase in Accounts Payable invoices processed.

Organisational sustainability

Finance-wise, we’re doing well. We’re projecting to finish this year with revenue of 14,200,000 USD and expect revenue next year of 14,500,000 USD. We’re budgeting 2% growth in overall revenue, accounting for some of the changes to fees that will reduce our earnings on membership dues, but anticipating continued growth of content registration revenue.

Revenue and expenses trends

About 67% of our expenses come from personnel costs, and the other 33% include non-personnel costs like AWS, travel, legal fees, etc. As we continue to build out the team, we have ten new positions planned for the next year (recruitment for many of these is already underway or done). With additional staff roles and AWS expenses, we’re expecting expense growth of 16%. We post our financial statements and Form 990 filings on the financials page on our website.

Revenue per member size (by tier)

As the chart above shows, we still ’the long tail’ of members in the smallest category (275 USD) contributing more revenue than those in the largest category (50,000 USD) at 5.8 million USD versus 5 million USD.

Another aspect of sustainability is our impact on the world around us. And this year we were able to publish a second report on Crossref’s carbon footprint, having monitored and controlled for several carbon-heavy activities, primarily staff travel. Our reported emissions went up 40% from 2023 to 2024, due to more travel given our growth in staff and members, better recording our emissions (for example, with hotel stays), and including travel that we support for our partners, ambassadors and board members. In terms of travel spending, we are still well below 2019 when we were smaller, demonstrating that we are following through on not going back to the pre-pandemic norm.

We were one of the first open infrastructure organisations to adopt the POSI Principles and now have a few years’ experience in trying to meet them. Together with other adopters, we proposed updates and additions to the principles, based on real-world practice, and gathered a lot of community comment, resulting in the group publishing POSI v2 in October. We conduct a self-assessment every other year and we’ll be involving all our staff in the next self-assessment, due later in 2026.

Open governance through board election and annual meeting

We continued our commitment to being member-led and community-driven. This year’s anniversary Annual Meeting in October brought together members to discuss strategy, metadata developments, and hear the results of their voting in our board election. It comprised two half-days of online conferencing and several in-person satellite meetings spread across five continents, gathering close to 500 members of our community. It was a platform to reflect together on the past quarter of the century of building community infrastructure and connections underpinning the progress of scholarship, and to share plans for the future.

Each member has one vote, and together they elected the following organisations to serve a three-year term alongside the rest of the board:

Tier 1 candidates (electing one seat):

Rebecca Wambua, Distance, Open and e-Learning Practitioners’ Association of Kenya

Tier 2 candidates (electing four seats):

Damian Bird, CABI
Rose L’Huillier, Elsevier*
Anjalie Nawaratne, Springer Nature*
Nick Lindsay, The MIT Press*

*returning board member

Congratulations to the remaining and incoming board members as we start their new term in January 2026. Have a look at all the outputs from our Annual Meeting.

Strategic theme 4: Foster a strong team—because reliable infrastructure needs committed people who contribute to and realise the vision, and thrive doing it

Team structure

We reorganised the team heading into 2025 because we had ambitious goals that required a more structured, collaborative approach. We reorganised the work around three strategic, mission-driven areas of focus described above. This was our first full year with the cross-functional program groups in place, and the activities reported here make it evident that our team members, both existing and new, are firing on all cylinders.

New staff and new roles

We welcomed eight new team members in 2025. In February, we welcomed our new Director of Programs & Services, Helena Cousijn, and a new member of the Technical Support team, Arley Soto. In March, we welcomed our new Community Manager for funders, Rocío Gaudioso Pedraza. In April, we launched our new Data Science team by welcoming Jason Portenoy and Alex Bédard-Vallée. In November, we welcomed our new DevOps Engineer, Thelma Laryea, and our new Program Technical Lead for the OSO program, Bharath Govindarajan. In December, we welcomed another member of the Technical Support team, Natali Giorgobiani.

We also had team members step up into new roles. Dominika Tkaczyk completed the new leadership team by taking on the Director of Technology role, Paul Davis has started his new role as Product Manager, and Michelle Cancel has taken on the Head of Human Resources role. And there’s more to come! As next year begins, two team members will step into Program Technical Lead roles: Carlos del Ojo Elias for the CRN program and Patrick Vale for the CCT program. Together with the Program Technical Lead for the OSO program and the Head of Infrastructure Services, these roles will complete the new structure of the technology team. This structure is more closely aligned with how our work is organised and will enable stronger coordination both within and across cross-functional programs.

Supporting a thriving global culture

As our team grows in different aspects within our new org structure to meet the needs of the community, we remain committed to supporting a thriving culture through training, conducting regular temperature checks, and organising our annual staff retreat. This year, we continued our work on psychological safety and introduced workshops on giving and receiving feedback and on consensus building. We were able to put some of this training into practice at our in-person all-staff event in Split, Croatia, where we all came together to build our roadmap.

We are ending the year with 51 staff in 14 countries and look forward to diversifying and evolving even further as a team in 2026—we’re currently hiring in UX, Communications, and Membership—and keep an eye on our jobs page for forthcoming opportunities in Software, DevOps, Metadata, and Operations!

Thank you to our community of members, partners, board, ambassadors, sponsors, metadata users, service providers, integrators—and of course our team—for making 2025 such a productive year. Together, we’re building a richer, more connected research ecosystem for the benefit of society. We can’t wait to continue the work together in 2026.

Twenty-five years of Crossref: reflections from the 2025 annual meeting and board election

Rosa Morais Clark — Wed, 17 Dec 2025 00:00:00 +0000

Crossref turned twenty-five this year, and our 2025 Annual Meeting became more than a celebration—it was a shared moment to reflect on how far open scholarly infrastructure has come and where we, as a community, are heading next.

Over two days in October, hundreds of participants joined online and in local satellite meetings in Madrid, Nairobi, Medan, Bogotá, Washington D.C., and London––a reminder that our community spans the globe. The meetings offered updates, community highlights, and a look at what’s ahead for our shared metadata network––including plans to connect funders, platforms, and AI tools across the global research ecosystem.

Ed Pentz opened with thanks and perspective. He reflected on how it all began: twelve members, one shared goal — to make research easier to find and verify. 25 years later, the same goal underpins 174 million open metadata records, 1.9 billion citation links, and roughly 1.3 billion DOI resolutions each month. What started as reference linking is now a global network of relationships among people, institutions, and research outputs. Ed also reaffirmed the Principles of Open Scholarly Infrastructure (POSI) as the foundation of our operations and our collaborations with other community-governed infrastructures.

“Each number represents shared effort, trust, and long-term commitment,” Ed reminded us. “Open infrastructure works because people keep showing up.”

Crossref’s purpose as per the Certificate of Incorporation.

Following up Ed’s talk, we showed a video timeline, ‘25 years of Crossref’, tracing milestones from the first DOIs to today’s connected Research Nexus.

Crossref 25th anniversary timeline

Shared perspectives from the community

We featured perspectives from organizations that have built key scholarly infrastructure alongside Crossref over the years. A shared message ran through their talks: open infrastructure only works when it’s interoperable, community-led, and practical for the people who use it.

Urooj Nizami (PKP) described PKP and Crossref as “independent and interdependent,” using the archipelago metaphor to show how open software and shared metadata services connect local publishing to a global network.

Todd Carpenter (NISO) emphasized standards being a social, and technical contract, noting how persistent identifiers and reliable metadata underpin a broader knowledge graph—and why provenance and linking matter even more as AI systems remix content.

Abel Packer (SciELO) highlighted Latin America’s strong DOI coverage while pointing out where multilingual versions and preprint–article–data links still break visibility—arguing for metadata that connects versions, not splits them. [data point]

Soichi Kubota (J-STAGE/JST) showed how Crossref services (from citation linking, Cited-by, metadata, to Similarity Check) anchor Japan’s national platform and how deeper cooperation (e.g., Crossmark) will support richer, more reliable metadata.

Leena Shah (DOAJ) outlined DOAJ’s open index, renewed POSI commitment, and hands-on collaboration with Crossref—from the MoU and PLACE to help-desk coordination, gap analyses, and plans to boost DOAJ records via Crossref’s API and open references.

Susan Murray (AJOL) spoke of capacity building: with 900+ journals across 40 countries, benefiting from AJOL’s support in registering identifiers and metadata , and of their long-standing partnership with Crossref making it possible for journals with limited resources to take part.

These voices echoed a common call: Build bridges, not silos.

Governance and election results

Leading off the formal annual meeting, Lisa Schiff, Chair of the Crossref Board, looked back on our 25th anniversary as one marked by progress and problem-solving. She talked about moving all our systems to the cloud—a big step that makes the organization’s work faster and more reliable. She also spoke about ongoing efforts to maintain the research record’s trustworthiness, including adding Retraction Watch data and updating member terms. Lisa noted new ways we are making membership more accessible, like the lower $200 tier and the expansion of the GEM program.

Lucy Ofiesh brought it back to the role of the members themselves, reminding everyone that success still rests with its members. The annual meeting is when members directly influence Crossref’s direction––when each vote helps shape how we move forward together.

We extend our thanks to the Board members whose terms have concluded, and we congratulate the newly elected members who will carry the work forward.

Five directors were elected: Rebecca Wambua (Distance, Open and e-Learning Practitioners’ Association of Kenya), Damian Bird (CABI), Rose L’Huillier (Elsevier), Anjalie Nawaratne (Springer Nature), and Nick Lindsay (MIT Press).

We also thank the 2025 Nominating Committee for their thoughtful work guiding this year’s process and slate selection.

The Board plays an important role in making sure our governance remains community-led, transparent, and accountable. The volunteer members bring experience from research funders, publishers, and libraries, giving a balance of perspectives that help steer our long-term strategy and sustainability.

Tools in practice

Then our attention turned to the tools that many members use every day. Patrick Vale walked participants through updates to Participation Reports and the Record Registration Form— designed to make working with metadata simpler.

Updated Participation Report for Universidad La Salle Arequipa (Peru), showing metadata element coverage percentages.

Participation Reports, first launched in 2018, have now been completely rebuilt as version 1.2. The refreshed interface runs on a new technology stack and supports morecontent types, and offers a new “download gap report” feature that generates a CSV list of records missing key fields—so members can identify and fix gaps directly.

Patrick then demonstrated improvements to the Record Registration Form, now streamlined for creating as well as editing records. The form includes real-time validation, auto-fill options for journals previously used, and the ability to edit existing records directly. Members can now easily add abstracts, funding data, licenses, and affiliations linked to ORCID and ROR—all within one place.

In the final demonstration, Luis Montilla, shared a “short research story”. He showed how anyone can explore Crossref metadata to uncover global participation patterns—turning what might seem like a mass of disconnected records into something meaningful once you start asking questions. He also shared a workflow that automatically retrieves and enriches data with country and regional information, then visualises member contributions and metadata coverage.

Luis also demonstrated an interactive notebook that lets users explore participation trends through radar charts and other visuals—illustrating how open data can help the community understand and improve the completeness of the scholarly record.

Crossref then & now

Amanda Bartell walked through how the community has changed over 25 years.

The membership has broadened dramatically: universities and scholar-led groups now form the largest share, and more organizations in Asia and Latin America have joined (with big growth in Indonesia and Brazil). Most members are small: 98% qualify for the lowest fee tier, and 57% participate via a Sponsor. In support of including members from smaller economies, Crossref launched a GEM programme, which will be expanding to 19 new countries in 2026.

She expanded her presentation later with a blog post to share insights about the changes in the Crossref global community.

With our growing membership, the needs of the community are evolving too, including expectations about Crossref’s role in preserving the integrity of the scholarly record.

“Our role in preserving the integrity of the scholarly record is focused on enriching the metadata to provide fuller and better trust signals while keeping barriers to participation low.” —Amanda Bartell, Crossref

In response to the growing membership across the globe, we launched our Ambassadors program in 2018. Johanssen Obanda highlighted the activities of what is now 50 volunteers across 38 countries. Ambassadors act as local contacts—running training sessions, organizing events, translating materials, and providing feedback from their regions. Over the past year, they’ve led 41 activities reaching around 1,200 people. Many also contribute to GEM outreach, metadata health checks, and regional events—often in local languages.

Roadmap highlights

Helena Cousijn outlined progress across three programs—Co-creation and Community Trends, Contributing to the Research Nexus, and Open and Sustainable Operations. Along with already showcased progress with Participation Reports and the new Record Registration Form, the Community Trends program involves working in partnership with others on DSpace integration and OJS plug-ins consolidation. In the near future there’s also a consideration for piloting AI detection tools. The Contributing to Research Nexus program carried out a consultation with Metadata Plus subscribers, and develops a new data citations endpoint for the Crossref REST API. This team is also developing further matching services, in the first instance looking to match funder metadata to ROR IDs.

Finally, Helena discussed the recent accomplishment of the Open and Sustainable Operations program, the migration of our database from the data centre to the cloud with Amazon Web Services. Other projects in this program involve ravamping resolution reports, rebuilding the Crossref authentication system, and launching new metadata schema.

Resourcing Crossref for Future Sustainability (RCFS)

RCFS program is focused on equity, simplicity, and revenue balance. Kora shared recent developments and next steps: : A new $200 membership tier (for organizations with ≤$1,000 in publishing revenue/expenses) takes effect on January 1, 2026; more than 3,000 members have already moved into it. We will keep “publishing revenue/expenses” as the sizing basis for publishers while funder sizing is still under review. Volume discounts for content registration end on January 1, 2026. Backfile discounts for theses/dissertations and conference proceedings are under review. Peer-review fees are normalized at $0.25 for the first review of a work, with subsequent reviews (same member, same work) for free

Behind the scenes: metadata, data science

Patricia Feeney reviewed recent and upcoming changes to our metadata schemas. Earlier this year, we began accepting ROR IDs as funder identifiers and released schema 5.4, which added versioning across all record types, a new status field for preprints, and a way to label citation types (like data sets, software, or blog posts).

Coming soon, Crossref will add grant DOIs to funding metadata and release schema 5.5, which supports the CRediT contributor vocabulary and allows multiple contributor roles. A new grant schema will follow, including support for beneficiaries, project identifiers (like RAiD), and repeatable roles. Looking ahead to 2026, our plans to overhaul how names and organizations are modeled, add richer funding and data-availability statements, and expand abstract and multilingual metadata support. A new Metadata Advisory Group has also been formed to guide work on multilingual fields, subjects, keywords, and relationship modeling.

Finally, Patricia announced plans to deprecate older schemas—a gradual, multi-year process—to simplify and modernize our metadata structure. She highlighted the importance of stronger relationships, richer records, and practical improvements that make metadata more useful across the community. That focus on connection carried directly into the next session about building through data science.

Data science at Crossref

Dominika Tkaczyk introduced the new data science team, formed a few months ago as part of the technology group. The team was created because of the growing scale and complexity of the data Crossref manages, driven by the expanding scholarly community. Their role is to use data science to assess, improve, and enrich scholarly metadata.

Their work falls into two areas: data analysis and insights—to help Crossref understand the scholarly record and guide decisions—and data services and workflows—to apply data science in building and maintaining production systems. Examples include studying overlap between scholarly databases and improving metadata quality. The session then focused on two projects: creating an internal data processing environment and developing metadata matching services.

Alex Bédard-Vallée described the team’s first project: building a data lake to bring together fragmented data from different systems. Previously, data were split across silos like the REST API, internal logs, and production databases. It enables tracking of reference deposits, closing 718M citation gaps. The system already enables analyses that were previously impractical, such as tracking how many members include reference metadata in deposits. It will also power new dashboards, monitoring tools, and other data-driven initiatives that support the integrity of the scholarly record.

Jason Portenoy then outlined the metadata matching project, which links pieces of information (like citations, funder names, or affiliations) to their identifiers such as DOIs or ROR IDs. He gave examples including reference-to-DOI, funder-to-ROR ID, affiliation-to-ROR ID, grant-to-DOI, and preprint-to-published-article matching.

He explained that much metadata is already deposited by members but large gaps remain. For example, among more than a billion citation links, about 843 million already include DOIs, while another 718 million references can’t yet be matched. The goal is to close these gaps to build a more complete and connected scholarly record—the “research nexus.”

Community highlights

Martyn Rittman, Program Lead, and Kora each opened the community highlights over the two days by noting that everyone presenting is sharing how they use metadata and contribute to the broader ecosystem.

Crossref does not exist without our members and the broader community—people who provide metadata and people who use the metadata. That’s why we’re here.” ~ Martyn Rittman

Antoine Drouin (Fonds de Recherche du Québec) shared that FRQ joined Crossref earlier this year and created 22,000+ grant and scholarship DOIs, linking grants to outputs and improving interoperability with ORCID, ROR, and Crossref grant IDs.

Agon Memeti (University of Tetova) shared findings of his analysis of abstract metadata coverage across 2024 articles from 13 university journals.

Charlie Rapple (Kudos) presented a Crossref-supported study on how researchers engage with the UN SDGs and described Kudos’ work explaining research for wider audiences. A survey of ~4,500 researchers showed strong awareness, regional differences in SDG priorities, and some targeted budgets for promotion, alongside challenges in publishing SDG-focused local research in prestige venues.

Pia Kretschmar (SCOAP3) outlined integrating Crossref metadata into new SCOAP³ open science elements in Phase 4; SCOAP³ funds OA publishing in high-energy physics and has covered 78,000+ articles. Publishers are scored on elements such as metadata provision to Crossref, identifiers, and links to datasets/software; completeness was checked via the Crossref API, results varied, and evaluation continues next year.

Barbara Rivera (Barcelona Declaration) introduced the Declaration, its four commitments, and its community of 125 signatories and 52 supporters, including Crossref. Working groups are executing a joint roadmap, with recent actions such as a funding-metadata roundtable and upcoming surveys on metadata frameworks and repository workflows.

Hans de Jonge (Dutch Research Council, NWO) presented his and Bianca Kramer’s recent study (as of 10/23/25 Preprint, not yet reviewed) of metadata completeness in Crossref among publishers using different manuscript submission systems. They compared six metadata types across major publishers and found that differences had more to do with workflow choices, customization, and policy than with the system itself.

Audrey Kenni (Pan African Medical Journal) shared PAMJ’s journey with Crossref to increased visibility.

Nurul Ain Mohd Noor (UMT Press, Malaysia) described UMT Press’s evolution since 2003, rebranding in 2007 and joining Crossref in 2020. Nurul explained how registering their metadata with Crossref increased citation visibility and indexing across databases.

Achal Agrawal (PostPub) introduced PostPub’s dashboard providing retraction statistics by country and institution, supported by a Catalyst Grant from Digital Science, and shared their journey through disambiguation challenges.

Ratna Galuh Manika Trisista (Universitas Islam Jakarta) presented how enabling reference linking transformed her law journal’s citation visibility.

Closing reflections

We closed the meeting with a panel discussion on the Research Nexus in the real world: What is the impact and potential of open scholarly metadata. Ginny Hendricks, Crossref; Dominika Tkaczyk, Crossref; Bianca Kramer, Barcelona Declaration on Open Research Information; David Oliva Uribe, UNESCO; Amber Osman, XploreOpen; Mariángela Nápoli, CONICET-IICE UBA-FFYL; Crossref; Kazuhiro Hayashi, National Institute of Science and Technology Policy; Science Council of Japan, shared a diversity of perspectives, which we’ll share in an upcoming blog.

You can also learn more about the in-person satellite events across the world from their organisers on our Community Forum.

You will find outputs from #Crossref2025 on our website, which you can cite as `#Crossref2025 Annual Meeting and Board Election, 22-23 October 2025 retrieved [date], https://doi.org/10.13003/431937misogo ‘.

Wellcome and Europe PMC: supporting Open Research through open metadata

Rocío Gaudioso Pedraza — Mon, 15 Dec 2025 00:00:00 +0000

In my latest conversations with research funders, I talked with Hannah Hope, Open Research Lead at Wellcome, and Melissa Harrison, Team Leader of Literature Services at Europe PMC. Wellcome and Europe PMC are working together to realise the potential of funding metadata and the Crossref Grant Linking System for, among other things, programmatic grantee reporting. In this blog, we explore how this partnership works and how the Crossref Grant Linking System is supporting Wellcome in realising their Open Science vision.

What motivated you to join Crossref?

Hannah: The motivation for Crossref Grant IDs is to be able to disaggregate research outputs between funders. Funders’ grant identifiers come in a range of formats, funders might change them over time, and there are also similarities between funders’ names, which is a challenge. Permanent identifiers, in this case, Crossref Grant IDs, are an opportunity to avoid some of the confusion if we were able to implement them throughout the research ecosystem.

This is also being discussed in different contexts, for example, within the Barcelona Declaration working groups, funders and other stakeholders are exploring the diverse motivations that exist to implement changes into our workflows, as well as the challenges that funding metadata and persistent grant IDs can help solve.

The way Wellcome implemented the Grant Linking System is a bit unique, given that it partnered with Europe PMC for the technical implementation and metadata registration with Crossref. Can you tell us more about how it works?

Hannah: The collaboration between Wellcome and Europe PMC in the implementation of Crossref’s Grant Linking System started because they already had the grants landing page feature ready and available to us.

There was an initial hope that other funders of Europe PMC, which also have these grant landing pages, could then leverage that same system to make Crossref grant IDs more broadly available to the research community, but I am not sure if that has materialised yet.

Melissa: Currently we are supporting Wellcome’s implementation of Crossref grant IDs, but the infrastructure remains available to other Europe PMC funders should they decide to take advantage of it. We already have funding metadata for Europe PMC funders because it is a requirement for grantees to select their grant identifier when submitting their accepted manuscripts for indexing and archiving. As we already have that metadata, naturally we can pull it together and send it to Crossref, along with the link to the Europe PMC grant landing pages!

An additional benefit of partnering with Europe PMC is the comprehensive metadata we deliver to Crossref with the grant IDs. For example, we have invested in supplementing affiliation data with ROR iDs and we deliver to Crossref all the data we have that matches their schema for grant data.

How is Wellcome leveraging the funding metadata and Crossref grants IDs that are being shared and registered with Crossref?

Hannah: We are discussing internally how we can better socialise the Crossref grant DOIs among the grantees, either via our grant management system or through Europe PMC. One place where the Crossref grant DOIs are being used and shared is through our publishing platform, Wellcome Open Research. The Crossref grant DOI is included in the publication metadata, ensuring that the research output is linked to the funding via the open metadata registered.

However, as we use Europe PMC as our repository for funded written research outputs, these outputs are aggregated alongside the grant records which includes the Crossref grant DOI, facilitated by Europe PMC APIs. So we have the means to link the two things together.

Melissa: There are some UX and technical blockers to fully integrate Crossref grant IDs within the Europe PMC grant system currently that are detrimental to the utility of these IDs, for example, you can’t search for a specific grant in Europe PMC grant finder using a Crossref grant ID. We are partnering with Crossref to solve these challenges and offer users more functionality in this space next year.

Hannah: Beyond eLife and Wellcome Open Research, I am not sure which publishers use Crossref grants DOIs in their workflows.

Rocio: That’s an interesting question, as we aren’t seeing a massive flow of Crossref grant IDs in the works metadata records just yet. We are exploring with publishers and their service providers how to make this business-as-usual, and in the meantime, we are running a series of matching projects to ensure that, when possible, we make those connections ourselves to enrich the metadata with funding information. We already insert reciprocal relationships where one record asserts a link with another (in this case, where either a grant Finances a work or a work isFinancedBy a grant record, Crossref adds in the reverse). Improving and enriching these relationships directly in the metadata makes sure that metadata provided by funders can make their way to the research outputs that originate from the grant.

Wellcome is streamlining the way of asking grantees to report on their publications, facilitated by Europe PMC. Can you tell us a bit more about how this will work and what role metadata will play?

Hannah: We will stop asking researchers to report their publications directly to us as part of progress and end-of-grant reporting. We believe there is sufficient open metadata with high-quality tagging in the ecosystem for us to collect written research outputs programmatically from this public data. Under our new system, we will be directing researchers to look at their grant record within Europe PMC and make sure that their written research outputs are properly linked there; otherwise, we won’t see them. We are trying to leverage open data, existing infrastructure, and a route that enables us to improve the completeness of open metadata.

There aren’t many mechanisms that enable our researchers to add assertions to funding and research output records retrospectively, and Europe PMC offers us that opportunity, and that is really critical for us. Rather than collecting information in our own system, we can contribute to enhancing the global corpus of knowledge and the quality of open metadata more broadly. Since correcting metadata at source isn’t easy, Europe PMC presents us with an opportunity to contribute to that system.

Melissa: We are thinking broadly about this problem; many institutions curate their research information in spreadsheets or closed CRIS systems and struggle to make it publicly available. We are thinking about how Europe PMC can be leveraged to be a public home for that data. EMBL-EBI hosts Europe PMC and utilises it as the institutional repository, so we have started a pilot project to add ROR IDs for affiliations to EMBL-authored publications within Europe PMC. This is manually curated, high-quality metadata that would otherwise be lost from the public ecosystem.

If you look into the future, what would your hopes be for the GLS and greater transparency in funding metadata in general? What do you think that we could achieve collectively as a community?

Melissa: It would be amazing (!) if everybody, from funders to publishers, to institutions and authors, would coalesce around the Crossref Grant Linking System, and add to metadata exchange workflows – you would potentially have a very clean and clear picture of where the money is going, what the outputs are, and how they relate.

Currently, even with the Open Funder Registry, there is ambiguity around funder names - for example, different geographical national funders sharing the same exact name as their counterpart in another country - so even with the best will in the world, funder institutions could be misidentified in systems and assigned the wrong identifier. The Crossref Grant Linking System facilitates complete disambiguation because grants are associated with the issuing funder’s correct identifier, ensuring traceability of outputs and funding and enabling more precise, cleaner metadata.

Hannah: I think that is a bit of the Holy Grail and in reality, its a bit messy, there isn’t just one system! We need to be able to move past the chicken and egg discussion, where we talk about the use of different identifiers, with sometimes competing priorities. For me, the real challenge for the metadata community is how do we enrich metadata, correct errors, and develop greater interoperability between PID systems. So that multiple parties can contribute towards the creation of a greater whole record, rather than relying on a single owner of the record to provide all the information. If we could all, funders included, connect information from individual partners to create a unified record at the end of it, we could have better records and probably save time by distributing the workload.

What would you say to colleagues in other funders about investing in open metadata?

We all need information from other partners in the ecosystem and investing in our own internal system will not give us the same return as collectively investing in opening up that information wherever possible.

——-

We are very grateful to Hannah Hope and Melissa Harrison for their perspectives on open funding metadata and the role of the community in ensuring a complete and comprehensive Research Nexus.

Some things are big because they are small – the new fee tier for Crossref members takes effect

Kornelia Korzec — Thu, 11 Dec 2025 00:00:00 +0000

Haz clic aquí para ver la versión en español

In January 2026, our new annual membership fee tier takes effect. The new tier is US$200 for member organisations that operate on publishing revenue or expenses (whichever is higher) of up to US$1,000 annually. We announced the Board’s decision, making it possible in July, and––as you can infer from Amanda’s latest blog––this is the first such change to the annual membership fee tiers in close to 20 years!

The new fee tier resulted from the consultation process and fees review undertaken as part of the Resourcing Crossref for Future Sustainability program, carried out with the help of our Membership and Fees Committee (made up of representatives from member organisations and community partners). The program is ongoing, and the new fee tier, intended to make Crossref membership more accessible, is one of the first changes it helped us determine.

When our membership renewal invoices are sent out in January 2026, the new fee tier will apply to 3,194 of our existing members, who will receive annual membership invoices 27% lower than previously. Surveys preceding the introduction of the new fee tier have shown that it might be applicable to between 30-60% of the organisations in what used to be our lowest fee tier (US$275 fee for organisations with publishing revenue or expenses of up to US$1 million).

We received positive feedback from members affected by the change.

We are very grateful for the new lowest membership fee tier. The Crossref fee is indeed a significant expense for our organisation, but we accept it given its importance. This new fee structure will make it easier for us to cover the cost.” – said Marina Pérez, Análisis Filosófico.

This initiative by Crossref to reduce membership fees is a welcome step toward achieving a truly global and connected research ecosystem. This will undoubtedly help our journal’s mission in fostering inclusive, open, and accessible publishing.” – said Dev Roychowdhury, Journal of Psychological Experience.

Following the feedback provided in the consultations and a number of prompts over the months after the original announcement, our Membership Team gathered information necessary to transition 3,194 members into the new fee tier. That’s 14.5% of all Crossref members (please note that in the graph below the number of members in $200 tier is higher due to recent influx of new members who didn’t need to transition, further – “$0” denotes all our sponsored members, who don’t pay membership fees to us, and those included in the GEM program).

Any members out there who think their organisation should be moved to the new lowest membership fee tier and haven’t already informed us – please contact us as soon as possible, before the end of the year, so we can make the change before invoices are raised in January.

We know – from speaking with our community (and thank you SO MUCH, for everyone’s feedback in surveys and discussions!) that this change makes participation in Crossref more accessible to smaller organisations communicating research. This will result in a continued flow of new records and associated metadata into the research nexus, helping us to make it easier to find and assess research, achieve greater transparency in the scientific process, and continue building trust in its outputs.

We’re not done reviewing our fees, and we don’t think the new fee tier addresses all the needs of the growing and evolving scholarly community. We continue working with Sponsors and Ambassadors, and we have upcoming changes to the Global Equitable Membership program to facilitate participation by all types and sizes of organisations sharing research.

Version in Español

Algunas cosas son grandes porque son pequeñas: la nueva tarifa para los miembros de Crossref entra en vigencia

En enero de 2026 entrará en vigencia nuestra nueva tarifa anual. Será de 200 dólares americanos (US$) para las organizaciones miembro que operen con ingresos o gastos editoriales de hasta 1000 US$ al año. Tras anunciar esta decisión de la Junta Directiva, se hizo realidad en julio y, como se puede inferir del último blog de Amanda, este es el primer cambio en las tarifas anuales de membresía en casi 20 años.

Esta nueva tarifa fue resultado de consultas y revisiones de tarifas que hicimos y que hacen parte del programa de financiación para la sostenibilidad a futuro de Crossref y que fue elaborada con la ayuda del comité de membresía y tarifas (compuesto por miembros representantes y aliados de la comunidad). El programa sigue en curso y la nueva tarifa, pensada para hacer más accesible la membresía de Crossref, es uno de los primeros cambios que nos ayudó a definir.

Cuando se envíen las facturas de renovación de membresía en enero de 2026, la nueva tarifa se aplicará a 3.194 de nuestros miembros actuales, quienes notarán que esta será un 27 % más económica que en otros años. Por otro lado, queremos que tengan en cuenta que las encuestas realizadas antes de la introducción de la nueva tarifa demostraron que esta podría ser aplicable a entre el 30 y el 60 % de las organizaciones que anteriormente se encontraban en nuestro nivel de tarifa más bajo (275 US$ para organizaciones con ingresos o gastos de publicación de hasta 1 millón de US$). Ya hemos recibido retroalimentación positiva de miembros que han sido beneficiados con el cambio:

Estamos agradecidos por la nueva tarifa más baja. El costo de Crossref es, sin duda, un gasto significativo para nuestra organización, pero lo aceptamos dada su importancia. Esta nueva estructura de tarifa hará que cubrir el costo sea más fácil.”, dijo Mariana Pérez, de Análisis Filosófico.

La iniciativa de Crossref de reducir las tarifas de membresía es la bienvenida a lograr un verdadero ecosistema de investigación global y conectado. Sin duda, esto va a ayudar en la misión de nuestra revista de fomentar una publicación inclusiva, abierta y accesible.”, dijo Dev Roychowdhury, del Journal of Psychological Experience.

Siguiendo los comentarios proporcionados en las consultas y una serie de indicaciones a lo largo de los meses posteriores al anuncio original, nuestro equipo de membresías recopiló la información necesaria para trasladar a 3.194 miembros al nuevo nivel de tarifas, lo que representa el 14,5 % de todos los miembros de Crossref (el gráfico a continuación muestra que el número de miembros en el nivel de $200 es mayor debido a la reciente afluencia de nuevos miembros que no necesitaron hacer la transición; además, “$0” denota a todos nuestros miembros patrocinados, que no pagan cuotas de membresía, y a aquellos incluidos en el programa Global Equitable Membership (GEM)).

Nota: los miembros que consideren que su organización debería pasar a esta nueva tarifa de cuota de membresía y que aún no nos lo hayan comunicado, por favor, contáctenos antes de que termine el año para que podamos hacer el cambio antes de que se emitan las facturas en enero.

Dicho lo anterior, por medio de las conversaciones que tenemos con nuestra comunidad (y GRACIAS por todos sus comentarios en encuestas y debates), sabemos que este cambio hace que la participación en Crossref sea más accesible para organizaciones pequeñas que comunican investigación. Estamos seguros de que esto promoverá un flujo continuo de nuevos registros y metadatos asociados que sumarán al nexo de la investigación, lo que nos ayudará a facilitar la búsqueda y evaluación de la investigación, lograr una mayor transparencia en el proceso científico y seguir construyendo confianza en sus resultados.

Aún no terminamos de revisar nuestras tarifas y no creemos que este nuevo nivel de tarifas considere todas las necesidades de la comunidad académica, que está en crecimiento y evolución. Seguimos trabajando con nuestros patrocinadores y embajadores y tenemos próximos cambios en el programa GEM para facilitar la participación de organizaciones, de todo tipo y tamaño, que comparten investigación.

Traducido por: Nicolás Mejía Torres

It's Time: Planning for Metadata Schema Deprecation

Patricia Feeney — Wed, 10 Dec 2025 00:00:00 +0000

It has been 18 (!) years since Crossref last deprecated a metadata schema. In that time, we’ve released numerous schema versions, some major updates, and some interim releases that never saw wide adoption. Now, with 27 different schemas to support, we believe it’s time to streamline and move forward.

Starting next year, we plan to begin the process of deprecating lightly-used schemas, with the understanding that this will be a multi-year effort involving careful planning and plenty of communication.

Which schema will be deprecated?

There are two types of schema used to register content metadata records: a full metadata input schema, which follows the pattern crossrefX.X.X.xsd, and resource schema, which follows the pattern doi_resourcesX.X.X.xsd. The resource schema are used to append metadata, such as references or funding data, to an existing metadata record.

I’ve categorized our schemas by usage levels to help prioritize the deprecation process:

Light usage (planned for initial deprecation):

crossref4.3.1.xsd
crossref4.3.2.xsd
crossref4.8.1.xsd
doi_resources4.3.2.xsd
doi_resources4.3.4.xsd
doi_resources4.3.5.xsd
doi_resources4.4.2.xsd

Moderate usage:

crossref4.3.3.xsd
crossref4.3.4.xsd
crossref4.3.5.xsd

High usage:

crossref4.3.0.xsd
crossref4.3.6.xsd
crossref4.3.7.xsd
crossref4.4.0.xsd
crossref4.4.1.xsd
crossref4.4.2.xsd
doi_resources4.3.0.xsd
doi_resources4.3.6.xsd

We currently support 5 versions of our grants-specific schema and will be working with our funder members to move to new versions of that schema over time - this will follow a different timeline and process as there are fewer schemas to navigate.

If you don’t know which version you’re currently using, now would be a good time to check. Many of our members are still using 4.3.0, the earliest supported version.

Why deprecate now?

Supporting 27 schema is unsustainable: Each schema version we maintain adds complexity to our systems and makes it harder to implement improvements that benefit everyone.
Existing schema need modernization. Some fundamental elements, like names and titles, need to be modeled differently to fully capture variations in language and usage patterns across different cultures and contexts. We also have too many bespoke record types. Consolidating these will create a simpler, more coherent structure. We may retain certain specialized structures for journal articles and books, but overall, simplification will benefit everyone.

Most importantly:

Our current requirements are too minimal. For most record types, we only require a title and publication year. While this low barrier has made registration accessible, it hasn’t served metadata quality well. We know you can do better, and we’d like to ask for more to improve the richness and utility of Crossref metadata.

What happens next?

This won’t be an abrupt change. We would like to deprecate the schema flagged ‘light usage’ by the end of 2026 and will be reaching out to impacted members early next year. For other schema, we’re planning a multi-year effort with clear communication at every stage. We’ll provide ample notice before any schema is deprecated, along with migration guidance and support.

With the exception of recent changes to affiliation metadata, we’ve primarily been building on existing schema structures. This means upgrading should be straightforward for most users. As mentioned, we’ll judiciously making some breaking changes to names, titles, and requirements, and would like to consolidate schema as we move forward.

Our goal is to create a more robust, modern metadata framework that better serves the scholarly community while reducing the maintenance burden that comes with supporting decades of schema versions. Stay tuned for more details on timelines and migration paths. In the meantime, if you’re unsure which schema version you’re using, we encourage you to check your current implementation.

Metadata in editorial workflows

Madhura Amdekar — Wed, 03 Dec 2025 00:00:00 +0000

Background

Scholarly metadata, deposited by thousands of our members and made openly available can act as “trust signals” for the publications. It provides information that helps others in the community to verify and assess the integrity of the work. Despite having a central responsibility in ensuring the integrity of the work that they publish, editorial teams tend not be fully aware of the value of metadata for integrity of the scholarly record. How can we change that?

Thousands of publishers and institutions from all over the world, big and small, are Crossref members, providing us rich metadata for their publications. During our discussion with the community on this topic, it has surfaced that it is usually the technical or production teams, which interact closely with Crossref, where the appreciation of benefits and value of metadata remain confined.

Although editors may interact with some aspects of metadata when they screen manuscripts that come their way, it is not evident whether they see metadata as useful for signalling trust. In the last couple of years, we have been specifically engaging with editors, meeting them, speaking to them, and writing for them on this topic. As next steps in this effort, we are now keen to engage with the diverse editorial community to understand where metadata fits in their workflows, and to identify opportunities for providing visibility to the importance of rich metadata.

To get a better grasp on this subject, I reached out to Christine Ferguson, to share her rich experience across many editorial roles with me, and to try and paint a better picture of the mutual gaps in understanding when it comes to publication metadata. Here’s what we discovered about the different editorial roles and some ideas for how Crossref might better engage with editors.

We know that

Our members come in all shapes and sizes, and that is also reflected in the diversity of editorial functions that may exist within their organisations. Some of our publishing members have editorial staff whose role is to screen submissions, which includes checking them to make sure that the manuscripts are formatted correctly, and have all the required information e.g. on ethics approvals, or ORCIDs (Open Researcher and Contributor ID) of authors. They then pass these manuscripts on to an external or an academic editor, who is usually a subject matter expert and is responsible for the editorial oversight of the content, to manage the rest of the peer review process, such as assessing the novelty and scope of the work, inviting and securing reviewers, and making a final decision on the manuscript. The academic editors make up a vast majority of the editorial community, variously serving as the editor-in-chief, section editors, and members of the editorial board. They usually volunteer their time as an editor, while having another primary job function.

Other publishers may have in-house editors who are subject matter experts themselves and manage the peer review process. Manuscripts can come to these editors after initial checks have been performed on them or the editors may also perform these checks, following which selected manuscripts undergo the peer review process.

Production editors assume responsibility for the manuscripts that are accepted. Their role is to make the manuscript production and publication ready, often liaising with the authors to finalise the formatting, and finally assigning it to an issue.

Then there are editorial roles that may be a combination of one or more of the above. The size and operational structure of an organisation may determine how editorial and other responsibilities are delegated within the organisation. For some of our medium or smaller members, it may be that the same individual or team is responsible for one or more tasks related to assessing the scientific content of the manuscript, managing the peer review process, as well as being in charge of the post-production workflows such as registering metadata with Crossref.

There are also emerging publishing workflows involving solicited peer-reviews of preprints or other types of works, which sometimes retain a form of editorial oversight.

In summary, editorial roles and responsibilities may vary quite a lot within our member organisations and we have less clarity about editorial roles and responsibilities within member organisations.

All of these different flavors of editors also interact with metadata at various stages in their workflows. For example, the title of the manuscripts, names of authors, whether they have ORCIDs and what is reflected in their ORCID records, and the abstracts may be used to assess the novelty and integrity of the work under consideration. The names of authors, especially if they are not known personally to the editor, can be verified in part by an ORCID check, ensuring the individuals exist, are affiliated to the organisations as claimed, that they have the relevant expertise to write or contribute to the manuscript, and to be able to find what they have written previously on the subject. Making sure that whether all or some of the authors (e.g. the corresponding author) have provided their ORCIDs, or if the link to where the dataset has been deposited in a repository resolves correctly, is usually a part of the pre-screening or post-acceptance checklists. As our recent metadata awardee, ASM has highlighted that having this metadata can be hugely beneficial during the peer-review management process, such as for identifying conflicts of interest, to ensure data policy compliance, and even for carrying out systematic analyses.

We’d like to know more about…

whether all editors interact with metadata in their workflows, and whether they are sufficiently informed about the power of rich metadata. It is evident that there is a lot of diversity in editorial roles and functions. Editors, whether they are mostly concerned with scientific content or with the manuscript peer-review process, are closely connected to the researcher community and the latest research topics and trends. By virtue of this, they are in an excellent position to ascertain the important metadata elements most relevant in their scholarly community. If we have a better understanding of how editors are using metadata in their workflows, we’d be able to identify specific opportunities for engaging with this key community to create greater recognition of the role of metadata in preserving the integrity of the scholarly record.

What we have in mind is to engage systematically with editorial community members and understand from them how, where, and which metadata are they using in their workflows. We’d like to do so by talking to editors who represent different Crossref members, perhaps in small groups, where participants will be able to share which metadata elements they interact with. We’d also like to share with them information about the use of metadata for research integrity. We’d like to understand whether they have been leveraging metadata in this context and the relevance of this information for them. Via this exercise, we hope to pick out some commonalities about the use of metadata in editorial workflows. Ultimately, we’d like to use this information to create resources that can be used for educating editors (and ultimately the researchers who submit their work for publication) about the importance of metadata, especially in signalling trust and preserving the integrity of the scholarly record.

Crossref members over the years: a journey through space and time

Amanda Bartell — Wed, 26 Nov 2025 00:00:00 +0000

Crossref was created back in 2000 by 12 forward-thinking scholarly publishers from North America and Europe, and by 2002, these members had registered 4 million DOI records. At the time of writing, we have over 23,600 members in 164 different countries. Half of our members are based in Asia, and 35% are universities or scholar-led. These members have registered over 176 million open metadata records with DOIs (as of today). What a difference 25 years makes!

In our 25th anniversary year, I thought it would be time to take a look at how we got here. And so—hold tight—we’re going to go on an adventure through space and time¹, stopping every 5 years through Crossref history to check in on our members. And we’re going to see some really interesting changes over the years.

2005

Let’s go back twenty years to 2005. Crossref has been running for five years, and at this point, we have just 318 members from 31 countries, with 18 million DOI records already registered. These members and the Crossref infrastructure are supported by five Crossref employees based in just two countries—the US and the UK.

In 2005, the majority of our members are based in North America, Northern Europe and Western Europe, and they are mostly publishers or societies. Our sponsor program doesn’t yet exist, so all members pay a membership fee directly to Crossref. Our membership fee structure is the same as it is today—we have tiered membership fees so our members can contribute to our infrastructure based on their capacity to pay. At this point, half of our members are eligible for our lowest fee tier.

2005 at a glance

318 members from 31 countries.
18 million DOI records registered.
Supported by five Crossref employees based in two countries - the US and the UK.
The majority (89%) are based in North America or Northern & Western Europe.
Half are eligible for our lowest fee tier.
Mostly societies (40%) and publishers (33%).

2010

Let’s move on by five years to 2010. By this stage, Crossref membership had grown to 1101 members from 69 countries, and these members have now registered 44 million DOI records. They are now supported by 14 Crossref employees, still all located in either the US or the UK.

We’re starting to see some changes in where our members are based. You’ll remember that back in 2005, 89% of Crossref members were based in North America, Northern Europe or Western Europe. By 2010, that percentage has dropped to 63%, and we’re seeing the number of members based in Asia starting to grow. In 2005, only 4% of our members were based in Asia, but by 2010, 18% of our members are based there, with 93 members in the Republic of Korea alone.

By 2010, the percentage of members who are eligible for our lowest fee tier has grown to 78%, so we are seeing smaller and less well-funded organisations starting to join. The types of organisations joining hasn’t changed significantly—members are still mostly societies and publishers. However, we are starting to see universities and scholar-led organisations beginning to join.

2010 at a glance

1,101 members from 69 countries.
44 million DOI records registered.
Supported by 14 Crossref employees based in two countries - the US and the UK.
Growth of members based in Asia (18%).
Smaller, less well-funded organisations starting to join - 78% eligible for our lowest fee tier.
Still mostly societies (37%) and publishers (28%), but universities and scholar-led members starting to emerge (23%).

2015

Jumping ahead another five years to 2015, we see Crossref membership has grown to over 3,000 members from 93 countries, with registered DOI records exceeding 77 million. These members and the Crossref infrastructure are supported by 28 employees, still all based in the US and UK.

Membership in Asia has now really taken off, and Asian organisations now account for 38% of all Crossref members. We also see membership in Latin America emerging, representing 12% of our membership. We have members from 12 different countries in Latin America in 2015, but the most significant number are from Brazil, with 274 members.

Our formal Sponsor program started to emerge from 2012 onwards. Our Sponsor program supports members who are otherwise eligible for our lowest fee tier and provides financial, technical and language support to organisations that would otherwise face barriers to membership. By 2015, we have 26 sponsors in 14 countries, and 20% of all members are working with us through a Sponsor. This is one of the drivers behind smaller, less well-funded members joining Crossref. We really see a leap here in 2015 with over 90% of members now eligible for our lowest fee tier.

Around 2015, we also begin to see an interesting shift in the types of organisations that are becoming members. Increasingly, our new members are university-based, and that type of member organisation has overtaken the publisher group in number for the first time. However, societies still make up the largest number of members.

2015 at a glance

3,134 members from 93 countries.
77 million DOI records.
Supported by 28 Crossref employees based in two countries - US and UK.
Growth in Asia (38%) and members in Latin America (12%) starting to emerge.
Leap in smaller, less well-funded members - 92% eligible for the lowest fee tier.
Sponsor program emerges - 26 sponsors in 14 countries.
Rise of university and scholar-led members (29%) - overtaking publishers (21%). Societies (31%) are still the largest group.

2020

Can you believe we’re already in 2020? Crossref now has almost 12,000 members in 133 countries, with registered DOI records totalling over 120 million! These members and the Crossref infrastructure are now supported by 43 employees across five countries, with Ireland, Germany, and France added to our staff locations.

Almost half of our members are based in Asia at this time, driven by growth from Indonesia, where we have 1681 members in 2020. Our sponsor program now contains 77 sponsors across 32 countries, including our first sponsor in North Africa.

We can now really see how membership is weighted towards smaller, less well-funded organisations: 97% of members are eligible for the lowest fee tier, and 57% choose to work with a sponsor.

By 2020, we also see a fundamental change in the types of organisations that are Crossref members. Societies no longer account for the largest share of our members, with both universities and publishers overtaking them. In 2016, we updated our schema to enable members to register records for preprints (and connect them to an article where relevant). By 2020, 65 members are registering preprints, and many preprint repositories have already become members.

2020 at a glance

11,976 members from 133 countries.
120 million DOI records.
Supported by 43 Crossref employees in five countries - France, Germany, Ireland, the UK, and the US.
46% of members based in Asia.
77 sponsors in 32 countries, first sponsor in N Africa.
Membership heavily weighted to smaller, less well-funded organisations - 97% eligible for the lowest fee tier and 57% working through a sponsor.
Universities and scholar-led are now the largest group (37%), followed by publishers (29%) and societies (24%).

2025

And so we find ourselves back in the present day.

With such steady growth, it’s pretty easy to predict almost exactly how many members we will have by 31st December 2025. By year-end, we would expect to have 23,800 members in 164 countries, with registered DOI records totalling around 177 million. With recent hiring, these members and our infrastructure will be supported by 52 Crossref employees in 14 different countries.

Member organisations are now a real mix, with museums, hospitals, botanic gardens, banks, and many more joining. The largest proportion remains those at a university or scholar-led (35%), but interestingly, we see the percentage who consider themselves to be societies starting to fall (19%) and publishers starting to grow again (29%).

And we see the arrival of a new type of member - since the launch of the Grant Linking System in 2019, we now see Research Funders joining Crossref in order to register identifiers for individual grants. These grant identifiers can then be included in the metadata for published content to uniquely identify the funding source, providing context and trust signals for the content, and fleshing out the Research Nexus. We currently have 45 funders who have registered over 175,000 grant records.

By 2025 we have 129 sponsors in 51 countries - including our first sponsors in East and West Africa who joined in 2024 and 2025 respectively. Half of all members are now based in Asia. 98% of members are now eligible for our lowest fee tier and 57% are working with us through a sponsor.

In 2023, we launched our Global Equitable Membership (GEM) program, which offers relief from any membership and content registration fees for organisations in the least economically advantaged countries in the world. We use the World Bank’s International Development Association (IDA) list as our data source for countries to include in the program. When we launched the program, 187 existing members moved under the program. Since the program’s focus is to enable participation for those who would otherwise find Crossref unaffordable, we are happy that we now have 583 organisational members in the GEM Program, showing the growth in participation from lower-income nations. Most members in the GEM Program are based in Southern Asia (48%) and Sub-Saharan Africa (33%).

November 2025 at a glance

23,622 members in 164 countries.
175 million DOI records.
Supported by 52 Crossref employees from 14 countries - Armenia, Austria, Canada, Ecuador, Germany, Ghana, Hong Kong, Ireland, Kenya, the Netherlands, Nigeria, Spain, the UK, and the US.
51% of members are based in Asia.
129 sponsors in 51 countries - first sponsors in East and West Africa.
98% of members are eligible for the lowest fee tier, and 57% working through a sponsor.
Real mix of organisation types - universities and scholar-led (35%), publishers (29%), societies (19%), but also research funders, museums, pharmaceutical companies, news agencies, and more!

Changes over the years

Here are some of that data over time, depicted in charts.

2026 and beyond

As you can see from our adventure through space and time, the types of organisations that work with Crossref have changed significantly over the years as the scholarly communications world has evolved. Our members now tend to be university-based research-performing organisations or scholar-led journals, based in Asia, and with low or zero publishing revenues (and volumes).

To meet our mission of a truly global and connected research ecosystem, it is essential to ensure that participation in Crossref and all our services and metadata is accessible to everyone involved in documenting scholarly progress.

We want to ensure that access to the Crossref infrastructure is equitable, so we are making two key changes in 2026: we’re extending eligibility for the GEM Program (more to follow), and we are introducing a new, lower-fee tier as an outcome of the RCFS projects more here.

We’re excited to see how our members will change as we head into our next 25 years—we hope you’ll continue with us on our journey and welcome all kinds of new members to the expansive and vibrant Crossref community.

Technically, this is only an adventure through time. At the time of writing, we have no members based in space. Unless you count the European Space Agency, NASA, et al. ↩︎

Crossref at the Frankfurt Book Fair 2025

Helena Cousijn — Wed, 19 Nov 2025 00:00:00 +0000

The Frankfurt Book Fair is the largest book fair in the world, and therefore a key event on our calendar. Held annually in Frankfurt, Germany, the 77th Frankfurt Book Fair (October 15–19, 2025) saw 118,000 trade visitors and 120,000 private visitors from 131 countries. The Crossref booth was located, as usual, in Hall 4.0 where all the stands with information about academic publishing can be found. Four Crossref colleagues attended the Book Fair this year, and in this blog post, you can read more about their meetings, experiences, and plans.

First timer fun at the Frankfurt Fair - Helena

Even though I’ve been working in scholarly comms for over 10 years, I’d never had a chance to visit the Frankfurt Book Fair. I was therefore really excited to have an opportunity to attend this year, and it didn’t disappoint! I arrived on Monday, October 13, in time for the STM dinner, which proved a great opportunity to meet with Crossref members and collaborators. On Tuesday, I attended the STM conference with the exciting theme of ‘The role of publishers in science diplomacy’. I think my favorite part of the day was the last panel, where the panelists realised that even though they represent different groups, in the end, they all have the same goals and are all working towards better science and dissemination.

On Wednesday, it was time to head over to our booth, where we prepared for the interesting conversations ahead. My meetings were mainly focused on collaborations in the area of research integrity, as Crossref plans to run pilots with potential partners next year. In-person meetings at the fair were a good opportunity to discuss in more detail which kinds of integrity checks could be useful to our members. I also had several meetings with organizations functioning as Service Providers –– depositing content on behalf of members –– who are eagerly awaiting the launch of our renewed Service Providers program next year. In these conversations, we shared our thinking about requirements for Crossref Service Providers and got input from organizations with experience serving our member community. Overall, it was a great opportunity to see members and collaborators in person, and I’ve already put the 2026 Frankfurt Book Fair in my calendar!

An exciting comeback - Maryna

If last year, I was a debutante at the Frankfurt Book Fair, 2025 marked an exciting comeback. It’s always a pleasure to spend time chatting with people you usually only meet through email or Zoom. Working remotely as part of a global team is something I truly value about Crossref, but it also makes those in-person moments even more special. You get to solve issues that have been sitting on your to-do list over lunch, brainstorm ideas while walking to the venue, get immediate advice in a meeting—and, of course, talk about dogs over dinner.

Frankfurt was busy but well organised. Our booth was lively with a mix of planned and spontaneous meetings. It was nice to reconnect with members and sponsors I’ve worked with over the years. We even gave an early look at the new Participation Reports before the official release (what a thrill!). There were good conversations about deprecating co-access, the importance of title transfers, and how we can keep improving the member experience. One highlight: I spoke with a prospective member about our membership model and fee structure, and they joined the following week! Their account is already active, with a prefix assigned, which was great to see.

Another key topic was the importance of ROR IDs. I talked with several publishers about how they could be implemented across other systems. At one point, I spotted an issue with unregistered DOIs and was able to fix it on the spot by finalising a title transfer—we’d had permission but never received the formal request—so it was satisfying to close that loop in real time.

Being a relatively small team serving a global membership of more than 23,000 and growing, it’s not possible to meet with every member face-to-face to respond to every question. Our team works hard to respond to all queries by email, but it’s undeniably faster and more productive in person. That’s why we keep returning to the Frankfurt Book Fair year after year—you can definitely count on seeing us again next year!

Third time at bat - Luis

The Frankfurt Book Fair is always an incredible opportunity to connect with our community. We come prepared with highlights of the year, plans for developments and upcoming releases, and remind the members we meet to participate and vote in the annual elections. But most of what we learn happens during the informal moments––meetings, drop-ins, and chats over coffee and tea––where people discuss what they’re working on, trends, and interests of the scholarly and publishing community.

This year, some of those conversations included meeting someone working with groups from Egypt and the UAE who are developing tools around our metadata. They wanted to talk through REST API use, recent Crossref updates, and how retraction metadata could fit into their systems. Another person opened their participation report with us and were surprised to see their metadata showing 0% despite the team believing they were sending complete metadata, which led to a discussion about getting their internal workflows running again.

Booth days always fly by, but they’re deeply informative and insightful for teams that participate in person, as we can “cross-check” (pun intended) how our different support mechanisms help the community and how well we’re delivering our communications. There is a good mix of problem-solving and catching-up; often, we see members who prepare a list of questions because they find it easier to sit and navigate through them with our support or membership colleagues. Sometimes it’s about refreshing their understanding of what Crossref is and what we do, especially during team changes. We also spoke with a publisher preparing to adopt Crossmark. They wanted to check they were handling updates and relationships correctly, and mentioned that increasing transparency is becoming a priority for them. Someone else, working closely with a repository, asked about using the REST API or Metadata Plus to enrich their records.

A few visitors simply needed clarity––one was pleased to learn they could register reports and datasets after being told otherwise. Another visitor who registers a small number of book DOIs each year asked whether the Web Deposit Form was still the best fit. We walked through the Record Registration Form together, and its new editing features helped them plan for upcoming changes.

Personally, I enjoy seeing the cultural and organisational diversity of existing and potential Crossref members, ambassadors, sponsors, allies and colleagues from all over the world at our booth. If you have the opportunity to attend the Book Fair next year, please visit our booth and say hello!

This year’s Frankfurt Veteran - Paul

I think this is my 5th (?) Frankfurt book fair, and each year I come away thinking how much I appreciate the opportunity to speak with our members face to face, and I get to see and hear the impact that Crossref has, which is always such a pleasure.

This year, there were only four of us in attendance, and it felt busier than ever. We had a lot of pre-booked meetings at our wonderfully designed booth again (thanks to the amazing work of our colleague Rosa) but we also had lots of ad-hoc meetings, where members came up to say “hello”, “thank you” or ask about that really knotty, niche problem that they have, which they are not sure how to explain over email. From a technical support perspective, this is great, as we can go through these issues and get a resolution––or a solid background––without the delay and confusion of long email threads. I also worked with a member who got their IT department to send over a file there and then for us to work through and try to navigate a difficult question regarding reference matching and whether the simple text query form worked using an API, which others could use. These were just two examples of many in which it was much easier to sit down and work through issues directly at the fair.

So I would always say that if you are at the Frankfurt book fair, and you have one of these issues then it is a great opportunity to come by, say hello and work through it with us. We will send out a reminder before the fair in 2026 to get any meetings booked, or just come find us at the fair.

A highlight for me this year was also showing some of our members our new Participation Report. It’s had a visual update as well as some new functionality: you can download a gap report that lists DOI numbers of records that are missing the metadata element you choose, making it easier to identify and update missing metadata. I always like attending the Frankfurt Book Fair and so might be there next year. It’s an important opportunity for all Crossref colleagues to engage and meet our members––many for the first time.

Next year

Feeling inspired after all the great meetings and conversations we had this year, we immediately started planning for next year! We’ll definitely be in Frankfurt in 2026, where you can find our team at the Crossref booth. We’re also planning to organize another roundtable on the Monday before the fair, so put October 5-9, 2026, in your calendars and stay tuned for more details.

The sunset is on the horizon for Metadata Manager. What's next?

Lena Stoll — Thu, 06 Nov 2025 00:00:00 +0000

TL;DR. Metadata Manager will be retired at the end of 2025. Over the past four years, we have been developing a new helper tool to replace it, and that tool has now reached a stage of maturity that means we will be able to switch off Metadata Manager by the end of the year.

How did we get here?

In 2021, we said that we would be retiring the deprecated Metadata Manager as soon as we can offer members a suitable replacement for registering their journal content. So this news has been a long time coming - Metadata Manager has been very challenging for us to support, and we have found it impossible to develop additional features. However, we did not want to take the final step of switching off the interface until we were able to offer a suitable replacement for members who rely on manual helper tools to register their journal content.

That replacement, our new record registration form, has now been used by many members for over a year to register their journal content. The feedback so far has been positive, and we have been able to add functionality to the tool at a pace that we are happy with.

In July 2025, we contacted those members who are still using Metadata Manager to let them know that the tool will no longer be available after December 2025. So if you are affected by this news, you were probably already aware of it. But we wanted to go into a little more detail on the sunsetting of Metadata Manager, why we are doing it, and what’s next for Crossref’s content registration helper tools.

What has happened since 2021?

We have been developing the record registration form ever since that announcement in 2021. It began its life as a helper tool for registering grant records, but we knew we wanted to expand it to cover journal articles and other record types as soon as we could.

To see whether the concept behind the grants form could be applied to journal content, we first built an initial prototype and tested it with a number of Crossref ambassadors and volunteers. We wanted to ensure that the tool was intuitive to use, and to understand what functionality it would need to support for it to be truly useful to our members. Following some iteration on the invaluable feedback we received from our testers, we finally released the tool to production in September 2024 and began encouraging members to use it for their real-life article deposits.

We have been continuously adding new functionality since then, from additional fields for registering richer metadata to a feature that allows members to edit their articles’ metadata without having to re-enter everything into the form.

Now, about two months from the target date for retiring Metadata Manager, the record registration form is used by members to register about 200 articles per day, while Metadata Manager still sees about double that volume of submissions. So we have some way left to go.

Why is now the right time to retire Metadata Manager?

2025 has been a year of addressing technical debt for Crossref. My colleague Sara wrote about this co-ordinated push towards modernising our system in her post about our cloud migration in the summer.

Having the long-awaited replacement for Metadata Manager in place will allow us to free up the resources that have been tied up for years by troubleshooting Metadata Manager, in terms of both technology and user support, so that we can focus on projects and initiatives that align with our longer-term strategy.

How will we avoid the new tool developing the same problems as Metadata Manager?

As stated above, Metadata Manager has caused us many issues and headaches in different ways - but we have also learned a lot from dealing with these problems. As Bryan Vickery wrote in 2020, Metadata Manager is “not flexible enough to easily add other record types, like books/book chapters, or to include any changes we may make to our input schema.” To address this, we built the record registration form in a schema-driven way, which makes it adaptable to any future schema changes. It also means that we can spin up prototypes of new forms for additional record types quite quickly.

So while Metadata Manager was custom-built in a way that could only ever work for journal content, the record registration form already supports two record types and will support more in future. This is key for our goal of building a complete research nexus, which extends far beyond journal content, and even beyond “content” as such (did someone say grants?).

What happens next?

Metadata Manager will no longer be available from January 2026.
Starting next year, if you attempt to access Metadata Manager at https://www.crossref.org/metadatamanager/, you will be redirected to a deprecation note on https://www.crossref.org/deprecated/ which will link out to the new tool.

What options do I have for registering my journal content going forward?

If your organisation still uses Metadata Manager to register metadata for your journal articles, now is a good time to begin familiarising yourself with the alternatives available to you from 2026 forward - these include, but are not limited to, the new record registration form.

If your journal has an ISSN

We recommend you begin using the record registration form as soon as possible. Simply go to https://manage.crossref.org/records and sign in with your Crossref account credentials to register a journal article. You can also see a list of all the journal article records you have previously registered using our manual helper tools at https://manage.crossref.org/records/edit and edit their metadata using the form.

To help you make the switch from Metadata Manager, we will be hosting an interactive webinar on 13 November about how to transition to the new tool. Register here or look out for the recording, which will be shared in our events archive.

If your journal does not have an ISSN

The record registration form currently only supports ISSNs as journal identifiers. Title-level and volume/issue-level DOIs, which are at the core of how Metadata Manager handles journal metadata, have been the cause for some of the problems we have had over the years with that particular tool. Also, Crossref DOIs have always been intended primarily as citation identifiers, and entire journals/volumes/issues are very rarely cited. For that reason, we built the Record Registration Form such that it doesn’t support registering or using journal-level DOIs.

With that being said, if you do not (yet) have an ISSN for your journal for whatever reason, you can use our web deposit form to register your articles with journal DOI. If you do obtain an ISSN for your title later on, you can then simply begin using the record registration form from that point onward.

How will the new tool continue to be developed?

We will continue to work with our members and community to develop additional functionalities for the journal article form. Currently we are working on allowing relationships metadata to be registered using the form.

Ultimately, the goal is for the record registration form to become the one-stop shop for members who manually register and update their metadata. To this end, we are working on expanding the tool to cover additional record types - we have recently developed a prototype for registering books and chapters, and we will be looking to test this in the coming months with volunteers who are currently registering their book metadata via other avenues such as the web deposit form.

If you would like to support these efforts, or you have begun using the new tool and would like to share your feedback, come join the discussion in our community forum.

References

Bowman, S. (2021). Next steps for Content Registration. Crossref. https://doi.org/10.64000/30vzx-r5x16
Bowman, S. (2025). We’ve migrated to the cloud; we hope you didn’t notice (but maybe you did). Crossref. https://doi.org/10.64000/wd6rx-vpq73
Vale, P. (2022). Forming new relationships: Contributing to Open source. Crossref. https://doi.org/10.64000/cvq2e-q8t24

Announcing changes to REST API rate limits

Martyn Rittman — Wed, 05 Nov 2025 00:00:00 +0000

Our REST API makes all of the metadata we hold publicly available. It receives the majority of our API traffic, with around 1 billion hits per month. It’s one of the key ways that we fulfil our mission to make research objects easy to find, cite, link, assess, and reuse. From 1 December 2025, we will be revising the rate limits for the public and polite pools of the REST API to ensure that we can maintain a stable and reliable system, and that metadata is freely available to everyone.

We haven’t changed the rate limits since the REST API was launched in 2013. In the past five years, the number of requests to the REST API has tripled and the number of metadata records has increased by a third, from 120 million to around 180 million. This means an increase in the resources needed to run it, and we’ve seen periods of instability where we haven’t been able to keep the API available for all users. We have decided that it is the right time to revisit rate limits to check that they’re in line with what our technology can provide and what our community needs. As a result, we will apply the following for the public and polite pools:

Public pool:

Request type	Rate limit	Concurrency limit
Single record	5	1
List of records (queries, filters, etc.)	1	1

Polite pool:

Request type	Rate limit	Concurrency limit
Single DOI record	10	3
List of records (queries, filters, etc.)	3	3

The rate limit is the number of total requests that can be made per second. The concurrency limit is how many requests can be running at the same time. This means that for longer-running requests you may need to wait for previous requests to finish before you can make a new one.

Here are some examples of single records requests:

The second case here will be directed to the polite pool because an email is included using the ‘mailto’ parameter. And here are examples of requests that return lists of records:

The second and third examples here will use the polite pool.

Our guiding principle in making these changes is to keep all of the metadata available to everyone, all of the time. These changes to rate limits won’t restrict current users from accessing the metadata they want to retrieve, but it will make it easier for us to maintain the system now and in the future.

Which use cases do we support?

Our metadata has a broad range of applications. If you’re someone who uses the REST API, we’re glad that you are part of our community! Our mission includes making it easier to find, reuse, and assess scholarly research outputs. By using metadata, you’re helping us to fulfil that goal.

The main uses of the REST API fit into several categories. The new rate limits will continue to support these, among many others:

I have some metadata, what is the DOI?
I have a DOI, what is its metadata?
I want all of the metadata, just give me everything.
Research on a specific topic or subset of metadata, often refreshing the results every few weeks or months.

Rate limits can encourage responsible usage. The majority of API users make requests at a low rate and will not need to make any changes, however a few send spikes of large numbers of requests in a short space of time, sometimes making it difficult for others to access the service. These can be smoothed out by lower rate limits. Complex requests that search across large numbers of items put more pressure on our systems than requests for a single content item, so we have decided to set different rate limits for different types of request.

Who will be affected?

We estimate that the changes might affect around 40 users per week across the public and polite pools, and this is only for some of their requests. In all of the cases we’ve seen, the rate of requests could be slowed down and users would still be able to get the same results. In other words, the aim of these changes is to make the load on the API more predictable, not to reduce the total number of requests or amount of metadata transferred. No changes are being made to the Metadata Plus service or other APIs, such as the XML API and OAI-PMH endpoint.

Do I need to change how I use the API?

If you’re reading this, thank you! It’s clear that you want to be a considerate user of our services. Almost all users can continue to use the REST API in exactly the same way, you won’t need to change anything. Here is some general advice that will help you make the most of the service and ensure that you won’t encounter issues.

Use a mailto parameter. This gives you access to the polite pool meaning higher rate limits and meaning we can get in touch with you if needed. We’ll only use your address to contact you about your API requests.
Check the HTTP response status for your requests. This is always good practice and can help you identify malformed requests and where you reach rate limits.
Cache results to avoid repeatedly making the same requests. Most records don’t change on a regular basis. How often you update the cache will depend on what you are interested in, but most metadata fields rarely change.
If you are making a very high volume of requests or have very complex analysis to carry out, consider downloading the public data file which is made available once a year and contains all of our metadata. You can update it with recent additions using the REST API.
If you are relying on our metadata in a production service, Metadata Plus can provide more stability, support, and access to monthly snapshots of our entire database.

We have more tips and tricks for the REST API in our documentation. If you have questions, please join the conversation on our Community Forum.

Celebrating Noyam Journals’ Metadata Award

Johanssen Obanda — Tue, 04 Nov 2025 00:00:00 +0000

Noyam Journals, based in Accra, Ghana, was recently recognised for the completeness of its metadata through the Crossref Metadata Award, part of our 25th anniversary celebrations. Noyam was one of six publishers worldwide to receive the award and stood out as a leader among members of our Global Equitable Membership (GEM) Program.

The GEM Program supports publishers and organisations in low- and middle-income countries to participate in the global scholarly community by reducing barriers to membership and services.

Earlier this year, at our Crossref Accra event, representatives from Noyam spoke about how registering metadata with Crossref has expanded their readership worldwide. They also encouraged other publishers and institutions in Africa to utilise Crossref’s infrastructure to enhance the visibility and impact of their work.

Following their award, we spoke with Naa Kai Amanor-Mfoafo from Noyam Journals about their approach to metadata quality. She shares her reflections in the Q&A below.

What motivates your team to work towards high-quality metadata?

Our commitment towards high-quality metadata stems from our organisational goal to promote the dissemination of usable knowledge by publishing innovative and high-quality research content. Over the last five years, registering our metadata with Crossref has strengtheed authors’ trust as their institutions can verify quality through tools like Crossmark. For instance, many institutions use the Crossmark feature on our published articles to access the latest information about a scholarly article, including updates, corrections, or retractions.

Do you have a strategy for complete metadata?

We prioritise inclusion of ORCID IDs, Abstracts, and References as these increase visibility of our articles. We also include Affiliations, Licenses, and Crossmark, and we use Similarity Check to help ensure research integrity.

As part of our team structure, we have a dedicated staff member responsible for ensuring that every article is assigned a Crossref DOI on the same day it is published online. Our in-house system supports this process, allowing us to capture and register all the key metadata efficiently.

What impact of good metadata can you see on your organisation?

Good metadata has made a real difference for our organisation. It has helped increase the visibility and discoverability of our journal articles, making it easier for researchers and readers around the world to find and cite our work. We’ve noticed more engagement with our publications since improving our metadata, which encourages us to keep strengthening the quality of the information we register.

Have you encountered any challenges in curating or improving your metadata, and how did you address those?

One major challenge we’ve faced is discovering errors in previously uploaded metadata, and we haven’t yet established a systematic process for correcting them. We’re currently working to improve our workflow to help ensure the correctness of our metadata to follow Crossref’s recommended best practices.

Have your efforts around metadata led to real benefits for your community?

Our authors appreciate the fact that their ORCID profiles are automatically updated with their published articles once they are assigned DOIs from Crossref. They are, of course, also enjoying increased visibility of our published articles globally.

Looking ahead, how are you planning to build on your metadata quality?

We need to stay informed about developments at Crossref. Once in a while, we visit the Crossref website or participate in a webinar to stay informed. For example, a few months ago, we got to know that a new record registration form had been initiated for metadata uploads through the documentation section on the Crossref website.

We advise others who are new to Crossref to focus on consistency. Ensure your organisational system includes staff dedicated to keeping your metadata up to date. Secondly, feel free to seek technical support from the Crossref team when the need arises.

New tool to report on completeness of open research information globally

Kornelia Korzec — Tue, 21 Oct 2025 00:00:00 +0000

Wednesday 22nd October 2025—Crossref, the open scholarly infrastructure nonprofit, today releases an enhanced dashboard showing metadata coverage and individual organisations’ contributions to documenting the process and outputs of scientific research in the open. The tool helps research-performing, funding, and publishing organisations identify gaps in open research information, and provides supporting evidence for movements like the Barcelona Declaration for Open Research Information, which encourages more substantial commitment to stewarding and enriching the scholarly record through open metadata.

Crossref’s Participation Reports now offer expanded features and provide full coverage of all members and all resource types registered with Crossref DOIs (Digital Object Identifiers)—over 175 million records representing a significant share of global research production from organisations in 164 countries. Each of Crossref’s 23,000 members has a dashboard to visualise their metadata contributions, display coverage of key information for scholarly works, and get actionable feedback via a gap report that specifies records that need enrichment, all helping to make more transparent the work that goes into creating and curating the scholarly record.

For any Crossref member—whether journal publisher, research funder, university, or museum—coverage of up to 11 key elements is public and visible to everyone, including: references, abstracts, ORCID iDs, affiliation strings, ROR IDs, Open Funder Registry IDs, funding award numbers, text-mining URLs, licence URLs, Similarity Check URLs (for text-based plagiarism checking) and the presence of a Crossmark policy, indicating the organisation’s commitment to declare corrections and retractions. These metadata elements provide greater context and visibility for research objects such as journal articles and preprints, grants and awards, books and book chapters, standards, datasets, conference papers and various ‘other’ content such as scholarly blogs, images, and even physical museum artefacts.

Mochammad Tanzil Multazam, Library Director of Universitas Muhammadiyah Sidoarjo, and Secretary of the Supervisory Board of Relawan Jurnals, says, “As a sponsoring organisation for several thousand small publishers across Indonesia, we support Crossref members to register complete metadata for their works. Despite time and resource constraints, this new actionable open report on key metadata elements will help drive improvements in the information they share for their publications. This has wide-reaching implications for the visibility of that research and trust among the community, and therefore has the potential to support Indonesian scholarship in the global context.”

Lena Stoll, Program Lead at Crossref, explains, “We are happy to have extended participation reports to cover more diverse record types, including grants, datasets, dissertations, and more, and to make it easier for our members to act on their ongoing improvements to enrich their records and build towards the vision of an open and more complete Research Nexus.”

Ludo Waltman, Scientific Director and Professor of Quantitative Science Studies at the Centre for Science and Technology Studies (CWTS) at Leiden University, comments, “As a representative of the researcher and metascience communities, this data is of great importance for us to analyse the trends and effects of global research activity. Crossref is one of the main driving forces in open infrastructure, and its commitment to supporting metadata completeness through this open reporting dashboard is a significant step for the open research information movement.”

Access Crossref Participation Reports and search for any Crossref member organisation.

Participation report for a typical Crossref member, Universidad La Salle Arequipa in Peru

About Crossref

Crossref runs an open infrastructure to link research objects, entities, and actions, creating a lasting and reusable scholarly record that underpins open science. Together with their 23,000 members in 4 Crossref drives metadata exchange and supports nearly 2 billion monthly API queries, facilitating global research communication, for the benefit of society.

Integrating grant metadata for seamless research interconnectivity at FCCN|FCT

Rocío Gaudioso Pedraza — Wed, 15 Oct 2025 00:00:00 +0000

Click here for the version in Portuguese

Welcome back to our series of case studies of research funders using the Grant Linking System. In this interview, I talk with Cátia Laranjeira, PTCRIS Program Manager at FCCN|FCT, Portugal’s main public funding agency, about the agency’s approach to metadata, persistent identifiers, Open Science and Open Infrastructure. With a holistic approach to the management, production and access to information on science, FCCN|FCT’s decision to implement the Grant Linking System within their processes was not simply a technical upgrade, but a coordinated effort to continue building a strong culture of openness. With the mantra “register once, reuse always”, FCCN|FCT efforts to embrace open funding metadata was only logical.

Could you introduce your organisation?

We are FCCN, the digital services of the FCT, the Foundation for Science and Technology, which is the main public funding agency in Portugal. FCT supports research and innovation in Portugal through multiple funding instruments targeting researchers, projects, institutions and international partnerships. FCCN is focused on providing digital services to the scientific and academic community in Portugal.

I am the manager of a program called PTCRIS, part of the FCCN, within the ‘Scientific Knowledge’ pillar of the unit. PTCRIS is a broad program, whose main goal is to fulfill the mantra ‘register once, reuse always’. We aim to develop an integrated ecosystem of scientific information, so all the projects we run have this main goal and that’s what we work towards. We develop infrastructure and added-value services, such as the scientific curriculum vitae management platform and an indicator system that exposes information of all the funding that supports research and innovation in Portugal.

What motivated you to join Crossref?

We had already adopted ORCID and we also developed a national PID, connected to the citizen card additional to ORCIDs. In 2015 we adopted the ISNI and we also had DOIs for research outputs. So we were clearly missing one piece, which was metadata for funding. At the same time we started developing a national infrastructure on science and technology funding, to have an aggregated and holistic view of the funding that is distributed in Portugal.

Before that the information was scattered across different databases and websites from many different funders, so we organised and aggregated this information into a platform called SciPROJ, which brings together all the information on scientific funding in one place, with quick and flexible access. But we didn’t have persistent identifiers for grants, and this was at the same time that Crossref started to build the Grant Linking System, so we were actually one of the first organisations to join, and in 2023 we had a pilot, where we registered 6000 grants, and we have been registering funding metadata ever since.

Can you tell us about your experience using the Grant Linking System?

The beginning of the pilot was the most critical stage of the process; some effort was needed to map our data models to the Crossref grant metadata schema. FCCN wasn’t in a bad position to do this since we already had all that information in a registry and it was well organised, we just had to map them to make sure that the information we had could be shared following the Crossref metadata schema and best practices. It has been two years since the pilot, which puts us in phase 2 of the implementation of the system. During the pilot we concentrated on registering both historical and current grants’ metadata, in the current phase, we are focusing on current grants’ metadata.

What do you find useful about registering grant metadata with Crossref?

Although this is the very beginning of this journey, we envision a world where we have the ability to link grants to any other object and entity that comprises the ecosystem: people that execute that funding, projects, institutions, outputs. Outputs are something particularly important to us, like for many other funders, because we want to be able to monitor the impact of our funding and that is something that is always at the back of our mind.

We are actually developing more and more services that aim to show how these links can be very useful to retrieve information from the system. For example, we are developing an indicator system that is focusing on the funding but also on the outputs and the links between the two. We are also monitoring OA trends, to see how FCT funding is contributing to Open Science initiatives.

Additionally, our OA policy was recently launched but we currently don’t have any system that allows us to track policy compliance. We are working towards that, but to achieve this it is absolutely fundamental that grants are linked to the outputs through metadata.

What are your hopes for the GLS and greater transparency in funding metadata in general?

The interconnectivity and interoperability of entities and objects, which is something that the field of scientific information management has always wanted to do, but that it’s very difficult to do. There have been attempts in the past to achieve this using information from the acknowledgement sections of publications, but this is fairly inefficient and there needs to be more structure to it. A critical piece of this puzzle would be to influence publishers, manuscript submission platforms to facilitate the systematic sharing of grant IDs and grant metadata by design. I think this is something that is still missing and that I would like to see happening soon.

Has anything surprised you while implementing the Grant Linking System?

Something that we have seen that was surprising was that researchers, who in general are not that concerned about PIDs, when it came to grant IDs, they would ask us proactively what the Crossref grant ID for their award was! It was very refreshing to see that we didn’t need to do any advertising to socialize Crossref grant IDs among our grant holders. I think that tells you about the high level of awareness there is within our community of the importance of the Crossref grant ID, using it and putting it in the acknowledgment section of their publications.

Based on your experience, what would be your advice for colleagues from other research funders?

I would say go for it! The more the merrier! This is like any other similar information system – it only works if there are enough people using it, registering grants metadata that facilitate the links between objects.

It is a very easy process to get into. Once you map the metadata schema to your own data it’s not a technically difficult thing to do. For us it’s an automated process that runs very smoothly, from grant registration to communicating this information to grant holders. We can see this in action in this example: the grantee published an article that acknowledges their funding through Crossref’s grants IDs or funding received being acknowledged in the website of a Research Center.

If you could change something about the GLS or how the grant metadata you register is used, what would it be?

I would love to have access to a visualization of grants’ metadata, how many outputs are linked to, and how they relate to other objects and entities. That would really give us a clearer understanding of the impact that our funding is having. We’d also love to see better integration between Crossref and ORCID for grants—just like it works for publications. Ideally, when a grant is registered and linked to a researcher, they’d be notified and could easily add it to their ORCID record. This would allow the information to flow seamlessly into their national CV via PTCRISsync, ensuring consistency and reducing manual work.

We are grateful to Cátia Laranjeira and FCT|FCCN for sharing their perspective and long-standing experience in this space. Their experience highlights the role that funding metadata plays in an interconnected and complete research and funding ecosystem.

Version in Portuguese

Translation by Edilson Damasio

Integração de metadados de financiamento pela FCCN|FCT para reforçar a interoperabilidade da informação sobre a atividade científica

Bem-vindo(a) de volta à nossa série de estudos de caso sobre instituições financiadoras de investigação que utilizam o Grant Linking System. Nesta entrevista, conversamos com Cátia Laranjeira, gestora do programa PTCRIS na FCCN|FCT, a principal agência pública de financiamento à ciência em Portugal, sobre a abordagem da instituição aos metadados, identificadores persistentes, Ciência Aberta e Infraestruturas Abertas.

Com uma abordagem holística à gestão, produção e acesso à informação científica, a decisão da FCCN|FCT de integrar o Grant Linking System nos seus processos não representou apenas uma evolução técnica, mas sim um esforço coordenado para consolidar uma forte cultura de abertura. Sob o lema “registar uma vez, reutilizar sempre”, a adoção de metadados abertos de financiamento pela FCCN|FCT foi um passo natural e coerente com essa visão.

Poderia apresentar a sua organização?

A FCCN é a unidade de serviços digitais da FCT — Fundação para a Ciência e a Tecnologia, a principal agência pública de financiamento à ciência em Portugal. A FCT apoia a investigação e a inovação através de diversos instrumentos de financiamento dirigidos a investigadores, projetos, instituições e parcerias internacionais. A FCCN dedica-se a disponibilizar serviços digitais à comunidade científica e académica portuguesa.

Na FCCN|FCT, sou gestora do PTCRIS, um programa integrado no pilar do Conhecimento Científico. O PTCRIS é um programa abrangente que tem como objetivo central concretizar o princípio “registar uma vez, reutilizar sempre”. Trabalhamos para desenvolver um ecossistema integrado de informação científica, e todos os projetos que conduzimos convergem nesse propósito. Desenvolvemos infraestruturas e serviços de valor acrescentado, como a plataforma de gestão do currículo científico CIÊNCIAVITAE e um sistema de indicadores que disponibiliza informação sobre todos os financiamentos que apoiam a investigação e a inovação em Portugal.

O que motivou a adesão à Crossref?

A FCCN tinha já adotado o ORCID e desenvolvido um identificador nacional persistente (PID), ligado ao cartão de cidadão, como complemento aos ORCIDs. Em 2015, adotámos o ISNI e também tínhamos DOIs para a produção científica. Ficava claramente em falta um elemento: os metadados de financiamento.

Ao mesmo tempo, iniciámos o desenvolvimento de uma infraestrutura nacional de financiamentos de ciência e tecnologia, com o objetivo de ter uma visão agregada e holística do financiamento que suporta a investigação e inovação em Portugal.

Antes disso, a informação estava dispersa por diferentes bases de dados e websites de múltiplos financiadores. Organizámos e agregámos esta informação numa plataforma chamada SciPROJ, que reúne toda a informação sobre financiamentos científicos num único local, com acesso rápido e flexível. No entanto, ainda não existiam identificadores persistentes para os financiamentos, coincidindo com o momento em que a Crossref começou a desenvolver o Grant Linking System. Fomos, assim, uma das primeiras organizações a aderir. Em 2023, realizámos um piloto com 6.000 financiamentos registados, e desde então temos vindo a registar continuamente os metadados de financiamento.

Pode falar-nos sobre a sua experiência com o Grant Linking System?

A FCCN iniciou a utilização do Grant Linking System com um piloto, que constituiu a fase mais crítica do processo. Foi necessário algum esforço para mapear os nossos modelos de dados para o esquema de metadados de financiamentos da Crossref. A FCCN estava, no entanto, bem posicionada para isso, uma vez que já dispunha de toda a informação num registo organizado; o passo necessário foi apenas assegurar que esta informação pudesse ser partilhada de acordo com o esquema de metadados da Crossref e as melhores práticas.

Já passaram dois anos desde o piloto, o que nos coloca na fase 2 de implementação do sistema. Durante o piloto, focámo-nos no registo de metadados de financiamentos históricos e atuais; na fase atual, estamos focados no registo de metadados de financiamentos atuais.

O que considera útil no registo de metadados de financiamento na Crossref?

Embora este seja ainda o início deste percurso, a FCCN idealiza um ecossistema em que seja possível ligar financiamentos a qualquer outro objeto ou entidade do sistema científico — projetos, pessoas que executam esses financiamentos, instituições onde são executados e produções científicas que dele resultam. Estes últimos são particularmente importantes para nós, como para muitos outros financiadores, pois queremos monitorizar o impacto do financiamento — uma preocupação que está sempre presente no nosso trabalho.

Estamos, de facto, a desenvolver serviços que demonstram o valor dessas ligações para a recuperação de informação no sistema. Um exemplo é o sistema de indicadores em desenvolvimento, que se centra nos financiamentos, nas produções científicas e nas relações entre ambos. Estamos também a acompanhar as tendências de Ciência Aberta, para perceber de que forma o financiamento da FCT está a contribuir para as iniciativas de Open Science.

Além disso, a política de Acesso Aberto da FCT foi recentemente lançada, mas ainda não dispomos de um sistema que permita monitorizar a conformidade com essa política. Estamos a trabalhar nesse sentido, mas para o concretizar é absolutamente essencial que consigamos associar inequivocamente os financiamentos às produções científicas através de metadados.

Quais são as suas expectativas para o GLS e para uma maior transparência dos metadados de financiamento em geral?

A interconectividade e interoperabilidade entre entidades e objetos é algo que a área da gestão de informação científica sempre procurou alcançar — embora seja um objetivo difícil de concretizar. No passado, houve várias tentativas nesse sentido, recorrendo à informação presente nas secções de agradecimentos das publicações, mas esse método revelou-se pouco eficiente e carece de uma estrutura mais sistemática.

Uma peça essencial deste puzzle seria influenciar as editoras e as plataformas de submissão de manuscritos a facilitarem a partilha sistemática de identificadores e metadados de financiamento. Este é um elemento que ainda falta concretizar, mas que gostaríamos de ver implementado em breve.

Algo o surpreendeu durante a implementação do Grant Linking System?

Algo que nos surpreendeu durante a implementação do Grant Linking System foi a reação dos investigadores. Normalmente, os investigadores não demonstram grande preocupação com identificadores persistentes (PIDs), mas, neste caso, começaram a procurar ativamente o identificador Crossref do seu financiamento! Foi muito positivo perceber que não foi necessário fazer qualquer esforço de divulgação para promover o uso dos Grant IDs da Crossref entre os beneficiários dos financiamentos. Isso mostra o nível de consciência existente na comunidade científica sobre a importância destes identificadores — usá-los e incluí-los na secção de agradecimentos das publicações.

Com base na sua experiência, qual seria o seu conselho para colegas de outros financiadores de investigação?

Com base na nossa experiência, o conselho para outros financiadores seria simples: avancem! Quanto mais, melhor! Este tipo de sistema de informação só é verdadeiramente eficaz quando há muitas entidades a utilizá-lo, a registar metadados de financiamento e a criar ligações entre objetos.

É também um processo simples de implementar. Uma vez feito o mapeamento entre o esquema de metadados e os dados internos da instituição, não há grandes desafios técnicos. No nosso caso, o processo é totalmente automatizado e flui de forma eficiente, desde o registo do financiamento até à comunicação dessa informação aos beneficiários. É possível ver isso em prática em vários exemplos — desde artigos que reconhecem o financiamento através dos Grant IDs da Crossref até ao reconhecimento do apoio financeiro nos sites dos centros de investigação.

Se pudesse alterar algo no GLS ou na forma como os metadados dos subsídios que regista são utilizados, o que seria?

Se pudéssemos mudar algo no Grant Linking System ou na forma como os metadados de financiamento são utilizados, gostaríamos de ter acesso a uma visualização interativa que mostrasse quantas produções científicas estão ligadas a cada financiamento e como esses se relacionam com outras entidades e objetos. Isso permitiria compreender de forma muito mais clara o impacto real dos financiamentos.

Gostaríamos também de ver uma melhor integração entre a Crossref e o ORCID no que respeita aos financiamentos — tal como já acontece com as publicações. Idealmente, quando um financiamento fosse registado e associado a um investigador, este seria notificado e poderia adicioná-lo facilmente ao seu registo ORCID. Assim, a informação fluiria automaticamente para o currículo nacional via PTCRISsync, garantindo consistência e reduzindo o trabalho manual.

Agradecemos à Cátia Laranjeira e à FCT|FCCN por partilharem a sua perspetiva e longa experiência neste domínio. A sua experiência destaca o papel que os metadados de financiamento desempenham num ecossistema de investigação e financiamento interligado e completo.

Enhancing repository integration with Crossref

Johanssen Obanda — Mon, 13 Oct 2025 00:00:00 +0000

Repositories are home to a wide range of scholarly content; they often archive theses, dissertations, preprints, datasets, and other valuable outputs. These records are an important part of the research ecosystem and should be connected to the broader scholarly record. But to truly serve their purpose, repository records need to be connected to each other, to the broader research ecosystem, and to the people behind the research. Metadata is what makes that possible. Enhancing metadata is a way to tell a fuller, more accurate story of research. It helps surface relationships between works, people, funders, and institutions, and allows us as a community to build and use a more connected, more useful network of knowledge - what Crossref calls the ‘Research Nexus’.

The challenge many repositories face is that metadata can be incomplete, inconsistent, or disconnected. Think of references without DOIs, authors without ORCID iDs, or research outputs that aren’t linked to funding. To address this, Crossref provides a range of services that repositories can use to improve the quality and interoperability of their metadata. Our REST API, which is openly and publicly accessible, allows repositories to retrieve structured metadata, such as DOIs, references, abstracts, contributors, ORCID iDs, and funder information, that can be used to enrich and update their local records. For repository members, with the Cited-by service and reference linking, repositories can also show how works are being cited and interconnect related content. The Grant Linking System (GLS) enables the clear indication of which research outputs are linked to specific grants, and funding bodies themselves are connected using Open Funder Registry and ROR, adding another layer of context. With Crossmark, repositories can flag updates, corrections, or retractions to ensure transparency and trust in the scholarly content they host.

Enriching repository metadata using Crossref is a practical and empowering step toward making your records more discoverable, complete, and connected. The process is simple, and you don’t need to be a developer to get started. Repositories can query the Crossref REST API using a DOI or basic metadata like a title or author name, and receive structured, reliable information. This can include full author lists, ORCID iDs, reference lists, funding data, and licensing terms. You can then match and merge this data into your repository records. Adding Crossref DOIs to your metadata enables persistent linking, helping users trace research outputs back to their stewards. It also helps create rich relationships between articles, datasets, software, grants, and other research objects. All of this supports the FAIR principles and contributes to a more connected and reusable scholarly record. And because Crossref’s infrastructure is open, any repository can access and use this metadata to improve the quality, visibility, and long-term value of their collections.

Steps to enrich repository metadata with Crossref:

Query the REST API using DOIs or basic metadata (visit our API learning hub to learn how to use the Crossref API)
Retrieve structured metadata like authors, ORCID iDs, funders, affiliations, ROR IDs, licenses, grants, and references
Map and merge with your local records
Display persistent links to all kinds of research objects using Crossref DOIs
Support FAIR by including open, structured, and complete metadata

Across the repository community, several institutions are already integrating Crossref metadata in meaningful ways to enrich their records and improve discoverability. DSpace users can enrich their deposits by using the platform’s “Live Import” feature, which allows them to pull in Crossref metadata, such as titles, authors, and DOIs, directly into items during the submission process. A deeper integration between DSpace and Crossref is currently in development. HAL in France uses the Crossref API to complete and standardise references, making its content more consistent and connected (hal.archives-ouvertes.fr). SciELO, a key open access platform in Latin America, leverages Crossref DOI links and citation metadata to strengthen the visibility of its journals (scielo.org). In Canada, the University of Saskatchewan’s eCommons repository queries the Crossref API to enhance metadata accuracy and link records to the broader scholarly graph (ecommons.usask.ca). The Apollo repository at the University of Cambridge uses Crossref to connect theses and articles to their published versions, creating a clearer picture of research outcomes (repository.cam.ac.uk). Zenodo, hosted by CERN, draws on Crossref metadata to link deposited datasets and software with related publications, supporting transparency and reuse (zenodo.org).

These examples show how even modest integrations with Crossref can lead to substantial gains in metadata quality, interoperability, and global discoverability. Altogether, these activities and organisations are enhancing the Research Nexus, enriching a scholarly graph for the benefit of all.

Want to learn more? You can explore the presentation slides (PDF) from Open Repositories 2025, which cover the Crossref API and its capabilities, how repositories can use it to query and enrich metadata, the benefits for repository managers, researchers, and funders, as well as recent updates to our metadata schema.

Piecing together the Research Nexus: uncovering relationships with open funding metadata

Rocío Gaudioso Pedraza — Wed, 01 Oct 2025 00:00:00 +0000

The Crossref Grant Linking System (GLS) has been facilitating the registration, sharing and re-use of open funding metadata for six years now, and we have reached some important milestones recently! What started as an interest in identifying funders through the Open Funder Registry evolved to a more nuanced and comprehensive way to share and re-use open funding data systematically. That’s how, in collaboration with the funding community, the Crossref Grant Linking System was developed. Open funding metadata is fundamental for the transparency and integrity of the research endeavour, so we are happy to see them included in the Research Nexus.

As emphasised recently by Hans de Jonge from NWO, funding metadata’s value is in the transparency of the relationships it enables. The system is powered by the collective action of the research community– including research funders – that registers open metadata with Crossref, making these relationships possible. With close to 180,000 grant records in our corpus we wanted to know how far they reach and what story they tell.

In March 2022, we developed an approach for linking grants to research outputs and analysed how many such relationships could be established. Now we’re able to present the latest dataset that contains relationships between grants and research outputs, both those deposited by Crossref members and discovered by an automated matching strategy. It includes data deposited up to the end of July 2025.

This work is part of our ongoing Metadata Matching project.

What exactly is in this new open dataset of grant<>output relationships?

The dataset contains 250,163 total funding relationships between grants and research outputs.
We welcomed a number of funders, such as the Dutch Research Council and Fonds de Recherche du Quebec, which together registered almost 27,000 grants in the past year.
It’s clear that the more grant metadata is registered the more funding relationships we can uncover.
The percentage of relationships that are registered explicitly by Crossref members providing grants IDs in funding information has grown from less than 0.1% in 2023 to 1% (modest numbers but amazing growth!).

The methodology

We created a dataset of relationships between grants and research outputs by analysing their metadata in several ways. A relationship is included in the dataset if at least one of the following conditions is met:

A relationship was explicitly deposited by a Crossref member through a finances or isFinancedBy relationship: 488 (0.2%) relationships
The research output contains the grant DOI within the award number in the funding metadata: 2,003 (0.8%) relationships
The award numbers in the grant and the research output are similar, and the associated funding organisations are either the same, or one is the sub-organisation of the other: 247,672 (99%) relationships

The dataset includes data deposited until the end of July 2025 and contains 250,163 total relationships.

The code used to generate the dataset is available in our GitLab repository.

The results

As you can see in the graph below, the number of relationships grant-research output continues to grow as the number of grants records Crossref members register with us increases.

Figure 1: Cumulative totals of grants, linked grants, research outputs, and grant–research output relationships from 2019 to 2025. Stepwise increases correspond to the addition of major funder datasets, including Wellcome (2020), OSTI (2021), JST (2022), the European Union (2022), the Austrian Science Fund (2023), and the Fonds de recherche du Québec (2025).

Looking at the numbers broken down by grant registrants we can see that the more grants registered the more relationships can be uncovered. The table below shows funders who have at least 1,000 total grants registered and for whom at least 10% of their registered grants are linked to research outputs, showing the number of relationships, grants, linked grants and linked research outputs (sorted by the percentage of linked grants), and compared with the data from the 2023 analysis (where available) to see how the uptake of open funding metadata is evolving.

Funder	Relationships		Linked research outputs		Grants		Number of linked grants		Percentage of linked grants
Funder	2023	2025	2023	2025	2023	2025	2023	2025	2023	2025
European Union	86,979	128,572	78,576	114,491	39,703	53,473	14,860	21,402	37.4%	40%
Japan Science and Technology Agency	19,549	30,728	16,265	25,003	9,923	11,866	2,609	3,900	26.3%	32.9%
Wellcome	34,254	45,596	25,720	33,783	17,547	19,929	5,238	6,206	29.9%	31.1%
American Cancer Society	50	604	49	586	380	1,162	34	277	8.9%	23.8%
American Heart Association (AHA)	40	1,040	38	935	598	2,764	30	621	5%	22.5%
Fundacao para a Ciencia e a Tecnologia	0	27,915	0	15,681	5	17,422	0	3,793	–	21.8%
Austrian Science Fund (FWF)	–	10,387	–	7,459	–	19,576	–	2,712	–	13.9%

Table 1: Comparison between data from 2023-07-31 and 2025-07-31 of a number of Crossref members registering grants. It shows the number of relationships, grants, linked grants and linked research outputs, sorted by the percentage of linked grants.

We encourage funders to join as members once they have determined the means of effective implementation of the GLS within their processes. By further analysing metadata of matched outputs, funders have the opportunity to monitor compliance with their policies and learn more about the impact of their programs.

Following through funders’ Open Science commitments

The relationships showcased above and in the recent analysis are powered by open funding metadata. Open funding metadata plays a central role in building a transparent, accountable and high integrity research environment by making visible the connections between the funding, grantees, research outputs, and their impact. Funders’ openness mandates and Open Science commitments emphasize the importance of traceability in the research process, so ensuring that the support given-whether financial or otherwise-can be systematically recorded and shared is instrumental. Openness is also part of the strategic plans of institutions such as the International Science Council, who has explicitly called for greater transparency in funding as a way to strengthen trust in science and counter misinformation. At the same time, initiatives such as the Barcelona Declaration on Open Research Information underscores the benefits of open, reusable funding metadata for monitoring, evaluation and assessment of research and researchers.

Crossref’s Grant Linking System offers funders’ a way to demonstrate a commitment to openness, modeling the standards they expect of the research community they support, while creating a more robust, trustworthy and collaborative research ecosystem.

Economy of scale: unlocking relationships with Crossref

Crossref houses millions of records, from the ubiquitous research articles and preprints, to books, peer review records, technical reports, datasets – you name it. Our members not only register, but also regularly update their metadata as new or corrected information becomes available. Our matching workflows allow us to make visible the hidden relationships and complete and improve the metadata records by adding new and reciprocal assertions.

This analysis shows the unique value of registering funding metadata with Crossref and adding an essential piece to the Research Nexus puzzle. The relationship metadata allows the funding that underpins the research process to be connected, and contextualise scattered data points, acting as an anchor that links publications, people, and other research outputs. This is made possible by the impressive number of records continuously being registered by more than 23,000 member organisations, and by the increasing availability of funding information in the system with more research funders joining in and registering their grant metadata with us.

Next steps

As we welcome more and more funders to the GLS, we, collectively, continue to complete the Research Nexus, record by record, field by field. The more awards we have in our corpus the more relationships we’ll uncover, so we’ll keep making these analyses periodically to make sure we don’t miss them.

But it is not all on us. We are working towards a vision where Crossref Grant IDs are business as usual – where funders register their awards, grantees are aware of them and share them with publishers, and those publishers share them back with us when registering their content – closing the loop organically. We continue working on making this easier. In the upcoming works schema update a specific Crossref Grant ID field will be added in the funding information, alongside Award ID (for an internal identifier).

Crucially, as the momentum of adoption among funders increases, and thousands of Crossref Grant IDs are available in the system, we are working with all members to raise their attention to the importance and desirability of funding metadata, so inclusion of that information in metadata of all works increases and consequently, the percentage of relationships asserted by Crossref members can grow.

This matching analysis is just one example of what we do to enrich metadata to highlight relationships among works, individuals, institutions, and actions. Earlier this year, we launched the Metadata Matching project, which is a major effort to rebuild our matching workflows using modern software development and data science practices. As part of the project, we plan to expose additional matched relationships between grants and research outputs in our REST API, alongside those deposited by our members. We’ll keep you updated as we go along!

Read more about metadata matching in the blog series:

Innovation in scientific publishing and its implications for Crossref DOI registration practices - Request for input

Ludo Waltman — Thu, 25 Sep 2025 00:00:00 +0000

Lots of exciting innovations are being made in scientific publishing, often raising fundamental questions about established publishing practices. In this guest post, Ludo Waltman and André Brasil discuss the recently launched MetaROR publish-review-curate platform and the questions it raises about good practices for Crossref DOI registration in this emerging landscape.

Digital Object Identifiers (DOIs) are unique identifiers commonly assigned to research outputs such as journal articles, preprints, peer review reports, and datasets. The DOI of a research output allows the output to be identified online in a persistent way, even when the underlying publishing infrastructure changes (e.g., a journal moving from one publisher to another).

There are several DOI registration agencies. Most of the larger scientific publishers work with Crossref, and so do many preprint servers, and therefore our focus in this post is on Crossref. Crossref also keeps track of metadata associated with research outputs, such as the title, authors, and publication date of an output, and it makes this metadata openly available via APIs for all kinds of services to ingest and reuse. Because indexing, discovery, and evaluation tools rely heavily on this metadata, content registration practices and metadata design choices can have major effects on the visibility and findability of research outputs and on analytics used to monitor and assess research outputs and their contributors.

For the most common types of research outputs, such as journal articles and preprints, a broad consensus has emerged over the past decades on good practices for DOI registration. Such consensus means that articles are assigned the record type ‘article’ in their Crossref metadata. Likewise, many preprint servers register DOIs for preprints at Crossref, with the record type ‘preprint’ in the metadata. (The arXiv preprint server is an exception; it registers DOIs for preprints with DataCite rather than Crossref.)

For innovative new publication platforms, however, good practices for DOI registration are less clear. The approaches to scientific publishing offered by these platforms often do not fit neatly into established ways of working. For instance, for some of these platforms, the traditional distinction between peer-reviewed articles published in scientific journals and non-peer-reviewed articles posted on preprint servers is no longer applicable. This raises fundamental questions about suitable DOI registration practices for new approaches to scientific publishing.

MetaROR

The MetaROR (MetaResearch Open Review) platform, launched in November 2024 by the Research on Research Institute (RoRI) and the Association for Interdisciplinary Meta-Research and Open Science (AIMOS), offers an example of the challenge of developing appropriate DOI registration practices for new publishing models.

Inspired by similar initiatives such as eLife and others, MetaROR adopts the so-called publish-review-curate model. Authors first publish their article on a preprint server and then submit it to MetaROR. MetaROR then organizes an open peer review process for the article. Review reports are published on the MetaROR platform, along with a copy of the preprinted article and an editorial assessment. Rather than a simple binary decision (accept vs. reject), an editorial assessment is a short one-paragraph statement summarizing the strengths and weaknesses of an article. Each review report and each editorial assessment has its own DOI registered at Crossref. In this way, review reports are treated as first-class research outputs that can, for instance, be indexed in scientific literature databases and can be cited in other research outputs.

For an article submitted to MetaROR, the publication of the review reports, the editorial assessment, and a copy of the article itself concludes MetaROR’s publish-review-curate process. The authors of the article may revise their work in light of the feedback received, and MetaROR may review the revised article. However, there is no requirement that revisions must be made. The primary aim of the review reports and the editorial assessment published on the MetaROR platform is to offer context for readers of the article, helping readers understand the strengths and weaknesses of the article.

Crossref DOI registration

Registration of DOIs for open peer review reports is increasingly common. By registering Crossref DOIs for review reports and editorial assessments, MetaROR enables reviewers and editors to be recognized for their contributions. But what about recognition for authors?

A crucial element in MetaROR’s philosophy is that authors of articles peer-reviewed by MetaROR deserve to be recognized in a similar way as authors of articles published in traditional peer-reviewed journals. One way to promote appropriate recognition for authors of articles peer-reviewed by MetaROR is to ensure that articles on the MetaROR platform, just like articles in peer-reviewed journals, have their own DOI. While this may seem straightforward to arrange, it actually raises two non-trivial questions about good practices for Crossref DOI registration:

For each article on the MetaROR platform, there is a corresponding article on a preprint server. Is it acceptable to have two Crossref DOIs, one registered by the preprint server and one registered by the MetaROR platform, for essentially the same article?
If Crossref DOIs are registered for articles on the MetaROR platform, should the articles be assigned the type ‘article’ or the type ‘preprint’ in their Crossref metadata, or something else entirely?

On the first question, it could be argued that having two Crossref DOIs for the same article is problematic and that MetaROR, therefore, should not register DOIs for articles on its platform. Alternatively, one could argue that an article on the MetaROR platform differs in a meaningful way from the corresponding article on a preprint server, since the article on the MetaROR platform has been enriched with peer review reports and an editorial assessment, similar to the way an article in a peer-reviewed journal may be seen as an enriched version of the corresponding article on a preprint server. This line of reasoning would justify registering DOIs for articles on the MetaROR platform.

On the second question, the argument could be made that articles on the MetaROR platform should be assigned the type ‘preprint’ in their Crossref metadata, since the type ‘article’ is intended for articles in journals and MetaROR does not consider itself to be a journal (in fact, MetaROR works with partner journals to enable articles peer-reviewed by MetaROR to be published in journals) and does not certify articles in the way journals do (i.e., MetaROR does not make accept/reject decisions). On the other hand, one could argue that articles on the MetaROR platform should be assigned the type ‘article’, since the peer-reviewed nature of articles in journals is typically seen as the key factor distinguishing these articles from articles on preprint servers. Articles on the MetaROR platform have been peer-reviewed, and in that sense, they resemble articles in journals. A third line of reasoning could be that neither the ‘preprint’ nor the ‘article’ type is fully appropriate for articles on the MetaROR platform and, consequently, that there is a need for a new Crossref record type.

What is your take?

The MetaROR team, in consultation with Crossref, will need to decide how to deal with the two questions discussed in this blog post. After some preliminary conversations between the MetaROR team and Crossref, we decided to share these questions more widely to solicit input from the broader community. We invite you to share your thoughts on the two questions, either by posting a comment on this blog post or by reaching out to us on social media or by email. Community perspectives will help shape good practices not only for MetaROR but also for other publish-review-curate initiatives facing similar questions. We look forward to hearing from you!

Ludo Waltman and André Brasil are members of the editorial team of MetaROR. Ludo and André are grateful to Ginny Hendricks at Crossref for valuable discussions about the issues raised in this blog post.

Crossref and PKP enter new partnership phase to support richer and more inclusive metadata

Kornelia Korzec — Mon, 22 Sep 2025 00:00:00 +0000

Crossref and the Public Knowledge Project (PKP) have been working closely together for many years, sharing resources and supporting our overlapping communities of organisations involved in communicating research. Now we’re delighted to share that we have agreed on a new set of objectives for our partnership, centred on further development of the tools that our shared community relies upon, as well as building capacity to enable richer metadata registration for organisations using the Open Journal Systems (OJS).

Crossref is working towards the vision of a rich and open network underpinning global scholarship, making relationships between works, people, institutions, and actions visible, thanks to the thread of metadata – the research nexus. This vision depends upon participation of research communication organisations coming from all parts of the world, disciplines, and languages. Working with PKP towards making tools for metadata registration more comprehensive, accessible, and easier to use is a big step towards supporting our community to participate in the research nexus.

The renewed partnership has three main goals:

Developments to improve experience and support metadata registration workflows in OJS, bringing relevant functionalities together under the Crossref plug-in, and developing an OMP Crossref plug-in.
Joint community engagement in support of transitioning OJS users to the future Long-Term Support (LTS) version of OJS, which will enable richer metadata registration.
Creation of a PKP School self-paced training course for system administrators.

Crossref and PKP have a rich history of collaboration, including previous investment in tools development in 2020, which resulted in some vital improvements to Crossref metadata management in OJS and a more streamlined experience for Crossref members on the platform, as well as many collaborative community events and training.

We know that thousands of Crossref members use OJS to register their metadata. Many are based in resource-constrained institutions, so the training provided by Crossref and PKP will be key to building their capacity to participate in the research nexus. With OJS 3.5 empowering organisations to register richer metadata, we look forward to opening up more opportunities for members to enhance their participation.

At PKP, we’re excited to deepen our longstanding collaboration with Crossref, supporting our global community in amplifying the visibility and impact of their research through streamlined integration for robust metadata management. By working together on both technological innovation and capacity-building initiatives, we anticipate even greater outcomes that will strengthen open scholarship throughout the duration of this partnership and well into the future.” – said Kevin Stranack, PKP Director of Operations.

– Kevin Stranack, PKP Director of Operations

About Crossref

Crossref runs an open infrastructure to link research objects, entities, and actions, creating a lasting and reusable scholarly record that underpins open science. Together with their 23,000 members in 164 countries, Crossref drives metadata exchange and supports nearly 2 billion monthly API queries, facilitating global research communication, for the benefit of society.

About PKP

Public Knowledge Project (PKP) seeks to improve the scholarly and public quality, reach, and diversity of academic research through the research, development, implementation, and support of innovative open source software to support scholarly publishing and communication.

Raising the standard: GigaScience Press on metadata and discoverability

Scott Edmunds — Wed, 17 Sep 2025 00:00:00 +0000

To mark Crossref’s 25th anniversary, we launched our first Metadata Awards to highlight members with the best metadata practices. GigaScience Press, based in Hong Kong, was the leader among small publishers, defined as organisations with less than USD 1 million in publishing revenue or expenses. We spoke with Scott Edmunds, Ph.D., Editor-in-Chief at GigaScience Press, about how discoverability drives their high metadata standards.

What motivates your organisation/team to work towards high-quality metadata? What objectives does it support for your organisation?

Our objective is to communicate science openly and collaboratively, without barriers, to solve problems in a data- and evidence-driven manner through Open Science publishing. High-quality metadata helps us address these objectives by improving the discoverability, transparency, and provenance of the work we publish. It is an integral part of the FAIR principles and UNESCO Open Science Recommendation, playing a role in increasing the accessibility of research for both humans and machines. As one of the authors of the FAIR principles paper and an advisor of the Make Data Count project, I’ve also personally been very conscious to practice what I preach.

Do you have a strategy for complete metadata? Which elements did you prioritise? What workflows, tools, or collaborations helped you get there?

We’ve been privileged to work with our technical partners at River Valley Technologies, and the novel XML-first publishing platform they have developed has made it particularly easy to integrate and collect persistent identifiers and other metadata, embedding it into the resulting rich-XML. As Open Access advocates, licensing and machine readability were early focuses when launching our journals. We ensured that we provided a text and data mining portal, allowing bulk downloads of our content to encourage reuse. Many specific metadata elements highlighted by the FAIR principles and UNESCO Open Science recommendations, and so these have also helped guide what should be prioritised. If there’s one specific tool to mention, we’ve been big fans of the Crossref participation reports, as this has helped highlight what is missing and what we need to improve upon.

How have you integrated these into your metadata processes?

The participation reports, in particular, have been useful for this, and by regularly checking them, we’ve managed to spot when processes have broken, for example. When you’ve added new fields to the reports like ROR IDs (Research Organization Registry), this has also motivated us to prioritise integrating these, so having a curated list of metadata fields like this definitively helps users focus on what should be the most important. River Valley Technologies has been very responsive to this type of feedback, and being able to see the participation report data in real-time has helped drive them to fix and update our metadata. So I thank them for being so patient and quick to respond to our very demanding standards.

What impact of good metadata can you see for your organisation?

From an Editorial side, our technical partners at River Valley Technologies have found having this metadata information available very useful in the Research Integrity tools they have developed and integrated into our publication platform. Things like ORCID IDs, RORs, and other identifiers are very useful for tracking provenance and increasing trust.

From a business side, putting the effort into collecting rich metadata has paid off in the long run by making it easier to integrate our publishing data into new platforms. Making it easier and quicker to integrate and track our data via OA Switchboard, for example. It also helps us more easily mirror and list our content in indexes like PMC, Scopus, Web of Science, and others.

Have you encountered any challenges in curating or improving your metadata?

One of the main metadata areas that has currently let us down, funding and registries, is because our publishing model is so affordable. The automated production processes from RVT’s novel publishing platform have allowed us to publish very cost-effectively (the APC of GigaByte is $535). We’ve also received sponsorship from the WHO to publish a series of public health papers, particularly supporting authors from the Global South who may not have sources of funding listed in these registries. Because of this, we’ve published numerous papers from independent researchers, students, and self-financed projects that may not have funding IDs or grant numbers. We’d like to push to get “unfunded” counted as a metadata field to address this.

Have your efforts regarding metadata yielded tangible benefits for your community? Is this something your editors, authors, or readers are aware of and appreciate? If so, why?

We’d like to think our authors find this useful, but we’ve not had any specific feedback on this. Our readers, both human and machine, should hopefully appreciate finding our work more easily, and from a purely selfish perspective, should get us higher access and citations. This is difficult to measure, but as evidence nerds, we have attempted to conduct RCTs examining this for Data Citations. One anecdote I can give is about the author who told us they pasted their paper into ChatGPT and asked it which was the best journal for their work, and it suggested our journal. I’d like to think that putting in this effort in making our papers more machine-readable and comprehensible pays off at times like this to make the discoverability and visibility of our journals greater.

Looking ahead, how are you planning to build on your metadata quality? Are there new elements or practices you’re exploring? And what advice would you give to others just starting to strengthen their metadata?

We still need to update older content with RORs, and improve it for the datasets linked to our papers. To do this, we’ve had interns working to improve our DataCite metadata.

We encourage others to think about metadata issues when setting up their workflows. While it may seem like additional work, it will be increasingly important to future-proof and get journals ready for our increasingly AI-centric age. And as we show here, we can more easily carry out important tasks like getting your content more quickly and widely indexed and disseminated.

Strong metadata ties open science, integrity, and discoverability together. GigaScience Press shows how consistent identifiers, machine-readable formats, and continuous checks deliver real benefits. As discovery becomes more AI-assisted, the priority is clear: keep metadata complete, open, and usable.

While it may seem like additional work, it will be increasingly important to future-proof and get journals ready for our increasingly AI-centric age.

– Scott Edmunds, GigaScience

Now, a few words from Scott.

Metadata Awards video - Gigascience

Meet the candidates and cast your vote in our 2025 Board elections

Lucy Ofiesh — Tue, 16 Sep 2025 00:00:00 +0000

On behalf of the Nominating Committee, I’m pleased to share the slate of candidates for the 2025 board election.

Each year we do an open call for board interest. This year, the Nominating Committee received 51 submissions from members worldwide to fill five open board seats.

We have four large member seats and one small member seat open for election in 2025. We maintain a balanced board of 8 large member seats and 8 small member seats. Size is determined based on the organization’s membership tier (small members fall in the $0-$1,650 tiers and large members in the $3,900 - $50,000 tiers).

We were pleased to see the diversity in candidates, with applicants from 19 countries. The committee was keen to prepare a diverse slate of organization types, individual skills and perspectives, and global representation.

Tier 1, Small member seats (electing one candidate)

Rebecca Wambua, Distance, Open and e-Learning Practitioners’ Association of Kenya
Oscar Donde, Pan Africa Science Journal
Nwachukwu Egbunike, Pan-Atlantic University Press

Tier 2, Large member seats (electing four candidates)

Damian Bird, CABI
Rose L’Huillier, Elsevier
John Sivo, IEEE
Nick Lindsay, The MIT Press
Anjalie Nawaratne, Springer Nature

Please read the candidates’ statements

Every member has a vote

If your organisation is a voting member in good standing as of September 5th, 2025, you are eligible to vote.

The voting contact for your organisation will receive a ballot from eBallot, a third party election platform. You should receive your ballot by Wednesday, September 17th, and you will have until 12:00 UTC on October 22nd to submit your ballot.

The election results will be announced at Crossref2025, our annual online meeting on October 22nd, 2025.

Special thanks to the committee: James Phillpotts of Oxford University Press, Wendy Patterson of Beilstein Institut, Abiodun Falodun of University of Benin, Amanda Ward of Taylor & Francis, and Chaerul Umam of the National Library of Indonesia for the time they dedicated to reviewing the expressions of interest and participating in committee meetings.

If you have any questions about our election process, please contact me

Happy voting!

A second look at Crossref's carbon footprint - the 2024 report

Ed Pentz — Mon, 15 Sep 2025 00:00:00 +0000

In 2022, we wrote a blog post “Rethinking staff travel, meetings, and events” outlining our new approach to staff travel, meetings, and events with the goal of not going back to ‘normal’ after the pandemic and said that in the future we would report on our efforts to balance online and virtual events, work life balance for staff, and track our carbon emissions. In December 2024, we wrote a blog post, “Summary of the environmental impact of Crossref,” that gave an overview of 2023 and provided the first report on our carbon emissions. Our report on 2023 only just made it into 2024, so we are happy to report on 2024 a little sooner in the year.

On the positive side, there are a few things:

Our spending on travel and meetings (a proxy for emissions) in 2024 was 56% of what it was in 2019, keeping below the target of not more than 60% of our 2019 spend
We were better at tracking hotel nights in 2024 compared to 2023
We managed to balance in-person, regional, and online meetings to engage with our global community while still not having returned to the pre-pandemic “normal”

In practice, our approach means thinking carefully about how to make the most of each trip. For example, when organising our Crossref Jakarta event, we travelled via Singapore and used the opportunity to meet with members there. Once in Jakarta, we combined our two-day event with an OJS workshop with colleagues from PKP, and another event with Universitas Indonesia. Similarly, when our colleague travelled by train to a conference in Amsterdam, they combined it with a day of visits to members in the area. These kinds of combinations reduce the need for separate trips and maximise the value of in-person travel.

Some of the less positive things were:

As our membership continues to grow globally and we expand our staff, (which are both great things in themselves), our emissions have also increased. Not only do we have more staff, but some staff travelled more in 2024 than in 2023. We’ll keep a close eye on this to avoid ever-increasing travel.
Taking a train instead of flying can take longer, and clashes with our desire for staff needing to be away from home as little time as possible.
It is difficult to find reliable data for some calculations - for example, we have decided not to try to calculate the impact of our Zoom use because there is no reliable way to do this.
We don’t have good options for offsetting our emissions, and it’s unclear whether we would want to do this even if they were available.

There is also the issue of whether it is worth it, or possible, to collect certain data, or whether it would change what we do. An example is Zoom. The estimate for the emissions from Zoom meetings in 2024 was 100 kg (that’s kilograms, not tonnes), but the calculations were made using a tool from 2020 that made many assumptions and estimates. We have no way of verifying whether the tool we used is accurate, so we decided not to update our previous calculation. In any case, we aren’t going to ration or reduce our teleconferencing, since it’s an essential tool, and especially if we want to fly less, have fewer in-person meetings, and operate effectively as a distributed organisation in multiple countries with no offices.

In summary, our total reported carbon emissions increased 40% from 105 tCO2e in 2023 to 147 tCO2e in 2024 (see below for the details). The positive aspect of this is that the increase is partly due to our improved ability to track our travel and hotel stays. The more concerning side of this is that we are travelling more. This enables us to engage with our growing community. We are still thinking strategically about our travel and meetings, following the approach outlined in our 2022 blog post. However, we need to carefully consider air travel in 2026, as it is our largest source of emissions (93%).

Total travel and carbon spending

Year	Amount	Percentage of 2019	Total carbon spent	Total hotel nights covered
2019 actuals	$585,482	100%	did not record	did not record
2020 actuals	$91,700	16%	did not record	did not record
2021 actuals	$19,066	3%	did not record	did not record
2022 actuals	$74,416	13%	did not record	did not record
2023 actuals	$305,737	52%	105 tCO2e	did not record
2024 actuals	$327,939	56%	147 tCO2e	415
2025 budget	$417,767 (reforecast)	71%	68 tCO2e (YTD)	256 (YTD)
2026 budget	$439,817	75%	TBD	TBD

In 2024, we met the target of keeping our travel expenses below 60% of our 2019 level. In 2025, we will exceed this. There are a number of reasons for this. We have more staff, more members, inflation has been high, and we are subsidising a lot more travel for others, such as our ambassadors, speakers, and collaborators at local events, and some board members (since 2019, we reduced from three to one in-person board meeting per year). This aligns with our goals of inclusivity for Crossref meetings, but we have to recognise there is a trade-off. The cost of travel, particularly airfare, has increased since 2019. Using US Bureau of Labor Statistics data from 2019 to 2025 the inflation multiplier for a dollar is 1.26 so adjusted for inflation the comparison figure for 2025 spending is $737,000 and forecasted 2025 spending is 60% of this. While we use cost as a proxy for travel volume, now that we’re better at tracking actual carbon emissions, we can try to set targets of keeping under a certain carbon tonne equivalent total instead of (only) a financial target.

Total Carbon Emissions for 2024

Our total reported carbon emissions increased 40% from 105 tCO2e in 2023 to 147 tCO2e in 2024. In 2023, we didn’t report on the estimated emissions from hotel stays, but for 2024, we have. We recorded 415 hotel nights in 2024 for 4 tCO2e using an average of Europe/US hotel per night emissions estimates (Circular Ecology). The most carbon-intensive activity was flying. There were about 215 flights in 2024, accounting for emissions of 138 tCO2e - 93% of our total. Crossref staff and community members we covered took 88 train journeys with carbon emissions of .47 tCO2e - so the more travel by train, the better, but this isn’t always possible or feasible. We haven’t included estimates of the impact of home working (Crossref is fully distributed), but we have an initial estimate below and will look to improve this analysis for the 2025 analysis and going forward.

Estimate of carbon footprint for distributed staff

Crossref is fully distributed with staff in 11 countries. We used Claude from Anthropic to calculate the emissions from home working for our staff in 2024 and asked for sources to be cited. It provided some approaches for how to go about the calculations but the results were not reliable - for our 46 staff in 10 countries (this is for 2024 - we now have 49 staff in 11 countries) estimates ranged from 5 tCO2e to 28 tCO2e depending on various assumptions such as whether to account for the grid intensity of the countries where staff are based (Our World in Data has grid intensity figures) and what estimate is used for the amount of energy an employee working from home uses each day. Circular Ecology uses UK DEFRA figures to come up with 2.67 kgCO2e/day for home working. So a simple calculation of 46 staff working 230 days per year arrives at the 28 tCO2e amount. This is much less than the equivalent figure for office-based work, which is 70 tCO2e. A number of things aren’t factored into these calculations: staff with green energy tariffs, staff with solar panels and home batteries, or other renewable energy sources, and the different needs for heating and air conditioning in different countries.

We decided not to include these figures in our overall emissions for 2024, but we are looking at a more reliable way to estimate this for 2025. However, we need to consider what we would do with the information and whether we would, or could, do anything to reduce this.

Hosting services

We use AWS for hosting our REST APIs, Metadata Search, and the website. In 2024, our main metadata registry was in a data centre in Massachusetts, which is not included in our calculations. In July 2025, we transitioned fully to AWS, so from 2025 onwards, our emissions from AWS will be higher and will encompass our entire system.

In 2023, Amazon reports Crossref’s carbon emissions were 0.216 tCO2e compared with 0.266 tCO2e in 2022. In 2024, emissions were 0.132 tCO2e.

Compared to travel, the footprint from AWS is minimal.

Online meetings

As a distributed, remote-first organisation, Crossref is a heavy Zoom user––it’s essential for staff and for engaging with our community. However, Zoom doesn’t provide tools or estimates of the carbon impact of Zoom meetings. We used a tool last year to provide an estimate, but we aren’t confident it’s accurate or meaningful. The tool was built in 2020 and made a lot of assumptions and guestimates.

Tools we used

To calculate emissions for flights and train journeys, we chose to use Carbon Calculator. For hotel stays and home working estimates, we used Circular Ecology. For AWS, we used the Customer Carbon Footprint Tool (CCFT) provided by AWS.

Offsetting

We don’t offset our emissions from travel or other operations and don’t have plans to do this. Offsetting emissions is problematic in a number of different ways, so we don’t feel confident in doing it.

In conclusion

In general, it feels good to have had a few years of tracking this, learning more, finding the right tools, and trying to stick to a target to limit our increases. While of course there are always reasons for the target to increase—as we grow and are able to subsidise others beyond our staff more—we remain committed to not just monitoring our carbon spend but also maintaining it at a reasonable level and finding ways to limit and mitigate our impact on the environment. This kind of sustainability isn’t included in the POSI Principles for open scholarly infrastructures, but we’d love to see other similar organisations share their tips and measurements so that, as a community, we can learn how to do even better.

Deprecating co-access: Crossref plans and timelines

Isaac Farley — Thu, 11 Sep 2025 00:00:00 +0000

To date, there are about 100 Crossref members who have made use of our co-access service for one or more of their books. The service was designed to be a last-resort measure when multiple parties - book publishers, aggregators, and other members - had rights to register book content. Unfortunately, the service allowed members to register multiple DOIs for shared books and book chapters, thereby violating our own core tenet of one DOI per content item. We should not have created a service that violated that tenet, resulting in duplicate DOIs. As we are able to offer an alternative in the form of the multiple resolution service, it is time to switch co-access off. Among other benefits – for the publisher and the authors, creation of a single DOI for each item, regardless of where it might be hosted, will result in more accurate citation counts and usage statistics. We’re retiring co-access at the end of 2026.

An idiom to start

There’s an idiom used in technology circles called ‘eating your own dog food.’ It’s used to describe an organization that tests or uses its own products in the real world. I’m no developer and only have a handful of years of exposure to this phrase, but I’ve always wanted to work it into one of my blog posts. The visceral reactions I have observed when it’s been used on internal calls are just too tempting. That, and I think it applies to our own rollout of and missteps with a service we call co-access. The decision to enable co-access reflected the priorities of that period, but we can now improve on it with an upgraded multiple resolution service. That rickety footing for co-access doomed it from the start. Now’s the time to face the music and swallow our own kibble.

Always meant as a last-resort measure, co-access allows multiple Crossref members to register metadata for shared book and book chapter content. Thus, use of co-access results in multiple, duplicate DOIs registered for the same book content. There are well over 500,000 DOIs in co-access within our corpus today. At least half of those are duplicates (more on this below).

This is far from ideal and has adverse consequences for the integrity of the scholarly record and the community. As we are able to offer an alternative in the form of the multiple resolution service, it is time to switch co-access off.

Among other benefits – for the publisher and the authors, creation of a single DOI for each item, regardless of where it might be hosted, will result in more accurate citation counts and usage statistics.

Duplicate DOIs

We frequently receive questions from members, metadata users, and others in the community, like this one, asking us what we are doing to combat the very real problem of registration and propagation of duplicate DOIs. We do take measures to prevent the registration of duplicate DOIs, including flagging registration of potential duplicate records to our members using what we call conflicts and conflict reports. As you might expect, this has been a sensitive topic for us, because we have one glaring service, yes, co-access, that has been actively exacerbating the issue of duplicate DOIs.

So, while we have been actively trying to counter the rise of duplicate DOIs, co-access enabled duplicate registrations of book DOIs. For every prefix that we configured for the service, we knew we were contributing to the problem (our members noticed too. As I said above, co-access allows multiple members to register their own DOI for shared book content. That means that book content in co-access has at least two DOIs registered. In some cases, there is book content with five or more registered DOIs for a single book. That’s a great many duplicates that this service is responsible for.

Replacing co-access

We plan to replace co-access with an existing tool, multiple resolution, which allows for more than one resolution URL to be registered to a single DOI. A user resolving the DOI is presented with an interim page, allowing them to choose from the various content sources registered with this DOI. We’ve made some progress toward making multiple resolution simpler for members to implement, but we still have more to do.

We’re aware that the technical steps involved in adopting multiple resolution might present a barrier to implementation for some of our members. To help with the transition, we are working on a basic tool (currently in beta) that simplifies the process. We will make it available to members between now and the middle of 2026.

Our timeline

We are not going to make these changes tomorrow. We’re going to give members who have been using co-access time to adjust. Right now, we trigger co-access when a secondary DOI is registered by a secondary registrant (member) that: 1) is already in a co-access group within our system with the DOI prefix that registered the original DOI, 2) has at least one shared ISBN with the metadata of that original DOI, and 3) has a title (in the title element of the book or chapter XML) that exactly matches the title of the original DOI. We’re going to stop triggering co-access for book and book chapter registrations starting 2026 July 1. No new DOIs will be placed in co-access starting then.

From there, there will be six months to clean up records already in co-access. One definitive DOI should be selected by the parties in a co-access group; the DOIs that will no longer be maintained for those books and book chapters should be aliased to the primary (definitive) DOI that will be maintained going forward. The primary DOI should be the DOI used on all landing pages for that book (or, book chapter).

In January 2027, if co-access DOIs have not been aliased to one another, we will force alias the DOIs in the record to the DOI registered by the organization identified as the publisher in the metadata records already in our system. At any point in this timeline, our team will be happy to help with the registration of secondary URLs in order to move books from co-access to multiple resolution. As a result, we will encourage members, end users, and the broader community to move back to using a single, definitive source of truth for these books and book chapters.

What will registration of books and book chapters look like post-co-access?

Coordinated. We expect that our members and their publishing partners will define the single DOI for each book and book chapter well upstream of Crossref, so all entities and their systems will use that one definitive DOI.

As for the registration process and our system, the first member to register the book (and its ISBNs) will establish the DOI for that book and its chapters. Following attempts to register the same content, with a duplicate book-level DOI(s), will fail the registration. Multiple DOIs for the same book or book chapter should be avoided starting 2026 July 01, as we will no longer be able to place books and book chapters into co-access.

We believe this will result in increased cited-by and usage metrics for that single DOI, and a cleaner, more accurate scholarly record.

We’d love to hear your reaction to this news in our Community Forum.

Celebrating one year of Crossref Grant IDs at NWO

Hans de Jonge — Tue, 09 Sep 2025 00:00:00 +0000

This month marks one year since the Dutch Research Council (NWO) introduced grant IDs—an important milestone in our journey toward more transparent and trackable research funding. We created over 1,600 Crossref Grant IDs with associated metadata. We are beginning to see them appear in publications. These early examples show the enormous potential Grant IDs have. They also highlight that publishers could extend their efforts to improve the quality of funding metadata of publications.

The promise of grant linking

For decades, funders have struggled with a seemingly simple challenge: tracking the research outputs that arise from their funding. The traditional approach—requiring grantees to cite their grants in acknowledgement sections of their papers—has all kinds of problems. Authors make many errors in providing this information, and even when funding organizations and schemes are cited correctly, there is no guarantee that a grant number is globally unique and not already in use by another funding council in the world.

To address these issues, and in collaboration with the research funding community, Crossref introduced the Grant Linking System (GLS) six years ago. The system allows funding organizations to assign globally unique and persistent identifiers to their grants, but - more importantly - the system allows connecting these grants with the outputs arising from them. The vision is straightforward: authors include Grant IDs (which are Crossref DOIs) in the funding acknowledgements of their research articles. Publishers either take these IDs from the acknowledgement or proactively ask authors for these IDs in their submission system. Next, when a publisher registers their publication with Crossref, it includes the grant identifier in the metadata of that publication, creating an unambiguous link between the publication and the grants from which the research was funded.

This last step—including the Grant ID in the metadata of the article when registering the publication with Crossref—is a crucial part of the system as it enables anyone to automatically retrieve all publications arising from a given grant over time via the Crossref API. Funding organizations interested in tracking the impact of their funding could then stop asking their grantees to manually report on the outputs of their funding, as most still do today. Instead, this information would become open data that funding organizations harvest directly themselves, reducing administrative burden on researchers while enhancing the ability to track the impact of their funding.

As Robert Kiley, former head of Open Research at the Wellcome Trust, which piloted the GLS in 2018, put it: “…if every funder were to adopt such a system and expose their grant metadata in a consistent, machine-readable way, it would facilitate the development of applications to help funders get a greatly enhanced picture of the global funding landscape, which in turn would inform strategic planning and resource allocation.”

NWO’s implementation journey

NWO joined Crossref’s Grant Linking System in 2024. It reflects our broader commitment to open science and aligns with our Persistent Identifier Strategy published in 2021, and our support for the Barcelona Declaration on Open Research Information. Since August 2024, all new grants awarded from July 2024 onward receive a Crossref Grant ID that persistently resolves to the information about the grant on our website, displaying all basic award information including project titles, summaries, grantee names, and affiliations. NWO is one of the 44 funding organizations worldwide that have introduced Crossref Grant IDs for their collective 111 funding programs. Other organizations include the European Commission, OSTI-DOE, the Wellcome Trust, Moore Foundation, Fonds de Recherche du Québec, CSIRO, Japan Science and Technology Agency, and the Austrian Science Fund.

Although it took time, implementation at NWO in general proceeded smoothly. Over the course of a year, we’ve registered over 1,500 grant records without experiencing difficulties or complaints from researchers. On the contrary, after we announced the introduction of Grant IDs, some researchers expressed disappointment on our decision—for practical reasons—to only register DOIs for new grants instead of the entire historical record. This shows that researchers understand the importance of persistent identifiers. Already, a year after its introduction, we are seeing the first NWO Grant IDs appearing in publications— showing that researchers are taking the extra step to look up their Crossref Grant ID and include it in their articles, as we are asking them to do. However, publishers don’t always manage to handle these identifiers in the way we expect them to.

Linking grants to publications in real life

One of the first publications to include an NWO Grant ID is a paper by Weile et al., published by the American Physical Society (APS) in the journal Physical Review X. On the left, we see the funding information provided by the authors, as included in the acknowledgement section of the published article. Funding by NWO from its Talent Scheme VIDI is identified with a Grant ID https://doi.org/10.61686/YDRHT18202.

On the right, we see how APS has included this information in the metadata of the publication: NWO is identified with its Funder ID and the grant with the Grant ID - forging an unambiguous link between funding and publication, initially between this particular grant and this particular publication, but potentially in the future between this grant and all other outputs arising from it. This works so long as all publishers include this information in the metadata of their publications; we need to encourage more publishers and other Crossref members (e.g., preprint services, repositories, blog platforms) to follow the APS example and do the same.

Where publishers fall short

There are big differences among publishers in their ability to include funding metadata. Many have been including funder IDs in the metadata for more than a decade, but some are still struggling to do that. Most are yet to catch up to start including Crossref Grant IDs, too.

Let’s demonstrate that in an example. On the left, we see the acknowledgements section of a paper by Van Zundert et al. in the journal Small, published by Wiley. The authors acknowledge a host of funding organizations and grants, including NWO with Grant ID https://doi.org/10.61686/LVZRW92421. On the right, we see that the publisher has correctly included NWO in the metadata as the funder with our Funder ID, but there’s no reference to our Grant ID, instead mentioning an award number, which seems to refer to a Marie Sklodowska-Curie grant for the same research with their internal award identifier.

Likewise, a publication by Criscuolo et al in Physics of Life Reviews (a journal published by Elsevier) correctly identified NWO using our Funder ID, but omitted our Grant ID in the metadata, despite its clear inclusion by the author in the acknowledgements (left). Apparently, this persistent link and open metadata is being thrown out of the infrastructure at a crucial time, when the article record could be connecting up with the grant record and making it easy and open for us all to track and report on the connection.

Several publishers do not seem to register funding data at all, despite the opportunity existing for almost 15 years, and sometimes even when comprehensive funding information is provided by authors.

The broader implications

It has been known for some time that publishers struggle with registering complete, high-quality funding metadata for their publications. They sometimes blame authors for not providing the required information or making errors in reporting their funding. Or they call on funders to identify their funding more precisely by introducing persistent Grant IDs for their grants. While these are legitimate issues, and it’s true that more funders could also do this, the examples presented here suggest this narrative is incomplete—when authors provide clear, standardized funding information using persistent identifiers, many publishers still fail to capture it accurately.

The Grant Linking System is still relatively new in terms of open infrastructure and open metadata development, and adoption from funders is still in the tens rather than the tens of thousands, with publishers being more accustomed to creating and providing millions of open metadata records for their publications. Most participating funders, like us, have only started registering grants in the past couple of years. Now that Crossref Grant IDs are becoming more widespread, and with publishers’ experience in creating open metadata, we would love to see publishers prioritise collecting and including Grant IDs in their Crossref metadata. By updating their production practices, they would be supporting the community at large in reaping the benefits of open grant metadata.

To address these challenges, we are organizing a roundtable session under the Barcelona Declaration in October to discuss concrete solutions for these issues. We invite publishers who are interested in participating to contact us. This follows a 2023 workshop where many publishers were very open in discussing the challenges and working towards improving the process together with funders.

Looking ahead

The introduction of Crossref Grant IDs represents just the first step in a longer journey toward more open research information for NWO. We are happy to see how quickly researchers are adopting the system by including Crossref Grant IDs in their work. For Grant IDs to truly become a Grant Linking System and fulfil its promise, however, publishers must act on the need to collect and process funding information in their publishing workflows, just as they do for other joint efforts, such as for ORCID iDs for contributors. The information is there—authors are providing it in the acknowledgement sections of their articles (and probably would too if asked directly in a submission form). The question now is: can we encourage more publishers to take up the request to capture and transmit this information accurately and register it with Crossref?

We’re hopeful. This first year has demonstrated the enormous potential of Crossref Grant IDs in action for NWO. We call on publishers to do their bit in ensuring this vital infrastructure reaches its full potential for the research community.

An eLife filled with possibility thanks to great metadata

Frederick Atherden — Thu, 28 Aug 2025 00:00:00 +0000

eLife recently won a Crossref Metadata Award for the completeness of its metadata, showing itself as the clear leader among our medium-sized members. In this post, the eLife team answers our questions about how and why they produce such high-quality open metadata. For eLife, the work of creating and sharing excellent metadata aligns with their mission to foster open science and supports their preprint-centred publication model, but it also lays the groundwork for all kinds of exciting potential uses.

Having complete and rich metadata puts you in the best position to fulfil future, as-yet-undetermined requirements.

– Fred Atherden, eLife

What motivates your organisation/team to work towards high-quality metadata? What objectives does it support for your organisation?

eLife is a mission-driven organisation tasked by its founders to help scientists accelerate discovery and encourage responsible behaviours in science. As such, we’re passionate about open science and metadata, and we’re vocal advocates of the benefits these provide to academic communities and beyond.

Given Crossref’s position as a hub at the centre of scholarly communication, providing Crossref with complete metadata furthers our mission. It facilitates the discovery and reuse of research and enables linkage to key but often overlooked outputs such as datasets and software. As signatories of DORA and supporters of the Barcelona Declaration, we are keenly aware of the wider context - that these efforts enable research assessment and policy decisions to be derived from open and transparent information, moving beyond closed systems that have proliferated the damaging use of anachronistic metrics.

Do you have a strategy for complete metadata? Which elements did you prioritise? What workflows, tools, or collaborations helped you get there?

There are plenty of existing guidelines that provide a great skeleton to follow. For example, we follow FAIR data and FORCE11 software citation principles, which ensure the capture of metadata for supporting datasets and software packages. There’s not any one particular element that we’ve prioritised, although we’re keen to ensure we follow best practices while also exploring the bleeding edge.

We’ve collaborated with and relied on the advice of many organisations over the years, including (but not limited to) Crossref, Research Organization Registry (ROR), JATS4R, FORCE11, Software Heritage, openRxiv, and our production vendors Exeter Premedia.

We’ve developed our own open source Crossref metadata generation library. Keeping this process in-house has proven really fruitful. It allows us to quickly and continuously improve upon the metadata we provide.

And we have a data team that has created a centralised data hub, serving as a really useful authoritative resource that can be queried, instead of always making use of disparate systems.

How have you integrated these into your metadata processes?

At submission, we collect ROR IDs for (a subset of) affiliations, and structured data for funding, datasets, and other information. Our publication model is centred around preprints, so it’s necessary to capture related information such as the preprint DOI, preprint posted date, the version that pertains to each specific revision (and so on). Without this information, we could not post public reviews to the correct preprint version on the preprint server, or indeed ensure the article we publish is the correct iteration of that work.

The systems that enable the publication of eLife Reviewed preprints are dependent on DocMaps, a framework for a machine-readable representation of the processes involved in the creation of a document. These are provided by our Data Hub and enable us to capture structured information about the peer review process and accompanying metadata for each article.

Our proofing system for journal articles only permits login via ORCID authentication, and we don’t capture unauthenticated ORCID IDs that have been copied or keyed (see ‘What’s So Special About Signing In?’). It also makes use of both the Crossref API and the PubMed Central API to ensure we have persistent identifiers where possible for references. We have an in-house content validator, which uses ROR’s API to ensure we have ROR IDs for affiliations and funders where possible. We use Software Heritage to archive author-generated code, and include their persistent ID (SWHID) in software references.

All our published content is captured as JATS XML (the industry standard format for journal articles), which our metadata generation library uses as its input.

What impact of good metadata can you see for your organisation? Is it supporting the business and/or editorial side of your work?

Persistent identifiers are very useful for reporting. Creating a report that, for example, includes publication volumes from a particular institution is trivial when content is enriched with persistent identifiers. It’s more complex when all you have are messy author-supplied strings of text. They’re also useful for content validation. For example, when we have a persistent ID and a method to retrieve the related metadata, we can confirm that the information we’ve been provided is complete and correct.

There are, of course, many other benefits, some of which are “unknown unknowns.” Having complete and rich metadata puts you in the best position to fulfil future, as-yet-undetermined requirements.

Have you encountered any challenges in curating or improving your metadata? If so, what were they, and how did you address those?

In 2024, we started introducing persistent grant IDs for our content. While we updated our submission system to collect these from authors, it’s apparent that many authors aren’t aware when/if these have been registered by funders, and they still provide us with the (internal) grant numbers instead.

Our workaround was to pull grant data from Crossref and then replace the grant numbers with the persistent IDs when we’re confident of a match. Since the grant number registered at Crossref might not exactly match the grant number the authors have given us, potential matches are confirmed by a team member or our production vendors. Since many organisations do a great job of creating informative landing pages (for example, EuropePMC for Wellcome funding), this is feasible, but we’re investigating ways we can make this less manual while remaining careful that we don’t introduce false positives.

Have your efforts around metadata led to real benefits for your community? Is this something your editors, authors, or readers are aware of and appreciate? If so, why?

Yes, I think this is something that is becoming increasingly visible. Authors are very mindful of the benefits that good metadata can bring for discoverability and promotion. And much is lost without the increased interoperability it brings, both for publishers themselves but also the wider ecosystem. For example, we’ve had some great feedback from numerous organisations that appreciate that the outputs we publish directly link to the preprints they are based on.

In recent years, there’s been an increased focus on research integrity, and this is likely to remain the case. Metadata has an obvious and key role in providing trust and transparency, whether that’s through the presence of trust markers like ORCID IDs or through the inclusion of complete post-publication metadata such as correction, retraction, or withdrawal information.

Looking ahead, how are you planning to build on your metadata quality? Are there new elements or practices you’re exploring? And what advice would you give to others just starting to strengthen their metadata?

Several years ago, we introduced a “publish, review, curate” model of publishing, where we publish ‘Reviewed preprints’ following each stage of review. We don’t collect the same level of structured information from authors at submission for these as we do for Versions of Record. This presents a challenge for retrieving and disseminating complete metadata for Reviewed preprints. We aim to start moving this forward so that comprehensive metadata is available at earlier stages of the publication process. For example, we recently started depositing (some) funding metadata for these.

We’re also keen to explore the ways in which we can make our eLife Assessments more discoverable. Our Editors use a common vocabulary to describe the significance of the findings and strength of evidence in a paper. Other publishers moving beyond accept/reject publication models use different rubrics and taxonomies, so having one restrictive field in a schema for the entire corpus of research won’t cut it. But nevertheless making these terms more discoverable and interoperable would be preferential.

We’ve found that the integration of public APIs/data within systems (such as ROR’s, Crossref’s, PubMed’s, and OpenAlex’s) to be really helpful in validating the correctness and completeness of content/metadata. The effort in adding these integrations will pay dividends in the future.

Time to enjoy Fred’s acceptance video.

Metadata Awards video - eLife

Mejorando la visibilidad a través de los metadatos: una mirada desde Editorial CSIC

Nacho Pérez Alcalde — Thu, 14 Aug 2025 00:00:00 +0000

Click here for the version in English

Hablamos con Nacho Pérez Alcalde, Vicedirector Técnico de Editorial CSIC, la editorial al mando de ´Boletín Geológico y Minero’, ganadora del Crossref Metadata Award en la categoría de Metadata Enrichment. Miembro de Crossref desde 2008, Editorial CSIC publica 41 revistas en acceso abierto Diamante, y juega un papel esencial en la diseminación del conocimiento científico a nivel internacional. Exploramos lo que este premio ha significado para Editorial CSIC y qué planes para el futuro tienen para seguir mejorando la calidad y uso de sus metadatos.

El ‘Boletín Geológico y Minero’ ha recibido el primer premio de Crossref al enriquecimiento de vuestros metadatos ya que en tan solo dos años, ha visto la cobertura de los metadatos pasar del 1 al 40%. ¿Cuáles han sido las motivaciones que han llevado a esta revista a ver una mejora tan grande en sus metadatos?

Editorial CSIC publica 41 revistas científicas, todas ellas presentes en los principales indexadores. Son revistas de prestigio que ofrecen, desde hace muchos años, contenidos revisados de alta calidad. Sin embargo, hoy en día, no es ya suficiente para una revista científica ofrecer contenidos de calidad, hoy en día es necesario ofrecer también una alta calidad en los metadatos generados por esas publicaciones. Algo que hace no muchos años veíamos como un servicio de valor añadido se ha convertido en algo imprescindible.

En un entorno de trabajo electrónico y en Internet, los metadatos son claves para la difusión de los contenidos, la identificación de revistas, autores/as, instituciones editoras, entidades financiadoras… Para un editor es fundamental poder transmitir esa información según unos procedimientos técnicos y unos protocolos estandarizados para garantizar su compatibilidad con las máquinas que cosechan, almacenan y distribuyen datos favoreciendo la visibilidad y la descubribilidad de nuestras revistas.

¿Seguis alguna estrategia? ¿Cómo decidís qué elementos priorizar?

Llevamos años trabajando con metadatos y, de forma periódica, vamos revisando y ampliando el número de elementos que convertimos en metadatos. Damos prioridad siempre a lo que es ya un estándar claramente identificado (por ejemplo el ORCID) y también a aquellos metadatos alineados con las políticas editoriales que consideramos prioritarias (por ejemplo la licencia CC by que aplicamos).

El flujo de trabajo requiere como primer paso la identificación, por parte del editor, de los datos que se quieren obtener y de cómo se van a pedir. Una vez se integran todos ellos en la política de envío de originales a la revista, es imprescindible la colaboración de los autores que son los que aportan los datos que, en una fase posterior son revisados por un editor técnico especializado en metadatos (diferente al revisor de texto). Por último, es imprescindible contar con una herramienta que permita automatizar la transferencia de metadatos y aquí es muy importante contar con personal técnico especializado. Nosotros trabajamos con la plataforma OJS, yo he pasado años depositando metadatos en Crossref con los archivos XML que generábamos, uno a uno. Con 1.000 artículos publicados de media al año, la creación del Módulo de exportación CrossRef XML de OJS para el depósito automatizado desde la plataforma fue de gran ayuda para nosotros porque aligera bastante el trabajo, asegura una mayor fiabilidad y nos permite dedicar nuestro tiempo a mejorar otras cosas.

También nos da una mayor flexibilidad a la hora de revisar nuestras políticas de datos, por ejemplo, nos ha permitido abordar un depósito masivo para actualizar todas nuestras referencias para corregir errores recurrentes.

¿Cómo habéis integrado esto en vuestra estrategia de metadatos?

El Crossref Metadata Enrichment Award ha sido concedido en concreto a la revista Boletín Geológico y Minero por haber experimentado una gran mejora en sus metadatos en los últimos años. Esta revista era editada por otra institución y cuando Editorial CSIC se hizo cargo de ella le aplicamos los mismos estándares que venimos utilizando en el resto de nuestras revistas desde hace años. Nos sentimos por ello especialmente orgullosos, porque entendemos este premio como el aval a una política de metadatos que llevamos años desarrollando y que ha permitido una mejora importante para esta revista en un tiempo relativamente corto.

Para ello fue clave la colaboración de la dirección científica de la revista, nosotros explicamos primero qué datos deben ser solicitados a los autores, por qué y para qué, y luego nos ocupamos de confirmar que se han ido integrando en los artículos y de implementarlos en la plataforma OJS para proceder después a su depósito en Crossref pero también a su integración en otras vías de difusión de metadatos.

A nivel de impacto, ¿cómo veis que una buena cobertura de los metadatos afecta a vuestra organización? ¿Beneficia de alguna manera vuestro trabajo editorial? O cualquier otro aspecto de vuestra actividad?

Más allá de sus beneficios obvios como potenciar la visibilidad de nuestras publicaciones y contribuir a manejar una información controlada y de calidad, en última instancia deberían ayudarnos a posicionarnos como grupo profesional. Nuestra función esencial es publicar contenido científico revisado y de calidad y transmitirlo a la comunidad científica y, cada vez más, a toda la sociedad. Sin embargo, hoy en día, deberíamos aspirar a ser identificados también como proveedores de datos. Y eso, en “la era del dato”, es mucho decir. Debemos ser capaces de extraer los metadatos de nuestras publicaciones aportados por los autores (palabras claves, filiación, bibliografías…) pero también debemos ser capaces de generar nosotros otros metadatos y de transmitirlos y difundirlos.

Las revistas científicas deben seguir contando con un editor que haga una revisión ortotipografía y de pruebas, pero también deben contar con un editor de metadatos, alguien que sepa qué es FundRef y sepa dónde y cómo hay que introducir los datos en la plataforma para garantizar que se conservan y transfieren de manera correcta y eficiente.

Por ello, quiero aprovechar esta ocasión para reivindicar el papel del editor como generador y proveedor de datos. Los editores somos la fuente de datos, hay agentes como las bibliotecas e indexadores que los cosechan, archivan, transmiten y procesan para, por ejemplo, generar nuevos contenidos o servicios, pero solo nosotros tenemos la capacidad de generarlos.

¿Habéis encontrado dificultades a la hora de mejorar y manejar vuestros metadatos? En ocasiones los autores se quejan de que se les piden muchos datos, por ejemplo, el uso de ORCID es obligatorio en nuestras publicaciones y muchos autores, sobre todo de ámbitos no europeos, se han quejado porque no saben qué es y para qué sirve o, por motivos personales, no quieren registrar ese identificador personal. Son motivos respetables, por supuesto, pero para nosotros prima la necesidad de identificar correctamente a cada autor y creemos que el ORCID ayuda a ello.

Otro problema habitual es que muchos autores, al citar una fuente de financiación, utilizan el nombre de la entidad financiadora pero a veces no lo ponen completo, o no incluyen el acrónimo o lo que es peor, ponen el nombre pero no el código de la institución o del proyecto. Los autores están acostumbrados a escribir pensando en los lectores “humanos” y no en las máquinas que van a procesar después toda esa información. Nuestro papel, como editores de metadatos, pasa por informarles, de forma didáctica, de la importancia de aportar esos códigos y pedírselos si vemos que no los han incluido en su manuscrito.

Y con respecto a vuestra comunidad, ¿se ha visto beneficiada de vuestro esfuerzo para tener unos metadatos completos y de alta calidad? ¿Están los autores, editores o lectores al tanto de estos esfuerzos o lo valoran?

Para el editor técnico es más sencillo valorarlo, nosotros sabemos cómo funciona el entorno, lo importante que es la interoperabilidad de las plataformas, la rapidez y amplitud de transmisión que puede alcanzar un dato y lo importante que es que esté correcto desde su origen porque luego puede ser muy, muy difícil corregirlo y controlarlo. Somos conscientes también de su posible impacto porque sabemos cómo los sistemas de información se alimentan unos de otros y comparten información, una información que generamos nosotros.

Los editores científicos, autores y lectores suelen valorarlo menos y no siempre son conscientes de su relevancia, aunque no se puede generalizar. Y de hecho, aunque creo que todos deberían tener al menos unas nociones básicas de cómo funciona, creo que los autores ya están bastante saturados con todos los requerimientos que les pedimos para entregar sus manuscritos como para que les pidamos, además, formación específica en metadatos. Para eso (entre otras cosas) estamos los editores, para indicarles qué datos y cómo los deben aportar.

No obstante, hoy en día todo el mundo está familiarizado con lo que son y lo que se puede hacer con los datos, todos consumimos productos muy diversos a través de internet y tenemos al menos nociones de lo que son los metadatos, los datos personales, los algoritmos… Hace años era mucho más complejo hacer didáctica de esto, pero hoy en día cualquiera lo entiende fácilmente y más en un ámbito científico y tecnológico como el de nuestras publicaciones.

Con la vista puesta en el futuro, tenéis algún plan para seguir construyendo sobre lo ya creado? ¿Algún elemento que queráis seguir implementando o prácticas que queráis incorporar en vuestra manera de trabajar?

En editorial CSIC, desde que comenzamos a publicar en formato electrónico y a distribuir nuestras revistas electrónicas en línea, hace ya casi 20 años, siempre estamos tratando de innovar en diseños, plataformas de gestión, formatos de archivo… Hablando de cosas concretas, hemos ampliado el uso obligatorio de ORCID y DOI a las contribuciones que no son puramente artículos científicos (hasta ahora nuestras reseñas, obituarios y textos similares no los tenían) y estamos valorando la implementación de identificadores ROR para organizaciones de investigación.

¿Qué consejos darías a aquellas organizaciones que están comenzando a mejorar la calidad de sus metadatos?

Para aquellos editores que están empezando a reforzar sus metadatos me atrevería a indicar algo aparentemente lógico y sencillo pero que creo que no siempre se hace: que planifiquen con calma y en detalle una política editorial de datos basada en identificar y seleccionar los datos que consideren prioritarios e implementar, después, protocolos para solicitarlos a sus autores e integrarlos en las plataformas editoriales y, por último, configurar correctamente dichas plataformas para asegurar una correcta exportación.

El metadato requiere de una cadena en la que trabajan diversas personas con distintos perfiles, hay que tener recursos para afianzar esa cadena y hay que tener en cuenta que no basta con pedir los datos a los autores, hay que seguir el recorrido de los datos desde su origen hasta donde podamos y eso no termina cuando los depositamos en Crossref: podemos depositarlos de manera adicional en otros sitios, podemos darles otras salidas y, además, debemos volver sobre ellos si detectamos algún error sistemático que podamos corregir.

Los Metadata Excellence Awards fueron entregados en mayo de 2025, en el contexto del encuentro anual de Crossref con su comunidad. Os dejamos el vídeo de aceptación del premio por parte de la revista Boletín Geológico y Minero, editada por Editorial CSIC.

Y ahora disfruta de este vídeo de aceptación.

Version in English

Improving visibility through metadata: a look from CSIC Editorial

We spoke with Nacho Pérez Alcalde, Technical Deputy Director of Editorial CSIC, the publisher behind ‘Boletín Geológico y Minero’, recipient of the Crossref Metadata Award in the Metadata Enrichment category. A Crossref member since 2008, Editorial CSIC publishes 41 Diamond Open Access journals and plays a key role in scholarly communication at the international level. We explore what this award has meant for Editorial CSIC and what plans they have for the future to continue improving the quality and use of their metadata.

What motivates your team to work towards high-quality metadata? What objectives does it support for your organisation?

Editorial CSIC publishes 41 scientific journals, all of which are included in major indexing databases. These are prestigious journals that have offered high-quality, peer-reviewed content for many years. However, today, it is no longer enough for a scientific journal to provide quality content alone; it is now also essential to deliver high-quality metadata associated with those publications. What just a few years ago was considered a value-added service has now become indispensable.

In an electronic and internet-based working environment, metadata is key to content dissemination and to the identification of journals, authors, publishing institutions, and funding organizations. For a publisher, it is crucial to be able to transmit this information through technical procedures and standardised protocols to ensure compatibility with the systems that harvest, store, and distribute data, enhancing the visibility and discoverability of our journals.

Do you have a strategy for complete metadata?

We’ve been working with metadata for years and, periodically, we review and expand the number of elements we convert into metadata. We always prioritise what is already a clearly established standard (for example, ORCID), as well as metadata aligned with editorial policies we consider a priority (such as the CC BY license we apply).

The workflow begins with the editor identifying the data to be collected and how it will be requested. Once this is integrated into the journal’s submission guidelines, the collaboration of authors becomes essential, as they are the ones who provide the data. In a later phase, the data is reviewed by a technical editor specialising in metadata (different from the content reviewer). Finally, it’s crucial to have a tool that enables the automated transfer of metadata, and here, having specialised technical staff is very important.

We work with the OJS platform; I spent years depositing metadata in Crossref using XML files that we generated manually, one by one. With an average of 1,000 articles published per year, the creation of the Crossref XML export module in OJS for automated deposit from the platform was a huge help for us – it significantly lightened the workload, ensured greater reliability, and allowed us to focus our time on improving other aspects.

It also gives us more flexibility when reviewing our data policies. For example, it allowed us to carry out a bulk deposit to update all our references in order to correct a recurring error.

How have you integrated these into your metadata processes?

The Crossref Metadata Enrichment Award was specifically granted to the journal Boletín Geológico y Minero for having shown significant improvement in its metadata in recent years. This journal was previously published by another institution, and when Editorial CSIC took over, we applied the same standards we have been using for our other journals for many years. We are especially proud of this because we see the award as recognition of a metadata policy we’ve been developing over the years, one that has led to significant improvements for this journal in a relatively short time.

The collaboration of the journal’s scientific leadership was key to achieving this. We first explained which data should be requested from authors, why, and for what purpose. Then we ensured that the data was being properly integrated into the articles and implemented it within the OJS platform. From there, we proceeded with depositing the metadata in Crossref and also integrating it into other metadata dissemination channels.

What impact of good metadata can you see for your organisation? Is it supporting the business and/or editorial side of your work?

Beyond their obvious benefits, such as increasing the visibility of our publications and contributing to the management of controlled, high-quality information, they should ultimately help us position ourselves as a professional group. Our essential role is to publish peer-reviewed, high-quality scientific content and deliver it to the scientific community and, increasingly, to society at large. However, today, we should also aim to be recognised as data providers. And that, in the “age of data,” is a significant shift. We must be able to extract metadata from our publications-supplied by authors (keywords, affiliations, bibliographies…). We also need to generate other metadata ourselves, and transmit and disseminate those effectively. Scientific journals must still have editors who perform copy editing and proofreading, but they must also have metadata editors, people who understand what FundRef is, and know where and how to input data into the platform to ensure it is preserved and transferred correctly and efficiently.

That’s why I want to take this opportunity to highlight the role of the editor as a generator and provider of data. Editors are the source of data. There are other actors-like libraries and indexers-who harvest, archive, transmit, and process that data to, for example, create new content or services. But only we have the capacity to generate it.

Have you encountered any challenges in curating or improving your metadata?

Sometimes authors complain about being asked for too much information. For example, the use of ORCID is mandatory in our publications, and many authors, especially those from non-European regions, have complained because they don’t know what it is or what it’s for, or – for personal reasons – they don’t want to register for a personal identifier. These reasons are, of course, valid and understandable, but for us, the priority is to correctly identify each author, and we believe ORCID helps achieve that. Another common issue is that when authors cite a funding source, they often include the name of the funding body, but sometimes don’t write it in full, or they omit the acronym, or worse – they include the name but not the institution or project code. Authors are used to writing with “human” readers in mind, not the machines that will later process all that information. Our role, as metadata editors, involves educating them about the importance of providing these codes and requesting them when we see they’ve been left out of the manuscript.

Have your efforts around metadata led to real benefits for your community? Is this something your editors, authors, or readers are aware of and appreciate? If so, why?

For the technical editor, it’s easier to assess the value of metadata. We understand how the ecosystem works, how important platform interoperability is, how quickly and widely data can be transmitted, and how crucial it is for data to be correct from the very beginning. Once it’s out there, it can be very, very difficult to correct or control. We’re also aware of its potential impact because we know how information systems feed off each other and share information – information that we generate.

Scientific editors, authors, and readers tend to value it less and aren’t always aware of its importance, though of course there are exceptions. While I believe everyone should at least have a basic understanding of how it works, I also think authors are already overwhelmed with all the requirements we ask of them when submitting manuscripts. Editors are here to guide them on what data to provide and how to provide it.

That said, today, everyone is at least somewhat familiar with what data is and what can be done with it. We all consume a wide variety of digital content online and have at least a basic idea of what metadata, personal data, and algorithms are. A few years ago, explaining all this was much more difficult, but nowadays, it’s much easier for people to grasp, especially within the scientific and technological environment in which we publish.

Looking ahead, how are you planning to build on your metadata quality? Are there new elements or practices you’re exploring? And what advice would you give to others just starting to strengthen their metadata?

At Editorial CSIC, ever since we began publishing in electronic format and distributing our journals online, almost 20 years ago, we have consistently sought to innovate in design, management platforms, and file formats. Speaking of specific actions, we have extended the mandatory use of ORCID and DOI to contributions that are not strictly scientific articles (until now, our book reviews, obituaries, and similar texts didn’t have them), and we are currently considering the implementation of ROR identifiers for research organizations.

Do you have any advice for organisations that are making an effort to improve the quality of their metadata?

For editors who are just beginning to strengthen their metadata, I would suggest something that seems logical and simple, but is not always put into practice: take the time to calmly and thoroughly plan a data policy. This should be based on identifying and selecting which data elements are most important, then implementing protocols to request them from authors and integrate them into editorial platforms, and finally, configuring those platforms correctly to ensure proper export.

Metadata involves a chain of tasks carried out by people with different profiles. You need to have resources to strengthen that chain. It’s good to remember that it’s not enough to simply ask authors for data – you have to follow the data along its entire path from the source as far as possible. That journey doesn’t end when we deposit it in Crossref: we can also deposit it in other repositories, find additional ways to disseminate it, and we must revisit it if we detect any recurring errors that can be corrected.

And now enjoy this acceptance video.

We’ve migrated to the cloud; we hope you didn’t notice (but maybe you did)

Sara Bowman — Tue, 12 Aug 2025 00:00:00 +0000

TLDR: We’ve successfully moved the main Crossref systems to the cloud! We’ve more to do, with several bugs identified and fixed, and a few still ongoing. However, it’s a step in the right direction and a significant milestone, as, whilst it is a much larger financial investment, it addresses several risks and limitations and shores up the Crossref infrastructure for the future.

Some background

We have been doing a lot of thinking, planning, and working on paying down our technical debt and modernising our systems. It’s not fun and flashy work, but it is vital for sustaining our infrastructure, meeting the demand on existing services, and developing new services.

Just about a year ago, we completed phase one, migrating our main database from Oracle to PostgreSQL, an open-source database. This move brought us more in line with our commitment to the POSI principles, reduced our dependencies on costly private licenses, and opened up the possibility to use and offer additional and more contemporary features. With the transition to PostgreSQL we made upgrades to the operating system, the database software, and the underlying hardware, resulting in significant improvements to the overall throughput and capacity of the deposit system. Previously, we typically maintained a queue of more than 10,000 deposits waiting to be processed; now, the queue holds fewer than 100 deposits on average. Consequently, the average latency – the elapsed time from submission to deposit – has reduced from hours to seconds.

During phase one, a total of 35 new servers were created, and for the first time, the entire system configuration was defined through infrastructure-as-code, enabling the infrastructure to be recreated as necessary. This effort not only enabled the migration but also established a solid foundation for our cloud migration strategy, as the code was leveraged to configure our infrastructure on AWS. Additionally, it serves as a critical component of our disaster recovery planning.

Most importantly, phase one set us up for phase two and our next migration: moving the system into the cloud.

Why we moved to the cloud

We had been running most of our services in a physical data centre near Boston, MA, USA (there are a few exceptions: the REST API and our test system (test.crossref.org) were already in the cloud, as was the Crossref website). We’ve been planning to move to the cloud for ahem quite some time, but as always, competing priorities and limited resources have thwarted us, and the data centre was mainly serving us well.

But… with staff across 12 countries, and increased global use of our system, operating our own hardware in a physical data centre was becoming increasingly challenging and risky, not to mention, frustrating.

Moving to the cloud has solved several pain points for us:

Physical access to the data centre was required for various tasks (e.g., hardware upgrades, troubleshooting, general maintenance), but as Crossref grew as an organisation and became more distributed, we had fewer staff in the area. Hosting services in the cloud means staff around the world can access our servers remotely from anywhere (and we can leave the hardware upgrades to our vendor).
Scalability in the data centre required installing new hardware or upgrading connections, which also meant a good amount of time. In the cloud, we can scale up almost instantly.
We can maintain copies of our databases and services in distributed places, providing insurance against natural or other disasters. Upgrades now don’t involve buying physical hardware and installing it; it’s a much quicker and more straightforward process.

Moving from a physical data centre to the cloud also has some trade-offs; for instance, the cost will be approximately five times higher than running the system in the data centre; with initial data, it’s not unlikely the annual cost may be up to 2,000,000 USD. We aim to optimise and control this cost going forward.

What we did

The size of the undertaking was partly due to leaving it so long; technical debt has accumulated over many years of running the system in the data centre.

The whole plan was hugely detailed, but we can distil it to a few bullets:

We conducted an analysis of components, considered risks and sequencing, and created a test plan and timeline, including comms.
While most of the drive and work was on the shoulders of two infrastructure services colleagues, our software engineers were heavily involved too, and we had weekly check-ins with a cross-team group to review progress, reassess risks, and adjust timelines as we got closer to the migration date (or decided to move it once or twice).
We first created the deposit system in the cloud.
We then created other parts of our services that aren’t in the deposit system code base, but run alongside it, such as reports, querying, and other tools. We replicated our databases (of which there are several, in a few different flavours - PostgreSQL, MySQL).
We gave 14 days’ notice to our members, via email, and kept this maintenance notice up to date.
We commenced the migration on 8th July, which involved taking the whole system down and rejecting deposits for up to 24 hours.
In the process, we scripted the process to create CS and the other services using Terraform and Ansible, so that going forward, bringing up a whole new instance of CS (should we need to) won’t be a manual process.
We moved the DNS to point at our new system in the cloud, rather than the data centre. We brought the system back up on 9th July, after 14 hours of downtime, and watched the first few deposits come in, while testing thoroughly.
Alongside the technical team, the membership and support team was at the ready to work through the testing in the new live production environment.

The message we sent to members, Metadata Plus subscribers, and key integrators like PKP and Turnitin, listed which services would be down and described what changes they might see, such as:

The system timezone shifted from EST to UTC (universal coordinated time), which would be noticeable in the timestamps reported back to members after metadata deposits
Our IP address became dynamic and is no longer static. If members had hardcoded our previous IP static address to connect to our services, that would no longer work.
We previously allowed connections using the HTTP/1.0 protocol, but now require HTTP/1.1. Likewise, we previously allowed TLS version 1.1, but now require at least version 1.2. Older ciphers will not work. A list of accepted ciphers can be found on this page for “ELBSecurityPolicy-TLS13-1-2-2021-06”.

How it went and what’s next

We still have more to do, with both expected and unexpected issues arising from the migration. There are a couple of functions that still route through the data centre, configuration changes to wrangle, and processes to iron out, so we’ll be keeping that open for another couple of months.

Those were the known issues…

…we also uncovered a few bugs along the way, and we’ve been reporting those (and our progress toward fixing them) on our status page. See history.

A few diligent members also alerted us to problems they were having. In some cases, we could tell why, and in many cases, their systems needed to be upgraded to work with ours. Thanks go to mEDRA, Spandidos Publications, and Stichting SciPost who helped us identify gaps that resulted in configuration improvements and lessons learned (that we then shared with other members).

There were three issues that we were contacted about more than others:

Delayed delivery of notification emails which is partly due to the volume of backlogged notification emails in the system.
- Mostly solved: We have repaired delivery of notification emails for all metadata deposits and are working on a fix for the delivery of messages associated with very large queries.
A small percentage of registered records not being indexed in the REST API - this can cause downstream issues for a number of other services (e.g., Crossref metadata search - search.crossref.org, Participation Reports, ORCID auto-update, and for external services that make use of the metadata from our REST API).
- Mostly solved: All records in July are now indexed in the REST API, albeit we have new reports of a few records missing in the last week, which we are actively investigating.
Delayed delivery of July’s resolution reports.
- Solved - not only has July’s resolution report run completed, but we also completed August’s ahead of schedule.

This migration was a significant effort, and 2025’s top priority project for the Open and Sustainable Operations (OSO) program team. Overall, we’re happy with our progress toward making Crossref infrastructure more robust, reliable, and future-proof. And judging by the messages of support we received, you are too! Onwards to the next infrastructure project… check out our roadmap to see what’s up next.

References

‘Infrastructure as code’ (2025) Wikipedia, 12 August. Available at: https://en.wikipedia.org/wiki/Infrastructure_as_code (Accessed: 12 August 2025).
‘The programs approach: our experiences during the first quarter of 2025’ (2025) Crossref. Available at: https://doi.org/10.64000/4s2ee-wkr84 (Accessed: 12 August 2025).

From storage closet to metadata champions: ASM's journey toward a smarter scholarly infrastructure

David Haber — Mon, 04 Aug 2025 00:00:00 +0000

The American Society for Microbiology (ASM) has earned recognition in Crossref’s Participation Reports for its exceptional metadata coverage among large publishing members––an achievement built on intentional change, technical investment, and collaborative work. In this Q&A, the ASM team shares what that journey looked like, the challenges they’ve tackled, and how centering metadata has helped them better connect research with the global scientific community.

A key lesson we learned is that meaningful progress doesn’t require perfection from day one. Start small, find manageable wins, refine as you go, and build a shared understanding across all your teams.

– David Haber, ASM

Since we first featured your metadata efforts in 2022, what developments or improvements have you made—and how does this new recognition reflect the journey so far?

Once we completed our initial metadata cleanup of our backfile and made sure that we were producing good, clean, and consistent Crossref metadata (no small feat), we realized that each new policy, process, or even style change should be viewed through a metadata capture lens. By looking at our publishing goals through that lens, we are better able to see the right time and method to help enrich and “grow” both our article metadata breadth and depth. Much of the metadata work is invisible or an afterthought. But the recognition of ASM’s coverage in the participation reports has affirmed that our change in perspective — shifting from viewing Crossref metadata as something produced as an afterthought to centering our processes around the creation of that metadata — has put us on the right path.

Have any of your goals around metadata changed or grown since then? What feels different about your work now compared to when you were first featured?

When we first started on our various metadata cleanup projects, it felt like there were just a few of us, arguing, agreeing, and arguing some more about obscure tagging structures and proper XML modeling in a closet––literally… My office actually was an old storage closet, and my pre-pandemic whiteboard still has that ghostly blue haze of angle brackets scribbled with dry-erase markers.

Since then, our goals have shifted significantly. Early on, we just wanted all our content mapped to DOIs; then we thought, “Oh wait. Let’s include as many abstracts as possible. And references. If we have the data, let’s send it.” Now that we have a strong metadata foundation, we can think proactively about what to capture and transmit, how we want to prioritize our efforts, and how to make research we publish more discoverable to those who need it.

Looking back, were there any changes in internal collaboration or external partnerships that influenced your progress?

Over the past three to four years, we have made some significant changes to our partnerships. We migrated to a new online platform (Atypon), a new production partner (Kriyadocs), a new submission platform (Chronoshub), and a new billing system (RLSC). Each of these partnerships allowed us to evaluate how we were capturing metadata, when that capture occurred, and how best to improve the QC process to ensure accuracy and quality. These partnerships accelerated all our efforts to improve hidden metadata and finally brought them out of the storage closet into the light.

Have you adopted any new tools, standards, or technologies since your last blog?

Our production software (Kriyadocs) has centered metadata capture as a core function. We have processes and procedures that match all affiliations to Ringgold and ROR IDs. We have invested heavily in partnerships with organizations like Chronoshub to utilize natural language processing, automating the identification of authors and affiliations, so that users no longer have to fill out tedious forms. We embraced ORCID and strongly encourage all authors to register for one if they don’t already have it. We have also adopted the CRediT taxonomy as a contributor framework and have built processes to make it easy for authors to stay within that taxonomy.

Have you encountered any challenges in curating or improving your metadata? If so – what were they and how did you address those?

The core problem (from our perspective) has always been the difference between author profile information and what is actually submitted in manuscripts. Auto-extraction of manuscript data into submission forms is one small step toward unifying author identity with manuscript data. One of our biggest pain points now is reconciling the chaotic data on author affiliations in manuscripts with institutional identifiers. Over the next year, this will be one of our main initiatives.

The capture of ORCID IDs has improved our ability to match papers to editors and identify hidden conflicts of interest. ORCID IDs have also helped us expand our reviewer pool, as they enable us to better disambiguate individuals with similar names.

Because we now capture CRediT roles in a controlled manner (rather than as loose text in the acknowledgments section), we are better able to identify when authors are contributing equally and how authors determine author order in the byline when this occurs. This analysis was undertaken by one of our Editors-in-Chief to study gender bias when authors contributed equally to a work. Now that we capture CRediT roles as structured data, we can build on his research.

In the last two years, we have also begun capturing Data Availability Statements and Ethics Statements in unique metadata fields (rather than as unstructured text in the body of an article or in the acknowledgments sections) because some of our editors are curious about open data policy compliance and whether there is higher uptake of open science initiatives in certain microbiology fields.

RC: These are very interesting and quite profound results, especially for integrity and equality in the publishing process! Good to see how useful you find this information as we’re approaching our schema updates to include contributor roles, among other things. I see that editors are already on board and taking advantage of high quality metadata. Are authors more engaged with metadata now than before?

Our authors likely are engaged too––though we have tried to build author metadata QC into our proofing and typesetting process in such a way that they wouldn’t even notice.

What challenges have you encountered while sustaining or scaling your metadata work?

In the realm of metadata, there are two standard solutions: 1) hire vendors to clean data at the end (the throw-people-at-the-problem philosophy); or 2) trust a black-box technical solution. The problem with the first method is that it is inefficient and can become expensive. The issue with the second is that, in my experience, most technical solutions have an 80% success rate. That may be acceptable for certain types of data, but it can fail spectacularly at the worst possible moment.

For example, let’s say you find a technical solution that parses affiliation data in such a way as to assign a PID. Great, wonderful. Let’s say your parser is the best natural language processor in the world and makes matches 90% of the time (if you have one that does this, I’m all ears). You announce that you are including these IDs. Everyone cheers. It is great, right? Now, imagine you want to use those IDs to identify subscribing institutions to offer discounts or fee-less publishing for authors. You also want to use those IDs to send alerts to institutional admins of publishing activity. In both situations, achieving 90% accuracy simply won’t work. What we’ve learned is that black-box technology and ’throw people at it’ philosophies cannot work alone. Metadata curation must be a collaborative effort among authors, publishers, funders, and institutions, where the information grows throughout the research process.

What’s next? Are you exploring any new metadata elements or areas (e.g., funding data, peer review metadata, preprints)?

Over the next year, we will focus on CRediT identifiers and pass them to Crossref, along with institutional PIDs (ROR, Ringgold, and ISNI). We are also exploring various ways to capture peer reviewer activity and contributions, which will inevitably lead us down new and interesting paths.

Here’s the thing about metadata that I wish I’d known when I started: it’s not a project with a finish line. It’s more like tending a garden that keeps growing in unexpected directions. Every time you think you’ve got it figured out, someone invents a new identifier, or your authors start doing something creative with their affiliations, or a funder changes their requirements, and suddenly you’re back to the drawing board.

But what I’ve also learned from our journey out of that metaphorical (and literal) storage closet: the best metadata work happens when you start thinking of it as infrastructure. Good metadata is like good plumbing; when it’s working, nobody notices it, but when it’s not, everything backs up and gets messy fast.

If you’re just starting this journey, my advice is this: don’t try to boil the ocean (gosh, I still need to remember that one). Pick one thing. Perhaps it could be ORCID IDs or institutional identifiers. Do it really, really well. Then build on that success. And please, for the love of all that is holy, invest in good partnerships. We couldn’t have done any of this without partners who understood that metadata isn’t just data entry; it’s the connective tissue of scholarly communication.

Of course, even with the best partners and aligned teams, there will still be moments when you’ll sit dumbfounded in front of a screen where an author’s affiliation that was listed as “Bloomberg School of Public Health” matched to the identifier linked to the “Escuela Nacional de Sanidad.” On those days, just remember: at least you’re not still working in a storage closet with a haunted whiteboard.

Good metadata is more than just a technical specification, and it’s not just for those XML wonks and nerds. It’s a service to science, and its core mission is to help us understand the world around us.

– David Haber, ASM

ASM’s story is a reminder that building a strong metadata infrastructure isn’t just about meeting technical requirements—it’s about aligning people, tools, and values around the idea that clean, connected, and consistent metadata is foundational to open and discoverable research. Whether you’re starting small or overhauling major systems, their experience shows what’s possible when you treat metadata not as a checkbox, but as a core part of scholarly publishing.

Thank you, David, for taking the time to share your insights. Again, congratulations!

Changing fees to increase equity and reduce complexity

Amanda Bartell — Mon, 28 Jul 2025 00:00:00 +0000

The Crossref Board recently approved three recommendations for changes to our fees: introduction of a new lowest membership fee tier, removal of volume discounts for record registration, and normalisation of registration fees for peer reviews. The changes will be applied from January 2026.

This is the first outcome of the Resourcing Crossref for Future Sustainability (RCFS) program, launched in 2023, as a comprehensive effort to review all aspects of Crossref revenue and how we’re adapting to growth and the diversification of our membership. The program aims to make fees more equitable, simplify our complex fee schedule, and rebalance revenue sources.

Following two rounds of member surveys, feedback gathered from the community in polls, open discussions, and emails, the Membership and Fees (M&F) Committee (made up of 30+ representatives from members, service providers, sponsors, and community partners) discussed evidence and made the first round of recommendations to the Board this month. We’re very thankful for their time spent reviewing data and sharing their experiences to get to this point.

GOAL 1: More equitable fees

Our membership has changed over the years - members now tend to be less well-resourced, more likely to be based in Asia or Latin America, and more likely to be much smaller operations, some of which may not even be organisations but volunteer groups. We are seeing more universities join as members, and fewer members now consider themselves publishers first and foremost. With our mission of creating a complete global research nexus, this growing diversity is excellent news.

While new member growth is steady (2.3k members per year), over half join via a Sponsor (that makes membership more accessible both financially and technically), and close to 300 members have their membership revoked due to unpaid invoices each year, indicating that the current fee may be a barrier to participation for some.

Area of focus: Define a new basis for sizing and tiering members for their capacity to pay

Our annual membership fees are currently tiered according to the publishing revenue or expenses (whichever is higher) of each member. This enables each member to contribute to the community infrastructure according to their capacity to pay.

One of the first areas under consideration throughout 2024 was an option to change the basis of our membership fees from the publishing revenue (or expenses) of each organization to their overall organisational revenue (or expenses) instead.

Through surveys, discussions with the M&F committee, and at the Crossref 2024 Annual Meeting, we received strong feedback, particularly from those based at institutions and/or following a diamond open-access model, that making this change would put Crossref beyond their reach.

It became clear therefore that we should NOT change the basis for sizing and tiering members.

Instead, we will maintain the current basis for sizing and tiering members by considering their publishing revenue or expenses, whichever is higher. For non-publisher members, we advise taking ‘publishing’ to mean ‘producing’, so taking their cost of producing the works being registered with us, whether that is data, software, imagery, physical objects, etc.

Area of focus: Evaluate the USD 275 annual membership fee tier and propose a more equitable pricing structure, which might entail breaking this down into two or more different tiers.

We also looked into making our fees more equitable. It’s been long recognised that our lowest fee tier (an annual fee payment of USD 275 for all members with publishing revenue up to USD 1 million) represents a huge diversity of organisations operating within a range of financial contexts - over 95% of our non-sponsored members are in this category, and this is the category almost all new members join in. Throughout the project, we ran various surveys with our members to learn more about the makeup and factors affecting the capacity to pay for this group.

From January 2026, we will create a new annual membership tier for members whose publishing revenue/expenses (whichever is higher) is equal to or lower than USD 1,000 per year.

Based on survey data, we expect 30-60% of our current members in the current USD 275 tier to move to this new category. This new membership fee tier will be set at USD 200 in 2026, which is 27% lower than the current 275 membership fee. We will monitor the uptake in this category, with a view to identify necessary adjustments in future years. As a result, we expect a decrease in revenue of between USD 174k (if 30% of current lowest tier members move into the new tier) and USD 348k (if 60% of those members move into the new tier).

Our Membership team will reach out to help qualifying members change to the new tier well before January 2026. If your publishing revenue or expenses are equal to or lower than USD 1,000 per year, look out for our email in the next couple of weeks to help you transition to the lower USD 200 tier.

GOAL 2: Simplify complex fees

Area of focus: Address and adjust volume discounts for Content Registration

We currently offer volume discounts for several of our record types. These are calculated at the end of each quarter.

In order to reduce the complexity of our pricing, we will eliminate all volume discounts.

They are underused, accessible only to a small percentage of members, and the financial impact of making the change is small. These discounts contribute to complexity in our billing process and block our ability to offer members a running total or provide leaving members with a timely final invoice.

Having consulted with affected organisations, we’re reassured that the change will not adversely affect their ability to register their works with us. We appreciate their understanding of the overall positive impact of this change for Crossref and their support for our sustainability.

Area of focus: Reduce complexity in peer review fees

Finally, prompted by feedback from our members, we looked into normalising fees for peer review registration. We currently have two sets of fees for peer reviews based on whether the review is registered by the owner of the item being reviewed. There is a charge for the first review for a specific article, and a different charge for subsequent reviews for the same article by the same member. This charge for the subsequent reviews also varies depending on who registered the review. Very few members register peer reviews for records that they do not own, so having a separate, higher set of fees just adds complexity to the fee schedule with no financial or strategic benefit.

Starting from January 2026, we will consolidate all peer review fees, regardless of who registers it, to USD 0.25 for the first review for an article, and free registration for any subsequent reviews of that same record by the same member.

Area of focus: Address and adjust back-year discounts for record registration

Another recommendation, related to the removal of back-year discounts for select record types (conference proceedings, technical reports and working papers, theses and dissertations, and posted content/preprints) due to under use, hasn’t been approved yet. Based on feedback from the board, more research will be conducted on trends related to specific record types, such as theses and dissertations, so we can better understand potential unintended consequences of such changes.

We’re looking to retain back-year discounts for record types where they continue to be well-used, including those for journal articles and book titles. We’re also looking to retain back-year discounts for grants, as these are at an early stage of adoption, and new funders coming on board naturally start with a backlog of grants to register in the Grant Linking System.

What happens next?

The Resourcing Crossref for Future Sustainability (RCFS) initiative is very broad, and in the coming months and years you can expect progress with other aspects of our fees and resourcing. There is more work to come, including the rebalancing of revenue from the use of our metadata, the future of fees for our funder members, and further changes to record registration fees.

We’re glad to see the first changes progressing to implementation, and would like to thank our Membership and Fees Committee and all members who took part in the consultations so far for your continued support.

Metadata excellence among new members: La Salle University, Perú

Yasiel Pérez Vera — Fri, 25 Jul 2025 00:00:00 +0000

Click here for the version in English

En 2025, lanzamos los Premios Crossref a los Metadatos, con el objetivo de destacar el rol de nuestra comunidad en la gestión y el enriquecimiento del registro académico. En esta publicación, destacamos a la Universidad La Salle, Perú, ganadora del premio a la excelencia entre los nuevos miembros, y contamos con la participación de Yasiel Pérez, Responsable Técnico y Editor de la Revista, quien comparte sus ideas:

Por qué los metadatos importan para nosotros

La Universidad La Salle se convirtió en miembro de Crossref hace relativamente poco tiempo, en 2023. Gestionamos nuestras revistas usando Open Journal Systems (OJS), y una vez que nos unimos a esta comunidad, los diferentes Consejos Editoriales compartimos la motivación de lograr una mayor visibilidad global, y vimos una oportunidad de mejora al proporcionar más metadatos y más completos.

El lado técnico de subsanar las deficiencias

Nuestras revistas, que llevan activas entre dos y cuatro años, han comenzado a enriquecer sus metadatos faltantes a niveles aceptables (¡creemos que aún podemos mejorar a niveles excelentes!). Gracias a mi formación como ingeniero de software, adaptamos el plugin de OJS para que admita campos de metadatos adicionales que no están disponibles en las versiones anteriores. El plugin requiere actualizaciones, por lo que realizamos modificaciones personalizadas para que sea compatible con los esquemas Crossref más recientes. Debido a limitaciones de tiempo, recursos humanos y financieros, consideramos más eficiente adaptar el plugin en lugar de adaptar nuestras instalaciones de OJS a las últimas versiones. Con estas modificaciones, depositamos los ROR ID, las licencias, las páginas de políticas y las actualizaciones de las revistas en Crossmark.

Por otro lado, hemos probado la versión con Soporte a Largo Plazo actual y la versión 3.5 de OJS, y recomiendo encarecidamente a cualquier usuario que actualice a cualquiera de estas versiones más recientes. Incluyen importantes parches de seguridad y, además, los plugins de Crossref son compatibles con los esquemas más recientes. Desafortunadamente, para nosotros, actualizar los sistemas desde una versión anterior a la 3.3 requiere tiempo adicional y soporte técnico, dada la importancia de los cambios de la v3.2 a la v3.3.

Haciendo las políticas sobre metadatos una prioridad

Tenemos un compromiso institucional con la provisión de metadatos enriquecidos. Contamos con políticas que exigen metadatos lo más completos posible como parte de nuestros flujos de trabajo, y lo convertimos en un requisito estricto. Naturalmente, existen algunos desafíos. Los metadatos abiertos y transparentes aún están relativamente poco valorados. A veces, los editores no comprenden completamente las implicaciones de proporcionar metadatos enriquecidos; mostrar su nombre en el sitio web no es lo mismo que tenerlo en los metadatos, por lo que la conexión entre la versión de registro y su visibilidad no siempre es evidente para autores y editores. Los apoyamos proporcionando directrices y capacitación a los consejos editoriales y equipos de las revistas. Por ejemplo, si una afiliación no está disponible en ROR, animamos a los autores a solicitar su inclusión en el registro.

Por otro lado, esto también nos motiva. Nos estamos preparando para empezar a incluir metadatos de subvenciones y financiación en nuestros flujos de trabajo. También apuntamos a utilizar estos datos para estudiar el impacto de nuestras políticas editoriales en la visibilidad, el uso, las citas, la indexación y otras métricas institucionales. La Universidad La Salle es una organización interesante porque formamos una red de universidades de todo el mundo, lo que provoca errores en la identificación adecuada.

Creemos que ciertamente otras organizaciones pueden lograr altos niveles de enriquecimiento de metadatos. Esto tiene dos aspectos fundamentales: uno técnico y otro organizativo. Desde nuestra perspectiva, el primer paso es obtener el apoyo de la organización y establecer políticas a nivel de toda la organización. Las soluciones técnicas pueden seguir después y no son fundamentalmente difíciles en comparación con conseguir que la comunidad proporcione metadatos buenos y completos.

Una vez que se consigue la asignación de recursos, se planifica la hoja de ruta para recopilar más metadatos. Es mejor tenerlos y no usarlos que necesitarlos y no tenerlos. Por ejemplo, ya estamos recopilando los roles de autor utilizando la taxonomía CRediT, por lo que una vez que sea totalmente compatible con el esquema de Crossref, queremos estar preparados para enviarlos. Idealmente, nos gustaría ver compatibilidad con identificadores alternativos y más tipos de fechas. Recopilamos las fechas de envío y aceptación a través de Crossmark y asignamos simultáneamente DOI, PURL y ARK. Con el tiempo suficiente, también planeamos implementar la revisión por pares abierta en nuestras revistas.

Lo que el reconocimiento nos ayudó a lograr

Recibir este premio ha tenido un profundo impacto en nuestra organización; nos ayuda a reforzar el mensaje que intentamos transmitir a nuestra comunidad. Abrió los ojos de las autoridades y los gestores de presupuesto, y también está aumentando la visibilidad de la organización en la región. Queremos ser vistos como un ejemplo en la comunidad local y regional: «Si una institución provincial puede hacerlo, otras también». Hemos comenzado a recibir llamadas solicitando capacitación para otras organizaciones. Por lo tanto, este premio ha sido sin duda fundamental para nosotros.

Version in English

In 2025, we launched the Crossref Metadata Awards, aiming to highlight our community’s role in stewarding and enriching the scholarly record. In this post, we put the spotlight on La Salle University, Perú, winner of the award for excellence among new members, and have Yasiel Pérez, Technical Head and Journal Editor, sharing his insights:

Why metadata matters to us

La Salle University became a Crossref member relatively recently, in 2023. We manage our journals using Open Journal Systems (OJS), and once we became part of this community, the different Editorial Boards had as a common motivation achieving more global visibility, and we saw an opportunity for improvement by providing more and richer metadata.

Technical side of filling the gaps

Our journals that have been active for two to four years started enriching their missing metadata to acceptable levels (we still think we can improve to excellent levels!). Because of my background as a software engineer, we adapted the OJS plugin to support additional metadata fields not yet available in the older versions. The plugin requires updates, so we made custom modifications to support the latest Crossref schemas. Because of time, human, and financial constraints, we found it most efficient to adapt the plugin rather than to adapt our OJS installations to the latest versions. With these modifications, we deposit ROR IDs, licences, and the journals’ policy pages and updates to Crossmark.

On the other hand, we have tested the current Long-term support and the 3.5 versions of OJS and I fully recommend to any user to upgrade to any of these more recent versions, there are important security patches and also the Crossref plugins are compatible with the latest schemas. Unfortunately, for us, upgrading the systems from a version older than 3.3 requires additional time and technical support, given the importance of changes from v3.2 to v3.3.

Making metadata a policy priority

We have an institutional commitment to the provision of rich metadata. We have policies in place to require metadata as complete as possible as part of our workflows and we make this a strict requirement. Naturally, there are some challenges. Open and transparent metadata is still relatively underappreciated. Sometimes editors don’t fully understand the implications of providing rich metadata; displaying your name in the website is not the same as having it on the metadata so the connection between the version of record and its visibility is not always evident for authors and editors. We support them by providing guidelines and training to the editorial boards and journal teams. E.g. if an affiliation is not available in ROR we encourage authors to request their inclusion in the registry.

On the other hand, this is also a motivational push for us. We are preparing to start including grant and funding metadata in our workflows. We also aim to use this data to study the impact of our editorial policies on the visibility, use, citations, indexation, and other institutional metrics. La Salle University is an interesting organization because we are a network of universities across the world, leading to mistakes in proper identification.

We certainly think that other organizations can achieve high levels of metadata enrichment. There are two fundamental aspects to it: A technical aspect and an organizational aspect. From our point of view, the first step is gaining organizational support, establishing organization-wide policies. The technical solutions can follow and are not fundamentally difficult compared with having the community provide good and complete metadata.

Once you manage to secure the assignment of resources, then you plan the roadmap for collecting more metadata. It’s better to have it and not use it than to need it and not have it. For example, we already collect author roles using the CRediT taxonomy, so once it is fully supported by Crossref’s schema, we want to be prepared to submit them. Ideally, we would like to see support for alternative identifiers and more types of dates. We collect submission and acceptance dates via Crossmark and we simultaneously assign DOI, PURL, and ARK. Given enough time, we are also planning to implement open peer review in our journals.

What the recognition helped us achieve

Receiving this award has been profoundly impactful for our organization; it helps us reinforce the message that we are trying to deliver to our community. It opened the eyes of the authorities and budget managers, and it is also increasing the organization’s visibility in the region. We want to be seen as an example in the local and regional community—“if a provincial institution can do it, others can too.” We have started receiving calls requesting training for other organizations. So, this award has certainly become pivotal for us.

Crossref at Beijing International Book Fair 2025

Johanssen Obanda — Thu, 24 Jul 2025 00:00:00 +0000

This June, we presented at the Beijing International Book Fair (BIBF) and connected directly with our growing community in China. With a surge of interest from Chinese publishers and partners, it was clear: there’s a strong and rising curiosity around how metadata plays a vital role in maintaining the integrity of the scholarly record.

And we were not alone: our incredible Crossref Ambassadors based in the region joined us at the booth, and together we hosted visitors and answered questions. Throughout the fair, we engaged in passionate conversations, provided metadata guidance, and shared our knowledge as part of a panel session focused on how metadata supports scholarship. Ms. Ran Dang, Editorial Director at Atlantis Press (Springer Nature), supports Crossref outreach and advocates for Open Access and Open Science. Ms. Xiaofeng Guo, Director at Sin-Chn Scientific Press, leads DOI infrastructure efforts in China and supports Crossref members across the region. Mr. Gantulga Lkhagva, Founder and CEO of Mongolian Digital Knowledge Solutions and MongoliaJOL, works to strengthen local scholarly publishing and promote metadata best practices.

Photo: Crossref Ambassadors and Staff

This was the first time some of us had met in person after years of online collaboration, and the sense of connection and shared purpose was energising. Our Ambassadors also contributed to this post, sharing their favourite moments, key takeaways, and stories from the fair.

A snapshot from the panel discussion

During BIBF, we hosted a panel session focused on the role of metadata in supporting scholarship. Ms. Alicia Wang, Vice President - CNPIEC Kexin Technology Co., Ltd, Robbykha Rosalien, Membership Support Specialist - Crossref, Johanssen Obanda - Community Engagement Manager - Crossref, and our Ambassadors joined the panel, and we were glad to have a mix of Crossref members, Metadata Plus users, and curious participants join the discussion.

Photo: Panel session - Ms. Alicia Wang, Mr. Gantulga Lkhagva, Ms. Robbykha Rosalien, Mr. Johanssen Obanda, Ms. Xiaofeng Guo, Ms. Ran Dang.

Ms Xiaofeng Guo making a presentation about how metadata supports scholarship

Key questions from the session included the status of open abstracts in Crossref, how retracted articles affect citation tracking and research integrity, and what happens when DOIs no longer resolve due to unmaintained landing pages.

Robbykha explained our DOI resolution and archival systems, clarifying that DOIs are designed to always resolve, even when the original content moves or becomes unavailable. We also touched on the work Crossref is doing to support transparency around retractions, and the goals of The Initiative for Open Abstracts, which aims to make research summaries more accessible.

Metadata Plus use cases from China

Two of our Metadata Plus users were present during the panel and generously shared how they are leveraging Crossref metadata in their work.

Jie He from ScienceRiver described how their team translates Crossref metadata from English into Chinese, making it possible for users in China to search for relevant academic literature originally published outside the mainland. Their efforts open up global research to local audiences, bridging language and accessibility gaps. This conversation also led to broader discussions about multilingual metadata and the work our Metadata Advisory Group hopes to support in this area.

Eurasia Academic Publishing Group, based in Hong Kong, talked about using Crossref metadata coupled with AI approaches to develop a tool for readers, editors, and institutions to help assess the integrity of research articles and detect paper mills.

Reflections from our Ambassadors and the community

One common thread throughout our time at BIBF was the recognition that many of our resources, documentation, and support materials are still primarily in English. For Chinese-speaking community members who are new to Crossref or metadata concepts, this creates a pretty steep learning curve. We heard this clearly, and we know there’s work to do in making our services more accessible across languages.

From personal highlights to fascinating conversations, here’s what some of our Ambassadors had to say:

I am very happy to have met with colleagues from Crossref and several Ambassadors from Asia! We have met many times online, but this was the first time we met face-to-face and worked together to engage with our members and host events! I learned a great deal from our face-to-face exchanges, including updates on Crossref’s latest use cases, industry development trends, and even information about my colleagues’ hometowns. We built friendships and successfully participated in the first BIBF event for Crossref, which was the biggest takeaway!

我非常高兴，能够与Crossref的同事和亚洲的几位大使见面！我们曾经多次在网络会议中见面，但是这是第一次面对面，并且共同面对用户、举办活动！在我们面对面的交流中我也学到了很多，包括Crossref的最新应用案例，行业发展情况，甚至同事们自己家乡的情况！我们建立了友谊，成功举办了第一次BIBF活动，这是最大的收获！

At the BIBF exhibition and events, we had good conversations with our Chinese partners and some members, and learned about actual application needs and use cases, which was very helpful to me. Most of the people I met spoke Chinese, but their publishers or institutions may have come from countries and regions outside mainland China, such as Singapore, Hong Kong, and Taiwan.

在此次BIBF展览和活动中，我们与中国的合作伙伴以及很多用户面对面交流，了解到实际的应用需求和应用案例，这对我帮助很大。我接触的客户多半讲华语，但是他们的出版社或机构可能来自新加坡、香港、台湾等中国大陆以外的国家和地区。

I also participated in the BIBF Forum events held before the exhibition, including the PubTech Conference, the first STM Asia-Pacific Conference, and the networking dinner. These three events were jointly organised by China National Publications Import and Export (Group) Corporation (CNPIEC), STM, and the Chinese Society of China University Journals (SCUJ). During the events, I heard about the latest developments in the publishing industry and gained valuable insights into hot topics. I also met many new and old friends and partners, some from China and others from around the world. Interacting with them not only allowed me to reminisce about the past but also provided me with new perspectives and expanded my professional network.

我这次也参加了在展览之前举办的BIBF论坛活动，包括的PubTech论坛，以及首界STM亚太会议和交流晚宴。这三个活动是由中国图书进出口公司（CNPIEC）、STM和中国高校科技期刊研究会（SCUJ）联合举办的。在活动中我听到了很多出版行业的最新发展以及针对热点问题的真知灼见，见到了很多新老朋友和伙伴，他们部分来自中国，部分来自世界各地。与他们交流不仅让我重温旧时光，也获得了新的见解、新的人脉。

Discussion with Ms. Bo Li from China Education Publication Import & Export Corporation (CEPIEC) on matching papers with their funding grants from China. This is an excellent use case for Crossref’s Grant Linking System (GLS) service and related metadata. We introduced the GLS service and Crossref metadata to Ms. Bo Li and will follow up with her and her colleagues to help them use Crossref’s metadata to complete this task more easily.

与中国教育图书进出口公司的李博女士讨论为科研基金匹配项目资助的论文元数据。这是一个非常好的应用案例，可以利用Crossref的GLS服务以及相关元数据。我们向李博介绍了GLS服务以及元数据的相关情况，之后还将与她和她的同事进行深入讨论，帮助他们利用Crossref的元数据更快捷地完成此项工作。

Discussion with Dr. Zhu Xuefeng. Their team has developed an application that identifies research integrity issues in journals and articles. They primarily utilise Crossref metadata (including article metadata and retraction observation data), withdrarXiv, ORCID and Research Organization Registry (ROR) data, among others. By linking and integrating these data, they calculate the research integrity risk of relevant journals and articles, providing a reference for authors submitting manuscripts, editors reviewing manuscripts, and institutions monitoring research integrity issues.

与朱学峰博士的讨论。他们的团队开发了一款应用程序，识别期刊/文章的科研诚信问题。他们主要利用了Crossref元数据（包括文章元数据和撤稿观察数据），arXiv的撤回数据集，以及ORCID和ROR数据等，通过关联、集成这些数据计算相关期刊/文章的科研诚信风险，为作者投稿、编辑审稿、机构监测科研诚信问题等提供参考。

At the Crossref BIBF event, Ms. Wang Xuan, Vice President of CNPIEC Kexin Technology Co., Ltd, a Crossref sponsor in China, discussed the strong demand for reliable data sources when applying AI in the field of scientific research, as well as how Crossref metadata can provide strong support. She proposed that all AI products focusing on scientific research should show the original DOIs for the academic resources they cite in the results they provide to users, to enhance the reliability and traceability of data sources. She committed that her company, Ke Xin, as a provider of research AI assistants, will implement this functionality in its products and hopes to promote this as a best practice to all research AI application developers and providers. This reflects that, as cutting-edge technology advances and requirements for research integrity and compliance continue to rise, Crossref metadata continues to play an important role in scholarship and will become increasingly extensive and indispensable.

在Crossref BIBF活动上，中图科信公司（Crossref中国赞助机构）副总经理王轩女士在讨论中阐述了关于AI在科研领域应用时对于可信数据来源的强烈需求，以及Crossref元数据如何能提供有力支撑的想法。她倡议所有的科研AI产品在为用户提供结果时，应对引用的学术资源提供原始的DOI标识，以增强数据来源的可信度和可追踪性。她承诺中图科信公司作为科研AI助手的提供者将在其产品中实现这一功能，并希望能将此作为最佳实践向所有科研AI应用的开发者、提供者进行推广。这反映了随着前沿科技发展以及科研诚信与合规要求不断提升，Crossref元数据对于学术研究提供的支撑作用将越来越广泛、越来越重要。

Connecting the dots: FWFs transition to linked grant metadata to support a thriving culture of openness

Rocío Gaudioso Pedraza — Wed, 23 Jul 2025 00:00:00 +0000

Click here for the version in German

As a new Community Engagement Manager at Crossref, dedicated to working with the funders community, I frequently hear requests for examples and case studies of adopting Crossref’s Grant Linking System (GLS) by ‘funders like us’. This has spurred me to start a series of blog posts presenting funders’ perspectives on joining Crossref and using our system – to demonstrate how it’s done.

In the first case study of a series, I speak with Katharina Rieck, Open Science Manager at the Austrian Science Fund (FWF), Austria’s national funding agency for basic research, about the agency’s approach to research metadata, transparency and openness, and the role that the Grant Linking System plays in it.

With a strong track record in Open Access and Open Science, the FWF’s decision to implement grant IDs represents more than a mere technical upgrade. What began as an initiative to enhance the openness and interoperability of grant information illustrates that truly open research infrastructure is not solely a matter of systems, but about people, policies and collaboration.

Katharina was also elected to the Crossref Board at our November 2024 Annual Meeting, and started her three-year term in January 2025.

Could you introduce your organisation? And what is your role?

The Austrian Science Fund (FWF) is Austria’s national funding agency for basic research. The FWF funds all disciplines, from Social Sciences and Humanities to Life Sciences and Natural Sciences and Technology. As Open Science Manager, I am responsible for developing the FWF’s Open Science strategy, including the development of the Open Access Policy for Peer-Reviewed Publications, the Open Access Policy for Research Data as well as the FWF Research Data Management Policy. I am also responsible for the development and implementation of funding instruments such as the FWF Open-Access Block Grant and support for Open Science infrastructures.

What motivated you to join Crossref?

For more than two decades, the FWF has actively promoted and supported various aspects of Open Science. In 2004, it published its first Open Access Policy, making it one of the first funding organizations worldwide to adopt an Open Access policy for publications. In line with the commitment to open research information as a core pillar of Open Science, the FWF has taken further steps to strengthen openness and transparency: it joined Crossref to register grant DOIs and became a signatory of the Barcelona Declaration on Open Research Information and joined Crossref to register grant DOIs.

While funding metadata––information about projects funded by the FWF––has long been freely available on our website, the launch of the Research Radar in 2023 marked a significant step forward. Our goal was not only to maintain accessibility but to ensure that the data published in the Research Radar is interoperable and aligned with the FAIR principles. By implementing the Grant Linking System from Crossref, we assign each FWF funded project a unique, persistent identifier with associated metadata, helping to make FWF grant information open, interoperable and sustainable.

Can you tell us about your experience using the Grant Linking System?

We have been using the Grant Linking System since November 2023. With the launch of the FWF’s new website and the introduction of the Research Radar, we began registering Crossref grant IDs (DOIs) for all grants included in the Research Radar database. As a result, all FWF-funded projects dating back to 1995 are now uniquely identifiable. The process of registering grant metadata with Crossref is straightforward, and we have set up a smooth internal workflow that enables the registration of DOIs after the FWF’s funding decision.

It is important to note that implementing Crossref grant IDs involved more than just a technical setup––it required the development of new internal processes and coordination through a dedicated Crossref grant DOI implementation group. The implementation process also resulted in a revised structure for grant numbers (DOI suffixes) for FWF-funded projects, establishing a sustainable and future-proof system.

How was your journey to socialise the Grant Linking System within your research community? How did you communicate the importance of identifiers and grant metadata to your grant holders?

The introduction of grant DOIs was supported by a comprehensive communication strategy, including dedicated online resources (e.g., New Identification Numbers for FWF Projects –– FWF), updates across multiple pages of the FWF website (such as Carrying out Your Project –– FWF), and presentations at various events. This communication strategy aimed to explain the purpose and value of the “new numbers” ensuring that researchers and stakeholders understood how this contributes to greater visibility, traceability, and openness of funded research.

As a funding organisation, we require grant recipients to acknowledge FWF support in all research outputs resulting from their projects. With the integration of grant DOIs into FWF’s metadata, the standardised acknowledgment text was updated to ensure that the DOIs are now included in outputs. The new required wording is: ‘This research was funded in whole or in part by the Austrian Science Fund (FWF) [grant DOI],’ and is now a requirement in the FWF funding agreement. Including the grant DOI both in the output metadata and the acknowledgment text enhances traceability and supports more effective analysis of FWF-funded outputs.

What do you find useful about registering grant metadata with Crossref?

One of the key benefits of registering grant metadata is the enhanced interconnectivity and the unique identification of FWF’s grant information. By registering our grants with Crossref, funding information becomes more than just information on the FWF website––it becomes interoperable data that is accessible and reusable. This not only increases visibility but also enables us to better analyse the outcomes of funded projects and ensures that the data is accessible as well as (re)usable by the broader research community.

In addition to assigning Crossref Grant IDs and registering grant metadata, the FWF has required ORCID IDs for researchers since 2016 and mandates the use of ROR IDs for institutions. The consistent use of persistent identifiers in metadata ensures the interoperability of FWF grant information and facilitates seamless integration with external data sources.

What are your hopes for the GLS and greater transparency in funding metadata in general?

The FAIRness and openness of research information––including metadata on funding information, research outputs, researchers, and institutions––are fundamental to a well-functioning research ecosystem. I hope to see a broader adoption of persistent identifiers in metadata, particularly in grant information, as well as a broader commitment to openly sharing research information as expressed in the Barcelona Declaration. Moreover, a key objective should be to ensure the highest possible accuracy of metadata at the point of entry. This entails, for instance, that publication metadata accurately includes funding metadata.

What were the key challenges you encountered when embracing the GLS, and how did you overcome them?

One of the key challenges we encountered when adopting the GLS was ensuring seamless integration in our existing IT infrastructure and workflows. Integrating the new number across different systems required considerable coordination. We overcame this challenge by establishing a dedicated implementation team that included IT experts.

Another challenge involved communicating and disseminating information regarding the grant DOI, ensuring that researchers and other relevant stakeholders were adequately informed. This was successfully managed through targeted and comprehensive communication efforts.

Based on your experience, what would be your advice for colleagues from other research funders?

It is important to recognise that registering grant identifers and metadata goes beyond a mere technical implementation. This is an opportunity to engage with diverse stakeholders, rethink processes and highlight the value of open funding metadata for the entire research community.

We are grateful to Katharina Rieck and FWF for generously sharing their insights and know-how. Their experience highlights the importance of seeing metadata not just as information, but as a shared resource that connects and empowers the research community.

Version in German

The title has been changed slightly from the original version. Translation by Lena Stoll.

Connecting the Dots: Wie der FWF durch die Umstellung auf vernetzte Fördermetadaten eine Kultur der Offenheit fördert

Als neue Community-Engagement-Managerin bei Crossref, die sich der Zusammenarbeit mit Fördergebern widmet, werde ich häufig gefragt, ob ich Beispiele und Fallstudien von „Förderern wie uns“ geben kann, die Crossrefs Grant Linking System (GLS) bereits eingeführt haben. Dies hat mich dazu veranlasst, eine Blogreihe zu starten, in der ich die Perspektiven von Fördergebern auf eine Crossref-Mitgliedschaft und die Nutzung unseres Systems vorstelle – um zu zeigen, wie es funktioniert.

In der ersten Fallstudie dieser Reihe spreche ich mit Katharina Rieck, Open-Science-Managerin beim Österreichischen Wissenschaftsfonds FWF, Österreichs nationaler Förderagentur für Grundlagenforschung, über den Ansatz des FWF zu Forschungsmetadaten, Transparenz und Offenheit sowie über die Rolle, die das Grant Linking System dabei spielt.

Mit seiner langjährigen Erfahrung im Bereich Open Access und Open Science stellt die Entscheidung des FWF, Grant-IDs (DOIs für Fördermittel) einzuführen, mehr als nur eine technische Verbesserung dar. Die Initiative begann mit dem Ziel, die Offenheit und Interoperabilität von Förderinformationen zu verbessern, aber schon bald wurde klar, dass eine wirklich offene Forschungsinfrastruktur nicht nur eine Frage der Systeme ist, sondern auch Menschen, Regelwerke, Abläufe und die Zusammenarbeit betrifft.

Katharina Rieck wurde auf unserer Jahresversammlung im November 2024 außerdem in Crossrefs Board of Directors gewählt und ist im Januar 2025 ihre dreijährige Amtszeit angetreten.

Bitte stellen Sie den FWF kurz vor und erklären Sie unseren Leser:innen, was Ihre Rolle dort ist.

Der Österreichische Wissenschaftsfonds FWF ist Österreichs nationale Förderorganisation für Grundlagenforschung. Der FWF fördert alle Disziplinen, von den Sozial- und Geisteswissenschaften über die Lebenswissenschaften bis hin zu Naturwissenschaften und Technik. Als Open-Science-Managerin bin ich für die Entwicklung der Open-Science-Strategie des FWF verantwortlich, einschließlich der Entwicklung der Open-Access-Policy für begutachtete Publikationen, der Open-Access-Policy für Forschungsdaten sowie der FWF-Richtlinie zum Forschungsdatenmanagement. Darüber hinaus bin ich verantwortlich für die Entwicklung und Umsetzung von Förderinstrumenten wie der Open-Access-Pauschale des FWF sowie die Unterstützung von Open-Science-Infrastrukturen.

Was hat Sie dazu bewogen, Crossref beizutreten?

Der FWF fördert und unterstützt seit mehr als zwei Jahrzehnten aktiv verschiedene Aspekte von Open Science. 2004 veröffentlichte er seine erste Open-Access-Policy und war damit eine der ersten Förderorganisationen weltweit, die eine Open-Access-Policy für Publikationen eingeführt haben. Im Einklang mit seinem Engagement für offene Forschungsinformationen als zentrale Säule von Open Science hat der FWF weitere Schritte unternommen, um Offenheit und Transparenz zu stärken: Der FWF ist Crossref beigetreten, um Grant-DOIs zu registrieren, und ist Unterzeichner der Barcelona Declaration on Open Research Information.

Zwar sind Metadaten zur Forschungsförderung – also Informationen über FWF-geförderte Projekte – schon seit Langem über unsere Website frei verfügbar. Doch die Einführung des Research Radar im Jahr 2023 war nochmal ein bedeutender Fortschritt. Unser Ziel war es nicht nur, den offenen Zugang zu den Metadaten aufrechtzuerhalten, sondern auch sicherzustellen, dass die im Forschungsradar veröffentlichten Daten interoperabel und mit den FAIR-Prinzipien vereinbar sind. Durch die Anwendung von Crossrefs Grant Linking System bekommt jetzt jedes vom FWF geförderte Projekt eine eindeutige, unveränderliche ID mit dazugehörigen Metadaten – und die Informationen zu FWF-Fördermitteln sind somit offen, interoperabel und nachhaltig verfügbar.

Können Sie uns mehr über Ihre Erfahrungen mit dem Grant Linking System erzählen?

Wir nutzen das Grant Linking System seit November 2023. Mit dem Launch der neuen FWF-Website und des Research Radar begannen wir damit, Crossref-Grant-IDs (DOIs) für alle in der Forschungsradar-Datenbank enthaltenen Förderungen zu registrieren. Dadurch sind nun alle FWF-geförderten Projekte seit 1995 eindeutig identifizierbar. Die Registrierung von Grant-Metadaten bei Crossref ist unkompliziert, und wir haben einen reibungslosen internen Workflow entwickelt, um DOIs nach der Förderentscheidung des FWF zu registrieren.

Es ist wichtig zu erwähnen, dass es für die Einführung von Crossref-Grant-IDs mehr als nur den Aufbau technischer Prozesse brauchte – wir haben auch neue interne Abläufe entwickelt und eine eigene Arbeitsgruppe für die Koordination von Crossref-Grant-DOIs gebildet. Im Zuge dieses Prozesses haben wir auch die Struktur der Projektnummern für FWF-geförderte Projekte (also der DOI-Suffixe) überarbeitet und somit ein nachhaltiges und zukunftssicheres System aufgebaut.

Welche Erfahrungen haben Sie damit gemacht, das Grant Linking System in Ihrer Forschungscommunity zu bewerben? Wie haben Sie Ihren Fördernehmer:innen die Wichtigkeit von Identifiern und Metadaten vermittelt?

Wir haben die Einführung der Grant-DOIs mit einer umfassenden Kommunikationsstrategie unterstützt, inklusive spezieller Online-Ressourcen (z. B. Neue Identifikationsnummern für FWF-Projekte), der Aktualisierung mehrerer Seiten auf der FWF-Website (z. B. Projekt durchführen) sowie Vorträgen bei diversen Veranstaltungen. Ziel dieser Kommunikationsstrategie war es, Zweck und Nutzen der „neuen Nummern“ zu erläutern und sicherzustellen, dass Forschende und Stakeholder verstehen, wie diese zu mehr Sichtbarkeit, Nachvollziehbarkeit und Offenheit der geförderten Forschung beitragen.

Als Förderorganisation verlangen wir von unseren Fördernehmer:innen, die Unterstützung durch den FWF in allen Forschungsergebnissen zu erwähnen, die aus dem Projekt resultieren. Mit der Integration der Grant-DOIs in die Metadaten des FWF haben wir den standardisierten Acknowledgement-Text aktualisiert, um sicherzustellen, dass die DOIs in den Ergebnissen erwähnt werden. Der neue erforderliche Wortlaut ist: „Diese Forschung wurde gänzlich oder teilweise durch den Wissenschaftsfonds FWF finanziert [Grant-DOI].“ und ist in jedem FWF-Fördervertrag festgeschrieben. Die Angabe von Grant-DOIs sowohl in den Metadaten als auch im Acknowledgement-Text von wissenschaftlichem Output verbessert die Rückverfolgbarkeit und ermöglicht eine genauere Analyse der vom FWF geförderten Ergebnisse.

Was finden Sie an der Registrierung von Fördermetadaten bei Crossref am hilfreichsten?

Einer der Hauptvorteile der Registrierung von Fördermetadaten ist die verbesserte Vernetzung und die eindeutige Identifizierung der Förderinformationen des FWF. Durch die Registrierung unserer Projekte bei Crossref werden Förderinformationen zu mehr als nur Informationen auf unserer Website – sie werden zu interoperablen Daten, die abrufbar und wiederverwendbar sind. Dies erhöht nicht nur die Sichtbarkeit, sondern ermöglicht uns auch eine bessere Analyse der Ergebnisse geförderter Projekte und stellt sicher, dass die Daten für die allgemeine Forschungsgemeinschaft zugänglich und (wieder-)verwendbar sind.

Neben der Vergabe von Crossref-Grant-IDs und der Registrierung von Fördermetadaten schreibt der FWF seit 2016 ORCID für Forschende sowie die Verwendung von ROR IDs für Institutionen vor. Die konsequente Verwendung persistenter IDs in den Metadaten gewährleistet die Interoperabilität der FWF-Förderinformationen und erleichtert die nahtlose Integration mit externen Datenquellen.

Was erhoffen Sie sich vom GLS und von mehr Transparenz bei Fördermetadaten im Allgemeinen?

Die FAIRness und Offenheit von Forschungsinformationen – einschließlich der Metadaten zu Förderinformationen, Forschungsergebnissen, Forschenden und Institutionen – sind für ein gut funktionierendes Forschungsökosystem wesentlich. Ich hoffe auf eine weiterreichende Anwendung von persistenten IDs in Metadaten, insbesondere in Förderinformationen, und auf ein größeres Engagement für den offenen Austausch von Forschungsinformationen, wie es zum Beispiel in der Barcelona Declaration on Open Research Information gefordert wird. Darüber hinaus sollte sichergestellt werden, dass die Metadaten bereits bei der Eingabe und damit bei ihrer Generierung möglichst korrekt sind. Das bedeutet unter anderem, dass die Metadaten von Publikationen die korrekten Fördermetadaten enthalten sollten.

Welche Herausforderungen sind bei der Einführung des GLS aufgetreten und wie haben Sie diese gemeistert?

Eine der größten Herausforderungen bestand darin, das Grant Linking System nahtlos in unsere bestehende IT-Infrastruktur und Arbeitsabläufe zu integrieren. Die „neue Nummer“ in die unterschiedlichen Systeme zu integrieren, bedeutete einen hohen Koordinationsaufwand. Gemeistert haben wir diese Herausforderung durch die Bildung einer eigenen Arbeitsgruppe für die Anwendung von Crossref-Grant-DOIs, in der auch IT-Expert:innen vertreten waren.

Eine weitere Herausforderung bestand in der Kommunikation und Verbreitung von Informationen zu Grant-DOIs, um Forschende und andere Stakeholder angemessen zu informieren. Das haben wir durch gezielte und umfassende Kommunikationsmaßnahmen erreicht.

Basierend auf Ihrer eigenen Erfahrung, welchen Ratschlag würden Sie Kolleg:innen bei anderen Fördergebern mitgeben?

Es ist wichtig zu verstehen, dass die Registrierung von Grant-IDs und Metadaten über eine bloße technische Umsetzung hinausgeht. Der Prozess bietet die Gelegenheit, mit verschiedenen Stakeholdern in Kontakt zu treten, Abläufe zu überdenken und den Wert offener Fördermetadaten für die gesamte Forschungsgemeinschaft zu unterstreichen.

Wir danken Katharina Rieck und dem FWF für ihre Bereitschaft, ihre Erkenntnisse und ihr Know-how so großzügig zu teilen. Ihr Erfahrungsbericht hat uns gezeigt, wie wichtig es ist, Metadaten nicht nur als Informationen zu betrachten, sondern als eine gemeinsame Ressource, die die gesamte Forschungsgemeinschaft vernetzen und stärken kann.

Data Science @Crossref

Dominika Tkaczyk — Mon, 07 Jul 2025 00:00:00 +0000

To address the growing scale and complexity of scholarly data, we’ve launched a new data science function at Crossref. In April, we were excited to welcome our first data scientists, Jason Portenoy and Alex Bédard-Vallée, to the team. With their arrival, the Data Science team is now fully up and running. In this blog post, we’re sharing our vision and what’s ahead for data science at Crossref.

New approach to achieve our mission

Over the last few years, we have witnessed substantial growth of the scholarly community in general, and Crossref in particular. This has been reflected in the increase in the volume and variety of the data we collect, store and process, including scholarly metadata and Crossref operational data related to membership, DOI registrations, billing, usage measurement, and other activities.

On the one hand, this growth opens new possibilities for using the data to better understand the scholarly landscape, serve our community, develop services, and make informed decisions. On the other hand, it forces us to address a set of challenges related to the scale and complexity of the data.

The new Data Science team, created as part of last year’s broader organisational changes, will address these challenges and fulfil our data-related ambitions. As part of our strategic mission, we created the following vision for the Data Science team within Crossref and our community:

The Data Science team uses scientific research and data science to deliver, assess, improve, and enrich scholarly metadata.

The work of the Data Science team broadly entails two types of projects: 1) data analysis & insights; and 2) data services & workflows.

Data analysis & insights: The goal of these kinds of projects is to broaden our understanding of the scholarly record and our community and help Crossref make decisions in a data-driven way, without trying to create any specific application or product. They will help Crossref explore new strategic directions, make more informed decisions, monitor the trends and outcomes of certain decisions and policies, and discover and share new insights with the community. This category also involves large and small data assessments and analyses, measuring and monitoring certain metrics, verifying hypotheses, answering questions using data, monitoring trends in the metadata, forecasting, data visualisation, reporting, and interpreting results.

Data services & workflows: The goal of these kinds of projects is to apply scientific knowledge and data analysis to build and maintain Crossref services, tools, and workflows. The Data Science team collaborates with other Crossref teams on the research, design and implementation of the Crossref system and its various components. This will involve modelling across different data stores and APIs, as well as designing efficient and robust data workflows for various processes, including metadata deposit, validation, and dissemination. Furthermore, the team will investigate and implement modern tools and techniques for efficient data processing, storage and analysis, and strategies for data enrichment. Finally, the Data Science team is involved in planning and implementing comprehensive monitoring and reporting for various features and services.

Connecting with the community

Crossref exists as part of a diverse, global community of 22,000 members from 160 countries, plus countless systems that rely on our metadata. Launching the new Data Science function gives us a great opportunity to connect more deeply and in new ways with the wider scholarly community. We’re keen to engage with Crossref members, users of our services, and partner organisations to better understand trends and needs, and to contribute to others’ community initiatives and awareness.

One area we’re particularly interested in is the growing range of initiatives in the metascience space. We’re looking to expand and solidify our understanding of how researchers use our data and services, and to learn more about their needs and perspectives. These insights will help inform the design and functionality of our data workflows and APIs over the long term.

We’re also committed to supporting the scholarly community’s efforts to preserve the integrity of the scholarly record (ISR). By applying modern, scalable data processing techniques, we aim to help detect and investigate potential issues affecting metadata quality, including both intentional manipulation and unintentional errors or inconsistencies.

More broadly, we’re looking forward to engaging with our community on scalable data processing approaches, as well as best practices and standards for processing and enriching scholarly metadata.

Introducing new members of the team

We couldn’t pursue our ambitious goals without the dedication and passion of our team. In April, we were thrilled to welcome two data scientists, Jason Portenoy and Alex Bédard-Vallée, to the Crossref team.

Alex Bédard-Vallée brings over six years of experience extracting meaningful insights from data within the research and scholarly publishing sector, applying it to large-scale bibliometric data, aiming to better serve the scholarly community. Prior to Crossref, during his tenure at Elsevier, he was instrumental in modernising data infrastructure, significantly enhancing the efficiency of massive research data pipelines. His contributions included developing automated data quality checks, creating reusable Python tools to streamline data access, and leveraging machine learning techniques to uncover research trends. Alex provided key insights for major reports, contributing to evaluations for the Canada Research Chairs Program and the NSF Science and Engineering Indicators between 2020 and 2024. Alex holds an M.Sc. in Quantum Physics (2018) and a B.Sc. in Physics (2016) from the Université de Sherbrooke.

Jason Portenoy is a New York-based data scientist with a background in bibliometric research and building applications using scholarly data. Through his work, he has become a passionate advocate for the maintenance and improvement of high-quality scholarly metadata. He holds a PhD in Information Science from the University of Washington where he studied how scholarly metadata can offer insights into scientific activity and help develop tools to address information overload. He brings experience working at OpenAlex, Semantic Scholar, and other organisations concerned with scholarly communication. Most recently, he was the Senior Data Engineer at OpenAlex, and he is now excited to continue his work using data science to support and strengthen crucial open scholarly infrastructure.

What’s next for us?

In the short term, we are focusing on two main projects: analysing how reliably DOIs resolve, and detecting discrepancies in bibliographic references at scale.

DOI resolutions: DOIs are persistent identifiers and links that are meant to consistently resolve to landing pages that represent the object they identify and Crossref has certain obligations that members have to adhere to, one of which is that if the location of the landing page changes, it is the responsibility of the member to update the metadata so the DOI continues to resolve correctly. Some prior work has suggested this doesn’t always happen, so there are some gaps in the scholarly record. We’re now analysing metadata from a broad sample of members to better understand the scale of the issue, and to identify cases where members may need to update their metadata records.

Detecting discrepancies in bibliographic references: Following last year’s reports of discrepancies between bibliographic references in metadata records and those found in full-text PDFs, we’ve explored ways to run broader, systematic checks across a larger set of members and metadata records. The goal was to understand how widespread these inconsistencies are and to identify cases where members may need support in correcting references in their metadata records. Ultimately, we aim to create a collaborative process that improves the accuracy and reliability of bibliographic references across the scholarly record, enhancing research discovery and reproducibility and ensuring impact assessments are reliable.

Look out for forthcoming blog posts with more details on these projects!

Looking further ahead, Crossref has two big projects for which the Data Science team will serve central roles: developing dashboards, and improving metadata matching.

Data dashboards: We are planning to develop a series of dashboards to monitor the state of the scholarly record over time. These will include both work-level statistics (e.g., how many works of a given type have been registered?) and more detailed insights at the relationship level (e.g., how many bibliographic references have been automatically matched? How often are ROR IDs included in funder assertions?). Upstream, this will require us to build an environment where all relevant data sources can be combined, as well as adopting a suite of scalable tools and data processing techniques.

Metadata matching: In April, we commenced the matching project. It is a major effort to rebuild Crossref’s metadata matching workflows using modern software development and data science practices. The goal is to create a dedicated consolidated matching workflow that will eventually replace all existing production matching processes, with results made available through the REST API. This project covers six matching tasks: bibliographic reference matching, funder name matching, preprint matching, affiliation matching, grant matching, and title matching.

(In the meantime, as we do not have a good mechanism to add matching results to the REST API yet, we separately released two datasets with relationships discovered by automated matching strategies: a dataset of relationships between preprints and journal articles, and a dataset of relationships involving research organisations.)

As you can tell, we are very excited about Crossref’s role in the modern, open, community-focused future of scholarly infrastructure. The new Data Science team is a crucial component of this vision. If you’re interested in collaborating or learning more about data science at Crossref, we’d love to hear from you!

Scholarly blogs and their place in the research nexus

Lena Stoll — Tue, 24 Jun 2025 00:00:00 +0000

If you are reading this blog on our website, you may have noticed that alongside each post we now list a Crossref DOI link, which was not the case a few months ago (though we have retroactively added DOIs to all older posts too). You can find the persistent link for this post right above this paragraph. Go on, click on it, we’ll wait.

Are you back here? Good. As you probably expected, the DOI link for this post resolves to the post itself, and you should use it anytime you want to cite this post. But the DOI does more than just point readers to this page––it is part of a rich metadata record that includes the authors’ ORCID iDs, the publication date, and more. In other words, the posts on this blog are part of what we call the research nexus: the open network of relationships connecting research outputs, people, organisations, and actions.

Crossref research nexus vision

Why blogs deserve a place in the scholarly record

A blog post may not be the first thing that comes to mind when you think of scholarly outputs. But scholarly blogs have been around since at least the early 2000s and have carved out a niche for themselves as a type of “grey literature” that allows researchers to write about research in a way that may not fit neatly into more traditional, peer-reviewed publishing venues, but also is too long-form for social media. Science blogs can give readers a window into ongoing work that isn’t ready to publish yet, serve as a self-publishing venue, or allow researchers to comment on others’ work and recent developments in science and science communication. These kinds of perspectives add crucial context to the scholarly record that should not be overlooked.

However, as Martin Fenner explained at the #Crossref2023 annual meeting, blogs have largely not benefitted from the metadata and long-term archiving solutions that tend to be applied to more “traditional” forms of publishing. As a result, most blogs have been left out of the scholarly record. But in recent years, there have been some efforts in the community to change this. Earlier this year, ORCID added support for the work type blog post, among others, to align more closely with the Confederation of Open Access Repositories (COAR) vocabulary of resource types.

At our 2025 midyear community update, we asked our community what content types they saw as growing in importance. Blog posts were mentioned several times as a ‘trending’ record type, and as one that members would like to see support for in the Crossref system.

Eating our own dog food

We had already been thinking for a while about how our own blog should be a part of the research nexus. We started out by manually uploading XML files through our Admin tool for each post. We did this for a few months and quickly found, like many of our members do, that this can be a laborious and error-prone process.

In the product management world, the process of using the products you usually spend your time building and maintaining is often referred to as dogfooding. The idea is that firsthand experience makes it easier to understand your end users’ needs and feel their pain - and we have certainly found that registering metadata for our blog posts has reinforced the importance of making manual registration easier for our members, but also of supporting and enabling machine-to-machine integrations.

What did we do?

The Crossref website, which includes this blog, uses an open-source static site generator named Hugo. Rather than using a content management system (CMS), we edit the website content in Markdown format using code editors. Whenever we start working on a post for this blog, we not only write the content of the post itself, but also include some front matter for the page, which contains some key metadata about the post.

The front matter of a recent post on this blog

We wanted this metadata to be part of the research nexus. But then there was also the question of archiving. Our membership terms state that:

The Member shall use best efforts to contract with a third-party archive or other content host (an “Archive”) (a list of which can be found here) for such Archive to preserve the Member’s Content and, in the event that the Member ceases to host the Member’s Content, to make such Content available for persistent linking.

So we knew that if this blog was to be part of the scholarly record, we would need to ensure that it would be available in perpetuity, even if www.crossref.org were to go offline one day.

Doing this properly was starting to look like a sizeable project!

Fortunately, we knew that others had already done some great work in this field, so we would not have to start from scratch. After considering our options, we opted to integrate our blog with an established workflow for registering blog metadata: the Rogue Scholar service.

The Rogue Scholar was launched in 2023 by Martin Fenner as an archive for scholarly blog posts, hosted by Front Matter. Rogue Scholar improves science blogs in important ways, including full-text search, long-term archiving, and DOIs and metadata, such as versions and relationships along with identifiers such as ORCID iDs and ROR IDs. It provides the necessary tools to treat blog posts as research outputs through better attribution, preservation, and discoverability.

How did we do it?

Rogue Scholar works on the basis of consuming RSS and ATOM feeds (you may remember them from the days of getting headlines direct to your browser or feed reader). We created a new feed, including the proposed DOI as each entry’s id: and taking full advantage of the ATOM format by listing the post’s authors and including their ORCID iDs. We also provide the entire post as the entry’s <content> to allow for full-text indexing and archiving.

The XML feed entry for a recent post on this blog

For each post, we generate and assign a unique DOI under the Crossref prefix 10.64000. The Rogue Scholar integration then registers the DOI along with the metadata of the post as posted content. If you are interested in getting a similar workflow set up for your blog, you can read more in the Rogue Scholar blog and documentation.

What does the future hold for scholarly blogs?

Researchers are increasingly sharing their early work, or commenting on others’ work, in less formal ways, and if you look at the growth in the number of blogs covered in the Rogue Scholar platform in just a couple of years, it seems like science blogging is here to stay and will only increase. We believe that this practice is an integral part of a healthy scholarly ecosystem, and it needs to be represented in the research nexus.

The Crossref input schema does not include a blog work type, but we are planning to add it as a subtype of posted content in our next schema update. We will discuss this and other plans and ideas in the metadata advisory group that we are currently forming.

If you have thoughts on the role of blogs in the public discourse around science and science communications, or you would like to share your experience of registering metadata for your blog, let us know by commenting below. Your comments will be threaded in our community forum for discussion.

Sprinting to Progress: Behind the scenes of our first metadata sprint

Luis Montilla — Mon, 23 Jun 2025 00:00:00 +0000

If you take a peek at our blog, you’ll notice that metadata and community are the most frequently used categories. This is not a coincidence – community is central to everything we do at Crossref. Our first-ever Metadata Sprint was a natural step in strengthening both. Cue fanfare!. And what better way of celebrating 25 years of Crossref?

We designed the Crossref Metadata Sprint as a relatively short event where people can form teams and tackle short problems. What kind of problems? While we expected many to involve coding, teams also explored documenting, translating, researching—anything that taps into our open, member-curated metadata. Our motivation behind this format was to create a space for networking, collaboration, and feedback, centered on co-creation using the scholarly metadata from our REST API, the Public Data File, and other sources.

What have we learned in planning

The journey towards the event was filled with valuable lessons and learnings from our community. Our initial call received submissions from 71 people, which was exciting but presented the first challenge: we felt our event would work better with a relatively smaller group. An additional challenge we faced was the enthusiasm from people from different regions of the world who were eager to join, but needed support to attend in person. It reminded us how global our community is, and how important it is to think about different ways of making participation possible, especially in future events.

We also wanted to make sure that participation wasn’t limited by technical background. The selection process included a preliminary review by several members of our team to bring in a mix of perspectives and reduce bias. The event welcomed participants from all kinds of expertise levels, including colleagues who had never worked with APIs before. We sought to provide common ground for all with several group calls, where we presented introductions to our tools and used the opportunity to collect requests about tools, specific data, and questions from the participants that could enhance their preparation during the sprint.

At the Crossref Metadata Sprint

I’ve recently stumbled upon the following quote from a recognized data scientist:

Numbers have an important story to tell. They rely on you to give them a clear and convincing voice. (Stephen Few) ¹

It made me think that we can replace numbers for metadata and the idea still holds. Surrounded by the paleontological collections of the National Museum of Natural History, on 8th of April in Madrid, 21 participants and 5 Crossref staff came together to work on twelve different projects. These ranged from improvements to our Public Data file formats and exploring metadata completeness, to tackling multilingual metadata challenges, understanding citation impact for retracted works, and connecting Retraction Watch metadata with other knowledge graphs metadata.

The different teams that participated in the first Crossref Metadata Sprint.
The initial hours were the most energetic (but not chaotic!) as most of the participants had the chance to interact in person for the first time, ideas were exchanged, and pre-formed groups became more stable (however, one of the advantages of the format is that teams don't have to be rigid). Twelve coffee- and tea-powered projects started taking shape, a few of which are part of larger ideas under development. By the end of the second day, we saw:

Author changes between preprints and published articles.
Coverage of funding information by publisher.
Enriching citations with Crossref metadata.
Funding metadata completeness.
Improvement to the Public Data File.
Interoperability between Crossref DOIs and hash-based identifiers.
University of Tetova’s metadata coverage.
Retraction Watch data mash-up.
Perspective about AI-driven multilingual metadata.
Public Data File in Google Big Query.
Visibility of retractions across citations.
Visualising Crossref geographic member data.

Our team worked as part of some of these projects, providing valuable insights and feedback to the participants. We ended the first session with a group dinner and re-energised for the second day, which started with everybody fully immersed in their tasks. As we approached the conclusion, the groups started preparing some quick slides for a short presentation (that you can find here).

Our team and the participants left excited and looking forward to the next opportunity to collaborate. We certainly see the potential of recreating these spaces, and we’ll work on future editions in a different location. All of the project summaries and notes will remain stored in our metadata sprint Gitlab repo. Would you like to know more about any of these ideas? Let us know in the comments.

The first Crossref Metadata Sprint in a nutshell

Participants

None of this would’ve been possible without our enthusiastic participants. Huge thanks to everyone! Here is the full list of those who attended our inaugural Sprint:

Name
Blessing Abumere
Ana Bermejo
Robert Bianchi
Adam Buttrick
María de la Paz
Nicoleta Roxana Dinu
Jack Ekinsmyth
Castedo Ellerman
Álvaro Hontanar
Bianca Kramer
Anne L’Hôte
Cyril Labbe
Alexandra Malaga
Agon Memeti
Kaitlin Newson
Yağmur Öztürk
Dietrich Rordorf
Mohamed Selim
Sajad Sepehri
Ramazan Turgut
Iñaki Úcar

https://www.forbes.com/sites/brentdykes/2016/03/31/data-storytelling-the-essential-data-science-skill-everyone-needs/ ↩︎

Evolving the preprint evaluation world with Sciety

Luis Montilla — Tue, 17 Jun 2025 00:00:00 +0000

This post is based on an interview with Sciety team at eLife.

What is Sciety?

Sciety is a community-led initiative developed by a team within eLife, that brings together expert evaluations of papers in one place. It is focused on preprints, preprint review and curation.

Can you tell us more about how Sciety works?

Sciety aggregates preprints from different sources to facilitate the processes of discovery and evaluation. Groups can triage the content and offer preprint reviews and endorsements, and individual researchers can learn about and share preprints of interest and their evaluations. We see the value of increasing trust in preprints, and transparency around the process of peer review, and we are trying to highlight this value and encourage more people to take part.

There are two key angles to Sciety: first, as preprints proliferate, we’re helping to make people more productive in their research by only surfacing the content they might be interested in and that they know they can trust. Second, we are also trying to get more people involved in the public review and curation of preprints. Contributors on Sciety are part of ‘groups’, representing organisations and other communities that facilitate some form of preprint evaluation. We’re broadly talking about peer review, but we also see the highlighting and summarisation of research. eLife, Biophysics Colab, MetaROR and Gigabyte, for example, are all providing some kind of review summary which Sciety shows as a ‘curation statement’. There’s also this additional layer of individual curation on top of it: we have people creating their own highlights in lists which they curate by topic; for example, ‘preprints by authors in the Global South’ or ‘Papers we want to discuss in our lab’. There is also an update feed available to users to help them keep track of all the reviews and endorsements from the groups they follow. We post these assessments and reviews alongside the preprint, which others can then use as an indicator of trust: why should one care about this particular study? As a given group – let’s say GigaByte – and its reviewers highlight the specific strengths of a preprint or reference an updated version, this feedback offers essential context for readers.

By making this evaluation and curation activity visible, Sciety clarifies who has reviewed the work and which groups have added it to their lists. These signals are invaluable for readers seeking reliable, curated research. The activity feed, which at present shows you all the added value in the form of comments, reviews and curation we are bringing from diverse sources, could be expanded to show different forms of curation activity in the future. Furthermore, other providers ingest and surface this information on their own platforms, such as Europe PMC and bioRxiv.

What is your main use of Crossref resources?

We started using the Crossref API to pull in the front matter of articles. Originally, these were only bioRxiv preprints, and then we expanded to various other preprint servers. We would aggregate reviews and build on top of all the preprint servers that have put the authors’ content out there.

We were mostly after a representation of the papers that we could link to: titles, authors, abstracts, publication dates, and, to have a way to go from the DOI of a paper, a classic Crossref entry point. Initially, we used the public API, but the performance wasn’t high enough for what we needed and we switched to Metadata Plus. This immediately increased the speed at which we got data to the point where we could compose pages on the fly and talk to Crossref simultaneously. Even if we needed to pull 10 or 20 different paper titles at the same time to show a list of articles, it stayed that way for a long time. Next, we implemented caching – that is, we started storing temporal local copies to improve performance further. Eventually, we expanded the set of preprint servers we were interested in. It’s always been quite a good experience to be able to put in a DOI and use the same code, essentially, to pull out titles, author information and so on. Crossref does this great job of aggregating the world of content so that we don’t have to. The metadata standardisation via Crossref’s API saves us the need to write special code for every new preprint server.

By the end of 2023, we were interested in multiple revisions and versions of a single preprint. Because the scholarly world is moving on, we can now see cases where the updates to a manuscript produce multiple versions in bioRxiv, and these might eventually evolve into an article in eLife, Nature, or another journal. The publication history complexity of papers has been increasing and we started relying a lot on Crossref to trace the relationships and the different versions of a paper across time. There is some good support on the relationship metadata on Crossref APIs, where you can see that a preprint has a new version with a different DOI, or conversely, that a preprint has an older version. Or you can see that a preprint has become a journal article, or the journal article was originally a preprint – along with all the dates that accompany these different versions. And we can establish the time it took for a preprint to become a journal article. In some cases it can take years, which is not great, right? We don’t want science to be stuck and not relied upon for years. So it helps us to make our case that preprints are the evolution of publishing, that authors publish them and then the preprints evolve rather than being stuck between gates kept by journals.

What can you tell us about the use of preprints?

We have noticed an increase in the interest in how a paper evolves over time and the cross-links between different preprint expressions or journal articles. We’re now seeing enthusiasm from those who are trying alternative publishing models to bring reviewed scientific preprints to people faster, and there is also interest in the transparency of a journal. And I think that’s part of what the Crossref relationship metadata gives us.

For example, we collaborated on a paper aimed at enhancing the culture of preprint peer review. One of the things we observed was that it was published on an OSF preprint server, and then went on to be published in PLOS Biology. As we’d started this project to show the relationships between something that had originally been a preprint, we noticed that the connection between PLOS and OSF for that specific preprint was not explicit. So, we asked a colleague if this was something that could be done. And our contact at PLOS said, “yes, we’ll do this”. At the time, we were aware of Crossref’s intention to either make this more manageable or to do it in bulk. This also prompted another group on Sciety to explore whether they could do the same. Consequently, GigaByte and GigaScience, two other reviewing communities on Sciety, inquired with their publishing platform, Riverview, if they could do the same. Eventually, they realised there was a way to connect the dots through Crossref, and they also started doing it. So, there seems to be a lot of enthusiasm around this idea of making the relationships more explicit: we should show if something has been a preprint, because it’s important to the authors, and it’s important to show the transparency in the journey. That was a real-world example of something that we’re able to service through Sciety by using the Crossref metadata, and the community is responding in a very positive manner to that.

How has your experience been using Crossref services? What are you looking forward to seeing in the future?

The works endpoint is really the 99% of what we have been historically interested in. We generally experiment by putting DOIs in the public API or trying to discover content in the API itself. The amount of data is so big that there are always different examples of what we seek. And we don’t have many performance problems now because we have adopted some aggressive caching. So anything that comes from Crossref is typically cached for 24 hours.

For example, take a bioRxiv preprint that might have multiple versions available on bioRxiv itself, because it’s quite common for authors to update the preprint as they make new changes to it. With this context, an example of something we would like to see is supporting the preprint version number. So this is something that we could implement for bioRxiv over some specific preprint servers on Sciety. But in the end, as we expanded our set of preprint servers, we had to get rid of that, because there wasn’t a sustainable way to aggregate it across most servers, like we would do with Crossref. So there’s probably a space there for papers as living documents. And we certainly have an interest in preprint-specific metadata – that’s where we will place our bets.

Also, as part of the preprint review metadata group, which is something that formed out of the recent meeting with EMC Europe and ASAPbio, we’re trying to drive forward a recommendation and prototypes for more consistency in preprint review metadata. It’s quite exciting to be involved in this and, as you can see, Sciety is a place where we’re starting to pull all this stuff together. And like I say, it is a bit of a Wild West. There are so many things that are called a review, but in metadata, we know there are different terminologies. As people are saying, everyone should be commenting on preprints, everyone should be curating them, and we’re trying to make some sense of that.

Working on Sciety and exploring Crossref metadata to make preprint review more open and valuable has been a rewarding experience.

With thanks to Giorgio Sironi, former Tech Lead Manager, and Mark Williams, Product Manager, at eLife

Destacando nuestra comunidad en Colombia

Susan Collins — Thu, 05 Jun 2025 00:00:00 +0000

English version

Dado que Crossref celebra su 25º aniversario este año, nos gustaría destacar algunas de las regiones activas y comprometidas en nuestra comunidad global.

Durante los primeros 25 años, la composición de los miembros de Crossref ha evolucionado significativamente. De un puñado de grandes editoriales fundadoras, ahora tenemos más de 22.000 miembros de 160 países. Casi dos tercios de ellos se identifican como universidades, bibliotecas, entidades gubernamentales, fundaciones, editoriales académicas, e institutos de investigación.

Una de las regiones de mayor crecimiento es Latinoamérica, con más de 3.200 miembros, la mitad de los cuales se unió en los pasados cinco años. Colombia fue uno de los primeros miembros de Crossref en Latinoamérica y continua siendo uno de los países más activos con 242 organizaciones.

“Creo que las organizaciones en Colombia siempre están abiertas a nuevos cambios, y a implementar nuevas estrategias que permitan mejorar o generar vínculos entre diversos actores, el programa Nexo podría verse de gran utilidad puesto que Colombia está uno de los grandes generadores de investigación en la región, y el poder conectar de una manera ágil y rápida toda una red de investigación va a representar grandes ventajas en los procesos”, – dice nuestro Embajador Juan Felipe Vargas Martínez, Cofundador y Director de Journals & Authors, en Medellín.

Una de las razones del aumento en la participación en Colombia es nuestro programa de patrocinadores. Los patrocinadores proveen apoyo a organizaciones más pequeñas que a menudo enfrentan barreras financieras, técnicas, y linguísticas que les dificultan convertirse en miembros de Crossref. Uno de los primeros patrocinadores en Colombia, Journals & Authors, se unió en 2016, siendo de los primeros en Latinoamérica. Ahora tenemos cinco patrocinadores ubicados en Colombia, apoyando 114 miembros.

Nuestros patrocinadores también han sido aliados clave en ayudarnos a interactuar con la comunidad, facilitando numerosos webinars y apoyando nuestras reuniones presenciales en Colombia en 2019 y 2024. Su conocimiento de la comunidad editorial a lo largo del país y sus extensas redes ayudan a las organizaciones nuevas a aprender más sobre Crossref de manera accesible, y a crecer continuamente la participación con nosotros.

También tenemos embajadores altamente dedicados ubicados en Colombia que son fuertes promotores de la misión de Crossref: Nicolás Mejía Torres y Juan Felipe Vargas Martínez. A lo largo de los años, ellos han sido instrumentales en ayudar a organizar eventos presenciales y webinarios para miembros, así como también en representar a Crossref en eventos a en Latinoamérica. Puedes aprender sobre nuestras discusiones en el resumen de los eventos más recientes en nuestro Foro Comunitario. Recientemente Juan Felipe y Nicolás participaron en la Feria Internacional del Libro en Bogotá donde presentaron una charla sobre los beneficios de los metadatos académicos abiertos.

Nuestra membresía en Colombia está conformada fundamentalmente por universidades, sociedades, e instituciones públicas. Casi todas las revistas dejan su contenido disponible abiertamente. La mayoría del contenido de revistas se publica usando la plataforma de publicación OJS de PKP - Colombia es el 8vo mayor usuario de OJS globalmente, y el segundo mayor en Latinoamérica.

“Entendemos que hay todavía mucho margen de uso de editoriales colombianas de Crossref.” Jaime Iván Hurtado, CEO & Fundador de Hipertexto-Netizen, un patrocinador de Crossref, reporta que “algunas hacen uso del DOI pero centradas en revistas tímidamente en los libros y poco en los capítulos de libros,” Hipertexto ha estado contribuyendo al incremento en el uso de identificadores persistentes para libros y capítulos de libros a través de sus herramientas y manejo estadarizado de metadatos.

Los miembros de Crossref a menudo conocen la importancia de los identificadores persistentes para su contenido, pero hay una necesidad de incrementar la conciencia sobre los beneficios y la importancia de incluír metadatos adicionales. Estamos concientes que muchos editores ofrecen su tiempo de manera voluntaria lo cual puede limitar su disponibilidad para entrenamiento adicional y participación en eventos relacionados con la edición y las buenas practicas para el manejo de metadatos. Queremos aumentar las oportunidades para el entrenamiento tanto presencial como remotamente, y nuestros patrocinadores y embajadores han sido aliados clave en la facilitación de estos eventos. En febrero de 2024 nos aliamos con nuestro patrocinador Biteca en un evento de dos días en Bogotá, en el que participaron más de 100 miembro. Hubo diuscusiones activas sobre los fundamentos de Crossref y el rol de los metadatos de calidad en la visibilidad de contenido, así como también presentaciones sobre la integridad y ética en la investigación y la publicación, con compañeros clave como COPE, PKP, Scielo, y DOAJ.

En Colombia no hay un requerimiento de usar identificadores persistentes (o no específicamente el DOI). Cada institución decide si usarlos de manera independiente, así que vemos con agrado tantos miembros de Crossref activos, registrando su contenido, y cada mes se unen más. Ellos reconocen el beneficio de los metadatos, así como también el ser parte de la comunidad de Crossref en general: “En Colombia, Crossref es un referente gracias al uso del DOI. Si bien en sus inicios este identificador se veía como otro requisito más que complicaba el trabajo de las editoriales, hoy es reconocido como una herramienta clave para mejorar la visibilidad y el impacto de las publicaciones. Asimismo, Crossref, a través de sus encuentros y recursos, brinda apoyo a los equipos editoriales al ofrecer pautas, herramientas e información valiosa que facilita la adopción de buenas prácticas y el cumplimiento de estándares de calidad” reporta Luz Ayda Becerra, Consultora de Innovación con nuestro patrocinador Biteca.

Las organizaciones tienen varias razones para convertirse en miembros de Crossref - la principal motivación es incrementar la visbilidad global de su contenido y, por lo tanto, incrementar el impacto de sus publicaciones. Los metadatos de Crossref son accesibles de manera abierta para todos en la comunidad. Cada mes tenemos millones de búsquedas en nuestra base de datos por parte de investigadores, bibñiotecas, herramientas que perfilan autores, servicios de búsqueda, y muchos más. Otras partes usan estos metadatos para crear herramientas y servicios que incrementan la visibilidad y la recuperabilidad del contenido de los miembros.

Sin embargo, existen desafíos que los miembros aún enfrentan cuando trabajan con nosotros. El obstáculo más frecuentemente mencionado al trabajar con Crossref es el lenguaje. La mayoría de nuestros correos electrónicos, documentación y herramientas están en inglés, y a los miembros les gustaría tener la oportunidad de recibir soporte, recursos y correspondencia en español. Aquellos que trabajan con patrocinadores se benefician de soporte de esta manera. Estamos aumentando el número de oportunidades de entrenamiento remoto y webinarios en español, y nuestros embajadores han estado interactuando con la comunidad local para proveer recursos adicionales. A principios de este año, el primer miembro de nuestro equipo ubicado en un país de Latinoamérica se unió a nuestro equipo de soporte técnico, y ahora podemos proveer soporte en español (recursos como este aparecerán más frecuentemente ahora). Reconocemos que aun tenemos trabajo por hacer para que Crossref sea más accesible a las comunidades globalmente.

Nuestros miembros han sugerido que más eventos locales y presenciales serian beneficiosos. Y estamos de acuerdo que las interacciones cara a cara son una manera clave para nosotros construir relaciones e incrementar la representación y visibilidad en las comunidades, y aspiramos a crear oportunidades de interacturas con nuestros miembros en todos los rincones del mundo.

Mostrar como se utilizan los metadatos puede resaltar los beneficios y la importancia de incluir metadatos adicionales. Varios de nuestros miembros y Patrocinadores han solicitado entrenamiento adicional en español sobre el uso de nuestras APIs, lo cual les permitiría obtener y analizar elementos clave de los metadatos.

“Al especializarse en este tipo de tecnologías, puedo analizar y estructurar la información de manera efectiva, generando informes útiles para los editores. Esto facilita la toma de decisiones informadas sobre sus publicaciones, optimizando la gestión editorial y asegurando una mejor visibilidad e impacto de los contenidos académicos.” (Luz Ayda Becerra)

En años anteriores Crossref ha sido invitado a participar en webinars y eventos presenciales en Colombia, dado el interés en crecimiento y la conciencia de la importancia de los metadatos para la comunidad de investigadores y la visibilidad de las publicaciones.

Gran parte de la información en este reporte proviene de encuestas enviadas a nuestros miembros, patrocinadores, y embajadores en Colombia. Apreciamos toda la retroalimentación, comentarios y sugerencias que hemos recibido, y queremos continuar la colaboración e incrementar la interacción con la comunidad.

English Version

A spotlight on our community in Colombia

As Crossref celebrates its 25th anniversary this year, we would like to highlight some of the active and engaged regions in our global community.

Over the past 25 years, the makeup of Crossref membership has evolved significantly; from a handful of founding large publishers, we now have more than 22,000 members from 160 countries. Nearly two-thirds of them self-identify as universities, libraries, government agencies, foundations, scholar publishers, and research institutions.

One of our fastest-growing regions is Latin America, with over 3,200 members, half of whom joined us in the past five years. Colombia was one of the early adopters of Crossref from Latin America and remains one of our most active countries with 242 organisations.

“I believe that organisations in Colombia are always open to new changes and to implementing new strategies that allow for improvement or the creation of connections between diverse actors. The Research Nexus program could be very useful since Colombia is one of the largest producers of research in the region, and being able to connect an entire research network quickly and efficiently will represent significant advantages in the processes”, – says our Ambassador Juan Felipe Vargas Martínez, Co-founder and Director, Journals & Authors, in Medellín.

One of the reasons for increased participation in Colombia is our sponsor program. Sponsors provide support for smaller organisations that often face financial, technical, and language barriers that make becoming a member difficult. Our first sponsor in Colombia, Journals & Authors, joined in 2016, one of our first in Latin America. We now have five sponsors based in Colombia, supporting 114 members.

Our sponsors have also been key partners in helping us engage with the community, facilitating numerous webinars and supporting our in-person meetings in Colombia in 2019 and 2024. Their knowledge of the publishing community across the country and extensive networks help new organisations learn more about Crossref in an accessible way, and continuously grow participation with us.

We also have very dedicated ambassadors based in Colombia who are strong advocates for Crossref’s mission: Nicolás Mejía Torres and Juan Felipe Vargas Martínez. Over the years, they have been instrumental in helping to organise in-person events and webinars for members, as well as representing Crossref at events throughout Latin America. You can learn more about our discussions from the summary of the latest event on our Community Forum. Most recently, Juan Felipe and Nicolás attended the Bogotá International Book Fair, where they gave a presentation on the benefits of open academic metadata.

Our membership in Colombia is made up primarily of universities, societies, and public institutions. Almost all journals make their content openly available. Most of the journal content is published using the OJS publishing platform from PKP. Colombia is the eighth-largest user of OJS globally and the second-largest in Latin America.

“There is still considerable scope for Colombian publishers to utilise Crossref” Jaime Iván Hurtado, CEO & Founder of Hipertexto-Netizen, a Crossref sponsor, reports that “while organisations use DOIs most commonly for journals, there’s potential for greater use for books and chapters.” Hipertexto has been contributing to the increased use of persistent identifiers for books and book chapters through their tools and standardised metadata management.

Members often know the importance of persistent identifiers for their content, but there is a need to increase awareness of the benefits and importance of including additional metadata. We’re aware that many editors volunteer their time, which can limit their availability for additional training and participation in events related to publishing and metadata best practices. We aim to increase opportunities for training, both in-person and online, and our sponsors and ambassadors have been key partners in facilitating these events. In February 2024, we partnered with our Sponsor, Biteca, on a two-day event in Bogotá, attended by over 100 members. There were lively discussions on the fundamentals of Crossref and the role of quality metadata for content discovery, as well as additional presentations on research integrity and publication ethics, with key partners including COPE, PKP, Scielo, and DOAJ.

There is no requirement to use persistent identifiers (or specifically DOIs) in Colombia. Each institution decides whether to use them independently, so we’re delighted to see so many are active Crossref members, registering their content, and more are joining every month. They recognise the benefit of metadata, as well as being part of the Crossref community at large: “In Colombia, Crossref is a benchmark thanks to its use of the DOI. While initially viewed as yet another requirement that complicated the work of publishers, this identifier (and related metadata) is now recognised as a key tool for improving the visibility and impact of publications. Furthermore, through its meetings and resources, Crossref supports editorial teams by offering guidelines, tools, and valuable information that facilitate the adoption of best practices and compliance with quality standards,” reports Luz Ayda Becerra, Innovation Advisor with our sponsor, Biteca.

Organisations have various reasons for becoming members with Crossref – the main motivation is to increase the global visibility of their content and, therefore, to increase the impact of their publications. Crossref’s metadata is openly accessible and free for everyone in the community. Each month, we have millions of queries to our database from researchers, libraries, author profiling tools, discovery services and many more. Third parties use this metadata to create tools and services that increase visibility and discoverability of members’ content.

There are, however, challenges that members still face when working with us. The most frequently listed obstacle in working with Crossref is language. Most of our emails, documentation and tools are in English, and members would like the opportunity for support, resources, and correspondence in Spanish. Those working with sponsors benefit from their support in this way. For all, we are increasing the number of Spanish language online training opportunities and webinars, and our ambassadors have been engaging with the local community to provide additional resources. Earlier this year, the first staff member based in Latin America joined our technical support team, and we can now provide Spanish language support (resources like this will appear more frequently now). We recognise that we still have work to do to make Crossref more accessible to global communities.

Members have suggested that more local in-person events would be beneficial. And we agree - face-to-face interactions are a key way for us to build relationships and increase representation and visibility in communities, and we aspire to create opportunities to engage with members in all corners of the world.

Showing how metadata is utilised can show the benefits and importance of including additional metadata. Several of our members and sponsors have requested additional Spanish language training on using our APIs, which would enable them to obtain and analyse key metadata elements.

“By specialising in these technologies, I can effectively analyse and structure information, generating useful reports for editors. This facilitates informed decision-making regarding their publications, optimising editorial management, and ensuring greater visibility and impact of scholarly content.” (Luz Ayda Becerra)

Over the past several years, Crossref has been invited to participate in webinars and in-person events in Colombia, as there is an increased interest and awareness of the importance of metadata for the research community and the visibility of publications.

Much of the information in this report is taken from a survey sent to our members, sponsors, and ambassadors in Colombia. We appreciate all the feedback, comments, and suggestions we received, and we look forward to continuing our collaborations and increasing our engagement with the community.

Our annual open call for expressions of interest to join our board

Lucy Ofiesh — Wed, 14 May 2025 00:00:00 +0000

The Crossref Nominating Committee invites expressions of interest to join the Board of Directors of Crossref for the term starting in January 2026. The committee will gather responses from those interested and create the slate of candidates that our membership will vote on in an election in September.

Expressions of interest will be due Monday, June 9th, 2025

This is an exciting time to join the board, as we have a number of active projects underway. Our focus is on how our community and metadata can contribute to ensuring the integrity of the scholarly record. We are redesigning our content system to better serve the changing needs of our community. We’re broadening our metadata record to capture richer funding and institutional affiliations. New board members will be part of on-going discussions about how to make our fees simpler and more equitable. Additionally, we envision a future where the scholarly record prioritizes relationships between research outputs to build a holistic research nexus. The board helps guide this work.

About our board elections

The board is elected through the “one member, one vote” policy wherein every member organisation of Crossref has a single vote to elect representatives to the Crossref board. Board terms are for three years, and this year, there are five seats open for election.

The board maintains a balance of seats, with eight seats for smaller members and eight seats for larger members (based on total revenue to Crossref). This is an effort to ensure that the scholarly community’s diversity of experiences and perspectives is represented in decisions made at Crossref.

This year, we will elect four of the larger member seats (membership tiers $3,900 and above) and one of the smaller member seats (membership tiers $1,650 and below). You don’t need to specify which seat you are applying for; we will provide that information to the nominating committee.

The online election will open in September, with results announced at the annual meeting scheduled for October 22nd. New members will begin their term in January 2026.

About the Nominating Committee

The Nominating Committee reviews the expressions of interest and selects a slate of candidates for election. The slate put forward will exceed the total number of open seats. The committee considers the statements of interest, organisational size, geography, and experience.

James Phillpotts*, Oxford University Press, committee chair
Abiodun Falodun, University of Benin
Wendy Patterson*, Beilstein Institut
Chaerul Umam, National Library of Indonesia
Amanda Ward*, Taylor & Francis

(*) indicates Crossref board member

Board roles and responsibilities

Crossref’s services provide a central infrastructure for scholarly communications. Crossref’s board helps shape the future of our services and by extension, impacts the broader scholarly ecosystem. We are looking for board members to contribute their experience and perspective.

The role of the board at Crossref is to provide strategic and financial oversight of the organisation, as well as guidance to the Executive Director and the staff leadership team, with the key responsibilities being:

Setting the strategic direction for the organisation;
Providing financial oversight; and
Approving new policies and services.

The board represents of our membership base and guides the staff leadership team on trends affecting scholarly communications.

The work of the board takes place in board meetings and board committees. The board sets strategic directions for the organisation while also providing oversight into policy changes and implementation. Board members join four meetings each year that typically take place in January, March, July, and November. The July meeting is in-person and may take place in a variety of international locations; travel support is provided when needed. January, March, and November board meetings are held virtually, and all committee meetings take place virtually. Each board member should sit on at least one Crossref committee. Care is taken to accommodate the wide range of time zones in which our board members live.

While the expressions of interest are specific to an individual, the seat that is elected to the board belongs to the member organisation. The primary board member also names an alternate who may attend meetings in the event that the primary board member is unable to. There is no personal financial obligation to sit on the board. The member organisation must remain in good standing.

Board members are expected to be comfortable assuming the responsibilities listed above and to prepare and participate in board meeting discussions.

Who can apply to join the board?

What does the committee look for?

The committee looks for skills and experience that will complement the rest of the board. Candidates from countries and regions not currently reflected on the board are strongly encouraged to apply. Successful candidates often have some or all of these characteristics:

Demonstrate a commitment to or understanding of our strategic agenda or the Principles of Open Scholarly Infrastructure
Have expertise that may be underrepresented on the board currently;
Hold decision-making positions in their organisations;
Have experience with governance or community involvement;
Represent member organisations that are active in the scholarly communications ecosystem;
Demonstrate metadata best practices as shown in the member’s participation report

The board is also encouraging Crossref members who are research funders to apply.

What does the application evaluation process look like?

Open call for board interest, May 14 to June 9th: Any active member in good standing can apply for a seat on the board. This includes direct members, sponsored members, and GEM members. Sponsoring organisations, service providers, and Metadata Plus subscribers who are not also members are not eligible to sit on the board.

Application review, June through August: Applications will be reviewed by our Nominating Committee. We also gather internal information about the member organisation, such as metadata habits, history with Crossref, any previous experience in Crossref working groups or community initiatives.

We might also refer to external information to help the committee’s review including LinkedIn profiles or member organisation websites and publications.

Brief interviews with final candidates, August: The committee will hold brief virtual interviews with the top candidates before finalising the slate of nominations.

Announcement of the slate and election, September: The committee will announce the final slate of candidates in September and the online election will begin, culminating at the annual meeting at the end of October.

How to apply

Please click here to submit your expression of interest by Monday, June 9th. We ask for a brief statement about how your organisation could enhance the Crossref board and a brief personal statement about your interest and experience with Crossref.

Please contact me with any questions at voting@crossref.org

Notice of amendments to Crossref membership terms and bylaws

Amanda Bartell — Sun, 11 May 2025 00:00:00 +0000

In its March 2025 meeting, the Crossref board unanimously voted to update both the Crossref bylaws and the Crossref membership terms to:

Provide more clarity and alignment between our bylaws and membership terms, where they had become out of sync over the years.
Reflect previous board motions and bring both documents up-to-date with current processes for suspending and revoking membership, and reviewing those decisions.
Work towards being more explicit about what “Member Practices” should look like in terms of preserving the integrity of the scholarly record.

Link to updated membership terms and link to updated bylaws

The bylaw changes are effective immediately, and the updated version of the membership terms will come into effect on 11th July 2025.

In accordance with the 60-day notice period, we have emailed the Primary contact on all our active member accounts today. Note: Members do not need to do anything in response to these changes - by continuing to use our services after 11th July, they are accepting the latest version of the terms.

Changes to the membership terms

The membership terms will be updated on 11th July to be clearer on, among other things, the importance of accurate metadata, using DOI links everywhere, the all-important reference linking obligation, and the process for suspending and revoking/terminating membership. It also introduces the new concept of “Member Practices”, which a dedicated community committee will propose for board approval. More information about this will follow soon.

You can find the specific changes below, or take a look at this marked-up PDF showing the changes between the current (from June 2022) terms and the revised (July 2025) terms.

Topic	Section	Summary of Change(s)
Terminology	Various sections (e.g., 1, 2(i), 2(k), 5)	Streamlines some legal language to enhance clarity and readability.
Member Practices	2(a)	Establishes an obligation of Members to comply with Member Practices, to be established soon through a dedicated committee.
Unauthorised use of metadata	2(d)	Highlights the harmful impact of unauthorised use or deposit of metadata on Crossref, its Members, and the integrity of the scholarly record.
Reference linking	2(f), (g)	Updates the language referring to reference linking, and makes explicit Members’ obligation to maintain reference linking throughout membership, not only upon first joining Crossref. It also makes it clear that members should use DOI links wherever they communicate about any item with a DOI.
Displaying identifiers	2(h)	Strengthens Members’ obligation to display DOIs in accordance with Crossref’s Display Guidelines (by eliminating the “commercially reasonable efforts” qualifier).
Fees	3	Expands the definition of “Fees” to include all usage fees and fees for optional services, in addition to annual fees and Content Registration fees. Crossref’s right to suspend or terminate a Member’s account for non-payment extends to any of these fees.
Termination of Membership	9	Significantly revises the provision regarding termination of a Member’s membership by Crossref:
		Updates the bases for ‘for-cause’ termination, to include ongoing misrepresentations in a Member’s practices; misleading use or creation of DOIs; and failure to pay fees due (without the former 120-day minimum duration of nonpayment);
		Clarifies the distinction between suspension and termination (also referred to as revocation or expulsion) of a Member’s Crossref membership;
		Eliminates the existing procedures for automatic Board review of a termination or extended suspension. (Crossref’s bylaws have been amended to prescribe a new suspension/termination process and right to request Board review);
		Adds a termination trigger for cases where a Sponsor cancels its agreement with a Sponsored Member. (The member, of course, has the option to move to a new Sponsor, or re-join Crossref as an independent member).
Notice contacts	8(d)	Updates Crossref’s Notice contact; updates the list of required Member contacts.

Changes to the bylaws

Our bylaws have needed updating for a while, but since these seldom change, we’ve saved up a few changes, also to bring them in line with the revised membership terms.

We’ve now modernised the language, ensured that the bylaws match what’s in the membership terms, and we’ve added in motions that have been agreed by the board but not updated in the bylaws over the last few years. We’ve also updated the bylaws in line with the new membership revocation process in the new July 2025 membership terms. The new bylaws also allow for a new group of members to be created to help Crossref define Member Practices.

You can find a summary of the changes below, or take a look at this marked-up PDF showing all the changes to the bylaws.

Topic	Section	Summary of Changes
Terminology	Various sections	Eliminates gender-specific terminology, e.g. replaces “Chairman” with “Chair”.
		Makes minor clean-up edits (e.g. deletion of unused “Reserved” section and renumbering).
Membership Qualification	Art. I Sec. 1	Replaces “publishes” professional and scholarly materials with “produces” professional and scholarly materials to match the language in the already-current membership terms.
Non-Voting Membership	Art. I Sec. 2; Art. IV Secs. 7, 8; Art. VII Sec. 4	Reflects the establishment of a non-voting Member category as previously approved by the Board.
Membership Procedures	Art. I Sec. 3; Art. I Sec. 5	Clarifies that acceptance of new Members is delegable to Crossref personnel generally, replacing a narrow reference to the Executive Director.
		Eliminates superfluous procedural steps regarding Member resignation.
Suspension and Termination of Membership	Art. I Sec. 6	Significantly revises the provision regarding termination of a Member’s membership by Crossref:
		Updates the bases for ‘for-cause’ termination, to include various specific prongs (matching those already in the Member Terms), while maintaining the catch-all for conduct prejudicial to Crossref’s best interests.
		Authorises the Board to define standards and procedures for ‘for-cause’ terminations, or establish a committee (which can be comprised of both Board members and non-Board members) for that purpose.
		Specifies that Crossref staff is responsible for implementing the ‘for-cause’ termination standards.
		Eliminates the existing procedures for automatic Board review of a termination or extended suspension; specifies the Board’s authority to delegate discretionary appeals/review to the ExCo or other committee of Board members.
		Restates that temporary suspension may be used in lieu of, or in advance of, termination.
Annual Meeting	Art. IV Sec. 1	Updates language around the timing of the annual Member meeting:
		Replaces reference to the “second week of November” with “during the month of October or November”.
		Eliminates language regarding avoiding legal or religious holidays; given Crossref’s global footprint, this is not feasible.

Thanks for reading this far!

Don’t forget, members do not need to do anything in response to these changes - by continuing to register metadata after 11th July, they are accepting the latest version of the terms. But do let us know if you have any questions by emailing member@crossref.org.

Meet six winners of the first ever Crossref Metadata Awards

Kornelia Korzec — Wed, 07 May 2025 00:00:00 +0000

Marking our 25th anniversary, we launch the Crossref Metadata Awards to emphasise our community’s role in stewarding and enriching the scholarly record.

We are pleased to recognise Noyam Publishers, GigaScience Press, eLife, American Society for Microbiology, and Universidad La Salle Arequipa Perú with the Crossref Metadata Excellence Awards, and Instituto Geologico y Minero de España wins the Crossref Metadata Enrichment Award. These inaugural awards highlight the leadership of members who show dedication to the best metadata practices.

Crossref exists to make scholarly communications better by making research objects easy to find, cite, link, assess, and reuse. Our members weave the research nexus: a rich and reusable open network of connections between works resulting from the scholarly process and the people and institutions engaged in it.

Rich metadata improves discoverability of and trust in published works. Many institutions now strive to turn towards open research information in their reporting, assessment and evaluation. And so we believe it’s time to give credit to members that are doing the best work in supporting others across the scholarly ecosystem with their metadata.

The awards presented today will be followed by a series of blog interviews, where the winners will share how they achieved their high level of metadata completeness.

Starting in 2025, we will hold the awards every other year.

Read on to get more acquainted with the winners, learn about other high performing organisations and overall trends in metadata practices we see at Crossref.

Recognising Metadata Excellence

Noyam Publishers is based in Ghana. Colleagues had the pleasure of meeting them in person, during the Crossref Accra event this March. Striving for visibility motivates Noyam’s high performance when it comes to metadata. With 57% coverage of key metadata elements across their records, they are a leader among the members in our Global Equitable Membership (GEM) program.

Among other GEM members who show high participation in the research nexus, we see more than 40% coverage of key metadata elements for the records registered by University of Sierra Leone Teaching Hospitals Complex in Sierra Leone, Queen Arwa University in Yemen, Kathmandu University School of Education in Nepal, and International Journal for Innovation Education and Research in Bangladesh.

GigaScience Press, based in Hong Kong, is the leader among small members (organisations of less than USD 1 mln of publishing revenue or expenses). Discoverability drives their high metadata standards, and GigaSciencePress sees those having advantages in terms of service integrations and development too. They are quick to credit the expertise of their technology partner, River Valley Technologies as the strategic contributor to them achieving 82% coverage of key metadata elements across their records.

It’s worth highlighting that the competition among our small members was much closer than in any other category! Stichting SciPost (Netherlands) also show more than 80% coverage across their records, followed by Life Science Alliance, LLC (United States), National Institute for Health and Care Research (United Kingdom), and Universidad La Salle Arequipa (Peru), each of which achieved more than 70% metadata coverage across their registered works.

eLife leads among our medium members (organisations between USD 1 mln and 10 mln of publishing revenue or expenses) with 85% coverage of key metadata elements. They have shown dedication to metadata quality and consistently high performance over the years. They are also the first publisher to include Crossref grant IDs in their records, adopting the Grant Linking System.

Other medium-sized organisations to note are MDPI AG in Switzerland, and XMLink in South Korea – while there’s a significant gap to the leader, each of these organisations has more than 50% coverage of key metadata elements across their records.

It appears that large members (organisations with more than USD 10 mln of publishing revenue or expenses) struggle to achieve consistency in metadata quality across all of their records. Yet, we are delighted to recognise the American Society for Microbiology in the United States, who have embarked on a large metadata quality improvement project several years ago, and it continues to bear fruit as we see 56% of metadata coverage across ASM’s records. They’ve shared their experience on our blog already, so this time we’ll invite them to follow up with the latest updates on their metadata practices.

American Geophysical Union (AGU), Public Library of Science (PLOS), SAGE Publications, and Wiley, all based in the United States, are ASM’s closest runners up. While the gap is significant – still each of these organisations has more than 40% of metadata coverage across their records. PLOS has an impressive proportion of Crossmark-enabled works (99%), and American Geophysical Union and Wiley are registering a significant proportion of abstracts for their records (87% and 59% respectively).

It often takes time to hone new processes and learn about metadata practices, so we decided to recognise metadata excellence among our new members: organisations that joined Crossref within the past two years. Our inaugural award for excellence among new members goes to Universidad La Salle Arequipa Perú, who joined Crossref in May 2023, and have 71% metadata coverage across their records.

Rewarding Metadata Enrichment

Our members don’t just register their records with us – they also steward and maintain their metadata over time. As new technical capabilities and metadata elements become available, members have the ability to update their metadata. We decided to recognise the member who achieved the biggest transformation to their records in the past two years: Instituto Geologico y Minero de España, based in Spain, jumped from just over 1% to more than 40% metadata coverage for their records in the space of the past two years.

Others who made more than 30% jump in their metadata completeness in the past two years are Cabrera Research Lab (United States), Centro de Investigaciones Sociologicas (Spain), Bon View Publishing PTE (Singapore), Asociacion Colombiana de Neurologia (Colombia), Instituto Superior Tecnológico Almirante Illingworth (Ecuador), and Tashkent State University of Economics (Uzbekistan).

How did we select the winners?

Our Metadata Excellence Awardees have been selected on the basis of the overall highest coverage of metadata elements included in Participation Reports as of March 2025, and the Metadata Enrichment Award was based on the comparison between performance on the same criteria between March 2023 and March 2025. Participation Reports are openly available and provide information about the proportion of a given member’s records that include the following high-value metadata elements:

References
Abstracts
ORCID iDs
Affiliations
ROR IDs
Funder Registry IDs
Funding award numbers
Crossmark enabled
Text mining URLs
License URLs

The report also includes Similarity Check URLs. However, since Similarity Check is an optional service that attracts a separate fee – it wouldn’t be equitable to include it in our analysis.

We encourage all members to periodically monitor their participation reports, and we offer frequent drop-in metadata health-check sessions, where we review the reports together and offer advice on making improvements in areas where our members experience challenges.

In a membership of more than 22,000 organisations, it’s difficult to recognise just one organisation as a model of best practices. There are many nuances that influence the performance and we would like to be transparent about some considerations we made in our awarding process.

First of all, we considered volume of publishing as a key variable, and decided to qualify organisations with a minimum of 20 items of registered content.

We also recognise that size matters – and decided to award our Metadata Excellence Awards in four categories corresponding with organisational size and resourcing.

Beyond the winners – overview of good metadata practice across different types of works

The scholarly communications landscape is always evolving, and new types of content arise all the time. Crossref schema enables rich metadata collection about journal articles, books, book chapters, preprints, conference proceedings, technical reports, as well as grants, and more.

At this point, the most prolific way of sharing scholarship - at least judging by the number of records registered with Crossref – is a journal article. There are 112,982,290 journal articles in the Crossref database, and in 2024 alone our members created 6,747,031 journal articles records with us.

When it comes to books (2,212,221 total records) and book chapters (22,892,785 total records), publishers with the richest metadata records include Universitatsbibliothek Kiel (Germany) with more than 50% coverage of key metadata elements across their book records, and 70% for their book chapters. RTI Press (US) also has strong metadata for books (52%), while Firenze University Press (Italy) has 56% of metadata coverage across their book chapters. Incidentally, Universitatsbibliothek Kiel (Germany) are also leaders in metadata for conference proceedings (53% metadata coverage of those records).

Preprints and posted content (including preprints, eprints, working papers, reports) are relatively new on the scene and growing rapidly – Crossref has 1,683,351 preprint records (413,742 registered in 2024). The richest metadata records for preprints belong to eLife (UK) - they cover more than 50% of key metadata elements across their preprints records in Crossref. Springer Science and Business Media LLC (Netherlands) have 48% metadata coverage for their preprints, American Chemical Society (ACS; United States) with 46%, and UNISA Press (South Africa) and PeerJ (US) follow with 44% coverage.

The newest of record types that can be registered with us are grants. At present this is an early adopters domain, with 152,810 registered grants so far. The European Union (represented in Crossref by the Publications Office of the European Union) registered the most grants to date.

Beyond the winners – overview of coverage in key metadata elements

When speaking about key metadata elements reflected in our Participation Report, the coverage varies widely. For example, overall 21% of records in Crossref have abstract metadata; 2,000 members have a full coverage of their records with abstracts, while 1,000 don’t include any. Deposition of ORCID iDs is growing but still very low, with only 10% of records including ORCID iDs.

Affiliation metadata, broadly sought after by many stakeholders in the scholarly ecosystem - not least because of its role as a key marker of trust - is growing steadily but slowly: only 16% of records included it at the end of March 2025. With recent improvements in our helper tools (especially the latest version of the record registration form), and the upcoming developments in other publishing software (notably the upcoming 3.5 version of OJS), which support affiliation metadata better – we’re expecting a significant improvement in the coming months.

As with affiliations, when research integrity judgements are concerned, another key element is the funding information. The growing interest in metadata among funders further strengthens the case for increasing inclusion of funder information in this way, ideally including Crossref grant DOIs that funders are registering in the hope of using the Grant Linking System to help their assessment and evaluation work. At the moment the space for improvement is vast, with only 6% of Crossref metadata including funder IDs and award numbers.

We support ROR IDs in both affiliation and funding metadata, but adoption among our members is slow. So far the top five contributors of ROR IDs to Crossref are Fonds de recherche du Québec, eLife, American Physical Society (APS), Optica Publishing Group, and Wellcome.

Licence metadata is currently included for 43% of records in Crossref, and we see that thousands of members don’t include it. Not all members realise that this is a practical challenge for their authors, as it hinders institutions and funders who seek to monitor compliance with their openness mandates.

Finally, references metadata is the lifeblood of the research nexus, supporting transparency and discoverability of scholarship. We’ve got 44% coverage of reference metadata across records registered in Crossref. While reference linking is a member obligation, including references in the metadata is a recommended best practice. The way references are recognised and included in works varies by publication type and discipline, which makes it harder for some members to provide it.

There’s an ongoing need to raise awareness about the role of metadata among the wider community, including editors and researchers. We have collaborated with practitioners, supporters, and users of metadata to develop relevant resources as part of the Metadata 20/20 initiative.

We make efforts to educate our members about best practices when it comes to registering their metadata with us and offer a range of support options, including technical support on our Community Forum. Recognising the leaders in metadata participation is part of that process too. With the upcoming blog series from our awardees, we hope to spur peer-to-peer learning to facilitate widespread improvements and to raise the profile of metadata quality among the community.

Metadata Advisory Group call for applications

Patricia Feeney — Fri, 02 May 2025 00:00:00 +0000

We’ve been accelerating our metadata development efforts and recently released version 5.4 of our metadata schema, and are planning to release version 5.5 (including support for multiple contributor roles and the CRediT taxonomy) this summer. We will also extend our grants schema based on the Funders Advisory Group work, and make progress on other changes as set out on our new metadata development roadmap.

As we work towards the vision of the rich and reusable open network of relationships connecting research organisations, people, things, and actions, dubbed the Research Nexus, our schemas need to change to accommodate the evolving landscape of research processes and communications.

In the past we convened the Metadata Interest Group that helped shape the current set of updates we’re now working through, including changes to names, expansion of support for abstracts, dates, and multilingual metadata. As we’ll soon move into new territory (support for subjects, keywords, and other metadata essential to developing a robust research nexus), we want to further enlist the support of our community to help shape the metadata we collect and the metadata best practices we promote.

We are inviting Crossref members, metadata users, and others with an interest in shaping metadata development at Crossref to apply to join our new Metadata Advisory Group.

The purpose of the group is to contribute your advice and insight to help shape our metadata development as we broaden the metadata we collect and outputs we support to better align with the Research Nexus. Group participants will help shape metadata development at Crossref, and will discuss potential new metadata to adopt, best practices, and the overall needs of metadata providers and users.

We’re looking for participants with experience with XML, JSON, and other metadata formats. We’ll cover a range of topics but we would particularly like to engage with those of you with an interest in emerging content types.

The Metadata Advisory Group will meet quarterly and we’ll accommodate multiple time zones as needed as we want participation to reflect the regional diversity of our membership.

If you’re interested, please submit an application!

Reflections from Crossref Accra 2025 - Strengthening open science and partnerships in Ghana

Johanssen Obanda — Tue, 29 Apr 2025 00:00:00 +0000

Crossref is a membership organisation, and it’s the global community of members that creates the Research Nexus together. Meeting our community locally is a highlight and an important learning experience. This year, we started by connecting with a growing community in Accra, Ghana - our first in-person event in the country included in our GEM program. From 14 members in 2023 to 31 in 2025, our community in Ghana is blooming.

At its core, Crossref Accra 2025 was about showing up for the community in Ghana - listening, learning, and building together. On the 20th of March, we welcomed 66 participants: journal editors, university staff, librarians, and researchers. People who are doing the real work of making scholarly publishing happen in the region.

Photo: Participants from across Ghana’s research and publishing landscape.

We started the day with a walkthrough of Crossref’s services, then shifted into more tailored conversations - talking metadata quality, improving discoverability, and making Crossref tools work for the local context. The panel featuring AJOL, WACREN, and CARLIGH was a key moment. We heard honest reflections about journal sustainability, the barriers to indexing, and how Open Access can grow if local infrastructure is supported. Each organisation shared how they’re working to strengthen research communities and where they see Crossref fitting into that bigger picture.

Photo: Crossref Ambassador Richard Lamptey moderates a panel with WACREN’s Effah Amponsah, CARLIGH’s Mac Anthony Cobblah, and AJOL’s Kylie van Zyl on sustaining journals and advancing Open Access in the region.

During the dedicated listening session, participants spoke candidly about the cost burden of APCs, the over-reliance on foreign journals for recognition, and the uphill battle local journals face, from limited resources to slow workflows. There was a clear push for stronger local publishing platforms and more training around tools like OJS. People want technical clarity: How does Crossref fit into their workflows? What’s involved in registering metadata and DOIs? What’s the actual value? Many also voiced interest in strengthening relationships with indexing services, and connecting university presses more directly with Crossref. The afternoon breakout sessions were hands-on. One group explored how to use the Participation Reports to check metadata completeness, while the other dove into using the Crossref API. People started swapping tips, asking questions, and brainstorming ways to improve how their institutions handle metadata. Several wanted to know how to automate more of their workflows through OJS, boost reference linking, and pull better reports from the Crossref system.

Photo: A collage of snapshots capturing activities at the Crossref Accra event.

Outside the main event, we also visited some of our members and stopped by the Association of African Universities. These visits gave us more time for deeper conversations about publishing workflows, ORCID uptake, metadata visibility, and the bigger picture of Open Access in Ghana. We heard a lot about the potential for more equitable partnerships and stronger local ownership of publishing infrastructure.

Post-event feedback made one thing clear: people want more opportunities to learn - more practical workshops, more guidance on using Crossref tools, and more support navigating the technical side of things. There’s growing interest in forming a local user group, a space to keep sharing, troubleshooting, and moving forward together. And the desire to improve indexing and visibility was a recurring theme. People see registering identifiers for content as an essential step on that journey. There’s also a broader concern about long-term sustainability and ethical publishing practices. Many journals are doing their best in tough conditions, and there’s a real appetite for honest conversations about quality, trust, and resilience.

Photo: Crossref staff and ambassadors with member Amy Asimah from Regional Maritime University. Pictured: Johanssen Obanda, Oumy Ndiaye, Evans Atoni, Patience Mbum, Audrey Kenni Nganmeni, Ginny Hendricks, and Richard Lamptey.

Crossref Accra 2025 reminded us how valuable these local gatherings are - not just for sharing tools and workflows, but for building lasting connections. We’re grateful to our Ambassadors and team who helped make it happen, and we’re committed to deepening our support across the region. There’s so much potential in Ghana’s scholarly community, and in West Africa more broadly, as we’ve seen again at WACREN in Senegal a couple of weeks later. We’re committed to working with local partners to help it grow.

Enhancing DOI Accessibility for All Users

Patrick Vale — Mon, 28 Apr 2025 00:00:00 +0000

2025 Update

In 2022, we set out to update our DOI display guidelines with the intention to adopt the proposals in 2025. It’s important to note from the outset that we are not mandating any immediate changes to the DOI display guidelines. Instead, we are working with our community to co-create a solution that addresses the diverse needs of all users, rather than imposing technical changes that may not suit everyone.

Background

DOI links are the lifeblood of scholarly communication. They’re the canonical identifiers that enable researchers to find, cite, and assess academic work. In essence, they’re stable, reliable, and easy to use—provided you can see them. But what happens when a user can’t rely on visual cues?

The Accessibility Challenge

For users of screen readers and other assistive technologies, the full value of a DOI link can be lost. While sighted users benefit from the context surrounding a DOI link—such as the title, abstract, and other metadata—screen reader users often hear just the bare URL. This means they might not know what content the DOI link represents, leading to confusion and a diminished browsing experience.

The problem is compounded by the technical nature of DOI links. Being URIs (Uniform Resource Identifiers), they don’t naturally lend themselves to the same accessibility techniques as standard URLs. When we attempted to tweak DOI links directly, every change that improved accessibility for one group inadvertently hindered another. Whether it was a WCAG (Web Content Accessibility Guidelines) rule or an ARIA (Accessible Rich Internet Applications) attribute, a solution that worked in one area would break in another.

A Community-Driven Approach

Realizing that a one-size-fits-all fix wouldn’t work, we took a different approach - one that involved the community from the outset. After consulting with early adopters and attending an insightful session with the JATS4R accessibility group, it became clear that the answer lay in experimentation and iteration. Rather than modifying the DOI display guidelines immediately, we are developing a tool that enhances the user experience without disrupting the current standards.

It’s worth noting that this solution places the responsibility on the end user rather than on publishers and platform providers. However, by doing so, users can have a consistent browsing experience regardless of the platform they use to access scholarly content. This approach also serves as an important stepping stone toward a future publisher-provided solution—be it via accessibility-focused JavaScript or a mandated dual-link implementation—and any efforts to recommend or mandate such changes will benefit greatly from concrete evidence of the effectiveness and scalability of this approach.

Introducing the DOI Accessibility Enhancer

First demonstrated at the recent Crossref Annual Meeting, here we share our DOI Accessibility Enhancer browser extension. Available now on the Firefox Add-on Store and the Chrome Web Store, this extension is designed to improve the experience of DOI links for screen reader users without altering the default behavior for sighted users.

How It Works

Scanning for DOI Links: The extension scans any webpage for DOI links.
Querying Metadata: Once a DOI is detected, it queries the Crossref REST API to retrieve the title of the corresponding scholarly work.
Enhancing the Link: The title is then injected as a screen-reader–only link. This means that when a screen reader user navigates to the DOI, they hear the title of the paper rather than the opaque URL.
Maintaining Visual Integrity: For sighted users, the original DOI link remains unchanged—visible, clickable, and easy to copy-and-paste.
Highlighting for Testing: An optional feature highlights updated links, making it easier for developers and testers to see the changes in action.

Get Involved

This project is very much a community effort. The extension is open-source, and we welcome feedback and contributions via our GitLab repository, email, or Community Forum. Your real-life experiences and insights will drive future improvements, ensuring that our solution meets the diverse needs of all users.

Try It Out

If you’re using Firefox, head over to the Firefox Add-on Store and install the DOI Accessibility Enhancer today. If you’re a Chrome user, you can find the extension directly in the Chrome Web Store. If you use a screen reader you’ll experience the difference firsthand - and if you don’t, give it a try with VoiceOver enabled (Command-F5 on a Mac).

Together, we can advance scholarly accessibility and ensure that critical research remains discoverable for everyone.

Supporting Membership through the Sponsor Program

Susan Collins — Fri, 18 Apr 2025 00:00:00 +0000

Sponsors make Crossref membership accessible to organisations that would otherwise face barriers to joining us. They also provide support to facilitate participation, which increases the amount and diversity of metadata in the global Research Nexus. This in turn improves discoverability and transparency of scholarship behind the works.

Growing number of sponsors

Our first sponsors joined in 2008, but the program started to grow rapidly between 2012-2014, with the addition of sponsors in South Korea, Türkiye, Russia, India, and Ukraine. In 2015, we welcomed our first South American sponsor from Brazil, followed by more sponsors in Latin America starting in 2016, and our first sponsor in Indonesia in 2017.

As of December 2024, Crossref works with 124 sponsoring organisations that support 12,195 sponsored members.

In 2021, we updated the criteria for organisations to be accepted as sponsors, raising the bar to ensure that potential sponsors accurately and successfully represent Crossref in the community. We also paused the acceptance of new Sponsors from regions where such organisations are already prolific. By doing so, we can focus on growing the program in areas with the greatest need.

In 2024, we added eight new sponsors to the program; these included our first sponsor in Bangladesh (our first GEM sponsor), as well as sponsors in China, Kazakhstan, Pakistan, Türkiye, Tunisia, Iraq, and Kenya.

Our five largest sponsors, based on the number of members they support (as of the end of 2024) are:

Relawan Jurnal Indonesia, Indonesia - 3076 members
Associacao Brasileira de Editores Cientificos do Brasil (ABEC Brasil) - 1312
Tubitak Ulakbim DergiPark, Türkiye - 1248
NEICON ISP, Russia- 713
Kyobobook Center, South Korea - 419

The majority of sponsors are much smaller than this, looking after 25 or fewer Sponsored Members.

Each sponsor has specific criteria for what kind of organisations they work with. Some are dedicated to supporting organisations in a specific country or region, while others may be based on geography, language, subject area, or usage of a specific platform, e.g. OJS.

Our sponsors are distributed across all regions of the world, and we’re continuously working to forge networks with organisations in regions with the least coverage, to ensure scholarly communicators anywhere can join Crossref and contribute to the Research Nexus.

Asia Pacific: 22
Central and Eastern Europe: 29
Central and South Asia: 25
Latin America and the Caribbean: 24
North Africa and the Middle East: 3
Sub-Saharan Africa: 2
US and Canada: 5
Western Europe: 14

Currently, sponsored members represent 115 different countries, with the largest proportions from Latin America, South-eastern Asia, and Eastern Europe. Nearly two-thirds of sponsored members self-identify as universities, libraries, government agencies, foundations, scholar publishers, and research institutions.

To date, sponsored members have contributed 6.5 million works to the Research Nexus.

Importantly, the sponsored members have the ability to fully participate in Crossref – they are stewards of their records (even if some choose to delegate this activity to their sponsor), they can vote, stand in for elections to our Board of Directors, and collaborate with others in the Crossref community, just as any other member.

Sponsors are key partners for us in making participation easier for organisations in their communities. They work with us to provide administrative, billing, technical, and local language support to the members they work with. Depending on the financial model, they may charge members for their services.

Technical support they provide for members makes it more tailored and often quicker than the Crossref team could offer. For example, sponsors can provide service in their local language using their preferred method (helpdesk, WhatsApp, phone, email), which varies widely by region; or, where they charge any fees – they tend to collect those in the local currency. Some sponsors even take care of all the records registration for the members they support.

It’s important to note that sponsors can only support the participation of organisations that would otherwise be in the current $275 fee tier (or up to $500 for funders) if these organisations were to join independently. Regardless of the number of sponsored members, the sponsor pays one membership fee on behalf of them all, and then they also pay all the registration fees that are due on behalf of their sponsored members, which alleviates challenges related to paying in foreign currency. Overall, sponsors make Crossref membership more economical for the organisations that participate this way, and Crossref benefits from billing efficiencies.

In a recent survey of sponsored members (carried out in July 2023, with 204 responses from members working with 53 sponsors), the majority of sponsored members (88%) said that sponsors met their expectations and 85% are likely or very likely to recommend their sponsors to another organisation.

Respondents indicated that the aspects of working with a sponsor that were most valued are technical support (72%), financial assistance/no annual fee (37.3%), ability to pay in local currency (43%), and local language support (44%).

It’s important to note that sponsors often offer many non-Crossref services to members too, including anything from website design, copy editing, typesetting, set up of publishing platform, XML-JATS markup, to assistance with submitting content to third-party databases.

Sponsors represent Crossref in the community. They also assist us in connecting with their communities locally. In 2024, we collaborated with Biteca for an event in Bogotá, and Relawan Jurnal Indonesia for a two-day event in Jakarta. Both sponsors advised on venues, promoted the event to the members they support, coordinated local guest speakers, and provided translation services as needed. We also collaborated with Hipertexto-Netizen on engaging our community at the Guadalajara Book Fair. The success of these events was in part due to our collaboration with each sponsor.

Ensuring quality experience for our members

We try to make sure that every sponsor we work with will be able to commit to helping our members long-term. We offer training too, with an expectation that they can disseminate the learning to their members. The majority of sponsored members report receiving some training from their Sponsors (with 70% in our survey saying they’ve received adequate training on all services, while only 3% haven’t received any so far). Most recently we engaged sponsors with the Participation Reports to help them improve metadata completeness for their members.

In 2024, we’ve been meeting sponsors individually to review how things are going for them and their members – assessing member metadata quality, and additional services, as well as inviting their feedback about the program and suggestions for improvements that Crossref could make.

We’ve learnt a lot about practices related to record registration and training, business models and especially – a whole range of attitudes and approaches related to metadata completeness. Some sponsors register content for all or some of their members, while others provide technical support but do not register the content directly for members.

Members who used OJS often had higher scores because of the ease of use and availability of the plugins. Some sponsors noted that many journal editors are volunteers and don’t have the time or financial resources to collect extra metadata or update existing metadata records; they collect only what is required to register an item. Several sponsors also reported a barrier with authors’ mindset – they don’t tend to see the value of including ORCiDs or ROR IDs in their submissions. Somewhat surprisingly, we learned that not all members see the value in including references in their deposits or don’t wish to take the time to add them – this is a concern, as relationships created by references are a cornerstone of the Research Nexus, and markedly support discoverability of the content.

Sometimes, sponsors are unable to continue to provide services, or they are unable to meet the obligations of being a sponsor and their accounts are closed. In the cases where a sponsor account is closed, we will work with their members to find an alternative sponsor when possible.

Similarity Check is an external service provided in partnership with iThenticate, that’s available to Crossref members at a more competitive price, and it is in demand among the sponsored members too. Currently, 78 Sponsors offer Similarity Check to their members (however, not all sponsored members working with these sponsors have elected to use the service).

Sponsor LIBCOM Piotr Karwasinski was pleased that “All the rules of Crossref are unified. Everything is the same for everyone - the same for big publishers as well as small. Equal for everyone.”

Costs can sometimes be a concern; sponsors in India and Algeria both noted that $1USD is a lot of money for some. We mentioned the fee review being conducted with the RCFS project.

In summary

As we move toward realizing our vision of a connected Research Nexus, building a network for the global community must include input from all of the global community. When Crossref began 25 years ago our first members were mainly from the United States and Western Europe, but today our membership is much more global and diverse. Though our membership has grown to more than 22,000 organisations around the world, we are not seeing significant membership growth from all regions.

In the last few years, almost half of our members came from Southeastern Asia, Eastern Europe, and Latin America combined. However, there is much slower growth in other regions, mostly notably Northern and Sub-Saharan Africa, and parts of Central Asia, with only 5% of new member applications coming from these regions collectively. We know there are organisations in those areas contributing to the scholarly record, however, many continue to face financial, technical, and administrative barriers to become members.

The Sponsor Program is one of the avenues established to address and reduce barriers and to help facilitate membership and participation to all knowledge-sharing organisations worldwide. Ensuring it remains strong and successful requires collaboration, communication, and comprehensive training.

Request for proposals: Crossref website information architecture review

Lena Stoll — Thu, 17 Apr 2025 00:00:00 +0000

We are looking for an organisation to perform an audit of, and propose changes to, the structure and information architecture underlying our website, with the aim of making it easier for everyone in our community to navigate the website and find the information they need.

UPDATE, August 2025: We are partnering with Cazinc and Cactus AI Solutions on this work. Stay tuned for updates on the progress of this project over the coming months.

About Crossref

Crossref is a nonprofit membership organisation that exists to make scholarly communications better. We run open infrastructure to link research objects, entities, and actions, creating a lasting and reusable scholarly record that underpins open science and makes research outputs easy to find, cite, link, assess, and reuse.

Together with our 22,000 members in 160 countries, we drive metadata exchange and support nearly 2 billion monthly API queries, facilitating global research communication, for the benefit of society. Our members include research institutions, publishers, libraries, funders, government bodies, and other stakeholders in the scholarly communications ecosystem.

About the Crossref website

We launched the current website in 2016. A few years later, we custom-developed the current Documentation section, moving from a separate site (Zendesk, and prior to that HelpIQ). We subsequently launched a Discourse community forum and actively encourage self-service there. Despite these efforts, we still answered about 50,000 support emails in 2024.

We use the Hugo static site generator, and all the content, assets, and code are open in GitLab. We have dedicated staging and sandbox branches, and use staging for editing instead of the usual git merge requests, and sandbox for testing more substantial code or navigation changes.

We share the responsibility for editing across the teams, with a page owner/author denoted for each page. Most staff use VSCode for editing; we don’t have or need a CMS. We deploy changes to the live site around twice a week. Several custom shortcodes are in place, such as for tables and displaying related information based on tags, or for presentation elements like highlight boxes or columns. We host (many) images and files directly in the repository, rather than using a CDN. We use Algolia for site search, which was chosen because it can support multiple languages.

Current website structure

There are currently four main sections of the website:

Get involved: this landing page is the most up-to-date with our current positioning and messaging. The section includes how to join as a member and the ways you can participate, obligations and benefits; a welcome page for new members to get started; events and webinars like our annual meeting; special projects or campaigns that need landing pages; fees; programs such as for service providers and ambassadors; global equitable membership; code of conduct; and working groups (which are different from board committees).
Find a service: listing the purpose and value/benefits for each service, such as content registration, metadata retrieval/APIs/Search, Crossmark, Similarity Check, Grant Linking System, and some other quasi-services that require members to develop or enable something, like reference linking or the Open Funder Registry or ROR.
Documentation: following more-or-less our “managed member journey” pathway, this includes getting set up, how to create DOI suffixes, how to select the right tool for content registration, how to interpret the various reports that members receive, what to expect in terms of invoicing, schema library and best practices for metadata sharing incl. guidance on principles to follow and sample XML files to edit. Each ‘service’ then has it’s own documentation section too.
About: governance, including information about our board, committees, and bylaws. Financial information and annual reports. Staff pages, org chart, jobs, and policies incl. employee handbooks. History of Crossref and mission. Under the sub-heading “Operations & sustainability”, there is also detailed information about membership processes such as revocations, managing legal sanctions, member practices, and member offboarding.

Additionally, the website hosts our blog and allows users to sign up for our newsletter, which are two key ways in which we keep our community informed.

Project overview

End goal

We want to allow our community to self-serve with information about what Crossref does, how to become a member, how to use our tools, and how to participate in our programs and services. The Principles of Open Scholarly Infrastructure are central to how we operate, and we want the information about the how, what, and why of Crossref to not only be openly available, but also easy to discover and reuse.

Visitors to www.crossref.org should be offered the information that they are looking for quickly and intuitively. A reduction in the number of help-desk tickets we receive (in 2024 we answered 50,000 of them) would be an indication of an improved self-service website, as would lower bounce rates.

Scope and deliverables

At the end of this information architecture review project, we expect to have agreed on a set of recommendations for tackling the problem statements laid out in the appendix of this document, as well as a plan for how the recommendations should be implemented. This plan will form the basis for an implementation project in 2026. We encourage applications both from organisations who would also be comfortable taking on the implementation project and from those who feel their expertise is specific to the review project described herein.

Specifically, we expect the following deliverables:

Assessment of key user needs (through analytics and/or user interviews incl. editors)
Audit and analysis of current site structure and how it serves key user pathways
Recommendations for content re-architecture, navigation and search improvements
Strategy for taxonomy and/or tagging system
Strategy for documentation site setup
Strategy for information pathways between website, docs, community forum, ticketing systems
Recommended roadmap for 2026 implementation project
Nice to have: Wireframes or annotated sitemaps for future site layout

Problem statements

It is difficult to find information about our services. Even Crossref staff often use search engines to find a page on our website rather than navigating to it or using the built-in search on the website. It’s often not clear whether the information you are looking for is on the “Find a service” page or the “Documentation” page for a given service, and there is no consistent cross-linking between the two groups of pages. There is a search bar prominently placed on the home page, but the search currently only looks for direct matches between the search terms and page contents (with some declensions, stopwords, and fuzziness to allow for typos). We have limited tracking available in Algolia, but can see that in a 7-day span in March 2025, a large portion of searches (78%) returned no results.
It is difficult to navigate our website. The home page contains some quick links to key pages, but they are not very visible. In order to navigate the website from the home page, users have to expand a hamburger menu which takes up the whole page, and are then presented with an overwhelming amount of options. Once users have left the home page, the way they navigate depends on which section of the website a user finds themselves in: all pages have breadcrumbs going back to Home, while only Documentation pages have a hierarchical sidebar. In order to switch between the basic groups of pages (Get involved, Find a service, Documentation, About us), users have to use the global hamburger menu.
Our home page doesn’t do a very good job of explaining who we are and what we do. A lot of real estate is taken up by images and recent news items without much context. Bounce rates from the home page are high (65% as of March 2025).
Our user interfaces and reports are not easily accessible from our website. While we are not a SaaS organisation, there is an established pattern of being able to access an organisation’s services directly from its website (often via a login button at the top right). This is complicated by the fact that we don’t have one single frontend “platform”. In fact we don’t have a single page linking out to the various frontends and interfaces, nor do we have a consistent pattern of linking out to an interface from the documentation page describing how to use it.
Some of the pages and grouping of pages are outdated and don’t reflect our current priorities or ways of working anymore. For example, the Get involved section still features Special programs and Service providers quite prominently, but the cross-functional programs that shape most of our strategic work now (Co-creation and Community Trends, Contributing to the Research Nexus, Open and Sustainable Operations, Metadata Development) are not represented. Find a service strongly suggests we’re a service provider, whereas most of our services are enabling infrastructure, requiring members to build or act on something. Some more recently created pages don’t fit neatly into any of the current groupings: e.g., API Learning Hub can be found under Get involved and in the home page footer, but doesn’t really belong in either. We also have time-limited, special projects or campaigns like the 25th anniversary of Crossref or the Resourcing Crossref for Future Sustainability project, for which there isn’t a great home. Lastly, we want to host additional content on our website in future, such as our own staff publications; instructions on how to find our codebases and how to contribute to them; how to build technical integrations; how to report bugs; and general best practices in scholarly communications (e.g. in the context of our work on the integrity of the scholarly record), which is not really part of the documentation of our services.

Project budget and timeline

We have a maximum budget of $20,000 allocated to the information architecture review project. The projected timeline is as follows:

RFP issued: April 17, 2025
Final deadline for proposals: May 15, 2025
Shortlisted applicant interviews: May 2025
Appointment made: June 2025
Project kick-off: July 2025
Final deliverables due: October 2025

If you are interested in applying but don’t think this timeline is deliverable for you, please contact us to suggest what would be realistic for you or your organisation before applying.

Proposal submission requirements

Proposals, as well as any questions, should be submitted to Lena Stoll by 15 May 2025.

Please include the following in your proposal:

Company background and relevant experience with open-source static sites and mission-driven communications
Case studies or examples of comparable work
Your approach to the proposed project and how you would structure it
Team bios and roles incl. typical timezones
Timeline and milestone estimates
Proposed budget, including breakdown
Proposed cadence of check-ins, communications, milestones, and deliverables
Contact information

Proposal evaluation criteria

We will evaluate proposals based on:

Demonstrated understanding of our mission and community needs
Proven experience designing for multilingual and multinational audiences
Expertise in mission-driven business-to-business communications and information architecture
Quality of previous work and case studies
Value for money

We look forward to hearing from you!

The programs approach: our experiences during the first quarter of 2025

Helena Cousijn — Tue, 08 Apr 2025 00:00:00 +0000

At the end of last year, we were excited to announce our renewed commitment to community and the launch of three cross-functional programs to guide and accelerate our work. We introduced this new approach to work towards better cross-team alignment, shared responsibility, improved communication and learning, and make more progress on the things members need.

In line with the Crossref strategic agenda, the three programs focus on:

Co-creation and Community Trends (CCT): This program is responsible for interfaces such as reports/dashboards, record registration interfaces, connections and collaborations such as Open Funder Registry, ROR, ORCID auto-update, as well as OJS and other partner integrations. This program also includes the Crossref website and any front-end interfaces to support other programs. It includes initiatives aimed at upholding the integrity of the scholarly record and our tools in this area, such as Crossmark and retraction/correction tooling, and Similarity Check for text comparisons.
Contributing to the Research Nexus (CRN): This program manages and oversees all activities relating to contributing to the Research Nexus. A lot of the work in this program revolves around our REST API, but also includes our other APIs, incorporating external data sources like Retraction Watch and Event Data, building out metadata matching services with the new data science team, supporting the community of metadata users with API sprints and more modern options for retrieving metadata based on usage and need.
Open and Sustainable Operations (OSO): This program manages and oversees all activities related to making our operations more open, transparent, and sustainable. This program focuses on supporting and strengthening the core functions our members rely on and enabling future growth. It includes metadata deposit and processing, most apps for e.g. managing titles, authentication, and architectural and infrastructural projects like moving from the data centre to the AWS cloud service. This program also includes modernising our operations in general, which is not just technology but also finance and human resources, so projects like membership process automation, financial analyses, and business system integrations.

The approach we are taking is to support the work within the programs through (internal) cross-functional steering groups. Led by three program leads (who share updates on their programs below), three program steering groups meet regularly to discuss the topics and work that fall within the scope of each program. The steering groups consist of representatives from all teams within Crossref, which means every steering group has people from the community team, membership team, technical team, data science team, and operations and finance team, bringing all the perspectives and expertise needed to prioritise the next steps for Crossref and fostering broad knowledge sharing and shared responsibility.

Although the whole organisation contributes to these programs, they are coordinated by the Programs and Services team. The team was formed towards the end of 2024, and on the 1st of February, Helena Cousijn joined Crossref in the new role of Director of Programs and Services. Helena has a background in both product management and community engagement and is very excited to help Crossref shape the programs approach and work with all teams across the organisation to drive the strategic agenda forward!

If you’d like to keep an eye on the work that is happening within each program, you can find more information on the Crossref productboard.

Co-creation and Community Trends (CCT)

The mission of the CCT program is to build and foster relationships with our community and other services and organisations within it, so that Crossref can meet and anticipate community needs. Curiosity and listening are at the core of how we co-create to tackle emerging challenges, develop best practices, and explore new ideas for building the Research Nexus. We want our work to benefit all of Crossref’s diverse stakeholders - from our own colleagues and members to underrepresented communities in the wider scholarly ecosystem.

In the first quarter of 2025, our focus areas have been:

Improvements to our new record registration form for journal articles, which already supports grants, and was launched in beta for articles in 2024. For example, the form now has a built-in reference deposit feature. Join the conversation on the community forum for updates and feedback on this new helper tool.
Running a series of multilingual metadata health check webinars. There are more of these coming up throughout Q2, so it’s not too late to sign up for one if you are interested.
Integrating with Rogue Scholar to automate the assignment of DOIs to, and the archiving of, posts on this very blog.
Planning for the inaugural Crossref Metadata Awards - join our upcoming community call on 7 May to find out what this is all about.

In the coming months, we are hoping to tackle the following:

Kick off a project to review the information architecture of this website and look into how we can make our documentation and related information more helpful and easier to navigate.
Expand the record registration form for journal articles to allow easy editing of previously submitted records. This will allow us to sunset the long-deprecated Metadata Manager tool, as was first announced in 2021.
Begin building new record registration forms for more work types. Watch this space.
Explore options for supporting the integration of additional software systems with Crossref, building on our existing approach with OJS plugins, with a focus on open-source tools relied upon by our members for registering metadata.
Restore faceted search on Crossref Metadata Search. This feature was disabled in 2022 following intermittent performance issues. We believe recent improvements to Metadata Search will allow us to bring some filters back, although we will need to start small so as not to overload our systems with these more complex queries.

Contributing to the Research Nexus (CRN)

The research nexus is a rich and reusable open network that represents scholarly activity. It consists of connections between research organisations, people, things, and actions; it’s an evolving model of the scholarly record that the global community can build on forever for the benefit of society.

Our metadata is already a contribution to the research nexus, however, there is much more we would like to do. Our next steps will be to consolidate our existing data and services, and build the technical capacity, partnerships, and knowledge to enhance our contribution with new relationships. Some parts of our data storage and workflows don’t yet have the flexibility to fully capture all types of research objects and how they are connected.

To support this process, the main priorities in the program are:

Collaborate with our community. We want to get to know users of our metadata better and work more collaboratively alongside them. Also, we seek partners to contribute new data sources that will enhance our metadata with additional relationships.
Share the research nexus vision. We know that we aren’t alone in developing the research nexus, so we will reach out to others with a similar vision and identify where we have common goals.
Maintain our technology. We have already identified technical improvements we can make to our REST API, and we need to keep on top of monitoring and fixing bugs. We also need to build capacity for new types of data and relationships. Our other endpoints, such as the XML API and forwardlinks (for citations), need maintenance and are likely to be affected by a planned redesign of our core architecture.
Building a new matching service. Identifying relationships between metadata records is a key part of the research nexus. We have already improved reference matching over the years, and we’re looking to implement funding, affiliations, and version matching next. We’ve carried out research on several types of matching and are looking at building a new service to handle it in production.

In the first quarter of 2025, we’ve been working on:

Schema changes, making the first significant updates to our schema for several years, including adding the capacity for depositing ROR IDs for funding organisations in funding metadata.
Delivering Retraction Watch retractions via the REST API, integrated with member-supplied retraction/correction data.
Getting the community involved and understanding needs, planning a sprint and various workshops.
Plenty of under-the-hood updates to the REST API, and more significant upgrades to come later this year.

Next up, we will:

Plan and build out the new matching service.
Improve representation of some metadata in the REST API, including Crossref members, journals, and typed citations such as data citations.
Update the grants schema to extend the award types and respond to new funder member requests
Add contributor roles to the schema, including CRediT.
Ask our community about metadata retrieval, including the various APIs and the Metadata Plus subscription service.
Upgrade elements of the REST API and optimise the underlying technical infrastructure.

Open and Sustainable Operations (OSO)

The OSO program is centered on transparency and sustainability of our technical systems and our business and people operations. We focus on maintaining critical systems and operations and ensuring their security, addressing technical and operational debt, and controlling or reducing costs - to Crossref, our community, or the environment. We’re always keen to tackle projects to automate repetitive and manual tasks – of which we have many – and pay down technical debt, being as open and transparent as possible along the way.

Our most recently completed work includes:

Moving from Oracle to an open-source database, PostgreSQL. This work aligns with the POSI principles and sets us up for a more robust, reliable, and modern infrastructure.
Implementing metadata schema changes for deposit submission and processing, so we can now accept ROR IDs in funding metadata, as well as the changes in latest schema version (5.4.0) which includes the new ability to label references with a type (such as dataset, software, blog post, article, etc.).
Automating parts of the process to keep Sponsor information on our website up to date and make it easier to search, so our community can find relevant and accurate information about our Sponsors and how to work with them, and our membership team spends less time keeping the website current.

Ongoing work in our program includes:

Moving from a physical data center into the cloud (AWS). The PostgreSQL migration was the first step needed to enable our move to the cloud, which will allow us to operate more sustainably and efficiently.
Automating new member setup in our systems, which is largely a manual process now.

And coming up are:

Making changes in our core system to accept the upcoming 5.5 metadata schema version.
Extracting billing code from our main codebase, to set up as its own service. This will allow us to simplify our code and make it easier to maintain. We’ll also be implementing the changes to billing enacted as part of the Resourcing Crossref for Future Sustainability program (TBD!).
Holding a “systems workshop” in April, to understand how our current system(s) are and aren’t meeting staff, member, and community needs, and how we might go about building the open, sustainable Crossref system of the future.

What have we learned so far?

Internal communication

One of the reasons to implement a programs approach was to improve internal communication across the organisation. With all teams being represented on all steering groups, everyone is in the loop when decisions are taken. We see that this way, people feel more connected to the strategic agenda and, importantly, the ‘why’ is clearer to people. It is easier to get perspectives from across the organisation because contributing to these conversations is now part of people’s day jobs and so it’s easier to ask for their time. We are still looking to improve how we facilitate group discussions and decision-making to ensure we make the most of the program steering groups.

Planning and delivery

Working closely with people from across the organisation has helped with more effective planning. A closer collaboration between program leads and developers makes the delivery of new features and functionality more accurate and predictable. With the community and support teams also being part of the conversation, they can plan related comms and support/documentation efforts in a timely manner. So far, it has also been easier to get more things delivered. We have some big projects coming up this year that will be a good test for the programs approach!

Cross-cutting topics

The implementation of a cross-functional approach facilitates discussions around cross-cutting topics, but also leads to the question of how cross-cutting topics fit within a specific program! Maybe you already noticed that work on metadata schema 5.4 and the planned work on 5.5 is included under both Contributing to the Research Nexus and Open and Sustainable Operations in the update above. Because metadata development impacts many of our systems, work was needed within all programs to enable these changes - the input, the output and the interfaces. Later this year, we’re planning to share some visuals that better explain which projects sit with which program and how we deal with cross-cutting topics.

Alignment

One of the most important things for the approach to be successful is that people are bought in and willing to participate and communicate. For cross-organisational alignment, a culture needs to be in place (or developed) where people are willing to collaborate and be open and transparent about their work. In a practical sense, we are still looking at how we can better align our code bases with the current programs so that it is easier to develop the relevant expertise within the programs.

We hope to see many of you at our upcoming community call on 7 May. Please register to join as we discuss some of the work included in this update.

Version 5.4.0 metadata schema update now available

Patricia Feeney — Wed, 19 Mar 2025 00:00:00 +0000

This year, metadata development is one of our key priorities and we’re making a start with the release of version 5.4.0 of our input schema with some long-awaited changes. This is the first in what will be a series of metadata schema updates.

What is in this update?

Publication typing for citations

This is fairly simple; we’ve added a ‘type’ attribute to the citations members supply. This means you can identify a journal article citation as a journal article, but more importantly, you can identify a dataset, software, blog post, or other citation that may not have an identifier assigned to it. This makes it easier for the many thousands of metadata users to connect these citations to identifiers. We know many publishers, particularly journal publishers, do collect this information already and will consider making this change to deposit citation types with their records.

Support for version numbering

Members can now supply a version number across all relevant record types, including journal articles, book chapters, conference papers, posted content/preprints, datasets, reports, standards, and dissertations. The versioning update also includes an optional description field.

Members who version content are encouraged to register a new DOI with each version and supply the isVersionOf’ relationship to connect versions to each other, facilitating the Research Nexus and allowing members to avoid additional content registration fees, which don’t apply for versions.

Preprint status

This is specific to the ‘posted content’ record type and comes as a result of the recommendations of the Preprints Advisory Group. The new status field allows repositories to flag a preprint as ‘withdrawn’ or ‘removed,’ a situation specific to posted content.

There are some other minor updates as well, including:

An expansion of the language codes supported by a language attribute.
Additions to the archive locations we collect. Our membership terms ask members to archive their content where possible, ensuring their DOIs are able to resolve to the content persistently, and we ask that the archive(s) they use are identified in the metadata records registered with us.
We’ve increased the number of ISBNs supported per item from 6 to 100.

If you would like to begin using this schema, a brief transition guide is here. A full set of schema files are in our GitLab repository, and more information is available in our website documentation for schema 5.4.0.

What’s next?

We’ve already begun working on our next update, which will be an expansion of contributor roles. We’ll allow multiple contributor roles instead of the single role we currently support, we’ll add ‘corresponding author’ and ‘other’ to the Crossref role vocabulary. We will be also adding full support for CRediT.

We’re also hoping to fit in a remodeling of our group contributor (currently labeled ‘organisation’ in our input schema) in the next update, and I would appreciate feedback on this planned update.

More changes are planned, including an update to our grants schema, and expanded support for abstracts. We’ll be circulating details about those updates soon.

Join us for the Mid-year Community Call on 7th May to hear more!

2025 public data file now available

Martyn Rittman — Wed, 12 Mar 2025 00:00:00 +0000

Every year we release metadata for the full corpus of records registered with us, which can be downloaded for free in a single compressed file. This is one way in which we fulfil our mission to make metadata freely and widely available. By including the metadata of over 165 million research outputs from over 22,000 members worldwide and making them available in a standard format, we streamline access to metadata about scholarly objects such as journal articles, books, conference papers, preprints, research grants, standards, datasets, reports, blogs, and more.

Our metadata is used by thousands of services, researchers, and other organisations. We make it openly available through our APIs, which can be used to obtain a subset of records. If you want to work with our full corpus, the best way is to get a copy of the public data file and update it via the REST API with any new records created or changed since its release.

By providing an annual copy of the full corpus, we also expand the ways in which the metadata can be used and interrogated. It is ideal for groups using large samples of the scholarly record, such as metaresearchers or research integrity experts. You can find examples of the public data file used in research on journal editorial practices and in projects investigating gaps in the scholarly record.

How to access the public data file

The total size of the file is 197 GB and it is available in JSON-lines format. We also provide an experimental tool to convert the file to an Sqlite database. Before downloading the full dataset, you may wish to download the sample dataset containing 100 files (with 100 records in each, around 24 MB). This is a randomly sampled subset of metadata records and can be used for prototyping and development.

To get a copy of the annual data file you can access it directly via https://doi.org/10.13003/87bfgcee6g, or get the sample dataset and previous public data files from Academic Torrents. We make a donation to Academic Torrents to support their work, which allows the data to be accessible in this way. Some organisations have reported policies that prevent access to torrents, so we provide a copy that can be downloaded from AWS, which requires an AWS account and a small payment to cover the data transfer costs. You can find the details about access here.

We have some tips for working with the public data file. If you would like to have access to monthly snapshots of the whole corpus, along with higher API rate limits and other benefits, you can subscribe to Metadata Plus.

What’s different this year?

This year’s public data file contains an additional 9 million records, and many updates to previously deposited records. The formats and method of access are the same as last year, except that it uses JSON lines, meaning that each metadata record is on a single line and the file suffix is jsonl instead of json. The records have been sorted by DOI, meaning it should be easier to navigate.

A change this year is that the file does not contain aliased DOIs, which are DOI that are redirected to another DOI. Aliasing is necessary on rare occasions, for example when two DOIs are registered for the same content. Previously we haven’t indicated aliasing in the REST API and public data files; this year only the prime DOIs (the ones to which they are redirected) are included. This makes statistical analysis of the metadata more accurate, but beware that it may give different results in cases where many aliased DOIs were previously counted. See this community forum post for more details.

The file also contains retractions from the Retraction Watch database, which was acquired by Crossref in September 2023 and recently integrated into the REST API.

If you have questions, want to let us know how you will use the metadata, or want to discuss anything on the topic of retrieving Crossref metadata, head to our community forum. From there, you can also keep updated about changes to our schema and APIs.

Come ROR with us: Using ROR IDs in place of Funder IDs

Patricia Feeney — Wed, 05 Mar 2025 00:00:00 +0000

Today, we’re delighted to let you know that Crossref members can now use ROR IDs to identify funders in any place where you currently use Funder IDs in your metadata. Funder IDs remain available, but this change allows publishers, service providers, and funders to streamline workflows and introduce efficiencies by using a single open identifier for both researcher affiliations and funding organisations.

As you probably know, the Research Organisation Registry (ROR) is a global, community-led, carefully curated registry of open persistent identifiers for research organisations, including funding organisations. It’s a joint initiative led by the California Digital Library, Datacite and Crossref launched in 2019 that fulfills the long-standing need for an open organisation identifier.

In 2023, we shared our plan to transition the Open Funder Registry into ROR. More recently, we announced that we were planning to update our schema so that it is possible to collect ROR IDs where we currently collect Funder IDs such as in the funding metadata section for works and funder section for grants. Now that we have completed this work, Crossref members can start depositing ROR IDs where they would normally deposit Funder IDs. This update also means that the community, including funders, service providers, researchers, and data scientists can retrieve this metadata via our API.

So come and ROR with us and start depositing ROR IDs for both researcher affiliations and funding organisations.

Open Funder Registry-ROR transition

This is of course a significant first step in the Open Funder Registry to ROR transition.

We’ve always said that we would continue supporting Funder IDs in our schema and in our tools and services until the community is ready to transition - and we will. In the last year, Crossref and ROR conducted a series of Open Funder Registry user interviews to help us understand how it was being used and identify practical challenges to this transition in our members’ workflow (thank you to those who took part, it was incredibly useful!).

One major takeaway from this consultation was around the pivotal role that peer review management systems played in the Open Funder Registry-ROR transition. We look forward to seeing more service providers integrating with ROR in the future. If you are a service provider and are ready to integrate with ROR, drop support@ror.org an email.

Including ROR IDs in Crossref metadata

If you are ready to begin including ROR IDs in your funding metadata, you only need to include the ROR itself to identify a funder.

For example:

<fr:program name="fundref">
 <fr:assertion name="ror">https://ror.org/00fq5cm18</fr:assertion>
 <fr:assertion name="award_number">10.3030/725840</fr:assertion>
</fr:program>

Examples of more complex combinations of funding information are available in our documentation. This update has been made across all schema that support funding metadata.

Our grants schema has recently been updated to version 0.2.0 to support ROR IDs in place of funder identifiers as well. As with funding metadata, only the ROR ID needs to be supplied within the record:

<funding amount="750" currency="USD" funding-percentage="75" funding-type="APC">
 <ROR>https://ror.org/02twcfp32https://ror.org/02twcfp32</ROR>
 <funding-scheme>Sofa Lending Programme</funding-scheme>
</funding>

Although previously a funder name was collected with the funder identifier, for both grants records and funding data in an attempt to avoid redundant, incorrect or conflicting metadata, now we’re accepting an identifier only as the ROR ID has an existing metadata record. The organisation name exists within the record in the ROR registry and the ROR record is the authoritative source of the name.

ROR IDs in JSON outputs

We have an existing legacy practice of representing Open Funder Registry IDs as just a DOI, but ROR IDs are represented in the JSON outputs as a full URL with id-type “ROR”, for example:

Funding metadata


 "funder": [
 {
 "award": [
 "10.3030/725840"
 ],
 "id": [
 {
 "id": "https://ror.org/02twcfp32",
 "id-type": "ROR",
 "asserted-by": "publisher"
 }
 ]
 }
 ],

Grant funder information


"funding": [
 {
 "type": "infrastructure",
 "award-amount": {
 "amount": 750.0,
 "currency": "USD",
 "percentage": 75
 },
 "funder": {
 "id": [
 {
 "id": "https://ror.org/02twcfp32",
 "id-type": "ROR",
 "asserted-by": "publisher"
 }
 ]
 }
 }
 ]
 }
 ],

If you have any questions or feedback, get in touch with us support@crossref.org !

The GEM program - Year Two 2024

Susan Collins — Thu, 27 Feb 2025 00:00:00 +0000

We began our Global Equitable Membership (GEM) Program to provide greater membership equitability and accessibility to organisations in the world’s least economically advantaged countries. Eligibility for the program is based on a member’s country; our list of countries is predominantly based on the International Development Association (IDA). Eligible members pay no membership or content registration fees. The list undergoes periodic reviews, as countries may be added or removed over time as economic situations change.

The program began in January 2023 with 214 existing members; and 131 more joined throughout the year. In 2024, we saw 127 organisations joining via the GEM program, bringing the total number of participants to 458. We welcomed our first-ever members from Sierra Leone and Honduras, as well as our first Sponsor in Bangladesh (Sponsors are organisations that work with us to provide administrative, billing, technical, and local language support to the members they work with).

Of 458 organisations participating in the GEM program, 380 are independent members, 77 are sponsored, and there is one sponsoring organisation. To date, these members have contributed over 279,000 works to the Research Nexus, our concept of a fully connected global scholarly ecosystem.

Though we have Sponsors based elsewhere, working with members who are in GEM countries (e.g. PKP), we will continue to consult with our ambassadors and other partners to identify potential new sponsors that are based in GEM countries.

Number of Crossref GEM members by country:

GEM Country (Alphabetically)	Total No. of Members
Afghanistan	17
Bangladesh	120
Benin	5
Bhutan	6
Burkina Faso	4
Burundi	2
Cambodia	8
Central African Republic	1
Congo, Democratic Republic	15
Ethiopia	13
Ghana	27
Guyana	2
Haiti	1
Honduras	1
Kosovo	8
Kyrgyz Republic	23
Lao, People’s Democratic Republic	2
Madagascar	4
Malawi	2

GEM Country (Alphabetically)	Total No. of Members
Maldives	3
Mali	3
Mauritania	1
Mozambique	2
Myanmar	1
Nepal	50
Nicaragua	2
Rwanda	7
Senegal	7
Sierra Leone	1
Somalia	9
Sri Lanka	14
Sudan	19
Tajikistan	4
Tanzania, United Republic of	21
Togo	1
Uganda	17
Yemen	30
Zambia	5

Number of Crossref members in GEM Program Countries

We are excited about our in-person event taking place in a few weeks in Accra, Ghana, as a direct result of the increasing participation and interest in Crossref from the region.

We can see a clear connection between outreach activities conducted by us and our Ambassadors and the increase in awareness and the number of members joining from related countries. These were Bangladesh, Nepal, Uganda, and Tanzania in 2023, and Ghana, Zambia, Sri Lanka, and Tanzania in 2024.

From our Ambassadors’ activities in the GEM countries, some recurring questions emerged highlighting barriers to joining Crossref. It’s important to recognise that many institutions struggle with funding and technical expertise. It’s no surprise that they are often concerned with the maintenance of their membership over the long term. We emphasize that GEM is a sustained measure to accommodate knowledge-sharing organisations from the regions of financial strain. Whilst the program addresses the costs of membership and content registration, our Ambassadors can assist further, offering technical support with record registration, metadata best practices, and integrating Crossref services with existing systems, including Open Journal Systems (OJS); and discuss how registering metadata improves research visibility.

We are grateful to our Ambassadors for directly supporting the GEM program within their countries through webinars and presenting in person at conferences: Shaharima Parvin and MD Jahangir in Bangladesh, Richard Bruce Lamptey in Ghana, Niranjan Koirala in Nepal, Oumy Ndiaye in Senegal, Lasith Gunawardena in Sri Lanka, and Baraka Manjale Ngussa in Tanzania.

Retraction Watch retractions now in the Crossref API

Martyn Rittman — Wed, 29 Jan 2025 00:00:00 +0000

Retractions and corrections from Retraction Watch are now available in Crossref’s REST API. Back in September 2023, we announced the acquisition of the Retraction Watch database with an ongoing shared service. Since then, they have sent us regular updates, which are publicly available as a csv file. Our aim has always been to better integrate these retractions with our existing metadata, and today we’ve met that goal.

This is the first time we have supplemented our metadata with a third-party data source. Until now, our APIs have included metadata provided by Crossref members along with outputs from our internal enrichment workflows, such as matches found for bibliographic reference matching and funders. Third party metadata has been gathered in Event Data, but this has been stored and delivered separately.

Knowing when work has been retracted is critical for assessing the integrity of research, and this enhancement of the data will be a great benefit to the community.

Where does the data come from?

Retraction Watch carefully curates retractions, pulling them from several non-Crossref sources, including PubMed and publisher websites. Each entry is manually checked and annotated before being added to the database. The high level of curation and broad coverage is what made a partnership between Crossref and Retraction Watch attractive, and our shared goal of making changes to metadata more visible.

“Our goal with the Retraction Watch Database has always been for it to be as useful to as many people as possible, and available from as many sources as possible,” says Ivan Oransky, co-founder of Retraction Watch and executive director of The Center For Scientific Integrity, its parent nonprofit organisation. “Integration with Crossref’s REST API is a huge step in that direction.”

Where can I see the retractions?

If you use a service that collects Crossref metadata, you will start to see the Retraction Watch retractions as they are picked up. To access the data directly, you can find retractions from both Crossref members and Retraction Watch in our REST API, for example with the following request for all retractions:

https://api.crossref.org/v1/works?filter=update-type:retraction

Or for an individual record:

https://api.crossref.org/v1/works/10.1177/17588359231172420

In the results here you will see an update-to field:

"update-to": [
{
"updated": {
"date-parts": [
[2023,4,22]
],
"date-time": "2023-04-22T00:00:00Z",
"timestamp": 1682121600000
},
"DOI": "10.1177/1758835920922055",
"type": "retraction",
"source": "publisher",
"label": "Retraction"
},
{
"updated": {
"date-parts": [
[2023,4,22]
],
"date-time": "2023-04-22T00:00:00Z",
"timestamp": 1682121600000
},
"DOI": 10.1177/17588359231172420",
"type": "retraction",
"source": "retraction-watch",
"label": "Retraction",
"record-id": 44124
}
]

The source field states where the retraction came from. Currently, it can have two values: publisher or retraction-watch. Note that the same retraction may be included multiple times from different sources.

Retraction Watch retractions will remain available on Gitlab in csv format and be updated on working days. The record-id refers to the entry in the csv file with further details, such as the reason for retraction.

There is full documentation available for the Crossref REST API and if you are new to REST APIs, see our learning hub to get started which includes a tutorial about accessing retractions.

What can I do with the retractions?

Like the rest of our metadata, the retractions are freely available. If you use or operate a tool that ingests retractions, the new entries will start to be picked up immediately. The Retraction Watch database includes a larger number of retractions than the Crossref database, so you should see an increase in the total.

We have heard from organisations that would like to build new research integrity tools based on this data. We look forward to seeing the benefits brought by wider availability of the Retraction Watch retractions, and how they can provide better context to research outputs.

While Crossref metadata is freely available to reuse without a license, if you make use of the Retraction Watch retraction metadata in a published work, we kindly request that you provide a citation to the source.

If you have questions or comments, please head over to the section of our forum dedicated to integrity of the scholarly record.

POSI 2.0 feedback

Ed Pentz — Tue, 28 Jan 2025 00:00:00 +0000

As a provider of foundational open scholarly infrastructure, Crossref is an adopter of the Principles of Open Scholarly Infrastructure (POSI). In December 2024 we posted our updated POSI self-assessment. POSI provides an invaluable framework for transparency, accountability, susatinability and community alignment. There are 21 other POSI adopters.

Together, we are now undertaking a public consultation on proposed revisions for a version 2.0 release of the principles, which would update the current version 1.1 of the principles, released in November 2023.

This is a crucial step in ensuring that POSI evolves to meet the needs of the community. Whether you are part of an organisation that has adopted POSI, is considering adoption, interacts with POSI-aligned groups, or you have a personal interest in open scholarly infrastructure, your perspective is invaluable.

Some additional context about POSI

POSI is not an organisation; POSI adopters are an informal group of those that have conducted self-assessments.
The POSI principles are not rules or a checklist; organisations or groups can adopt or interpret them to fit many different circumstances.
Our goal is for POSI self-assessments to be made publicly available and for interested communities to assess and monitor updates and progress.

How to Participate

If your organisation has adopted POSI, is considering adoption, interacts with POSI-aligned groups, or you have a personal interest in open scholarly infrastructure, your perspective is invaluable.

Review the Proposed POSI 2.0 Revisions.
Share your thoughts via our short survey.

Deadline: March 5, 2025

Together, we can shape the future of open scholarly infrastructure. Join the conversation and make your voice heard!

Metadata matching: beyond correctness

Dominika Tkaczyk — Wed, 08 Jan 2025 00:00:00 +0000

https://doi.org/10.13003/axeer1ee

In our previous entry, we explained that thorough evaluation is key to understanding a matching strategy’s performance. While evaluation is what allows us to assess the correctness of matching, choosing the best matching strategy is, unfortunately, not as simple as selecting the one that yields the best matches. Instead, these decisions usually depend on weighing multiple factors based on your particular circumstances. This is true not only for metadata matching, but for many technical choices that require navigating trade-offs. In this blog post, the last one in the metadata matching series, we outline a subjective set of criteria we would recommend you consider when making decisions about matching.

Openness

Matching tools come in many different shapes and sizes: web applications, APIs, command-line tools, sometimes even enchanted crystal balls showing matched identifiers emerging from a mysterious mist! No matter what form they take, an important consideration is whether the source code and all the related resources for the matching are openly available.

Matching strategies that are either closed-source, or rely on closed-source services for their matching logic, make it difficult to fully understand and explain matching processes. This lack of transparency also makes it impossible to adjust or improve the matching logic, since we cannot understand or improve code we cannot see.

Users are similarly impeded from identifying flaws or suggesting improvements to processes they are unable to examine. By blocking this community participation, we also lose the proven cycle of real-world testing, refinement, and validation that has strengthened myriad of open source projects. The cumulative impact of both minor and major community-driven refinements over time is incredibly valuable and should not be underestimated.

Using open source matching will also help build trust in the matching workflows and results. This is one reason why open source is one of the tenets of the Principles of Open Scholarly Infrastructure, adopted by Crossref, DataCite, ROR, and other organisations who build and maintain open scholarly infrastructure.

When evaluating matching strategies, we strongly recommend prioritizing those that are fully open source. This not only ensures their transparency and trustworthiness, but also allows for the kind of continuous improvement that results from this visibility and community engagement.

Explainability

In terms of our ability to understand and improve a matching strategy, using an open source model is only the first step. What typically matters most in the context of building and maintaining matching services is that we are able to understand their underlying code and have a clear model of how matches are derived from their corresponding inputs. Even if the matching code itself and all of the resources used in the matching are open, if they are poorly documented, lack reproducibility or tests, or are otherwise opaque, there is no guarantee that it will be possible to understand or improve the strategy. Striving for a high level of interpretability in our matching plays a determinative role in how well we can understand and modify our strategies in the future.

Being able to explain the behaviour of the matching will also help you to respond to and incorporate user feedback. When users encounter errors, you will be able to do things like advise them on how to modify or clean their inputs so that the results are better. Conversely, examining the behaviour of the strategy relative to user inputs and feedback can provide you with ideas for improving the matching.

Typically, heuristic-based strategies, such as those that use forms of search or string similarity measures, like edit distance, are easier to explain than, say, machine learning models. If a strategy uses machine learning, at least some internal decisions might be made by passing data through a complex network of algebraic equations. Those can be mysterious, non-deterministic, and are famous for being hard to interpret. This doesn’t mean they should be avoided entirely - we have built and use many machine-learning based tools ourselves! Instead, it is a good idea to weigh how their inherent lack of explainability could affect your ability to continue work on the strategy and respond to user needs, relative to all the available options.

Complexity

Complexity is another aspect that can greatly affect how easy it is to maintain the strategy. Complexity is related to how many different components the strategy has and how difficult they are to use and maintain. When a strategy has multiple interconnected parts, each component becomes a potential failure point that requires discrete assessment and maintenance.

Consider, for example, two different approaches to a matching strategy: one that uses a single machine learning model versus another that uses an ensemble of models. A single model requires maintaining one set of training data, a single training pipeline, and one deployment process. If the model’s performance unexpectedly deteriorates, whether because of an issue with the training data, a configuration error, or the need for additional input sanitization, the source of the problem is easier to isolate and fix.

The ensemble, by contrast, combines multiple, specialized models, each requiring its own training data, tests, updates, and deployments. If one model in the ensemble is found to reduce the performance of the strategy, the interdependence between models can cause this degradation to cascade through the entire system and undermine its overall reliability. Correcting for these errors becomes more challenging. If fixing one model’s performance requires retraining or adjusting its outputs, this could require recalibrating the entire ensemble to maintain the balance between models, identify regressions, and prevent new errors from emerging.

In general, preferring simpler strategies not only reduces operational overhead, but also makes it easier to diagnose issues, test changes, and iterate on user feedback. When problems arise, having fewer moving parts means less places to look for the root cause and fewer components that could be affected by any fixes.

Flexibility

The metadata to which we match grows and changes over time. New records are created, existing ones are updated, with schemas changing and evolving alongside. The resources that underlie our matching are also not static. The libraries we depend on may deprecate features between versions or the taxonomies we used to categorize results might undergo significant revisions. We thus rarely have the luxury of deploying a matching strategy once and using it forever without any changes. A good strategy has to be flexible enough to adapt to such changes, with this adaptation also being both technically feasible and practical to implement.

Much of this flexibility is also determined by a matching strategy’s ability to incorporate new data. Strategies that use continuously updated databases or indices can immediately match against new metadata as it appears in the system. By contrast, some machine learning-based approaches require training on target matches and can thus be limited in flexibility and face more constraints. While some models can be incrementally updated to recognize new matches, others require retraining from scratch to incorporate these changes - a process that can be both time-consuming and resource-intensive.

Paying close attention to a strategy’s flexibility and favoring this aspect, when possible, can significantly impact its long-term viability. When comparing different matching strategies, flexibility should thus be a primary concern in your decision-making process.

Resources

Matching strategies can vary significantly in their resource requirements, including things like CPU and GPU utilization, memory consumption, storage capacity, and network bandwidth. These requirements are directly related to infrastructure costs and energy consumption, so when evaluating a matching strategy, it is necessary to assess its resource demands across all phases of the matching lifecycle. This includes things like initial model training, re-training, index construction, updates and management for all aspects of the strategy, as well as the real-world processing of matching requests. It is a good idea to measure and monitor resource usage carefully in considering which strategies to use, as the best performing strategy may also be too resource intensive to run as a service or might grow to this state over time with additional utilization.

Speed

Matching strategies can operate at a wide range of speeds, from milliseconds to minutes per match. Since the overall response time of a strategy can affect both system scalability and user experience, we should always assess the strategy’s performance for different usage scenarios and scales of data. While some strategies might perform adequately with small datasets, they can also exhibit exponential slowdowns as data volume and complexity increases or as concurrent requests grow in number. We should therefore consider carefully how requirements for matching speed might evolve with increased usage, data complexity, and total anticipated growth. The fastest matching strategy might not always be the best choice if it comes at the cost of reduced accuracy or requires large amounts of resources, but unacceptable latency can make an otherwise excellent strategy unusable in practice for many use cases.

Putting it all together

The typical life cycle of developing a metadata matching strategy is as follows:

Scoping: we define the matching task, along with its inputs and outputs.
Research: we research what existing strategies are available for our task and/or we develop our own.
Evaluation: we evaluate all available strategies, internally or externally-developed, exploring all of the aspects described above.
Decision: we choose which strategy (if any) we want to use in our production system.
Production setup: we prepare the production models, indexes, and other resources needed for the matching.
Maintenance: we monitor and adapt the strategy relative to changing data, user feedback, and new resource requirements.

In practice, these phases do not happen all at once, nor in this strict order. Often we need to proceed through multiple iterations of them to arrive at the best strategy. For example, if initial evaluation of a strategy yields poor results, we might return to the research phase to investigate other strategies or refine our understanding of the task. Often, during the maintenance phase, we receive feedback from users that indicates potential areas of improvement and then pursue them with a new round of research and evaluation.

As we cycle through these phases, ideally all the aspects described in this entry, along with the results of the evaluation, would be taken into account. Of course, this means that these decisions have to be based on multiple criteria and by making trade-offs between their performance and all other considerations. In making these complex and difficult choices, it is useful to consider two primary questions:

Are any of the considered matching strategies good enough for our use case?
Out of all the considered strategies that are sufficient for our use case, which would be the best?

The first question requires us to create clear and quantifiable criteria that allow for eliminating some of the potential strategies. As we have indicated, these could include things like the strategy being open source, minimum performance baselines using measures like precision or recall, and operational thresholds, like the strategy being able to return results quickly, relative to user expectations or the volume of data to be processed. It should be fairly easy to test these requirements and eliminate any strategies that fall short of them. If the strategies are difficult to assess, that is likely a mark against them.

If no strategies meet these criteria, we have two options: either to abandon matching entirely or to reassess and relax our criteria to align with the available options. While the former is always an option, adopting a more pragmatic lens, framing in terms of potential value (or harm) to the users, might be beneficial. Sometimes we approach matching tasks with too high expectations and a dose of realism helps us to re-center our perspectives. After more consideration, you might decide that your criteria were too stringent or realize that you need to better define and decompose the tasks to fit the available options.

When multiple strategies appear viable, the selection process becomes more nuanced. When evaluating strategies across these various dimensions, we should try to avoid placing undue weight on minor performance differences. Evaluation metrics are useful estimates of performance, but do not always translate to real-world applications and changing data. In cases where a more complex strategy offers only marginal improvements over a simpler alternative, the maintenance and operational benefits of the simpler solution often outweigh small performance gains.

This concludes our series on metadata matching, where we described the conceptual, product, and technical aspects of matching and its applications. We hope this overview was instructive and helps you to make better decisions about the use of matching in your own tools and services!

A progress update and a renewed commitment to community

Ginny Hendricks — Thu, 12 Dec 2024 00:00:00 +0000

Looking back over 2024, we wanted to reflect on where we are in meeting our goals, and report on the progress and plans that affect you - our community of 21,000 organisational members as well as the vast number of research initiatives and scientific bodies that rely on Crossref metadata.

In this post, we will give an update on our roadmap, including what is completed, underway, and up next, and a bit about what’s paused and why. We’ll describe how we have been making resourcing and prioritisation decisions, including a revised management structure, and introduce new cross-functional program groups to collectively take the work forward more effectively.

It’s important to acknowledge that Crossref has evolved significantly from just five years ago - our member count has more than doubled from 10,000 to 21,000 organisations since 2019 and they include all kinds of organisations such as funders, universities, government bodies, NGOs, and of course scholar- and library-led publishers. The smaller organisations now collectively contribute the majority of Crossref funding. We’ve gone from 100 million records to 160 million in five years, and our metadata is retrieved more than 2 billion times monthly, quadrupling what it was five years ago.

It’s within this context that we’ve spent quite a lot of time thinking about scalability, how we collect and process feedback and contributions from many organisations, how to automate our operations, and refining the plans for the next few years.

Our strategic agenda remains the same

A few times a year we update the strategy page where there is a quadrant of projects showing what’s completed, in progress, up next, and in planning/ideas - for each strategic theme. We also link from there to our live public roadmap which shows more specifics about individual projects, including projected timelines, and is updated more frequently.

If you’ve been watching the strategy page, checking in on the public roadmap or this blog, or joining webinars and annual meetings, you’ll know that we’ve had some longstanding plans to—among other things—reduce technical debt, rebuild our metadata management system, move to the cloud, modernise our schema, support multiple languages, and partner with multiple data sources to build the Research Nexus.

You’ve heard us talk about these initiatives a lot, but you’ve not seen particularly swift action.

Moving the work forward more effectively

Earlier this year, it became clear that our almost three-year project to build a new relationships API had not worked out. The project, dubbed ‘manifold’, was to initially deliver data citations, and eventually replace our central metadata system, but what was prototyped didn’t scale, even with a subset of our metadata. We weren’t confident enough about the project’s timeline or costs to justifiably continue investing further time and resources.

Meanwhile, we’d barely scratched the surface of our aim to pay down technical and operational debt, and we’d also been neglecting to keep the live system up to date with the numerous metadata changes that have been queued up, waiting to be implemented.

We knew the manifold project was ambitious – our system has grown in complexity over the years. We were trying to rebuild the car while driving it (our system needed to continue to operate and be maintained by our team) while trying to design a new approach to manage the many relationships between 160+ million database records. In the years we worked on this project, we learned a lot that will inform future plans for a large system redesign.

In March this year, we decided to pause the manifold project. We apologised to our community partners for not delivering the promised data<->literature matches they hoped to use. They were frustrated but thankfully understanding.

We then resolved to focus on backend infrastructural changes, conduct cross-training so that all of our staff would become familiar with current in-use systems instead of greenfield tech (for now), and start to make a dent in the backlog of bugs and long-promised schema updates in our mainstream services.

We’re happy to report some movement on these things and some milestones that have been achieved in these areas in recent months.

Fostering a happy and dedicated team

Any kind of work can only happen when our staff are in a good place, feeling supported and comfortable to question things, and well-equipped with information, purpose, and clear priorities. In June, when the whole staff met up in person, we had some really good conversations about culture, communication, and about sharing responsibilities. Some people ran birds-of-a-feather sessions to explore the issues that had been keeping them up at night, such as authentication/security, and rebuilding the Crossref System (CS), and the team also co-created a set of prioritisation drivers that are now in use within our roadmap and planning processes.

Taking on feedback from the all-staff meeting and then the July board meeting, we thought strategically about the organisational structure Crossref would need over the next few years to reflect the growth in scope and size, and fulfil its longer term goals. We have long had an ambitious agenda but realised we didn’t yet have the capacity to do it all. So we came to the conclusion that we needed an updated team and management structure to take us through the next phase of our development.

The structural changes were concluded at the end of November. They included:

Moving Technology under Operations, since Technology—though a vital enabler—still works in service to our mission and in support of our community, just like other operational things like board governance and finance.
Reframing product development as Programs and Services, and reducing our workstreams from five product portfolios to three programs. We formed cross-team steering groups around clearly articulated program areas (more on those below).
Broadening the leadership to include an Executive team and an extended Director team, and forming a Senior Management Team (SMT). These changes ensure that the collective responsibility for Crossref now rests on a wider group of experts who can back each other up and share the risk and the knowledge, rather than on just a few individuals.
We started recruiting for directors for two new leadership positions. We’ll welcome a new Director of Programs and Services and a new Director of Technology in the new year.
Evolving the strategic initiatives team into a data science team, integrating research & development functions throughout all teams and with the SMT taking collective responsibility for strategic initiatives.

Unfortunately, with the shift in approach for product development and by sharing responsibility for strategic initiatives and research among the wider team, we made the difficult decision that four positions would no longer work within the new structure.

A new approach: joined-up initiatives and cross-functional programs

Research has always been an important role for Crossref, but as this function had been annexed from our regular work, it became hard to coordinate strategic initiatives across the wider organisation. In recent years we inadvertently created more technical debt for ourselves, i.e., built multiple prototype tools without plans for adoption or moving them into production. Strategic initiatives, by their nature, need thorough research and high-level alignment, so we made such initiatives—things like Resourcing Crossref for Future Sustainability (RCFS) and improving the Integrity of the Scholarly record (ISR)—the responsibility of the whole senior management team.

Some useful research had been conducted, but we were never in a position to act on any of it. Particularly promising work has been in the field of metadata matching, and with the growth in the community reliance on our metadata, and attention on data quality rightly increasing, we decided to create a new data science team to be dedicated to this work, led by Dominika Tkaczyk.

We had also struggled with a traditional product management approach since all our tools and activities are interconnected, and we found we were trying to do too many things at once but not all of them very effectively. We also acknowledged that product management comes from the commercial e.g. retail world and therefore is designed to help companies sell/upsell, which is not our goal. So we looked to other approaches more suitable to mission-based nonprofits.

Introducing three programs

We have introduced cross-functional program management in order to work towards the following:

better cross-team alignment
shared responsibility
improve communication and learning
make more progress on the things members need.

Supporting the strategic theme of co-creation, a new program, facilitated by Program Lead Lena Stoll, now manages and oversees all activities around co-creation and community trends. A cross-team steering group just began meeting regularly and will be responsible for interfaces such as reports/dashboards, record registration interfaces, connections and collaborations such as Open Funder Registry, ROR, ORCID auto-update, as well as OJS and other partner integrations. This program also includes the Crossref website and any front-end things to support other programs. And it includes ISR (the integrity of the scholarly record) and our tools in this area such as Crossmark and retraction/correction tooling, and Similarity Check for text comparisons.

Supporting the strategic theme of complete and global metadata and relationships, a new program, facilitated by Program Lead Martyn Rittman, now manages and oversees all activities relating to contributing to the Research Nexus. Working particularly closely with the metadata team, led by Patricia Feeney, this program addresses how metadata is modelled, used, enriched, and extended. Work includes our APIs, incorporating external data sources like Retraction Watch and Event Data, building out metadata matching services with the new data science team, supporting the community of metadata users with API sprints and more modern options for retrieving metadata based on usage and need.

Supporting the strategic theme of open and sustainable operations and keeping to the POSI framework, a new program, facilitated by Program Lead Sara Bowman, now manages and oversees all activities relating to making our operations more open, transparent, and sustainable. This program focuses on supporting and strengthening the core functions our members rely on and enabling future growth. It includes metadata deposit and processing, most apps for e.g. managing titles, authentication, and architectural and infrastructural projects like moving from the data centre to the AWS cloud service. This program also includes modernising our operations in general, which is not just technology but also finance and human resources, so projects like membership process automation, fee modelling and financial analyses, and business system integrations.

The Programs will start to be reflected across our website and in our communications from next year.

What are Crossref’s new prioritisation drivers?

These are the drivers that our ~40 staff co-created in June that are guiding decisions about the priorities on our roadmap. New ideas will be evaluated in the following areas:

Encourage participation from new or under-represented communities
Respond to and lead trends in scholarly communications
Benefit the greatest number of members and users
Reflect on how the community works with each other and allow members to self-serve
Expand to support and connect relevant resource types and metadata fields
Make it easier to create and update metadata
Enhance metadata for completeness and accuracy
Make it easier to retrieve and use metadata
Automate repetitive/manual tasks
Address technical and operational debt
Maintain critical systems and operations and ensure their security
Control or reduce costs - to Crossref, our community, or the environment

We’re happy to report that the changes made this year have resulted in a productive last few months of the year. As reported in our annual meeting, here is the progress update.

What’s paused

A relationships API endpoint and, therefore, a specific data citation feed
Manifold, the three-year effort to modernise our tech stack
Most of the strategic initiatives prototypes that can’t yet be scaled, such as Labs API and Labs reports

What’s recently completed

We succeeded in moving the entire Crossref corpus to an open-source database, PostgreSQL
Fixed numerous REST API data quality issues and lots of troublesome bugs
Schema development - support for ROR as a Funder identifier is live and currently in testing
We automated some very manual membership and billing processes, saving hundreds of staff hours a year
Released a new form for journal article record registration, building on the grant registration form
Upgraded Participation Reports to include Affiliations and ROR IDs
Launched a new API Learning Hub

Since the rest of the community stops for no Crossref product roadmap issue, we also progressed a number of community and governance initiatives:

The Grant Linking System (GLS) reached 5 years with over 40 funders joining Crossref and registering over 130,000 grants and awards, including use of facilities and projects
Our research for Resourcing Crossref for Future Sustainability (RCFS) with the Membership & Fees Committee is going well, and we’ll have new fee proposals for review in 2025
The integrity of the Scholarly Record (ISR) conversations have deepened, and we’ve formed strong relationships with editorial experts and research integrity sleuths, who are getting up to speed on our metadata, and we’re working with some sleuthing consultants to change our processes to handle deceptive member behaviour such as paper mills, cloned journals, and citation manipulation. The new data science team plays a role here, along with membership and governance.

What’s currently in focus

In our efforts to do less but do it more effectively, we have two current priorities:

Get out of the physical data centre and into the cloud.
Develop Schema 5.4.

These two projects are underway, involving lots of communication and learning. Since we haven’t released any schema updates in many years, all our staff are learning for the first time how a metadata schema model is interpreted in a systemic way, learning about the structure of research objects, and honing the process as they go. We’ve high hopes we’ll be in a position to release continuous metadata schema versions and catch up on the backlog over the coming years.

What’s next

Continuous metadata development, with contributor roles up next
Retraction Watch data integrated into the REST API so users have a single source of retraction/correction data
Upgraded preprint matching and notifications
Modelling more equitable fees through the RCFS projects
Piloting a non-voting membership category

Once we’re fully in the cloud and in the groove of metadata updates, and with the support of newly-hired technology and program directors joining in the new year, we’ll turn our attention to rebuilding the central metadata system that we call the Crossref System, or “CS” and report more on this next year.

So that was our summary of 2024 and an indication of what’s coming in 2025 and beyond; sorry it’s so long, and thanks for reading this far! Next year we’ll get back to more regular updates as the strategic agenda and the programs progress.

A summary of our Annual Meeting

Rosa Morais Clark — Mon, 09 Dec 2024 00:00:00 +0000

The Crossref2024 annual meeting gathered our community for a packed agenda of updates, demos, and lively discussions on advancing our shared goals. The day was filled with insights and energy, from practical demos of Crossref’s latest API features to community reflections on the Research Nexus initiative and the Board elections.

Our Board elections are always the focal point of the Annual Meeting. We want to start reflecting on the day by congratulating our newly elected board members: Katharina Rieck from Austrian Science Fund (FWF), Lisa Schiff from California Digital Library, Aaron Wood from American Psychological Association, and Amanda Ward from Taylor and Francis, who will officially join (and re-join) in January 2025. Their diverse expertise and perspectives will undoubtedly bring fresh insights to Crossref’s ongoing mission.

The meeting started with a recap of our mission and priorities. Ed Pentz reiterated the Research Nexus vision of increasing transparency of the connections that make up the scholarly record and underpin the research ecosystem.

Crossref is dedicated to openness, community ownership, and a stable, accessible infrastructure that researchers, publishers, funders, and institutions can rely on for the long term. This is demonstrated by Crossref’s commitment to the the Principles of Open Scholarly Infrastructure (POSI), which constitute commitments to building a resilient and transparent infrastructure for research—sustainability, community governance, and openness. Ed emphasized how Crossref is aligning with these principles and collaborates with other adopters to reflect and continuously align these with the needs of the scholarly community, with a public consultation on proposed revisions to POSI forthcoming next year.

Ginny Hendricks highlighted key membership and metadata trends. She noted that as of 2024, half of Crossref members are based in Asia. This year, as always in recent years, we saw many new organisations from Indonesia, Turkey, India, and Brazil join us. Removing those fast-growing countries for the chart’s clarity, we can see that some of the next most active countries are Pakistan, Mexico, Spain, Bangladesh, and Ecuador, among others.

There are now ~163 million open metadata records with Crossref DOIs, and Ginny pointed out increases in the registration of preprints, peer-review reports, and grants. In terms of metadata elements, it’s good to see that more publishers recognize the importance of including abstracts and ROR IDs in their metadata records. Also, in line with the community’s concerns about integrity, our members have been enriching their records with direct assertions of retractions.

Then, Ginny went on to report on the progress towards our strategic goals:

Contribute to an environment where the community identifies and co-creates solutions for broad benefit
A sustainable source of complete, open, and global scholarly metadata and relationships
Manage Crossref openly and sustainably, modernizing and making transparent all operations so that we are accountable to the communities that govern us.
Foster a strong team because reliable infrastructure needs committed people who contribute to and realize the vision and thrive in doing it.

Demos

Lena Stoll and Patrick Vale’s session gave members a practical preview of our latest tools.

Patrick started by reflecting on the challenge of making our identifiers useful for people using screen readers (and other assistive technologies). He thanked all who responded to our past consultation on the topic and presented the Crossref DOI Accessibility Enhancer – the browser plug-in initially available for Firefox (and soon also for Chrome). He shared the Gitlab repo for anyone interested in trying it and invited feedback as we’re hoping to iterate on this.

Patrick then went on to talk about our openness to community contributions to Crossref tools, with an example of the recent contribution from CWTS Leiden to our Participation Reports. Thanks to their work, our members can now see the proportion of works they’ve registered that include affiliation information and ROR IDs, alongside the previously available key metadata such as references, abstracts, ORCID iDs, funding information, or Crossmark.

Finally, Lena demonstrated the latest extension of our record management tool that’s just been made available to make manual registration of metadata records for journal articles easier. The new form is flexible and driven by our metadata schema. Importantly for our members, it simplifies the workflow with input validations and automated ISSN matching, and it enables members to register author affiliations with an integrated ROR look-up. We hope this will support our smaller members, who are relying on our helper tools to register their content.

Throughout the session, members were encouraged to use these tools and explore new resources available through Crossref. We believe that by taking advantage of these resources, you can enhance your research and publishing experience, and contribute to the growth and development of the scholarly community.

The discussion about open scholarly infrastructure

The panel on open scholarly infrastructure brought together experts with a wide range of experience in the field. Moderated by Lucy Ofiesh, Crossref’s Chief Operating Officer, the discussion featured six invited speakers who shared their insights on the opportunities and challenges facing the scholarly ecosystem: Ed Pentz, Crossref; Sarah Lippincott, Dryad; Amélie Church, Sorbonne University; Joanna Ball, DOAJ; Ann Li, Airiti; and Richard Bruce Lamptey, Kwame Nkrumah University of Science and Technology.

The panel talked about what openness in scholarly infrastructure means, why it’s important, its sustainability, and how to tackle challenges and gaps across the ecosystem. They highlighted frameworks like the Principles of Open Scholarly Infrastructure (POSI), the Barcelona Declaration, and the FOREST Framework as key tools for guiding work on governance, sustainability, and equity. The discussion highlighted the need for more collaboration, inclusivity, and practical ways to ensure open infrastructure remains sustainable in the long run.

They also stressed how openness supports research integrity. How transparent systems allow researchers to question methods, verify findings, and preserve data. Amelie Church expanded on this point, underscoring the important role of open infrastructure in addressing challenges to integrity. She explained that such transparency enables the scholarly community to scrutinize research processes, ensuring the quality of outputs and their impact on society. Without openness, researchers face barriers to maintaining trust in their work, making open infrastructure necessary for research integrity and public confidence in science.

“By focusing on accessibility, transparency, and community engagement, open infrastructure can reshape academic and research ecosystems in transformative ways.” ~Richard Bruce Lamptey

Regarding sustainability, Sarah Lippincott stressed the importance of aligning funding models with community needs while addressing governance challenges. She pointed out that while initial funding can launch infrastructure, long-term sustainability requires consistent community investment and robust governance frameworks. This balance, she explained, is essential to ensure equity and transparency.

Collaboration was another important topic. Joanna Ball and Sarah Lippincott shared examples of how pooling expertise and resources—such as in the global support for ROR—can strengthen systems and make them more sustainable. These initiatives show the power of collective efforts in addressing technical and resource barriers. However, inclusivity remains an ongoing challenge.

The panel discussed the ways in which language barriers, resource limitations, and reliance on proprietary systems continue to exclude researchers from underrepresented regions. Ann Li highlighted how addressing these disparities is critical to ensuring the global accessibility of open infrastructure. By fostering inclusive practices, the scholarly community can mitigate biases and build tools that reflect a broader range of research contributions.

”My hope is that open infrastructure can have the resources that it needs to thrive, not just merely survive, and also that open infrastructure communities and organisations look to the value of frameworks that we’ve talked about today to help align themselves and improve their policies and practices, because there’s always room for growth, even in the best, most well-intentioned communities.” ~Sarah Lippincott, Dryad

The panel wrapped up the discussion by expressing optimism for the future of open scholarly infrastructure and emphasized the importance of continued investment, collaboration across organisations, and transparency in operations. The discussion reinforced the idea that open infrastructure provides a strong foundation for research that is equitable, sustainable, and accessible to all.

Updates from our Community

We enjoyed talks from our community about increasing their participation in the Research Nexus by adopting, using and enhancing metadata in different ways. Robbykha Rosalien hosted talks from the EuropePMC, Dutch Research Council, eLife, and CSIRO featured in Session I, and Amanda French hosted CLOCKSS, Sciety, and Redalyc in Session II.

Michael Parkin talked about preprints in Europe PMC. Europe PMC is a database for life science literature and a platform for content-based innovation. They started indexing preprints via Crossref REST API in 2018. Michael presented their work on discoverability of preprints in their database, including reflections on early challenges, as well as the latest efforts in surfacing available community reviews.

Hans de Jonge talked about the Dutch Research Council’s (NWO) dedication to open science, with policies ensuring that publications and data funded by NWO are openly available. They embrace open science principles for their own metadata and is a signatory of the Barcelona Declaration on Open Research Information. Hans focused on NWO’s recent introduction of Grant IDs through Crossref’s Grant Linking System (GLS). He shared their approach, the motivations behind introducing Grant IDs, and some challenges they faced.

Frederick Atherden explained how eLife, a nonprofit led by scientists, use Crossref’s Grant Linking System to include grant DOIs in their publication metadata. It allows authors to add grant DOIs during submission, and they developed a tool to match grant numbers with DOIs during the proofing process to improve accuracy. Their goal is to follow best practices for metadata, making content easier to find and link to.

Brietta Pike covered how CSIRO is working to improve metadata quality for its journals, making research more discoverable and trustworthy. CSIRO faced challenges like inconsistent XML tagging, outdated systems, and data loss. To address these, they formed a project team, created a clear XML stylesheet, and updated their workflows. Recent progress includes better funding data, clearer license information, and more complete affiliation tagging. These efforts aim to support a more transparent and accessible research environment.

Alicia Wise of CLOCKSS talked about recent collaborations seeking to safeguard our cultural and scholarly heritage over the long term. CLOCKSS, a community-run archive, is dedicated to preserving scholarly content to remain accessible and unchanged for future generations. True preservation requires securely storing content in trusted archives that are actively maintained. A group of librarians and publishers developed a guide to help publishers preserve content, they also established an archival standard for EPUB formats to ensure ebooks can be stored effectively, and launched a pilot project to track preserved books, helping libraries and scholars identify safely stored titles.

Mark Williams from Sciety talked about how Sciety uses Crossref metadata to create detailed preprint histories. By partnering with organisations and communities worldwide, Sciety platform gathers public reviews, highlights, and recommendations on preprinted research, helping researchers evaluate the quality and relevance of new studies. Through linking related preprints and journal articles, Sciety builds a connected view of each research work. Although challenges like inconsistent terminology and identifier gaps persist, these efforts enhance the visibility and credibility of preprints.

Arianna Becerril-García of AmeliCA/Redalyc shared insights on diamond open-access journals in Latin America. Redalyc is an open-access infrastructure that supports journals by providing free services like visibility and production tools. Redalyc has a role in sustaining Latin America’s unique approach to open-access publishing, where most journals are backed by academic institutions and public funds, allowing free access for both readers and authors. Arianna stressed the need to treat these journals as digital public goods and urged the communities they serve to help ensure their long-term sustainability. Despite limited resources and global under-recognition, these journals serve an international research audience, including authors from Europe, Africa, and Asia. Redalyc and other open infrastructures play a key role by offering tools that reduce production co-sts and improve discoverability, all without financial barriers. Noted was how this approach aligns with UNESCO’s open science framework, which promotes inclusivity and addresses long-standing inequalities in scholarly publishing.

Afternoon of more resources and updates from Crossref

After a mid-day break (in Europe), Luis Montilla kicked off the second session with a practical tutorial of Crossref’s REST API. Following his last year’s intro to the Crossref API, this time he offered a step-by-step guide to help attendees maximize the API’s capabilities for metadata retrieval with advice on:

Managing large data requests with pagination and iterations
Incorporating safety mechanisms - to avoid hitting rate limits, Luis recommended adding pauses between requests and sharing example scripts to streamline this.

For those interested in learning more, look at the new Crossref API Learning Hub— a new resource offering guides, scripts, and training materials to simplify complex queries. Please share questions about things you’re not sure about in our community forum, to help guide development of future demos.

Patricia Feeney followed with updates on metadata schema changes. She introduced our recent shift to integrate the Funder Registry with ROR, which allows members to use a single identifier system, simplifying data management by reducing redundancy. Patricia explained that, for now, the current identifiers remain valid, so members won’t need to make immediate changes. She also outlined planned support for version metadata, typed citations, and future plans to expand support for contributor role vocabularies, and invited community participation in a planned multilingual metadata working group.

Next, Kora Korzec offered an update on the progress in our research on Resourcing Crossref for Future Sustainability and opened up a discussion about the best ways of assessing our members’ size and ability to pay. In light of our ambition to streamline discounts, we also invited suggestions for discounts to support accessibility and fuller participation in the Research Nexus.

As part of the discussion, we’ve learned who was in attendance during the session:

We’ve heard a lot of support for our current GEM program. While it was clear from our poll that publishing revenue is not the most relevant measure of size or capacity for all those present – establishing a good alternative proved challenging. The idea of considering the size of the organisation as its largest entity has been discussed, and important points were raised about budgets in different types of distributed organisations (e.g., on the position of libraries within large universities).

The official Annual Meeting part commenced after the discussion, with a report on the State of Crossref from Lucy Ofiesh, and commenced with our Board election. Lucy highlighted some of the key accomplishments of the year so far, including:

Research for Resourcing Crossref for Future Sustainability (RCFS)
Integrity of the Scholarly Record (ISR)
Grant Linking System (GLS) reached 5 years
Automated some very manual membership processes
Released new form for journal article record registration
Upgraded Participation Reports to include Affiliations and ROR IDs
Launched a new API Learning Hub
Paused further development of a Relationships API
Migrated to a new open-source database
Schema development - ROR as Funder identifiers
REST API bug fixes and metadata consistency fixes.

Then she reflected on the membership growth––Crossref is now made up of 21,000 organisations from 160 countries. We reviewed our 2024 year-end financial forecast. As we’re bouncing back from COVID-19, our travel expenses have grown this year, and so have the fees for cloud services hosting. These are all as planned and happen in the context of healthy growth, including that from adoption and increased usage of paid services. We’re in a healthy financial position as membership revenue and usage fees, like content registration and Similarity Check document checking fees, continue to grow from the previous year.

Thank you to everyone who joined us for Crossref2024. This year’s meeting showcased our collective dedication to advancing open, accessible research infrastructure and underscored the power of collaboration in building a stronger scholarly community. As we reflect on the rich discussions and insights shared during the event, it’s clear our community is committed to advancing open and sustainable scholarly infrastructure.

Looking ahead, we’ll continue collaborating with members and partners to tackle challenges, expand accessibility, and foster collaboration. A key focus will be enhancing tools and metadata standards to serve the community better. Through innovative solutions and strategic initiatives like the Research Nexus, our collective efforts will make research more connected and accessible for all.

For anyone who couldn’t attend live, recordings are now available on our website. We’re excited to see how the ideas exchanged during this meeting spark progress across the scholarly ecosystem in the coming months.

2024 POSI audit

Lucy Ofiesh — Sat, 07 Dec 2024 00:00:00 +0000

Background

The Principles of Open Scholarly Infrastructure (POSI) provides a set of guidelines for operating open infrastructure in service to the scholarly community. It sets out 16 points to ensure that the infrastructure on which the scholarly and research communities rely is openly governed, sustainable, and replicable. Each POSI adopter regularly reviews progress, conducts periodic audits, and self-reports how they’re working towards each of the principles.

In 2020, Crossref’s board voted to adopt the Principles of Open Scholarly Infrastructure, and we completed our first self-audit. We published our next review in 2022.

The POSI adopters have continued to review the principles, reflecting on the effects of adopting them and providing a revision to the principles in late 2023. We use the revised principles for this latest review.

Key

We use a traffic light system to indicate where we believe we stand against each of the 16 principles. Now with up/down arrows to show any significant movement, and an ‘i’ where there is something of note with narrative.

red indicates we are not fulfilling the principle.
yellow indicates we are making progress towards meeting the principle.
green indicates we are fulfilling the principle.
or means this is a new change, where we’ve moved ‘up’ the traffic lights, in comparison to the previous audit. We would use the same if ‘down’ ever happens too.
or means that something has changed of note and in comparison to the previous audit.

GOVERNANCE

Coverage across the scholarly enterprise
Stakeholder governed
Non-discriminatory participation or membership
Transparent governance
Cannot lobby
Living will
Formal incentives to fulfil mission & wind-down

What’s changed with governance

Stakeholder governed

We’ve been yellow and we’re still yellow, but it has been improving. In the past, we’ve reported that we are working towards this but we’re not there yet because we didn’t have representation on the board from certain types of members, specifically research funders and research institutions. In the incoming 2025 board class, we have both. Six out of our 16 board seats are held by universities, university presses, or libraries. We also look forward to adding a new research funder, the Austrian Science Fund (FWF), to the board in January.

None of this, though, is hardcoded into the structure of the board. We extend an open call for board interest; any active member can apply for consideration. The Nominating Committee prepares a slate with a diverse range of candidates and organisations, and it is then up to the membership to elect board members.

With only 16 board seats and >21,000 members in 160 countries, being fully stakeholder-governed is challenging. Further, there are important contributors to the community that we all rely on who are not eligible for board seats because they are not members, as defined in our by-laws, such as sponsors, service providers, and metadata users.

We don’t consider this principle fulfilled, and that’s a good thing to keep note of; we must keep aspiring to have a broader, more comprehensive representation of our evolving community. The board continues to discuss stakeholder representation.

SUSTAINABILITY

Time-limited funds are used only for time-limited activities
Goal to generate surplus
Goal to create financial reserves
Mission-consistent revenue generation
Revenue based on services, not data

What’s changed with sustainability

Goal to create financial reserves

This was yellow and is now green. In 2023, we met our goal of maintaining a contingency fund of 12 months of operating costs. We also topped up this fund in 2024 to keep pace with our growing operating expenses. The revisions for POSI 1.1 actually removed the specificity of a 12-month timeline, allowing each adopting organisation to set its own goal; in Crossref’s case, 12 months remains appropriate.

INSURANCE

Open source
Open data (within constraints of privacy laws)
Available data (within constraints of privacy laws)
Patent non-assertion

What’s changed with insurance

Open source

This was yellow and still is, but we’re making improvements. In September of this year we migrated our database off of a closed-source solution and onto PostgreSQL. This has improved the performance of the system and is an important step towards paying down technical debt and moving the system fully into the cloud.

Patent non-assertion

This was yellow and is now green. We confirm that we do not hold any patents, and we have a published policy on it that is available for inspection and reuse by anyone in the community.

In summary

These are the main changes of note for our 2024 POSI update. The summary is that we’ve maintained all our greens, and of the four principles that were yellow last time, two have moved to green (financial reserves; patent non-assertion) and two have remained yellow but seen some progress of note (stakeholder governed; open source).

Please let us have any comments or questions; by commenting here it will add a public record of the discussion on our community forum. Here is an image to share, if needed.

We continue to learn from the POSI adopters group—now numbering 23 organisations—and the group will soon share a draft of POSI v2 for community comment. We look forward to the ongoing discussions with this group, and others, to keep improving and holding ourselves to account.

Summary of the environmental impact of Crossref

Ed Pentz — Thu, 05 Dec 2024 00:00:00 +0000

In June 2022, we wrote a blog post “Rethinking staff travel, meetings, and events” outlining our new approach to staff travel, meetings, and events with the goal of not going back to ‘normal’ after the pandemic. We took into account three key areas:

The environment and climate change
Inclusion
Work/life balance

We are aware that many of our members are also interested in minimizing their impacts on the environment, and we are overdue for an update on meeting our own commitments, so here goes our summary for the year 2023!

To be honest, the picture is mixed. On the positive side, we are traveling less and differently compared with 2019. Most of our events have been online, with some regional in-person ones, reducing our carbon footprint and increasing inclusivity with more people attending Crossref events. On the negative side, it hasn’t been easy to collect the data and figure out the best tools for calculating emissions, and we certainly haven’t captured all of our carbon emissions. Our approach has been to not let the perfect be the enemy of the good and we’ve focused on our largest source of carbon emissions - air travel.

Some of the positive things:

We have maintained our strategic approach to consider environmental, inclusion, and work/life balance issues when we plan travel and to make the most of in-person events by focusing on those that involve interaction, such as listening and learning from our members and users, deepening relationships, co-creating, and forming new alliances
Crossref Annual Meetings and community updates have been online and in different time zones.
Crossref board meetings have been reduced from three in-person meetings per year to one face-to-face and two online meetings per year.
We had an optional all-staff in-person meeting in June 2023 (and this year too).
For the in-person board and staff meetings, we have selected locations that minimize the overall amount of travel and maximize direct flights.
We have maintained our country focus for in-person local meetings supported by regional Ambassadors.
We met our goal of keeping total travel and meeting expenses below 60% of 2019 costs even though we have more staff and membership growth has continued. The amount of money spent is a rough proxy for our carbon impact.
We no longer have an office in Oxford and will not renew the lease on our Lynnfield, MA office, so we will have no physical offices by the end of 2024. This is not a large carbon emission reduction and is more a result of being a “distributed first” organisation with staff in 11 different countries.
We recorded data on staff travel (flights, trains, cars, hotels) for 2023 to use as a baseline for comparison with future years. In 2023 the carbon emissions from travel and meetings was about 105 tCO2e.
We used tools provided by Amazon Web Services (AWS) and Zoom to estimate the impact of these services. In 2023 this was 0.266 tCO2e for AWS and .1 tCO2e for Zoom.

Some challenges

Compiling data is difficult and time-consuming for a small organisation
There are many different calculators and metrics to use and it’s difficult to decide which to use and how much detail to go into
We haven’t yet estimated the carbon footprint of staff home working
We were able to calculate the emissions from AWS but not our data center
We didn’t estimate the emissions from our offices. We had a small office in Oxford until November 2023, and we have an office near Boston - we won’t be renewing the lease in 2025 so won’t have any offices.

Total travel and meetings spending

Year	Amount	Percentage of 2019
2019 actuals	$585,482	100%
2020 actuals	$91,700	16%
2021 actuals	$19,066	3%
2022 actuals	$74,416	13%
2023 actuals	$305,737	52%
2024 budget	$333,500	56%

We have recorded carbon emissions from travel at about 105 tCO2e, so we will compare 2023 with future years. Now that we have started collecting travel data, it will be easier—staff can do it as they travel throughout the year.

Our Executive Director, Ed Pentz, looked at his personal and work flights and the carbon emissions in 2019 were 18 tCO2e and in 2023 were 2.7 tCO2e so this is a big change in the right direction.

Hosting services

We use AWS for hosting our REST APIs, Crossref Metadata Search, the website, and Labs projects. Our main metadata registry is still in a data center, which is not included in this calculation. For 2023 Amazon reports Crossref’s carbon emissions were 0.216 tCO2e compared with 0.266 tCO2e in 2022. Crossref is planning to move out of the data center and fully to AWS by the end of 2024 so this will increase our AWS usage and therefore our emissions from related activities will increase. Compared to travel, the footprint from AWS is minimal.

Online meetings

As a distributed, remote-first organisation Crossref is a heavy Zoom user –– it’s essential for staff and for engaging with our community. However, Zoom doesn’t provide tools or estimates of the carbon impact of Zoom meetings. We used a tool provided by Utility Bidder, which makes a lot of estimates and assumptions. In 2023 Crossref had almost 800,000 meeting minutes. This translated into an average of 1.92 kg of CO2 emissions per week, or 100 kg per year.

Some studies have estimated that turning off video reduces the carbon footprint of meetings. However, this can be a false savings since video is often important for creating a connection and having a productive meeting, and a Zoom meeting with video is still much, much better than traveling, particularly if flying is involved.

Tools we used

In order to calculate emissions for flights and train journeys, we chose to use Carbon Calculator. We didn’t calculate emissions from hotel stays but looked at the Hotel Footprinting tools and may add hotels to calculations in the future.

Offsetting

We don’t offset our emissions from travel or other operations and don’t have plans to do this. Offsetting emissions is problematic in a number of different ways so we don’t feel confident in doing it.

We did tree-planting as a “thank you” for the time of respondents in our metadata survey. Intended as an alternative to more commercial types of incentives rather than off-setting for our emissions, this resulted in 921 trees planted for the Gewocha Forest, Ethiopia via Ecologi.

Wrapping up

Moving forward, we’ve learned a lot over the last couple of years. Collecting accurate data is challenging and time-consuming, especially for a small organisation. For us, this has been a new lens for viewing our activities, and it remains a true learning journey and we have made permanent changes. In 2024 and beyond we are going to continue to follow our travel, meetings, and events policies that we announced in 2022. We will continue to capture our air travel emissions, and in 2025 we will more accurately capture train journeys and hotel stays. We will also continue calculating our Zoom and AWS emissions as best as we can. What we’ve learnt in the process of capturing and calculating our 2023 emissions helped us set things up to enable more prompt reporting on these impacts in the future.

We expect that many of our members and our community at large assess their environmental impact or are embarking on similar projects, to understand and curb emissions. We’re keen to discuss this and learn together to reduce our environmental impact as an organisation.

Metadata beyond discoverability

Ginny Hendricks — Tue, 03 Dec 2024 00:00:00 +0000

Metadata is one of the most important tools needed to communicate with each other about science and scholarship. It tells the story of research that travels throughout systems and subjects and even to future generations. We have metadata for organising and describing content, metadata for provenance and ownership information, and metadata is increasingly used as signals of trust.

Following our panel discussion on the same subject at the ALPSP University Press Redux conference in May 2024, in this post we explore the idea that metadata, once considered important mostly for discoverability, is now a vital element used for evidence and the integrity of the scholarly record. We share our experiences and views on the metadata significance and workflows from the perspective of academic and university presses – thus we primarily concentrate on the context of books and journal articles.

The communication of knowledge is facilitated by tiny elements of metadata flitting around between thousands of systems telling minuscule parts of the story about a research work. And it isn’t just titles and authors and abstracts – what we think of as metadata has really evolved as more nuance is needed in the assessment and absorption of information. Who paid for this research and how much, how exactly did everyone contribute, what data was produced and is it available for me to reuse it, as well as, increasingly, things like post-publication comments, assertions from “readers like me”, who has reproduced this research or refuted these conclusions.

Different types of published works are described by different types of metadata – journal articles, book chapters, preprints, dissertations. And those metadata elements can be of varying importance for different users. In this article, we will talk about metadata from the perspectives of four personas highlighted by the Metadata 20/20:

Metadata Creators, who provide descriptive information (metadata) about research and scholarly outputs.
Metadata Curators, who classify, normalise and standardise this descriptive information to increase its value as a resource.
Metadata Custodians, who store and maintain this descriptive information and make it available for consumers.
Metadata Consumers, who knowingly or unknowingly use the descriptive information to find, discover, connect, and cite research and scholarly outputs.

Our approach delineates the metadata lifecycle, from authorship, through production, discovery and through continuous curation. Though some of the metadata is generated outside of that linear process, and much happens before the authorship step, we see it as a clear and useful breakdown of how metadata contributes to a new piece of content.

Authorship

The first stage in the metadata lifecycle, authorship, is just the beginning of a dynamic process with many collaborators. A formative piece of the puzzle, authorship involves the authors or contributors, the editorial team and/or the marketing team and this is when the shape of the project and its metadata takes form. During this stage, the book or journal’s metadata exists only between the originators and the publisher, allowing the most opportunity for creativity and enhancement. Once the metadata reaches the next checkpoint along the lifecycle and is sent out externally, it’s more difficult and riskier to make major changes to the key metadata elements. In scholarly monograph publishing especially, we have the advantage of longer production lead times during which to amend and manipulate metadata during this stage.

At this stage, authors may have ideas of titles, subtitles and descriptions and it is up to the editors and other team members at the publisher to think strategically about how this can be optimised. The marketing and sales teams may be thinking about how the abstracts, keywords, and classifications can be best optimised for the web, leading to increased sales. Discoverability and interoperability of metadata for a book or journal, especially the use of persistent identifiers, is beneficial both for the author – in that their book is easily discovered, used, and cited – and for the publisher – increased visibility, sales, and usage.

Current challenges at the authorship stage include changing goalposts for metadata standards and accessibility requirements, which also have knock-on effects in subsequent stages in the metadata lifecycle. One of the key challenges with these is that they require buy-in from multiple players to keep up with and amend, and publishers must think closely about how these changes may affect metadata workflows for books at different stages of publication.

Production

As a book or journal article comes into production, it’s time to update and release the metadata to retailers, libraries, data aggregators and distributors. The metadata should be updated and checked to make sure that it’s still a good reflection of the product or the content that it describes and complete enough to release, including a final cover image in the case of books. This is still very much a collaborative effort with multiple roles involved. Technical details, such as spine width, page extents, and weight, are added, capturing the final specification. The editorial team may update metadata entered into systems earlier in the process. For example reviewing the prices, updating subject classification codes or amending the chapter order. If any of the content is to be published open access, appropriate licensing and access metadata need to be included, so that users of the content are clear about what they can (and can’t!) do with it. Metadata that’s not yet captured upstream can be added or enhanced. For example, vendors already involved in the production process can verify that persistent identifiers (PIDs) are present and correct in funding metadata.

More and more metadata elements are being requested by supply chain partners. For example, new requirements being introduced to provide commodity codes, spine width, carton quantities, gratis copy value and country of manufacture. There may be differences in metadata depending on the methods of production. For example, country of manufacture will be supplied differently when using traditional print methods where the whole print run is carried out at a location, or where a title is manufactured print on demand and the location of printing is determined by the delivery address.

In an XML-first workflow, metadata can be captured with the content files to aid with discovery. This usually requires multiple systems, both internal and external. These systems need to be able to work together to ensure that only up-to-date metadata is used. Metadata will change throughout the production process, whether it’s the publication of an accepted manuscript through to the final version of record, or pre-order information to the published version, so updates need to feed out regularly.

The right metadata needs to go to the right recipient. Some is not useful or cannot be processed by certain recipients. For example, a printer, retailer, librarian or data aggregator each have their own needs and use cases and may receive and process metadata in different formats or require different fields.

Discovery

Discovery is the series of actions taken by an end user to retrieve and access relevant content they do not know about. Discovery can happen everywhere: Google (a search engine), a library catalog, a publisher platform, etc. However, Discovery is associated with using Discovery systems in the academic sector.

The technological landscape of libraries has developed in the last 15 years. Discovery systems are tools libraries subscribe to in order to allow their end users to have one search experience within their library holdings. It is paramount for librarians that library collections are used; hence, it is very important for them that the discovery system of their choice contains all the relevant metadata. Libraries expect their discovery service to include their content coverage as comprehensively as possible. Content items not represented or misrepresented in a discovery system create challenges to libraries in how they might otherwise ensure that these materials are discovered and accessed.

Libraries’ adoption and usage of discovery systems are surrounded by the belief that the great benefits of this technology are the one search box and the configuration flexibility, which are the most important benefits. Libraries invest a significant amount of money in discovery services. The increase in usage is the success indicator of this adoption and a positive return on investment.

The backbone of discovery systems is formed by three crucial elements: a user interface, a metadata index, and a link resolver or Knowledge Base. These elements, along with a back-end control panel for librarian configuration, are the key components that enable the discovery process.

The discovery index, a database storing descriptive data from various content providers, data sets, and content types, is a testament to the collaborative efforts of content providers and discovery systems vendors. Their work under the Discovery Metadata Sharing partnership agreements, which establish the format, scope, frequency, and support of the collaboration, is instrumental in meeting librarians’ expectations.

Format

The discovery metadata integration processes have settled down for most cases in these two metadata delivery workflows.

Metadata for the index of discovery: Discovery systems have traditionally made efforts to work with various metadata formats like MARC, proprietary templates, etc., but the preferred format is XML. This metadata could include all the bibliographic information data, including index terms and full text at the article and chapter level.

Metadata for link resolvers and Knowledge bases: Knowledge bases are tools that contain information about what is included in a product, packages, and/or databases. KBART is the preferred format in this area. It includes a set of basic bibliographic descriptions at the publication level and linking information for direct and OpenURL syntaxes.

Frequency

The delivery channels vary, and the frequency could vary daily to yearly, depending on the publication schedule.

Scope

Library collections include various content types, including archival materials, open access, and multimedia alongside the more traditional books and periodicals. Different content types will require different metadata elements to make a comprehensive discovery-friendly description, and the metadata elements will impact the formats in use.

Discovery services will receive this data and prioritise uploading. They will select and manipulate the required metadata elements according to their system requirements. These metadata tweaks and selections are not always communicated to the content providers and/or libraries. Ultimately, librarians decide which metadata will be visible on their discovery tool and the linking methods of their choice.

As described, Discovery is a complex area where the activities of its main stakeholders are interconnected. The success of the end users’ discovery journey from search to access depends on the successful integration, implementation, and maintenance of the discovery systems. This necessitates a combined effort from the three discovery stakeholders: content providers, discovery system providers, and libraries. Their collaborative work is not just crucial, but integral to supporting discovery and fulfilment in the most efficient manner possible. Your active involvement in this process is what makes it successful.

How do we ensure discoverability?

Electronic resources do not exist in isolation but are assessed and used depending on their level of integration in the discovery landscape where libraries and patrons are active. From a content provider’s perspective, discoverability is about the number and efficiency of entry points to our products created in third-party discovery products.

The level of discovery integration has a direct impact on sales and upsell opportunities. Products that are not discoverable are difficult to work with, and the opposite is true for products that are considered discoverable. Your role in ensuring discoverability directly influences the user experience and sales, making your work crucial and impactful. The term ‘Discoverability’ is critical in discovery library systems. It refers to the extent to which eResources are searchable in a discovery system, and it directly influences the ease with which users can find the information they need, thereby enhancing their overall experience. In practical terms, the degree of discoverability will be impacted by the quality of the metadata supplied, the transformations the metadata suffers in the integration process to discovery systems, and the configuration’s maintenance.

The general principles of metadata quality also apply in this area: accuracy, completeness, and timely delivery. Your attention to these principles is crucial to contributing to the effectiveness of the discovery process. Metadata enrichment practices like identifiers and standards are also applicable. Your meticulous attention to detail in maintaining metadata quality ensures the effectiveness of the discovery process.

Discovery as a mindset in the publishing process will increase discoverability, as it will be influenced by product designs (whether the content is linkable) and which metadata outputs are possible. For example, author-generated index terms will be more effective for meeting research search terms, and detailed article titles will probably be more discoverable than general titles. Finally, all the integration, descriptive metadata, configurations, etc., leave much room for errors. The flow is complex; on occasion, the products and content are more complicated to describe than tools can handle, and there are millions of holdings per library to manage. Constant maintenance and troubleshooting are crucial elements to maintaining and increasing discoverability.

Metadata beyond publication

In the lead-up to publication, finalising rich complete metadata can seem like establishing a fixed set of information. Post-publication, however, the metadata workflow should be dynamic, able to evolve to keep pace with new demands and opportunities. Think of metadata as a journey rather than a one-time destination, and look at ways to futureproof your metadata by actively adapting to some of the following types of change.

Changing Publisher Goals and Product Needs

Metadata should align with changing priorities for a publisher. Developing new formats, shifts in commissioning focus or building new distribution partnerships may require metadata updates. For instance, re-releasing content in audiobook form or digitising a backlist title warrants a metadata review to ensure current and prospective readers find accurate, relevant information.

Changing Technology and Metadata Standards

Advances in technology, from artificial intelligence to emerging metadata standards, offer enhanced possibilities for capturing and updating metadata. AI, for example, can help enrich metadata with more precise subject tagging, while new metadata formats may offer greater compatibility across platforms and discovery services. Staying current with these tools can help publishers manage metadata more efficiently and enhance discoverability.

Changing Societal Values

As society evolves, so do expectations for inclusive and socially responsible metadata. Utilising new categorisation codes, such as those for the United Nations Sustainable Development Goals, can align metadata with emerging social goals. Similarly, publishers may need to revisit keywords and category codes to reflect language changes, balancing the integrity of historic records with the need for current, appropriate terminology.

Changing Industry Priorities

Commitments to accessibility and sustainability have prompted developments in metadata. Increasingly, publishers need to be able to use metadata to build a record of sustainable production methods, such as paper sources, printing methods or ink types. New metadata fields for accessibility specifications will also support more inclusive reader experiences going forward. Metadata will play an increasingly vital role in meeting industry standards for accessibility, EUDR and EAA compliance, and environmental transparency.

Changing Customer and Librarian Expectations

Finally, as the metadata expectations of customers grow and the nature of roles and responsibilities in library and collection management professions develops, teamwork and making good use of available resources are essential. Publishers don’t have to tackle this alone. Working with organisations such as Crossref or Book Industry Communication (BIC), signing up to newsletters and webinars, and forming an in-house discovery group are all great ideas for sharing ideas and best practice, and ensuring your metadata workflow is adaptable and responsive. Be part of the conversation now rather than struggling to keep up down the line!

What are some challenges and opportunities with metadata?

JM: Metadata that establishes permanence is a real opportunity in a digital landscape where content can move or be taken down, links can rot, website certificates can expire. Persistent identifiers like ORCiDs for people and DOIs for content are key examples of metadata that establish enduring routes to, and provenance of, published digital content.

KM: Metadata creation, maintenance and change has long been seen as a manual process. AI tools offer a real opportunity for metadata creation and review, especially for keywords and classification codes, at a scale and speed that has the potential to transform metadata workflows. Especially for backlist transformation, AI could offer real opportunities in this area. A challenge we face for monograph metadata more specifically is that much of the scholarly metadata infrastructure is built around the journal article, and it can be difficult to fit longer form content into these systems of discovery.

MT: Metadata is crucial. Good metadata (complete, accurate, and timely) is the base for smooth integrations and easy discovery interactions with eResources. Bad metadata (inaccurate, incomplete, late) will be the main reason for undiscovered content. At this point, the eResources industry is still based on different versions of the same metadata, which is the leading cause of problems. It is probably time to start considering a unique record approach. This unique record, which will be complete and accurate, could be used by different systems for different purposes. I know there are many details to define here, but if you think about it, it is not impossible and could solve the many known issues.

How do you ensure the quality and completeness of your metadata? Do you have ways of auditing it?

SP: Validation of data is really important, so choosing or building a system that’s set up to do this is an important foundation. It’s straightforward to check for completeness of fields and I run daily checks on our book metadata to make sure there’s nothing missing in the files feeding out. Quality can be more challenging to monitor. Feedback from data recipients is key, and accreditation schemes such as the BIC Metadata Excellence Award are a great way to benchmark progress. Good training and clear documentation help to make sure that everyone involved in creating and updating metadata understands exactly what they need to do and the standards they need to meet.

KM: Earlier this year we completed a year-long data cleansing project as part of our move to a new title management database. This gave us the time to address gaps in backlist metadata as well as to identify any inconsistencies across records for the same book, and enrich key metadata fields like classification codes, keywords and PIDs. For frontlist titles, each person owns a number of fields to ensure they are complete before a book’s metadata is distributed – some of these have validation tools which will prevent a book’s metadata from being sent out unless it is complete.

MT: Strict and consistent internal processes are essential to ensure quality and completeness. Following the different standards and industry recommendations helps to keep the quality at high standards. Random manual checks and system-based checks help to ensure everything is good. We carry out projects where we work with specific aspects of the metadata. This building-blocks approach ensures the different data layers are as good as possible. As with any project, metadata projects should have specific goals, outcomes, resources, and documentation.

How do you know if (and how much) metadata helps achieve your goals?

JM: Take any available opportunities to find out what people think of your metadata – via library conferences, institutional customer feedback, and by working with the library team at our home institution, we’ve had some really useful and interesting conversations about MUP’s metadata and where we can improve it to make it as relevant as possible for different stakeholder needs.

MT: Customers and Discovery partners will inform us if something is incorrect. Usage data is also a good indicator of how healthy our metadata is. Following industry standards is another good reference point for assessing the metadata. Finally, the metadata is only good when we know what we want to use it for. So, always considering what we are trying to achieve helps us understand how effective the metadata is.

KM: As the others have noted here, and we represent a range of different types and sizes of publishers, measuring the direct impact of metadata is an ongoing challenge. We think about the different end users who might encounter our metadata further down the supply chain – retail customers searching on Amazon, librarians filtering results on purchasing platforms, researchers finding our books and journals through citations on popular online search engines – and consider what elements of our metadata might help reach those people in the right ways.

JM: Ideally, you’ll see an uplift in sales or usage for every metadata element that you add, review or expand, although it can be challenging to quantify and prove a direct correlation between richer metadata and higher revenue or discoverability, as there are will be other factors involved. For my Operations team, what is certain is that richer, more comprehensive metadata means fewer errors are thrown up by the distribution systems and feeds we use, which means colleagues save time and gain productivity by not having to resolve and rerun failed jobs, chase missing information from other teams, or manually send information to third parties. My job is also made easier because things like size and weight of every printed product are recorded in our bibliographic database as standard, easy to report on and analyse, which helps with forecasting costs for inventory storage or shipping. Metadata can be powerful.

Research Integrity Roundtable 2024

Martyn Rittman — Fri, 15 Nov 2024 00:00:00 +0000

For the third year in a row, Crossref hosted a roundtable on research integrity prior to the Frankfurt book fair. This year the event looked at Crossmark, our tool to display retractions and other post-publication updates to readers.

Since the start of 2024, we have been carrying out a consultation on Crossmark, gathering feedback and input from a range of members. The roundtable discussion was a chance to check and refine some of the conclusions we’ve come to, and gather more suggestions on the way forward. As in previous years, we were able to include a range of organisations, which led to lively and interesting discussions. See below for the full participant list.

Crossmark feedback

We started by presenting Crossmark and a summary of the consultation process. There are a number of areas where we have learned more about how the community operates or found that Crossmark needs to adapt. These include:

Implementation: Our members have struggled to implement Crossmark and uptake is low. At the same time, in many organisations the workflows for handling retractions are not well-defined because they are rarely used, if ever. The responsibility for updating Crossref metadata can be unclear and this may be a factor in the low uptake.

Education: There are different levels of understanding about how to handle retractions. Some members are very defensive when asked about retractions, others state they will never make updates to published works. How can we have a constructive conversation where the value of communicating updates appropriately is recognised?

Community engagement: Given the different scales, locations, disciplines, and technologies used by our members, it looks like one size will not fit all when it comes to updates. How can we get continual, representative feedback on new tools and processes?

Metadata assertions: Crossmark allows the deposit of metadata using custom field names, however this metadata seems to have low usefulness and is not highly valued by the community. Should we continue to collect it? Can we make some of the most-used field names part of our standard schema?

Changing the Crossmark UI: Although we didn’t specifically ask about it during the consultation, the look of the Crossref logo often came up, and concern that it is not recognised and not well-used. Can we change the look and behaviour so that it has more impact?

NISO Recommendations

Patrick Hargitt represented the NISO group on Communication of Retractions, Removals, and Expressions of Concern (CREC). The group’s recommendations were published earlier this year and cover how retractions are communicated. CREC arose from an earlier project, IRSRS. A large part of the motivation is that retracted works continued to be cited, with citing authors apparently unaware of the retraction. Patrick presented the CREC recommendations, which cover:

Metadata receipt, display, and distribution,
Which metadata elements to communicate,
How to implement the recommendations,
Discussion of some special cases,
Key stakeholders and their responsibilities.

The two presentations prompted discussion, which was taken into the first of two workshops.

First workshop: Improving collection of retractions and Crossmark

The first workshop looked at proposed changes to Crossmark and how to encourage more members to deposit their retractions, corrections, and other post-publication updates. Several important themes emerged.

First, the question of whose responsibility it should be to provide metadata on retractions and similar updates. Crossref has a responsibility to work with the community to obtain high quality and complete metadata; publishers should take responsibility for handling issues of research integrity and reporting them to relevant downstream services, like Crossref; and platforms need to provide tools that allow easy reporting of retractions.

The value of Crossmark appearing in PDFs was reiterated. The fact that a PDF can be downloaded, and years later there is a way to tell whether it has been retracted or not is highly valued. There was also the suggestion that the Crossmark logo on web pages can indicate a change before it has been clicked. This is something that we have been considering at Crossref and it was useful to have the idea reinforced. Another suggestion was that a browser plugin would make a good complement to Crossmark.

Implementation issues with Crossmark were raised, including that it’s difficult to validate whether a specific implementation is complete. There are a number of different changes (to metadata deposit and content, and websites) that need to work together to have Crossmark fully functional. There were several questions and a discussion about Retraction Watch data. Some were about understanding its collection and validation. A number of participants are actively using the data and it was great to see the variety of applications.

Second workshop: Community use of retraction metadata

The second workshop focused on a broader set of downstream organisations that might want to make use of retraction metadata. We looked at stakeholders and their needs, and attempted to match them up with existing tools. Several gaps were identified as a result, which may provide opportunities for new services or collaborations to fill them.

We identified a number of tools available for publishers, editorial systems, metadata researchers, and readers. A good example is reference managers, many of which are now highlighting retracted works to authors. This can help to reduce the number of retracted works being cited. Publishing platforms are also providing support to editors, using tools that include retraction metadata.

Some of the stakeholders identified have limited tools for identifying retractions that are relevant to them. These include funders, archives and repositories, journalists, and institutions.

Often, there are pathways for retraction data to be communicated but they are not being sufficiently used. There needs to be a concerted effort to improve the quality of retraction metadata for tools to function better. For example, a second author on a paper might not know that a correction or retraction is planned for their article. If their email or ORCID isn’t included in the metadata, an alerting tool wouldn’t be able to let them know. A similar argument can be made for institutions or funders if they are not well-identified in the metadata.

The question of standardisation of metadata was raised. It seems too early to implement a full set of standards at the moment. CREC and similar initiatives have documented and accommodated for a range of practices while providing guidance and principles to work towards. More discussion is needed in the community to work out paths that could be applied across the broad spectrum of scholarly communication.

Conclusion

The event was very valuable in bringing up a range of topics related to retraction and communication of post-publication changes to scholarly works. We are grateful to all of the participants for their contributions and sharing their diverse experience and opinions with us.

Research integrity is an area of flux, with significant changes over the past few years. While there has been progress, there remain gaps in metadata and tools to communicate retractions. This is something that Crossref will continue to contribute to, and Crossmark clearly still has a role to play.

Some of the ideas and suggestions from the discussion can be implemented in the near future. Others need further development, and we will continue to engage the community. Reading this, there may be topics where you feel you have a role to play. We are keen to partner with other organisations in this space as we continue to improve the transparency and communication of metadata for post-publication updates.

Participants

Many thanks to the participants. Here is the full list of those that attended:

Name	Role	Organisation
Aaron Wood	Head, Product & Content Management	American Psychological Association
Adya Misra	Associate Director, Research Integrity	Sage
Bianca Kramer		Sesame Open Science
Constanze Schelhorn	Head of Indexing	MDPI
Guillaume Cabanac	Full Professor	University of Toulouse
Hong Zhou	Director of AI Product	Wiley
Jennifer Wright	Head of Publication Ethics and Research Integrity	Cambridge University Press
Johanssen Obanda	Community Engagement Manager	Crossref
Joris van Rossum	Program Director	STM Solutions
Kathryn Weber-Boer	Data & Analytics	Digital Science
Kornelia Korzec	Director of Community	Crossref
Kruna Vukmirovic	Publisher- Journals	The Institution of Engineering and Technology
Lena Stoll	Product Manager	Crossref
Leslie McIntosh	VP, Research Integrity	Digital Science
Liying Yang	Professor	CAS Library
Luis Montilla	Technical Community Manager	Crossref
Madhura Amdekar	Community Engagement Manager	Crossref
Martyn Rittman	Progam Lead	Crossref
Maryna Kovalyova	Member Experience Manager	Crossref
Mina Roussenova	Project Manager, Strategic Projects	Karger
Osnat Vilenchik	VP Content Operations	Ex Libris, part of Clarivate
Patrick Hargitt	Senior Director of Product Management	Atypon/Wiley
Paul Davis	Tech Support & R&D Analyst	Crossref
Sami Benchekroun	CEO	Morressier
Scott Delman	Director of Publications	Association of Computing Machinery (ACM)
Shilpi Mehra	Head, Research Integrity & Paperpal Preflight	Cactus Communications
Sichao Tong		Chinese Academy of Sciences, Library

How good is your matching?

Dominika Tkaczyk — Wed, 06 Nov 2024 00:00:00 +0000

https://doi.org/10.13003/ief7aibi

In our previous blog post in this series, we explained why no metadata matching strategy can return perfect results. Thankfully, however, this does not mean that it’s impossible to know anything about the quality of matching. Indeed, we can (and should!) measure how close (or far) we are from achieving perfection with our matching. Read on to learn how this can be done!

How about we start with a quiz? Imagine a database of scholarly metadata that needs to be enriched with identifiers, such as ORCIDs or ROR IDs. Hopefully, by this point in our series this is recognizable as a classic matching problem. In searching for a solution, you identify an externally-developed matching tool that makes one of the below claims. Which of the following would demonstrate satisfactory performance?

It is a cutting-edge, state-of-the-art, intelligent-as-they-come, bullet-proof technology! All the big players are using it. You won’t find anything better!
The tool was tested on the metadata of 10 articles we authored, and many identifiers were matched.
The quality of our matching is 98%.

Okay, okay, trick question. The correct answer here is to opt for secret answer #4: “I wouldn’t be satisfied by any of these claims!” Let’s dig in a bit more to why this is the correct response.

The importance of the evaluation

Before we decide to integrate a matching strategy, it is important to understand as much as possible about how it will perform. Whether it is used in a semi or fully automated fashion, metadata matching will result in the creation of new relationships between things like works, authors, funding sources, and institutions. Those relationships will then, in turn, be used by the consumers of this metadata to guide their understanding and perhaps even to make important decisions about those same entities. As organisations providing scholarly infrastructure, we must therefore take it as our paramount responsibility to understand any caveats or shortcomings of the scholarly metadata we make available, including that resulting from matching.

Proper evaluation is what allows us to do this, as it is impossible to know how well a given matching strategy will perform in its absence. This is true no matter how simple or complex a matching strategy may seem. Complex methods can be tailored to data with specific characteristics and might fail when faced with something different from this. Simple methods might be only appropriate for clean metadata or a narrow set of use cases.

Beyond complexity, matching strategies themselves vary widely in character, inheriting biases from their design, training data, or how a problem has been formulated. Some prioritise avoiding false negatives, while others focus on minimising false positives. Even a generally high-performing strategy might not be perfectly aligned with your specific needs or data. In some cases, the task also itself might be too challenging, or the available metadata too noisy, for any matching strategy to perform adequately.

Evaluation is, again, how we understand these nuances and make informed decisions about whether to implement matching or avoid it altogether. By now, it should also be clear that the notion “we don’t need to evaluate” is far from ideal! Given its importance, let’s explore how evaluation is actually done.

Evaluation process

In general, a proper evaluation procedure should follow the following steps:

Preparation of an evaluation dataset containing many examples of matching inputs and the corresponding expected outputs.
Applying the strategy to all inputs from the dataset and recording the responses.
Comparing the expected outputs with the outputs from the strategy.
Converting the results of the above comparison into evaluation metrics.

From this accounting, we can see that there are two primary components for the evaluation process: an evaluation dataset and metrics.

Evaluation dataset

It’s useful to conceive an evaluation dataset as the specification for an ideal matching strategy, describing what would be returned from our forever-elusive perfect matching. When creating such a dataset, what this means in practice is that it should contain a number of real-world, example inputs, along with the corresponding ideal or expected outputs, and that all data should be in the same format as the strategy is expected to process. The outputs should themselves also confirm the strategy’s overall requirements, for example, by being consistent with its cardinality, meaning whether zero, one, or multiple matches should be returned and under what circumstances. In terms of size, it’s generally useful to calculate the ideal number of evaluation examples using a sample size calculator or using standardised measures, but as a quick rule of thumb: less than 100 examples is probably insufficient, more than 1,000 or 2,000 is generally acceptable.

It is also important that the evaluation dataset be representative of the data to be matched in order to ensure reliable results. Using unrepresentative data, even if convenient, can lead to biassed or misleading evaluations. For example, if matching affiliations from various journals, building an evaluation dataset solely from one journal that already assigns ROR IDs to authors’ affiliations might be tempting. The data, having been already annotated, allow us to avoid the tedious work of labelling, and we might even know that it is produced by a high-quality source. This is still, unfortunately, a flawed approach. In practice, such datasets are unlikely to represent the entire range of affiliations to be matched, potentially leading to a significant discrepancy between the evaluated quality and the actual performance of the matching strategy, when applied to the full dataset. To assess a matching strategy’s effectiveness, we have to resist shortcuts and instead do our best to create truly representative evaluation datasets to be confident that we’ve accurately measured their performance.

Evaluation metrics

Evaluation metrics are what allow us to summarise the results of the evaluation into a single number. Metrics give us a quick way to get an estimation of how close the strategy was to achieving perfect results. They are also useful if we want to compare different strategies with each other or decide whether the strategy is sufficient for our use case, removing the need to compare countless evaluation examples from different strategies against one another.

The simplest metric is accuracy, which can be calculated as the fraction of the dataset examples that were matched correctly. While a commonsense benchmark, accuracy can be misleading, and we generally do not recommend using it. To understand why, let’s consider the following small dataset and the responses from two strategies:

Input	Expected output	Strategy 1	Strategy 2
string 1	ID 1	ID 1	ID 1
string 2	ID 2	ID 3	Empty output
string 3	Empty output	Empty output	Empty output

Both strategies achieved the same accuracy, 0.67, making one mistake each on the second affiliation string. However, a closer examination reveals that these error types are distinct. The first strategy matched to an incorrect identifier, while the second refused to return any value illustrating the limitation of accuracy as a measure: it generally fails to capture important nuances in strategy behaviour. In our example, the first strategy appears more permissive, returning matches even in unclear circumstances, while the second is more conservative, withholding them when uncertain. Although using such a small dataset would preclude drawing any definitive conclusions, it highlights how relying on accuracy alone can obscure differences in performance.

For evaluating matching strategies, we instead recommend using two metrics: precision and recall. To recap from our previous blog post:

Precision is calculated as the number of correctly matched relationships resulting from a strategy, divided by the total number of matched relationships. It can also be interpreted as the probability that a match is correct. Low precision indicates a high rate of false positives, which are incorrect relationships created by the strategy.
Recall is calculated as the number of correctly matched relationships resulting from a strategy, divided by the number of true (expected) relationships. It can also be interpreted as the probability that a true (correct) relationship will be created by the strategy. Low recall means a high rate of false negatives, which are relationships that should have been created by the strategy but were not made.

Applying these measures to our prior example, the strategies achieved the following results:

Strategy 1: accuracy 0.67, precision 0.5, recall 0.5
Strategy 2: accuracy 0.67, precision 1.0, recall 0.5

As we can see, while both strategies have the same accuracy, using precision and recall better describes the difference between the two sets of results. Strategy 1’s lower precision indicates it made false positive matches, while Strategy 2’s perfect precision shows that it made none. The identical recall scores show both identified half of the possible matches.

Of course, results calculated using such a small dataset are not very meaningful. If we obtained these scores from a large, representative evaluation dataset, it would indicate to us that Strategy 1 risks introducing many incorrect relationships, while Strategy 2 would be unlikely to do so. In both cases, we would still expect approximately half of the possible relationships to be missing from the strategies’ outputs.

Which one is more important to prioritise, precision or recall? It depends on the use case. As a general rule, if you want to use the strategy in a fully automated way, without any form of manual review or correction of the results, we recommend paying more attention to precision. Privileging precision will allow you to better control the number of incorrect relationships added to your data. If you want to use the strategy in a semi-automated fashion, where there is a manual examination of and a chance to correct the results, pay more attention to recall. Doing so will guarantee that enough options are presented during the manual review stage and fewer relationships will be missed as a result.

To get a more balanced estimation of performance, we can also consider both precision and recall at the same time using a measure called F-score. F-score combines precision and recall into a single number, with variable weight given to either aspect. There are three commonly used types, each calculated as the weighted harmonic mean of precision and recall:

F0.5: Precision is weighted more heavily. It can be understood as a score that is 50% more sensitive to precision than recall. A high F0.5 score indicates a measure of performance that minimises false positives.
F1: Equal weight is given to both precision and recall. It can be interpreted as the most balanced score in this set. High F1 indicates good overall performance, with both false positives and false negatives being minimised equally.
F2: Recall is weighted more heavily. It can be understood as a score that is 50% more sensitive to recall than precision. A high F2 score indicates a measure of performance where false negatives are minimised.

Each of these variants allows for fine-tuning the evaluation metric to align with your expectations for a specific matching task. Choose whichever reflects the relative importance of precision versus recall for your use case.

To summarise, to avoid falling prey to misleading sales pitches or silly quizzes, it is important to have a good understanding of the performance of any strategies you are building or integrating. With thorough evaluation, including a representative dataset and carefully considered metrics, we can estimate the quality of matching and, by extension, its resulting relationships.

Now that we’ve covered how to evaluate effectively, we can move on to some other aspects of metadata matching. Our next blog post will take a final, more holistic view of matching, exploring some complementary considerations to all of the preceding. Stay tuned for more!

Update on the Resourcing Crossref for Future Sustainability research

Kornelia Korzec — Mon, 28 Oct 2024 00:00:00 +0000

We’re in year two of the Resourcing Crossref for Future Sustainability (RCFS) research. This report provides an update on progress to date, specifically on research we’ve conducted to better understand the impact of our fees and possible changes.

Crossref is in a good financial position with our current fees, which haven’t increased in 20 years. This project is seeking to future-proof our fees by:

Making fees more equitable
Simplifying our complex fee schedule
Rebalancing revenue sources

In order to review all aspects of our fees, we’ve planned five projects to look into specific aspects of our current fees that may need to change to achieve the goals above. This is an update on the research and discussions that have been underway with our Membership & Fees Committee and our Board, and what we’ve learned so far in each of these areas.

Goal 1: More equitable fees.

To ensure our fees going into the future are more equitable, we’re carrying out two parallel projects: evaluation of the lowest membership tier, and the review of the basis for deciding the membership tiers and distribution of membership across them.

Project 1: Evaluate the lowest membership tier and propose a more equitable pricing structure.

All Crossref members pay an annual membership fee. These fees are tiered, and different members pay a different fee depending on the annual publishing revenue that their organisation receives (or publishing expenses if they don’t receive any publishing revenue).

We entered into this project recognising that we have too many membership tiers and the definition we use to size members is not consistent and can be confusing (e.g. different basis for funders than other organisations, and both are different still from subscribers to our Metadata Plus service). The idea of the membership tiers was to use publishing revenue as a proxy for “ability to pay”. We really want to develop proposals for a more equitable pricing structure. However we don’t know enough about our members’ capacity to pay to be able to model an alternative approach.

Our current lowest fee tier is $275 (USD) for any organisation with annual publishing revenue (or publishing expenses where the organisation doesn’t receive publishing revenue) of $0 to $1 million, and this is the tier where we focus our attention in our first project of the RCFS program. The difference between an organisation with revenue or expenses of USD 0, and an organisation with revenue or expenses of USD 1 million, is huge. Hardly any new members have joined in any other tier in the past several years. Of the 21,000 active members, more than 20,000 fall into the USD 275 tier - either directly (as an independent member) or indirectly (through a sponsor, where their fees would be lower). A fee structure that would fit better with the realities of our community might entail breaking our current $275 fee tier down into two or more more granular tiers.

At the moment, the majority of Crossref’s revenues come from the bottom membership tiers; 65% of membership revenues come from organisations in the USD 275 tier. We also know that many of those members (86%) are paying more in membership dues than in content registration, whereas other members have the inverse relationship between annual dues and content registration. Overall, the members in the USD 275 tier contributed 34% of Crossref’s revenue last year, and the members in the >USD 50 million tier – contributed 29%.

Members’ survey

Between April and May this year, we surveyed all independent members in the USD 275 tier. We asked questions about their operating size, how they’re funded, and how Crossref’s fees affect them. At the time of the survey, there were 8,027 members in this category. We received 1,054 responses; with a 13% response rate and broad representation globally, we are confident in the sample size. One-third of respondents said they were part of a larger organisation (such as a department or a library in a research institution).

Chart 1: Organisation revenue or funding The majority of respondents in this category (65%) have annual revenue or expense of less than USD 100,000; with 48% operating with less than USD 10,000.

Chart 2: Sources of funding When asked about the sources of funding (as an indicator of how stable these organisations might be and how readily accessible their funding is) the most frequent answer was public or government funding, and then article processing charges. If organisations relied on two sources of funding, the most common combination was public funding and article processing charges, and it was relatively rare for these organisations to have multiple sources of funding.

Chart 3: What percentage of expenses do you spend on Crossref fees?

The majority (61%) of respondents spend less than 5% of their expenses on Crossref fees. However, we have also learnt that for some volunteer-run publications, Crossref fees might be some of the only expenses they incur. Interestingly, the percentage of expenses spent on Crossref is fairly consistently spread across the continents.

Project 2: Review the basis and distribution of membership tiers

This project examines options for how we define the capacity to pay, how members are distributed across tiers, and the right levels of member fees.

There are currently a range of prices for our annual fees, based on an organisation’s ability to pay. We have used the metric of annual publishing expense or revenue as an indicator of that ability, but in some cases it doesn’t apply. As per our fee principles, we have not differentiated between organisation types. Nonprofit and commercial entities pay the same price (caveat: research funders still have a separate fee schedule, but that was intended to be temporary).

We conducted a review of other annual fee models to benchmark our approach against six like-minded organisations working in the context of scholarly communications and infrastructure. We looked at whether these organisations based their fees on one more more of the following:

Volume: e.g., research output, # of journals
Budget: e.g., total annual revenue or expenses
Relevant budget: e.g. publishing revenue
Organisation type: e.g. variance in fee based on publisher, institution, or funder
Country-level economic data: e.g., discounting based on World Bank classification, discounting based on purchasing power calculation.

Chart 4: Annual fee schedules comparisons between Crossref and CORE, DOAJ, Dryad, OA Switch-board, OpenCitations and ORCID.

There are three consistent themes among our peers: the total annual revenue and volume levels are the most common basis for membership fees among other organisations, and almost all offer discounted fees to accommodate country-based economic circumstances, utilising World Bank’s data (this is currently achieved at Crossref via the GEM program, which we have full intention of incorporating into our future fees whatever other decisions we might take). Only one other organisation uses publishing revenue or expenses as a basis for annual fees, while the potentially more transparent and less ambiguous data point of the total revenue factors in three other annual fee models.

For subscribers to our Metadata Plus service, the fee tier is selected based on whichever is the higher between their total annual revenue (including earned and fundraised, e.g. grants) or annual operating expenses (including staff and non-staff, e.g. occupancy, equipment, licences etc.). At present, we have limited understanding of the budgets of our members and how this may compare to their publishing revenues or expenses. We are looking to learn more about this as part of our annual membership data checking process, where we email all our members to ask them to confirm contact details for their organisation and the staff involved in managing their Crossref account. This year, we’re also asking all members about their organisation’s annual operating budget (or planned annual expenses) to help inform our discussions. In our case, the volume of outputs (in this case the number of items and associated metadata registered with Crossref) is recognised by the registration fees mechanism.

Consulting with organisations outside Crossref membership

To help us inform how our fees can be more equitable, it’s important to invite voices of organisations that may currently be unable to join us - due to fees or technical barriers. We hope that learning more about their circumstances will help us make sure that we improve accessibility of Crossref membership to all organisations that publish scholarly and professional works. We commissioned Accucoms to carry out a consultation on our behalf.

So far, from a handful of interviews with publishers from Nigeria, DRC, Canada and USA, we’ve learnt that while virtually all offer open access to their publications, the majority has no publishing income, and where the income is derived via APCs it’s modest and only applicable in rare circumstances. Through institutional funding and/or grants, these organisations have modest operational budgets, yet our respondents lacked clarity over the particulars. In terms of participation in professional networks and international publishing organisations, only one of the organisations we interviewed participates in DOAJ, and another is a member of OASPA, in both cases their participation is free. Among the interviewees, two organisations were interested in Crossref membership in the past but encountered technical barriers to joining.

With only five interviews to date, the consultation is still open and we’re keen to hear from more organisations that are not Crossref members but have considered our membership at some point.

Goal 2: Simplify complex fees

Projects 3 & 4: Review volume and back-year discounts for Content Registration

Along with our membership fees, our members also pay usage-based registration fees for records (scholarly works and grants) they register with us. Different content types render different costs for our members, and the fees are subject to discounts related to the age of publication and volume of registrations. Records for items older than two years have a lower fee associated with them, to help incentivise registration of such ‘back-year’ materials with great gains for the Research Nexus. There are also discounts related to the volume of transactions – which again depend on the content types.

These discounts are intended to encourage certain behaviours, specifically encouraging members to register older records in large quantities to better complete the scholarly record. Not all content types have back-year or volume discounts, and the rate of discount varies. This creates quite a complex system of fees. To the extent that the discount is successful in encouraging this behaviour, we want to preserve it, but in many cases these discounts see little to no activity.

Following the discussions of the Membership and Fees Committee, chaired by Vincas Grigas, Vilnius University, we are preparing to consult with the small number of members who currently receive volume discounts to discuss what the impact would be if we removed them.

We plan to identify and preserve the well-used back-year discounts, which encourage registration of old content, such as books, journal articles, grants. However, there are types of discounts that are hardly ever used and we are considering removing these to simplify the fees. This work will focus on the technical implications of removing some of the underused back-year discounts from the billing code and consulting with members to understand any impact .

Goal 3: Rebalance revenue sources

Project 5: Reflect increase in metadata usage and perceived shift of value toward metadata distribution

All Crossref metadata is made freely and openly available to everyone. However, some organisations may be looking for a service level agreement in delivery of the metadata, plus more regular snapshots and priority service/rate limits. For those organisations, we have an optional Metadata Plus service.

The final project is looking at the fees for this service. We are interested in making sure that Crossref metadata is available and used by the community where it can contribute to their objectives – related to discovery, analysis, integrity, and more. The optional paid service we offer aims to support the external tools that facilitate business and scholarly processes for the community. We are heartened to see that the appetite for the use of metadata seems to be growing, and the value of open research information is increasingly and widely recognised. We want to ensure that the users of metadata contribute proportionally to the maintenance of the records created and curated by our members.

Conclusion

At this point, most projects generate a lot of questions and the work is underway to deliver answers related to capacity to pay, discounts as well as available metadata usage, and barriers faced by organisations in our community.

What we have found so far is that two of our goals – simplification and equity – are often at odds with each other, and this is especially true with the $275 tier.

We welcome comments, suggestions and questions.

Meet the candidates and vote in our 2024 Board elections

Lucy Ofiesh — Tue, 24 Sep 2024 00:00:00 +0000

On behalf of the Nominating Committee, I’m pleased to share the slate of candidates for the 2024 board election.

Each year we do an open call for board interest. This year, the Nominating Committee received 53 submissions from members worldwide to fill four open board seats.

We maintain a balanced board of 8 large member seats and 8 small member seats. Size is determined based on the organisation’s membership tier (small members fall in the $0-$1,650 tiers and large members in the $3,900 - $50,000 tiers). We have two large member seats and two small member seats open for election in 2024.

We were pleased to see the diversity in candidates, with applicants from 24 countries. We also received three applications from research funders, which we specifically identified as a priority in the committee’s remit for this year. The committee was keen to prepare a diverse slate of organisation types, individual skills, and global representation.

The Nominating Committee presents the following slate.

The 2024 slate

Tier 1 candidates (electing two seats):

Katharina Rieck, Austrian Science Fund (FWF)
Lisa Schiff, California Digital Library
Ejaz Khan, Health Services Academy, Pakistan Journal of Public Health
Karthikeyan Ramalingam, MM Publishers

Tier 2 candidates (electing two seats):

Aaron Wood, American Psychological Association
Dan Shanahan, PLOS
Amanda Ward, Taylor and Francis

Please read the candidates’ statements

Every member has a vote

If your organisation is a voting member in good standing as of September 11th, 2024, you are eligible to vote.

The voting contact for your organisation will receive a ballot from eBallot, a third party election platform. You should receive your ballot by Wednesday, September 25th, and you will have until 15:00 UTC on October 29th to submit your ballot.

The election results will be announced at Crossref2024, our anual online meeting on October 29th, 2024.

If you have any questions about our election process, please contact me

Happy voting!

The myth of perfect metadata matching

Dominika Tkaczyk — Wed, 28 Aug 2024 00:00:00 +0000

https://doi.org/10.13003/pied3tho

In our previous instalments of the blog series about matching (see part 1 and part 2), we explained what metadata matching is, why it is important and described its basic terminology. In this entry, we will discuss a few common beliefs about metadata matching that are often encountered when interacting with users, developers, integrators, and other stakeholders. Spoiler alert: we are calling them myths because these beliefs are not true! Read on to learn why.

If you have stuck with us this far in our series, hopefully, you are at least a bit excited about the possibility of creating new relationships between the works, authors, institutions, preprints, datasets, and myriad other objects in our existing scholarly metadata. Who would not want all of these to be better connected?

We have to pause for a moment and be honest with you: metadata matching is a complex problem, and doing it correctly requires significant effort. What is worse, even if we do everything right, our matching won’t be perfect. This may be counterintuitive. Perhaps you’ve heard that matching is not a hard problem, or have encountered people surprised that a matching strategy returned a wrong or incomplete answer. Sometimes, it is obvious to a person from looking at some specific example that a match should (or should not) have been made, so they naturally assume that a change to account for this has to be simple.

Misconceptions like these can be problematic. They create confusion around matching, drive users’ expectations to unreasonable levels, and make people drastically underestimate the effort needed to build and integrate matching strategies. So let’s dive right in and debunk a few common myths about metadata matching.

Myth #1: A metadata matching strategy should be 100% correct

Anyone who has built or supported a matching strategy has likely encountered the following belief: it is possible to develop a perfect strategy, meaning one that always returns the correct results, no matter the inputs. The unfortunate truth is that while one’s aim should always be to design matching strategies that return correct results, once we move beyond the simplest class of problems or artificially clean data, no strategy can achieve this outcome. In thinking through why this is the case, some inherent constraints become obvious:

The inputs to matching are often strings in human-readable formats, which can vary wildly in their structure, order and completeness. Since they’re intended to be parsed by people, instead of machines, they’re inherently lossy and frequently unstructured, anticipating that a person can infer from the source context what is being referenced. Matching strategies, although built to make sense of unstructured data, unfortunately, don’t have the luxury of this flexibility. A strategy has to account for translating a messy, partial, or inconsistent input into a correct and structured match.

Consider, for example, the following inputs to an affiliation matching strategy:

“Department of Radiology, St. Mary’s Hospital, London W2 1NY, UK”
“Saint Mary’s Hospital, Manchester University NHS Foundation Trust”
“St. Mary’s Medical Center, San Francisco, CA”
“St Mary’s Hosp., Dublin”
“St Mary’s Hospital Imperial College Healthcare NHS Trust”
“聖マリア病院”

In order to correctly identify the organisations mentioned here, the matching strategy must be able to distinguish between different ways of representing the same institution, disambiguate multiple institutions that have similar names, and handle variant forms for the parts of each name (Saint/St./St), identify the same name in different languages (“聖マリア病院” is Japanese for “St. Mary’s Hospital”), and make assumptions about partial or ambiguous locations translating to more precise references. While a person reviewing each of these strings might be able to accomplish these tasks, even here there are some challenges. Does “St Mary’s Hosp., Dublin” refer to the hospital in Ireland or a separate hospital in one of the many cities that share this name? Should we presume that because “聖マリア病院” is in Japanese, this refers to a hospital in Japan? Would someone, by default, be aware that St. Mary’s Hospital in London is part of the Imperial College Healthcare NHS Trust, such that inputs one and five refer to the same organisation?

An additional challenge lies in the quality of the data, which in the context of matching, encompasses both the input and the dataset being matched against. In real world circumstances, no dataset is fully accurate, complete, or current and certainly not all three. As a result, there will always be functionally random differences between inputs to the strategy and the entities to be matched. A theoretically perfect matching strategy would thus need to distinguish between inconsequential discrepancies resulting from gaps, errors, and variable forms of reference and actual, meaningful differences indicating an incorrect match. As one might imagine, this would require near total knowledge of the meaning and context for all inputs and outputs, a nigh-on impossible task for any person or system!

As a consequence, no metadata matching strategy will ever be perfect. It is unreasonable for us to expect them to be. This does not mean, of course, that all strategies are equally flawed or destined to forever return middling results. Some are better than others and we can improve them over time. Which brings us to the next myth:

Myth #2: It is always a good idea to adapt the matching strategy to a specific input

Matching strategies are not static. They can - and should - be improved. There is, however, a deceptive trap that one can fall into when attempting to improve a matching strategy. Whenever we encounter an incorrect or missing result for a specific input, we treat this problem like a software bug and try to adapt the strategy to work better for it, without considering all other cases.

The more complicated reality is that the quality of matching results is controlled through a complex set of trade-offs between precision and recall that determine the kind and number of relationships created between items:

Precision is calculated as the number of correctly matched relationships resulting from a strategy, divided by the total number of matched relationships. It can also be interpreted as the probability that a match is correct. Low precision indicates a high rate of false positives, which are incorrect relationships created by the strategy.
Recall is calculated as the number of correctly matched relationships resulting from a strategy, divided by the number of true (expected) relationships. It can also be interpreted as the probability that a true (correct) relationship will be created by the strategy. Low recall means a high rate of false negatives, which are relationships that should have been created by the strategy but were not made.

The diagram depicts false negatives and false positives. The ideal outcome would be that the ellipses are identical, matched relationships are exactly the same as true relationships, and there are no false negatives or false positives. In practice, we try to make the intersection as big as possible.

The tradeoff between precision and recall roughly means that modifying the strategy to improve recall will decrease precision, and vice versa.

Imagine, for example, we received a report about a relationship that was missed by matching because of a partial, noisy, or ambiguous input. We might be tempted to resolve this issue by relaxing our matching criteria. Unfortunately, this will have a cost of a higher overall rate of false positive matches.

Conversely, if we encounter a case where the matching has returned an incorrect match, we might attempt to make the matching strategy stricter to avoid this result. We should remember, however, that this may have the consequence of causing the strategy to skip many perfectly valid matches.

The tradeoff between precision and recall. (a) A strict strategy prioritises precision over recall resulting in more false negatives. (b) A relaxed strategy prioritises recall over precision resulting in more false positives.

Striking this balance becomes even more difficult when attempting to address multiple issues at once, or considering constraints like the time and resources consumed by each aspect of the strategy. Each choice can compound the individual effects in unanticipated and expensive ways. The aim of matching ultimately then can’t be to achieve perfect results for every single case. Fixing one particular situation might not be desirable, as it can result in breaking multiple other cases. Instead, we have to find a locally optimal balance that optimises the strategy’s utility, relative to these inherent limitations. This means accepting some level of imperfection as not just inevitable, but necessary for implementing a workable strategy. When you consider all this, you might conclude that…

Myth #3: We shouldn’t do large-scale, unsupervised matching

Imperfect matching strategies, when applied automatically to real-world large datasets, might:

Fail to discover some relationships (false negatives), an outcome that may not be terribly problematic. In the worst case scenario, we have wasted a great deal of effort developing matching strategies that do not improve our metadata.
Create incorrect relationships between items (false positives), what seems like a potentially larger problem, where we have added incorrect relationships to the metadata.

Many have the instinct to avoid false positives at any cost, even if this means missing many additional correct relationships at the same time. They might come to the conclusion that if we cannot have 100% precision (see our previous myth), we simply should not allow matching strategies to act in an automated, unsupervised way on large datasets. While there might be circumstances where this belief is rational, in the context of the scholarly record, this notion is seriously flawed.

First, if you are dealing with any medium to large-sized dataset, it almost certainly contains errors, even before you apply any automated processing to it. Even if data is submitted and curated by users, they can still make mistakes, and might themselves be using automated tools for extracting the data from other sources, without your knowledge. It is thus not entirely obvious that applying an (imperfect) matching strategy to create more relationships would actually make the data quality worse.

Second, while we cannot eliminate all matching errors, we can place a high priority on precision when developing strategies, with the aim of keeping the number of incorrectly matched results as low as possible. We can also make use of additional mechanisms to easily correct for incorrectly matched results, for example doing so manually, in response to error reports.

Finally, the results of matching should always contain provenance information to distinguish them from those that have been manually curated. This way, the users can make their own decisions about whether to use and trust the matching results, relative to their use case.

By applying those additional checks, we can minimise the negative effects of incorrect matching, while at the same time reap the benefits of filling gaps in the scholarly record.

Myth #4: We can only ever guess at the accuracy of our matching results

In attempting to determine the correctness of our matching, we immediately encounter a number of inherent limitations. The sheer amount of entries in many datasets prevents a thorough, manual validation of the results, but if instead, we use too few or specific items as our benchmarks, these are unlikely to be representative of overall performance. The unpredictable nature of future data adds another wrinkle: will our matching always be as successful as when we first benchmarked it or will its performance degrade relative to some change in the data?

With so many unknowns, are we then doomed? No! We have rigorous and scientific tools at our disposal that can help us estimate how accurate our matching will be. How do we use them? Well, that is a big and fairly technical topic, so we will leave you with this little cliffhanger. See you in the next post!

Re-introducing Participation Reports to encourage best practices in open metadata

Lena Stoll — Thu, 25 Jul 2024 00:00:00 +0000

We’ve just released an update to our participation report, which provides a view for our members into how they are each working towards best practices in open metadata. Prompted by some of the signatories and organizers of the Barcelona Declaration, which Crossref supports, and with the help of our friends at CWTS Leiden, we have fast-tracked the work to include an updated set of metadata best practices in participation reports for our members. The reports now give a more complete picture of each member’s activity.

What do we mean by ‘participation’?

Crossref runs open infrastructure to link research objects, entities, and actions, creating a lasting and reusable scholarly record. As a not-for-profit with over 20,000 members in 160 countries, we drive metadata exchange and support nearly 2 billion monthly API queries, facilitating global research communication.

To make this system work, members strive to provide as much metadata as possible through Crossref to ensure it is openly distributed throughout the scholarly ecosystem at scale rather than bilaterally, thereby realizing the collective benefit of membership. Together, our membership provides and uses a rich nexus of information— known as the research nexus—on which the community can build tools to help progress knowledge.

Each member commits to certain terms, such as keeping metadata current, updating links for their DOIs to redirect to, linking references and other objects, and preserving their content in perpetuity. Beyond this, we also encourage members to register as much rich metadata as is relevant and possible.

Creating and providing richer metadata is a key part of participation in Crossref; we’ve long encouraged a more complete scholarly record, such as through Metadata 20/20, and through supporting or leading initiatives for specific metadata, like open citations (I4OC), open abstracts (I4OA), open contributors (ORCID), and open affiliations (ROR).

Which metadata elements are considered best practices?

Alongside basic bibliographic metadata such as title, authors, and publication date(s), we encourage members to register metadata in the following fields:

Example participation report for Crossref member University of Szeged

References

A list of all the references used by a work. This is particularly relevant for journal articles but the references can include any type of object, including datasets, versions, preprints, and more. Additionally, we encourage these to be added into relationships, where relevant.

Abstracts

A description of the work. These are particularly useful for discovery systems that will promote the work, and are often used in downstream analyses such as for detecting integrity issues.

Contributor IDs (ORCID)

All authors should be included in a work’s metadata, ideally alongside their verified ORCID identifier.

Affiliations / Affiliation IDs (ROR)

Members are able to register contributor affiliations as free text, but we are encouraging everyone to add ROR IDs for affiliations as the recommended best practice, as this differentiates and avoids mistyping. These two fields have newly been added to the participation reports interface in the most recent update.

Funder IDs (OFR)

Acknowledging the organisation(s) that funded the work. We encourage the inclusion of Open Funder Registry identifiers to make the funding metadata more usable. This will evolve into an additional use case for ROR over time.

Funding award numbers / Grant IDs (Crossref)

A number or identifier assigned by the funding organisation to identify the specific award of funding or other support such as use of equipment or facilities, prizes, tuition, etc. The Crossref Grant Linking System includes a unique persistent link that can be connected with outputs, activities, people, and organisations.

Crossmark

The Crossmark service gives readers quick and easy access to the current status of a record, including any corrections, retractions, or updates, via a button embedded on PDFs or a web article. Openly adding corrections, retractions, and errata is critical part of publishing, and the button provides readers with an easy in-context alert.

Similarity Check URLs

The Similarity Check service helps editors to identify text-based plagiarism through our collective agreement for the membership to access to Turnitin’s powerful text comparison tool, iThenticate. Specific full-text links are required to participate in this service.

License URLs

URLs pointing to a license that explains the terms and conditions under which readers can access content. These links are crucial to denote intended downstream use.

Text mining URLs

Full-text URLs that help researchers in meta-science easily locate your content for text and data mining.

What is a participation report?

Participation reports are are a visualization of the data representing members’ participation to the scholarly record which is available via our open REST API. There’s a separate participation report for each member, and each report shows what percentage of that member’s metadata records include 11 key metadata elements. These key elements add context and richness, and help to open up members’ work to easier discovery and wider and more varied use. As a member, you can use participation reports to see for yourself where the gaps in your organisation’s metadata are, and perhaps compare your performance to others. Participation reports are free and open to everyone - so you can also check the report for any other members you are interested in.

We first introduced participation reports in 2018. At the time, Anna Tolwinska and Kirsty Meddings wrote:

Metadata is at the heart of all our services. With a growing range of members participating in our community—often compiling or depositing metadata on behalf of each other—the need to educate and express obligations and best practice has increased. In addition, we’ve seen more and more researchers and tools making use of our APIs to harvest, analyze and re-purpose the metadata our members register, so we’ve been very aware of the need to be more explicit about what this metadata enables, why, how, and for whom.

All of that still rings true today. But as the research nexus continues to evolve, so should the tools that intend to reflect it. For example, in 2022, we removed the Open references field from participation reports after a board vote to change our policy and update the membership terms meant that all references deposited with Crossref would be open by default. And now we’ve expanded the list of fields again, adding coverage data for contributor affiliation text and ROR identifiers.

Putting it in practice

To find out how you measure up when it comes to participation, type the name of your member organisation into the search box. You may be surprised by what you find—we often speak to members who thought they were registering a certain type of metadata for all their records, only to learn from their participation report that something is getting lost along the way.

You can only address gaps in your metadata if you know that they exist.

More information, as well as a breakdown of the now 11 key metadata elements listed in every participation report and tips on improving your scores, is available in our documentation.

And if you have any questions or feedback, come talk to us on the community forum or request a metadata Health Check by emailing the community team.

Metadata schema development plans

Patricia Feeney — Mon, 22 Jul 2024 00:00:00 +0000

It’s been a while, here’s a metadata update and request for feedback

In Spring 2023 we sent out a survey to our community with a goal of assessing what our priorities for metadata development should be - what projects are our community ready to support? Where is the greatest need? What are the roadblocks?

The intention was to help prioritize our metadata development work. There’s a lot we want to do, a lot our community needs from us, but we really want to make sure we’re focusing on the projects that will have the most immediate impact for now.

Several projects were proposed, based on community demand over time. All are projects we intend to support long-term.

Projects

The projects included in the survey were:

Alternate names - We proposed adding a repeatable ‘name’ element to allow for names that aren’t separated by given/family/surname.
Updates to funding data -this update will be released in the near future and includes:
- Expand ROR support - Allow members to supply ROR ID instead of funder ID in funding data and grant records.
- Include Grant DOIs in funding metadata.
Publication typing in citations - Support citation type in citation metadata (for example article, preprint, data, software, etc.).
Expand contributor role support - Allow multiple contributor roles to be provided per contributor and add support for external vocabularies (like CRediT)
Expand abstract support - We currently require all abstracts to be formatted using JATS. We will be adding new abstract formats, including BITS and ONIX (which have been requested), as well as a generic abstract format (non-JATS).
Statements - Add support for free text statements such as data availability, acknowledgments, funding, and conflict of interest.
Contributor identifiers - Accept contributor identifiers such as ISNI (in addition to ORCID, which is already supported).
Conference event IDs - Identifiers for conference events.

What’s next?

There is a clear preference for publication types in citations and abstract markup, expanded support for multilingual metadata, followed by expanding contributor roles to support multiple roles and the CRediT taxonomy. The results have helped us prioritize our work and we’re advancing several projects soon based on our readiness to move forward.

First up is publication typing in citations and statements - we hope to be able to make this ready for registration in the coming months, but want to confirm a few things first, primarily the list of ‘types’ to apply to citations, so please review and comment: Metadata updates in need of feedback July 2024

We also have been discussing expansions to our support for preprints metadata with our Preprints Advisory Group and have a number of preprint-specific updates that will be rolled out in the coming months as well, including support for versions and status. These proposed changes are also available for comment.

And finally, we will be expanding support for contributor roles to include multiple roles per contributor, as well as adding support for the CRediT taxonomy. This update is yet to be scheduled but we do have the inputs and output planning done and welcome any comments on this as well.

We will also be continuing work on other projects highlighted in the survey that aren’t quite ready to go:

Multilingual metadata: Support for multilingual metadata in particular is very important and will require a fairly significant technical effort, so we want to be sure we get this right - at minimum we need to include repeatable fields flagged with language metadata for most items, there may be other considerations as well such as the scope of languages supported.

As we develop new metadata segments we’re keeping language metadata in mind, but I’d like to form a short-term working group to help shape this update - this group will be focused on the details of supporting multilingual metadata in our inputs and outputs, so conversations will be very XML and JSON heavy. If you are interested and available please contact pfeeney@crossref.org.
Abstract markup: we are currently in the research phase of this project but will be proposing updates and asking for input this fall. At the moment support for BITS and ONIX abstracts have been requested, as well as an agnostic format.
Expansion of name and contributor ID support: work is under way for this as well, and I should have inputs and outputs for feedback in the coming months.

We anticipate more developments and requests for feedback in the future as we still have other projects from the list above to get to. I’ve opened up a ‘Metadata Development’ section in our Community Forum to invite discussion and will be kicking off a renewed Metadata Interest Group in the fall.

Crossmark community consultation: What did we learn?

Martyn Rittman — Tue, 02 Jul 2024 00:00:00 +0000

In the first half of this year we’ve been talking to our community about post-publication changes and Crossmark. When a piece of research is published it isn’t the end of the journey—it is read, reused, and sometimes modified. That’s why we run Crossmark, as a way to provide notifications of important changes to research made after publication. Readers can see if the research they are looking at has updates by clicking the Crossmark logo. They also see useful information about the editorial process, and links to things like funding and registered clinical trials. All of this contributes to what we call the integrity of the scholarly record.

Crossmark has been around a long time and the context around it is constantly changing. It last had a major update in 2016 and in 2020 we removed fees for its use.

The past few years have seen a more intense focus on research integrity among the scholarly communications community, leading to more retractions and calling out large-scale manipulation of editorial processes. At the same time, we haven’t seen an increase in the uptake of Crossmark, which is still used by only a minority of our members. We would like to know why the uptake is low and whether there is more we can do in this area. To dig into this, in the first part of 2024 we reached out to members of our community.

What did we do?

We wanted to learn about attitudes towards Crossmark and related aspects of research integrity. This was done in several ways:

Structured interviews with eight of our members.
Round tables at Crossref LIVE events in Bogota and Nairobi
Surveying a selection of our members, which led to 94 responses.

The topics we asked about were related to how post-publication updates are made and communicated, and which metadata demonstrates good practice.

We are extremely grateful to the members who contributed. They provided valuable feedback and have helped to shape the future of Crossmark and our approach to the integrity of the scholarly record.

What did we find?

Across the various groups there were a few common themes, which fell into several areas.

Communication of updates is highly valued, and seen as the most important role that Crossmark can play. Some of those we spoke to would like readers to see if there is an update as soon as a page opens, without having to open a popup. This could be done by having a logo that changes colour, shape, or size.
Conversely, not as much enthusiasm was shown for the metadata assertions. These are additional fields that can be displayed to readers in the Crossmark popup. There wasn’t a strong consensus on which commonly-made assertions are the most important for research integrity.
There is diversity in attitudes towards making updates to published works, what research integrity means, and approaches to workflows for updates. Even within a single organisation, a number of different workflows and multiple staff members might be called on to update published research. This makes things complex and means that it can be difficult to fit Crossmark in.
There are technical challenges to getting started with Crossmark. Those responsible for implementing Crossmark are often technical staff who struggle with the documentation we provide in English. There is also no plugin for OJS, a widely-used open source editorial software. It is more difficult to deposit Crossmark metadata for books than journal articles, and many article types don’t permit Crossmark metadata at all. On the other hand, those who successfully installed Crossmark found it easy to use and low-maintenance.

Overall, it seems that Crossmark still has an important role to play but there are changes and improvements we can make.

What’s next?

Here are the main areas we intend to follow up on in the coming months.

Implementation

We need to look at how to make implementation more straight-forward. Can we provide multilingual documentation, plugins, run workshops or webinars, or make changes to Crossmark to lower the barrier to entry?

Understanding workflows

Can we collaborate with our members and other organisations to reach a better understanding of how to update published works? Are there alternative workflows we need to support? Have we made it too difficult to understand and implement the options we currently have?

While updates are always likely to be rare, we want to help members understand the benefits of making them. We talked to some members who were proud of never having published a retraction or correction, which left us wondering whether they are missing legitimate opportunities to correct the scholarly record. We also know that for some members and many work types (preprints, for example), updates are made without a separate published notification. Can we better understand the role that the published updates play and communicate updates even if there isn’t a published notice?

Ongoing feedback

Clearly one size doesn’t fit all when it comes to implementing and communicating updates. We need to find ways of keeping in touch with the community to test new solutions with as broad a range of members as possible. We want to avoid catering to a minority and leaving others struggling to find ways to implement a solution.

Custom metadata?

Is there an ongoing need for metadata assertions? Many of the assertions currently made are possible as standard metadata and others could be included in our deposit schema. We want to consider removing the option to add assertions. This needs more feedback from the community, especially those who currently make use of assertions.

Redesign the UI

Crossmark doesn’t have the recognition with readers we would like. Is there a way we can redesign it to make it more associated with Crossref and accurate metadata? We intend to explore different designs, and test them with members and readers.

Celebrating five years of Grant IDs: where are we with the Crossref Grant Linking System?

Kornelia Korzec — Mon, 01 Jul 2024 00:00:00 +0000

We’re happy to note that this month, we are marking five years since Crossref launched its Grant Linking System. The Grant Linking System (GLS) started life as a joint community effort to create ‘grant identifiers’ and support the needs of funders in the scholarly communications infrastructure.

The system includes a funder-designed metadata schema and a unique link for each award which enables connections with millions of research outputs, better reporting on the research and outcomes of funding, and a contribution to open science infrastructure. Our first activity to highlight the moment was to host a community call last week where around 30 existing and potential funder members joined to discuss the benefits and the steps to take to participate in the Grant Linking System (GLS).

Some organisations at the forefront of adopting Crossref’s Grant Linking System presented their challenges and how they overcame them, shared the benefits they are reaping from participating, and provided some tips about their processes and workflows.

The funding organisations whose experiences were shared included Wellcome, FCT (Foundation for Science and Technology, Portugal), and NWO (Dutch Research Council). They were joined by a new group of foundations, research councils, and private research funders from around the world—from Kenya to Singapore to Estonia—to have a first introduction to the GLS and connect them with colleagues who are further along on their journey.

We also heard about tools such as a new open source Crossref plugin for the Fluxx platform, grant management systems with in-built Crossref integrations such as ProposalCentral, Europe PMC GrantFinder which was first to implement the GLS on Wellcome’s behalf and hosts their grants, and one of the first publishers, eLife to start referencing Crossref grant links in their publications both online and in the open metadata for others to retrieve.

Read on for further information or watch the recording of the event.

What is the Crossref Grant Linking System?

The Crossref Grant Linking System, conceptualised in 2017, and launched in 2019, captures and helps clarify funding relationships for scholarly outputs. Thanks to interconnectedness with the 160 million metadata records collected and curated by Crossref members, it enables funders as well as scholars to track and analyse funding patterns and evaluate programmes, and it supports assertions about the integrity of scholarly records.

Features of the GLS

Globally unique persistent link and identifier for each grant
Connected with 160 million published outputs
Funder-designed metadata schema, including project, investigator, value, and award-type information
Programmatic or no-code methods to send metadata
- Thanks to the Gordon and Betty Moore Foundation who funded development of the online grant registration form
Open search and API for all to discover funding outcomes; all metadata is distributed openly to thousands of tools and services
Crossref-hosted landing pages
A global community of ~50 funder advisors and 35+ funders already in the Grant Linking System
Membership of Crossref; influence the foundational infrastructure powering open research

The last five years has seen the GLS grow through membership, metadata, and community contributions.

The momentum for this programme is building - as illustrated by increasing numbers of metadata records (and related relationships we’re seeing). The 35 funder members represent over 100 funding programmes and have created 125,000 grant records already.

During last week’s call, it was helpful to hear from the community what they see as key benefits of the Crossref Grant Linking System:

Meaningfully delivering on and supporting Open Science policies and mandates, and contributing ‘their bit’ to the transparency of the evidence trail in the scholarly ecosystem.
Reporting and evaluating the funding programmes, essential for the public funders who need to demonstrate the value for money in allocating their funds and other support.
Supporting a more holistic assessment of scholarship and scholars, especially as and when metadata becomes included with a full array of outputs, not limited to books and articles.

How the Crossref Grant Linking System supports Open Science policy

Since 2020, all the grant records are openly available through our REST API which is queried more than 1.8 billion times every month so these metadata records are distributed to thousands of systems across the research enteprise. In a 2022 blog, Ed Pentz and Ginny Hendricks laid out guidelines for research funders to meet open science guidelines using existing open infrastructure such as Crossref, ORCID, and ROR. Syman Stevens, a grantmaking and private philanthropy consultant, highlighted on the call that the funders he works with are increasingly interested in ways to deliver on their open science policy and that participation in the GLS is a tangible thing they can do to meet this goal.

As part of its open science policy, NWO will start participating in the Crossref Grant Linking System from July 2025. Research funders are a part of the scholarly communications system; we not only provide the funding to do the actual research but can also be the authoritative source of data about the projects we have funded and the outputs arising from that funding. Increasingly, all these elements – grants, researchers, outputs - are linked with metadata and unique identifiers to ensure that research is findable and accessible.

– Hans de Jonge, Director of Open Science NL, part of the Dutch Research Council (NWO)

How funders leverage the Grant Linking System in their reporting and assessment

Looking back to the origins of the system, it’s important to recognise the work of the initial working groups. Through their contribution, funders helped design the initial metadata schema for grants as well as establish the governance and fees for this service, and our Advisory Group continues to inform further developments. In this way, the Grant Linking System enables the needs and wishes of funders to contribute and see their data as part of the wider ecosystem.

An excellent example of that synergy in action is the use case presented by Cátia Laranjeira, manager of the PTCRIS programme at the Foundation for Science and Technology, Portugal (FCT). PTCRIS is the Foundation’s integrated national information ecosystem that supports scientific activity management. Cátia reflected on the relative fragmentation of spaces where the scientific outputs are found, and PTCRIS’s ambition for aggregating metadata in one place to be able to trace and evaluate programmes in light of the related outputs. At the start of the programme, they identified lack of a persistent identifier for grants as a major shortcoming of the system. Crossref GLS naturally fits in with their goals.

The initiative by FCT to assign unique DOIs to national public funding through Crossref is a game-changer for open science, linking funding directly to scientific outcomes and boosting transparency. Join us in this effort—let’s make every grant count and ensure open access to research information!"

– Cátia Laranjeira, PTCRIS Program Manager at Fundacao para a Ciencia e a Tecnologis (FCT Portugal)

FCT initially piloted a small subset of their grants (approximately 6,000 recent awards) at the end of 2023. Cátia pointed to researchers’ keen participation in this programme as one of its successes – and thanks to the word of mouth, FCT has already been approached by researchers requesting unique Crossref links for their grants! This appetite for grant IDs will soon be more fully satisfied, as FCT is readying to register all of their grants with Crossref, to enable further insights into funding and outcome flows, supporting them in demonstrating the value for money for the public resources they manage. Via interfaces for grant management and standardised online CVs, the system is also enabling researchers to use the system in their own future reporting and career development.

In the ensuing discussion, Rachel Bruce of UKRI mentioned that she’s hopeful that GLS will help funders ‘close the loop’ on more holistic reward and recognition, allowing for inclusion of evidence for a broader set of outputs in those processes.

How the community is working to integrate open infrastructure

Melissa Harrison, Team Leader at EMBL-EBI, manages Europe PMC and a complementary data science team, who were part of the initial FREYA project – supporting infrastructure delivery for unique identifiers for grants. The team has been adding grant records to Crossref on Wellcome’s behalf since 2019. Melissa highlighted the shortcomings of internal award numbers, which don’t tend to be understood outside of the ecosystem where they are produced (that is the funder’s administrative system), are almost certainly not unique, and don’t resolve to or connect with anything in the wider ecosystem. Therefore internal award numbers can’t signify relationships with other outputs or assets in the wider world. By contrast, Crossref’s Grant IDs are unique, persistent, resolvable, and interrelated with other Crossref metadata, whilst being retrievable for other systems to link to too.

Persistent identifiers for grants was the next logical step after identifiers for funders - open metadata registered with a PID in a central service like Crossref is invaluable to build the full picture of the research enterprise.

– Melissa Harrison, Team Leader, Literature Services at EMBL-EBI)

Ease of execution is important for scaling the Grant Linking System, and enabling its use in a diverse set of circumstances in the open science ecosystem. Altum was the trailblazer, first integrating its grant management platform Proposal Central with GLS. It was good to hear that others are now joining the integration efforts. Syman Stevens talked about the recent work initiated by Joe McArthur at OA Works, to develop a simple, open-source plug-in for any of the major grant management systems, to enable funders to deposit their grant metadata with Crossref GLS with a click of the button. Syman demonstrated the resulting interface in Fluxx, that allows for creating a record and sending grant metadata to Crossref as part of the regular grant management within the platform. He pointed out that, while this integration was developed for Fluxx, all code and documentation is openly available on GitHub and this can potentially be forked or adapted as necessary for reuse in other grant management systems.

It is heartening that others in the community are seeing such a need for this that they’re funding and creating their own tools to advance participation and use of the GLS.

Finally, Fred Atherden, Head of Production Operations at eLife, presented how they include Crossref grant identifiers in publication metadata for the version of record of the works published on their platform. eLife is the first publisher to fully integrate Crossref grant identifiers both within the article display and in the metadata. Fred shared that in addition to collecting the data from the authors, eLife also attempts matching, albeit using very restrictive methodology, to enable more grant metadata in their publication records. They recognise that so far there are very few publishers including persistent links for grants in this way, and talked about plans to start collecting and including this data further upstream, and including them in the future for reviewed preprints.

Acknowledgements and how to participate in the GLS

Reflecting on the last five years, thanks must go to the >35 funders who are already participating (see logo mashup below), to our current volunteers and to those partners working to promote and make use of the Grant Linking System. We also acknowledge that the GLS would not have been possible without the Crossref board members at the time, our staff including alumni Josh Brown, Jennifer Kemp, Rachael Lammey, and Geoffrey Bilder, or without the early dedicated time and input from the following people and organisations on our working groups for governance and fees, and for metadata modelling:

Yasushi Ogasaka and Ritsuko Nakajima, Japan Science & Technology Agency
Neil Thakur and Brian Haugen, US National Institutes of Health
Jo McEntyre and Michael Parkin, Europe PMC
Robert Kiley and Nina Frentop, Wellcome
Alexis-Michel Mugabushaka and Diego Chialva, European Research Council
Lance Vowell and Carly Robinson, OSTI/US Dept of Energy
Ashley Moore and Kevin Dolby, UKRI (Research Councils UK / Medical Research Council)
Salvo da Rosa, Children’s Tumor Foundation
Trisha Cruse, DataCite

To learn more about the Crossref Grant Linking System, the best place to start is our service page. And for the next step, please reach out to us for a conversation about any questions specific to your organisation and any questions that may need to be addressed in order to enable your full participation.

Grant DOIs enhance the discovery and accessibility of funded project information and are one of the important links in a connected research ecosystem. I’m grateful and proud to contribute to the robustness and interconnectedness of the research infrastructure. Few funders are currently participating in the Crossref Grant Linking System, and I encourage others to consider doing so. This adoption follows the “network effect,” where the value and utility increase as more people participate, encouraging even wider adoption.

– Kristin Eldon Whylly, Senior Grants Manager and Change Management Lead at Templeton World Charity Fund (TWCF)

You can email me via feedback@crossref.org or set up a call with me when it suits you (you can overlay your own calendar using the toggle at the top right). We look forward to welcoming even more funders and to see those relationships in the open science infrastructure grow even further in the coming years.

The anatomy of metadata matching

Dominika Tkaczyk — Thu, 27 Jun 2024 00:00:00 +0000

https://doi.org/10.13003/zie7reeg

In our previous blog post about metadata matching, we discussed what it is and why we need it (tl;dr: to discover more relationships within the scholarly record). Here, we will describe some basic matching-related terminology and the components of a matching process. We will also pose some typical product questions to consider when developing or integrating matching solutions.

Basic terminology

Metadata matching is a high-level concept, with many different problems falling into this category. Indeed, no matter how much we like to focus on the similarities between different forms of matching, matching affiliation strings to ROR IDs or matching preprints to journal papers are still different in several important ways. At Crossref and ROR, we call these problems matching tasks.

Simply put, a matching task defines the kind or nature of the matching. Examples of matching tasks are bibliographic reference matching, affiliation matching, grant matching, or preprint matching.

Every matching task has an input, which is all the data that is needed to perform the matching. Input data can come in many shapes and forms, depending on the matching task. For example, all of the following could be inputs to a matching task:

Department of Molecular Medicine, Sapporo Medical University, Sapporo 060-8556, Japan

<fr:program xmlns:fr="http://www.crossref.org/fundref.xsd" name="fundref">
<fr:assertion name="fundgroup">
<fr:assertion name="funder_name">
European Union's Horizon 2020 Research and Innovation Program through Marie Sklodowska Curie
<fr:assertion name="funder_identifier">http://dx.doi.org/10.13039/501100000780</fr:assertion>
</fr:assertion>
<fr:assertion name="award_number">721624</fr:assertion>
</fr:assertion>
</fr:program>

Everitt, W. N., & Kalf, H. (2007). The Bessel differential equation and the Hankel transform. Journal of Computational and Applied Mathematics, 208(1), 3–19.

{
"title": "Functional single-cell genomics of human cytomegalovirus infection",
"issued": "2021-10-25",
"author": [
{"given": "Marco Y.", "family": "Hein"},
{"given": "Jonathan S.", "family": "Weissman", "ORCID": "http://orcid.org/0000-0003-2445-670X"}
]
}

Every matching task also has an output. For our purposes, this is almost exclusively zero or more matched identifiers. In the context of a specific matching task, output identifiers may be of a specific type (e.g. we might match to a ROR ID, and never to an ORCID ID). In some cases, there can be a certain target set as well (i.e. matching only to DataCite DOIs). The output identifiers can have different cardinality depending on the task, meaning that the matching task might allow for zero, one, or more identifiers as a result of matching to a single input.

A matching strategy defines how the matching is done. Multiple strategies can exist for a specific matching task. Compound strategies can run other strategies and combine their outcomes into a single result.

In some cases, we may also want the matching strategy to output a confidence score for each matched identifier. A confidence score represents the degree of certainty or likelihood that the matched identifier is correct, typically expressed as a value between 0 and 1. This score may help with post-processing or further interpretation of the results.

To summarise, the anatomy of the matching task can be diagrammed as follows:

How to specify a matching task

Whenever we plan the development or integration of a matching solution, it is good to begin by answering a few basic questions:

What problem do we plan to solve with our matching task? What would we call our matching task and how would we describe it?
What do we expect as the input for this matching task? Which input formats do we need to be able to accept? What information do we expect to find in this input?
What kind of identifiers should be output? Is there a target set of identifiers? Can our matching output zero/one/or multiple identifiers, and under what conditions might that occur?

These sound fairly simple, but the answers to these questions can be remarkably complex. Once one tries to apply these concepts to real-world problems, they might encounter several non-obvious challenges.

For example, one common concern is at what level we should define each matching task. Consider the following problems:

Matching bibliographic reference strings to DOIs. Example input:

Everitt, W. N., & Kalf, H. (2007). The Bessel differential equation and the Hankel transform. Journal of Computational and Applied Mathematics, 208(1), 3–19.

Matching structured bibliographic reference to DOIs. Example input:

{
volume: "208",
author: "Everitt",
journal-title: "J. Comput. Appl. Math.",
article-title: "The Bessel differential equation and the Hankel transform",
first-page: "3",
year: "2007",
issue: "1"
}

Are those discrete matching tasks (unstructured reference matching vs. structured reference matching), or are they the same task (reference matching) that can accept different types of inputs (unstructured or structured)?

Similarly, let’s compare the following tasks:

Matching affiliation strings to ROR IDs. Example input:

Department of Molecular Medicine, Sapporo Medical University, Sapporo 060-8556, Japan

Matching funder names to ROR IDs. Example input:

Alexander von Humboldt Foundation

Are these different matching tasks (affiliation matching vs. funder matching), or the same task with different inputs (organisation matching)?

Defining the boundaries of a matching task can also be difficult. Consider, for example, the need to obtain ROR IDs for organisations mentioned in the acknowledgements section of a full-text academic paper. To begin, one may first extract the acknowledgement section from the full text, then run something like a named entity recognition (NER) tool to isolate the organisation names from the extracted text, and finally match these names to ROR IDs. Is this entire process matching, with the input being the full text of a paper? Or perhaps matching starts with the acknowledgement section as the input? Instead, is it only the last phase, where we try to match the extracted name to the ROR ID, that constitutes the matching task, with the extraction phases being completely separate processes?

There are also important questions related to the expected behaviour of a matching strategy. Consider, for example, developing an affiliation matching strategy where we define our input as “an affiliation string”. What should happen when the strategy gets something else on the input, for example, song lyrics? Perhaps the strategy should simply return no matches, or an error, or we could say that in such a situation the behaviour is undefined and it simply doesn’t matter what is returned. But what should happen if in this input we have the lyrics of Street Life by Roxy Music, a song that mentions the names of a few universities that happen to have ROR IDs?

It is likewise important to consider what should happen if different parts of the input match to different identifiers, like in the following example:

Department of Haematology, Eastern Health and Monash University, Box Hill, Australia

Here, “Eastern Health” matches to https://ror.org/00vyyx863 and “Monash University” to https://ror.org/02bfwt286. Should the matching strategy return all the identifiers, one of them (if so, which one?), or nothing at all?

Similar questions arise when it is possible to match to multiple versions (or duplicates) in the target identifier set. This can happen, for example, in the context of bibliographic reference matching or preprint matching. Multiple matches may occur when there are different editions, reprints, or variations of the same publication in the target dataset, each with its own unique identifier.

If you are waiting for an answer to these questions, we unfortunately must disappoint you here. These can only be answered in the context of a specific problem, considering who the users are and what it is they need and expect.

Did you notice any other subtleties related to metadata matching and its concerns? Are there other non-obvious questions that should be considered when planning to develop or integrate metadata matching strategies? Let us know—we’d love to hear from you!

Drawing on the Research Nexus with Policy documents: Overton’s use of Crossref API

Luis Montilla — Sat, 15 Jun 2024 00:00:00 +0000

Update 2024-07-01: This post is based on an interview with Euan Adie, founder and director of Overton._

What is Overton?

Overton is a big database of government policy documents, also including sources like intergovernmental organisations, think tanks, and big NGOs and in general anyone who’s trying to influence a government policy maker. What we’re interested in is basically, taking all the good parts of the scholarly record and applying some of that to the policy world. By this we mean finding all the documents, finding what’s out there, collecting metadata for them consistently, fitting to our schema, extracting references from all the policy documents we find, adding links between them, and then we also do citation analysis.

What do you mean by the good parts of the scholarly record?

What I mean by the good parts of the scholarly record is, from a data perspective, having persistent open metadata for items on different stable, interoperable platforms and being able to build up layers of data to suit specific use cases. That’s a better approach than trying to do everything in a silo here and a silo there and trying to do stuff bit by bit or in a hundred different ways.

There’s also a bad part, which is less to do with metadata and more around citation analysis and responsible metrics. With all this data… as the famous Spiderman quote goes… with great power comes a great responsibility: once you start systematically collecting this data, it’s very easy to fall into the trap of thinking that if we can put numbers on it, and then maybe we could start reading meaning into those numbers, and then it spirals out of control. So the idea for Overton was: can we take the system, some of the infrastructure and apply those ideas? But then come at it already knowing where the later pitfalls are and try to avoid them.

What is your main use of Crossref resources?

We rely heavily on Crossref to link policy documents to the scholarly record. The question we’re trying to answer is: does this government document cite academic work? We work a lot with universities, think tanks, and IGOs. They’re asking where is the research we produce ending up? Is it being used by the government? In some countries, like the UK, there’s a big impact agenda where it’s quite important to demonstrate that for government funding. In the US as well, state universities for example aim to impact the local policy environment. Right? Are we producing things that went on to change life for local residents for the better? And that’s really what we’re trying to support. And so that’s one of the main use cases of the database.

Can you tell us a little bit more about the story of Overton, how did this idea start?

It really came from two things. The first one is that I’d always been interested in this area and before Overton, I founded a company called Altmetric.com, which was looking at kind of broader impact metrics for papers. And we looked at Twitter, and news, and blogs, and other things, including policy. But policy wasn’t a primary focus.

When I left Altmetric two things were happening in the UK – not that everything is about Brexit, but Brexit was happening, and then COVID happened as well. And in both cases, I think it just drove home to me that other people seemed to be very interested in the evidence that the government has used to make decisions. Be they good decisions like some of the evidence based initatives in COVID or bad decisions like Brexit. So, how can you find out what it was? And it is actually very difficult to do. You can’t really track back how this decision was made. I thought that there is a growing need for that kind of impact analysis. So the second thing was, can we do something that helps make it easy to see what evidence goes into policy? The scholarly evidence but also the other kind of policy influence that goes into any document or discussion.

What are the main challenges that you face when you are trying to retrieve these policy documents?

Well, first is another thing that the scholarly record does well, which is persistence. We have CLOCKSS and all the dark archives¹. So the whole idea is that if you have a DOI, if something moves, we can track it and it maintains the ID, and even if the publisher goes bust it’ll never disappear. For citing it, then there’s always going to be a copy of it somewhere available even if it’s in a library or a dark archive.

One of the biggest challenges with policy documents is that kind of persistence doesn’t exist… There are a lot of statistics about link rot², and they hold true for policy documents as much as anywhere else. Every year a percentage of the links everywhere basically break because websites are redesigned or a government changes, it’s even worse because it can be by design. If you think about it, a new government comes into power, they change… let’s say the Department of Agriculture and they merge it with the Department of Fisheries. That would refer to a completely new third thing. And the other two departments disappear or they start linking off, like, redirecting or whatever.

One of the challenges is just keeping track of all the changes in the landscape and constantly trying to stay on top of the data. And that’s a big part of what we do. Another challenge for us, and I think about it compared to journals, when you cite something in a scholarly document, you cite it in a given style, but there are no standards for referencing styles in policy documents. So even in the same document, we can see, like, four or five different ways of referring to something, and sometimes they’re missing important data and sometimes they’re not. And it means when we’re using Crossref search, we usually have much more unparsable text.

How has your experience been so far using our Crossref API or our services in general?

It’s been great. I would happily say this anywhere, I always talk about the Crossref API as being one of the best examples of a well-done scholarly infrastructure API. It’s well-documented. It’s fast. It’s clear. The rate limits are clear. It’s up when it should be up. I like that you can trust it. So the technical aspect is great. From an organisational aspect, in contrast with a lot of infrastructure in the scholarly world that you don’t know if it’s even going to be there in a given time, Crossref is pretty stable.

What would you say are the main challenges or things that we can improve in the future? What other expectations or suggestions do you have?

It depends, if we’re talking about how the service could be improved versus how the data could be improved. Data-wise, and I appreciate this is a publisher problem, not a Crossref one, but, we still have to pull other data from OpenAlex, for example, for things like affiliations just because it’s missing from so many articles. And then equally things like ORCID for authors. And in fact also disambiguation in general. This is a huge problem that either the user doesn’t solve or you end up using a hundred different author disambiguation systems. I don’t know if there’s necessarily something Crossref wants to get into, but there’s definitely not something out there generally accepted already.

Another kind of improvement I see is to make sure that changes in one API are reflected in the other, and they don’t get out of sync. When somebody updates their ORCID record, I’d like it reflected in the Crossref record if we’re using that as the “canonical” metadata record for the DOI. Retrospectively enriching records.

I think it’s harder than I expected to just find preprints because you can’t simply use the item type but I understand that this is maybe a bigger issue. So maybe it’s not for a short time.

Finally, this is very specific, but we experienced friction when going from the snapshots to having something useful, either in Elasticsearch or in, like, Postgres. It might be nice to have some open-source scripts to download and process everything, convert it to relational tables, or send it to an Elasticsearch cluster or something.

Rebalancing our REST API traffic

Stewart Houten — Tue, 04 Jun 2024 00:00:00 +0000

Since we first launched our REST API around 2013 as a Labs project, it has evolved well beyond a prototype into arguably Crossref’s most visible and valuable service. It is the result of 20,000 organisations around the world that have worked for many years to curate and share metadata about their various resources, from research grants to research articles and other component inputs and outputs of research.

The REST API is relied on by a large part of the research information community and beyond, seeing around 1.8 billion requests each month. Just five years ago, that average monthly number was 600 million. Our members are the heaviest users, using it for all kinds of information about their own records or picking up connections like citations and other relationships. Databases, discovery tools, libraries, and governments all use the API. Research groups use it for all sorts of things such as analysing trends in science or recording retractions and corrections.

So the chances are high that almost any tool you rely on in scientific research has somewhere incorporated metadata through us.

Optimising performance

For some time, we’ve been noticing reduced performance in a number of ways, and periodically we have a flurry of manually blocking/unblocking IP addresses from requesters that are hammering and degrading the service for everyone else, and this is of course only minimally effective and very short term. You can always watch our status page for alerts. This is the current one about REST API performance: https://status.crossref.org/incidents/d7k4ml9vvswv.

As the number of users and requests has grown, our strategies for serving those requests must evolve. This post discusses how we’re approaching balancing the growth in usage for the immediate term and provides some thoughts about things we could try in the future on which we’ll gladly take feedback and advice.

Load balancing

In 2018, we started routing users through three different pools (public, polite, and plus). This coincided with the launch of Metadata Plus, a paid-for service with monthly data dumps and very high rate limits. Note that all metadata is exactly the same and real-time across all pools. We also, more recently, introduced an internal pool. Here’s more about them:

Plus: This is the aforementioned premium option; it’s really for ‘enterprise-wide’ use in production services and is not really relevant here.
Public: This is the default and is the one that is struggling at the moment. You don’t have to identify yourself and, in theory, we don’t have to work through the night to support it if it’s struggling (although we often do). Public currently receives around 30,000 requests per minute.
Polite: Traffic is routed to polite simply by detecting a mailto in the header. Any system or person including an email is being routed to a currently-quieter pool, this means we can always get in touch for troubleshooting (and only troubleshooting). Polite currently receives around 5,000 requests per minute.
Internal: In 2021, we introduced a new pool just for our own tools where we can control and predict the traffic. Internal currently receives around 1,000 requests per minute.

The volumes of traffic across public, polite and internal pools are very different and yet each pool has always had similar resources. The purpose of each of these pools has been long-established but our efforts to ask the community to use polite by default have not been particularly successful and it is clear that we don’t have the right balance.

The internal pool has been dedicated to our internal services that have predictable usage and that have requests that are not initiated by external users. The internal pool has previously included reference matching but not Crossmark, Event Data, or search.crossref.org, which all use the polite pool instead, along with the community. We have the capacity on the internal pool to shift all of this “internal” traffic across, and in doing so we will create more capacity for genuine polite users and redefine what we consider to be “internal”.

Creating more capacity on polite will also give us the opportunity to load-balance requests to both polite and public across the two pools. We are at a point where we cannot eke more performance out of the API without architectural changes. In order to buy ourselves time to address this properly, we will modify the routing of polite and public and evenly distribute requests to the two pools 50/50.

The public and polite pools have equal resources at the moment yet handle very different volumes of traffic (30,000 req/min vs 5,000 req/min), and with the proposed changes to internal traffic the polite pool would handle a fraction of this. The result would look something like 31,000 req/min evenly distributed across public and polite.

Rate limiting

Our rate-limiting also needs review. We track a number of metrics in our web proxy but only deny requests on one of them - the number of requests per second. On public and polite we limit each IP address to sending 50 req/sec and if this rate is exceeded users are denied access for 10 seconds. These limits are generous and we cannot realistically support this volume of request for all users of the public or polite API.

However, when requests are taking a long time to return, we potentially have a separate problem of high concurrency as hundreds of requests could be sent before the first one has returned. We intend to identify and impose an appropriate rate limit on concurrent requests from each IP to prevent a small number of users from disproportionately affecting all users with long-running queries.

Longer-term

So, in the short-term we will revise our pool traffic as described above. We’ll do that this week. Then we will review the current rate limits and reduce them to something more reasonable for the majority of users. And we’ll identify and introduce a rate limit for concurrent requests from each user.

Longer-term, we need to rearchitect our Elasticsearch pools so that we can:

Reduce shard sizes to improve performance of queries
Balance data shards and replicas more evenly
Optimise our instance types for our workload

Want to help?

Thanks for asking!

Firstly, please, everyone, do always put an email in your API request headers - while the short term plan will help stabilise performance, this habit will always help us troubleshoot e.g. we can always contact you instead of blocking you!

Secondly, we know many of you incorporate Crossref metadata, add lots of value to it in order to deliver important services, and also develop APIs of your own. We’d love any comments or recommendations from those of you handling similar situations on scaling and optimising API performance. You can comment on this post which is managed via our Discourse forum. We’ll also be adding updates to this thread as well as on status.crossref.org. If you’d like to be in touch with any of us directly, all our emails are firstinitiallastname@crossref.org.

Metadata matching 101: what is it and why do we need it?

Dominika Tkaczyk — Thu, 16 May 2024 00:00:00 +0000

https://doi.org/10.13003/aewi1cai

At Crossref and ROR, we develop and run processes that match metadata at scale, creating relationships between millions of entities in the scholarly record. Over the last few years, we’ve spent a lot of time diving into details about metadata matching strategies, evaluation, and integration. It is quite possibly our favourite thing to talk and write about! But sometimes it is good to step back and look at the problem from a wider perspective. In this blog, the first one in a series about metadata matching, we will cover the very basics of matching: what it is, how we do it, and why we devote so much effort to this problem.

What is metadata matching?

Would you be able to find the DOI for the work referenced in this citation?

Everitt, W. N., & Kalf, H. (2007). The Bessel differential equation and the Hankel transform. Journal of Computational and Applied Mathematics, 208(1), 3–19.

We bet you could! You might begin, for example, by pasting the whole citation, or only the title, into a search engine of your choice. This would probably return multiple results, which you would quickly skim. Then you might click on the links for a few of the top results, those that look promising. Some of the websites you visit might contain a DOI. Perhaps you would briefly compare the metadata provided on the website against what you see in the citation. If most of this information matches (see what we did there?), you would conclude that the DOI from that website is, in fact, the DOI for the cited paper.

Well done! You just performed metadata matching, specifically, bibliographic reference matching. Matching in general can be defined as the task or process of finding an identifier for an item based on its structured or unstructured “description” (in this case: finding a DOI of a cited article based on a citation string).

But matching doesn’t have to just be about citations and DOIs. There are many other instances of matching we can think of, for example:

finding the ROR ID for an organisation based on an affiliation string,
finding the ORCID ID for a researcher based on the person’s name and affiliation,
finding the ROR ID for a funder based on the acknowledgements section of a research paper,
finding the grant DOI based on an award number and a funder name.

Matching doesn’t have to be done manually. It is possible to develop fully automated strategies for metadata matching and employ them at scale. It is also possible to use a hybrid approach, where automated strategies assist users by providing suggestions.

Developing automated matching strategies is not a trivial task, and if we want to do it right, it takes a great deal of time and effort. This brings us to our next question: is it worth it?

Why do we need matching?

In short, metadata matching gives us a more complete picture of the research nexus by discovering missing relationships between various entities within and throughout the scholarly record:

These relationships are very powerful. They provide important context for any entity, whether it is a research output, a funder, a research institution, or an author. Imagine for a moment the scholarly record without any such relationships, where all bibliographic references, affiliations (institution names and addresses), and funding information (funder names and grant titles) are provided as unstructured strings only. In such a world, how would you calculate the number of times a particular research paper was cited? How would you get a list of research outputs supported by a specific funder? It would be incredibly challenging to navigate, summarise, and describe research activities, especially considering the scale. Thankfully, these and many other questions can be answered thanks to metadata matching that discovers relationships between entities in the scholarly record.

There are two primary ways we can use metadata matching in our workflows: as semi-automated tools that help users look up the appropriate identifiers or as fully automated processes that enrich the metadata in various scholarly databases.

The first approach is quite similar to the example we described at the beginning. If you are submitting scholarly metadata, for example of a new article to be published, you can use metadata matching to look up identifiers for the various entities and include these identifiers in the submission. For example, with the help of metadata matching, instead of submitting citation strings, you could provide the DOIs for works cited in the paper and instead of the name and address of your organisation, you could provide its ROR ID. To make this easier for people, metadata submission systems and applications sometimes integrate metadata matching tools into user interfaces.

The second approach allows large, existing sources of scholarly metadata to be enriched with identifiers in a fully automated way. For example, we can match affiliation strings to ROR IDs using a combination of machine learning models and ROR’s default matching service, effectively adding more relationships between people and organisations. We can also compare journal articles and preprints metadata in the Crossref database by calculating similarity scores for titles, authors, and years of publication to match them with each other and provide more relationships between preprints and journal articles. This automated enrichment can be done at any point in time, even after research outputs have been formally published.

There are fundamental differences between these two approaches. The first is done under the supervision of a user, and for the second, the matching strategy makes all the decisions autonomously. As a result, the first approach will typically (although not always) result in better quality matches. By contrast, the second approach is much faster, generally less expensive, and scales to even very large data sources.

In the end, no matter what approach is used, the goal is to achieve a more complete accounting of the relationships between entities in the scholarly record.

This blog is the first one in a series about metadata matching. In the coming weeks, we will cover more detail about the product features related to metadata matching, explain why metadata matching is not a trivial problem, and share how we can develop, assess, compare, and choose matching strategies. Stay tuned!

2024 public data file now available, featuring new experimental formats

Patrick Polischuk — Tue, 14 May 2024 00:00:00 +0000

This year’s public data file is now available, featuring over 156 million metadata records deposited with Crossref through the end of April 2024 from over 19,000 members. A full breakdown of Crossref metadata statistics is available here.

Like last year, you can download all of these records in one go via Academic Torrents or directly from Amazon S3 via the “requester pays” method.

Download the file: The torrent download can be initiated here. Instructions for downloading via the “requester pays” method, along with other tips for using these files, can be found on the “Tips for working with Crossref public data files and Plus snapshots” page.

In January, Martin Eve announced that we had been experimenting with alternative file formats meant to make our public data files easier to use by broader audiences. This year’s file will be published alongside the tools that can be used on the public data file to produce two experimental formats: JSON-lines and SQLite (and a bonus Rust version). You can read more about our thinking behind this work in Martin’s blog post, and we are keen to hear your thoughts on these alternatives.

Our annual public data file is meant to facilitate individuals and organisations interested in working with the entirety of our metadata corpus. Starting with the majority of our metadata records in one file should be much easier than starting from scratch with our API, but because Crossref metadata is always openly available, you can use the API to keep your local copy up to date with new and updated records.

If you’re curious about what you’ll get with the public data file, we’ve also published a sample version so that you can take a peek before committing to downloading the ~212 gb file. This file includes a random sample of JSON files and is available exclusively via torrent here.

We hope you find this public data file useful. Should you have any questions about how to access or use the file, please see the tips below, or share your questions below (you will be redirected to our community forum).

Tips for using the torrent and retrieving incremental updates

Use the public data file if you want all Crossref metadata records. Everyone is welcome to the metadata, but it will be much faster for you and much easier on our APIs to get so many records in one file. Here are some tips on how to work with the file.
Use the REST API to incrementally add new and updated records once you have the initial file. Here is how to get started (and avoid getting blocked in your enthusiasm to use all this great metadata!).
While bibliographic metadata is generally required, because lots of metadata is optional, records will vary in quality and completeness.

Questions, comments, and feedback are welcome at support@crossref.org.

Integrity of the Scholarly Record (ISR): what do research institutions think?

Madhura Amdekar — Thu, 09 May 2024 00:00:00 +0000

Earlier this year, we reported on the roundtable discussion event that we had organised in Frankfurt on the heels of the Frankfurt Book Fair 2023. This event was the second in the series of roundtable events that we are holding with our community to hear from you how we can all work together to preserve the integrity of the scholarly record - you can read more about insights from these events and about ISR in this series of blogs.

Research institutions are one of the most important stakeholders in the endeavour of research integrity, and any conversation around ISR is incomplete without the views of this key community. This fact was acknowledged at the second ISR roundtable event, and one of the main takeaways from the discussions was to make more focused efforts to hear the viewpoints of researchers and academics.

As the first step in this direction, we organised an online discussion on the integrity of the scholarly record, to which we invited: researchers and academics, research integrity experts based at academic institutions, Crossref members, as well as other organisations working on this topic such as COPE and Digital Science. The primary objective of this event was to hear from this community their perspectives on preserving and leveraging the integrity of the scholarly record and to identify opportunities for collaboration in this area. To ensure common ground, we also wanted to share information about Crossref metadata, the Research Nexus vision, and our position and role in the integrity of the scholarly record.

To facilitate this, the event started with an introduction by Kora Korzec, Head of Community Engagement and Communication at Crossref, to our mission and vision and the importance of capturing the relationships between the objects, people and places involved in research through the Research Nexus. Amanda Bartell, Head of Member Experience, was next and she spoke about the scholarly record and the role that Crossref plays in preserving the record’s integrity. In her presentation, Amanda emphasised that Crossref’s role is not to assess the quality of content deposited by the members but rather to provide infrastructure that enables the community to provide and use metadata about the scholarly content produced by members. It’s important not to put up barriers to entry, but to work with all publishers to encourage best practices.

Dominika Tkaczyk, Head of Strategic Initiatives, shared details of a few Crossref projects that focus on monitoring and improving metadata completeness, thereby supporting ISR. These projects include improving the Participation Reports, using metadata matching to discover new relationships (e.g., preprint published as work, work supported by funder, etc), and importing more retractions and other updates from the Retraction Watch database that was acquired and made openly available by Crossref. Dominika used these examples to highlight the ways in which open and complete metadata can help in uncovering large scale trends and systemic concerns. The final speaker was Amanda French, ROR Technical Community Manager, who introduced the audience to the Research Organisation Registry, or ROR.

To accomplish the primary aim of the event, which was to hear the community’s viewpoints, the participants were divided into breakout groups for discussions and given three prompts to answer. The rest of the blog is a summary of what we heard from the participants.

1. Is Crossref’s role what you expected? What surprised you? What are we missing?

An overarching sentiment from the academics in the audience was that Crossref does so much more than is known to researchers! They were surprised by the range of activities underway at Crossref. At the same time, there were calls for Crossref to play a bigger role. Suggestions included playing a leadership role in deciding which metadata elements are a priority, providing guidance on the main metadata components important for signalling trust, playing a greater role in connecting various identifiers to ensure that relationships between different content types are preserved well, and to coordinate the efforts being taken by institutions, publishers and service providers around research integrity, by virtue of Crossref’s unique position in the community. There was a broad agreement that by providing the essential infrastructure, Crossref acts as the base upon which other actors in the scholarly community can build.

2. What metadata elements do you consider important for signalling trust?

Many participants spoke about the various ways in which author identity and affiliation are important as trust signals. Being able to identify when an author has changed institutions, or being able to make a distinction between authors who have the same name is important. Author affiliations that are authentic and verified would go a long way in establishing trust.

Multiple assertions, e.g. for affiliations, would be welcome. The use cases for this could be when research starts at one institution and is carried over to another, or when researchers affiliated with an institution may perform part of the research overseas. Some of the participants, who actively investigate research data, shared that abstracts are valuable because they can be used for large scale analyses related to research integrity.

Other metadata elements that came up during this discussion were data on peer review, ethics approval, patient and donor consent in medical research, editorial boards (especially of special issues), pre-registration, funding metadata, datasets and programming scripts.

3. What value do you see in the integrity and completeness of the scholarly record in the way you operate? How do you contribute to it? How can it support you to achieve your own goals?

Participants acknowledged that integrity of the metadata and the scholarly record is essential. Ensuring this integrity is a dynamic process, much akin to the concept of organised scepticism which is the notion that all scientific work should be trusted subject to its verification. Several ideas were shared on how to progress the integrity and completeness of the scholarly record. One recommendation was to use multiple metadata trust markers as that can make it harder for bad actors to game the system, but this may run the risk of making things complicated. Another suggestion was to make metadata part of the onboarding procedure- by gathering staff ORCID iDs during the onboarding process and sharing the institutional ROR ID with staff to promote its use, institutions can ensure that this information is routinely made available. The metadata deposited with Crossref should be integrated with downstream workflows to better facilitate the use of this rich metadata. An example of this is to integrate Crossmark with other research tools such as reference management software.

The participants acknowledged that this discussion underlined for them the fact that having identifiers in itself is not an indicator of quality and that the underlying metadata records and wider context is key to understanding trustworthiness of the content.

This event was a good first step towards engaging researchers and academics in the conversation about ISR. It connected folks working in different parts of the world who are united by their interest in research integrity. There was good engagement among all and commitment to continue these conversations in the future, with many participants planning to connect at the World Conference on Research Integrity in June (I’ll be attending as well, for anyone who wants to continue the conversation - along with my colleagues Fabienne and Evans).

At Crossref, we plan on continuing these conversations with all segments of the community to understand their needs and perceptions around metadata. The greater the awareness about the importance of metadata and its applications, including for research integrity, the richer the metadata that we are able to collect together. This will lead to building a comprehensive Research Nexus and emergence of more relationships therein. Please write in response to this post on our Community Forum if you have any thoughts on this as we’d love to hear from you.

List of participants


Manu Goyal	International Journal of Cancer
Panagiotis Kavouras	University of Oslo
Dorothy Bishop	University of Oxford
Zhesi (Phil) Shen	Centre of Scientometrics, National Science Library, Chinese Academy of Sciences
Wouter Vandevelde	KU Leuven
Leslie McIntosh	Digital Science
Elizabeth Noonan	University College Cork
Radek Gomola*	Masaryk University Press
	Queensland University of Technology
	London School of Hygiene & Tropical Medicine
	Vilnius Gediminas Technical University Library
	Committee on Publication Ethics (COPE)
Ginny Hendricks	Chif Program Officer, Crossref
Kornelia Korzec	Director of Community, Crossref
Amanda Bartell	Director of Membership, Crossref
Dominika Tkaczyk	Director of Data Science, Crossref
Amanda French	Technical Community Manager, Crossref
Madhura Amdekar	Community Engagement Manager, Crossref

*Note: name added 21-May-2024

Seeking consultancy: understanding joining obstacles for non-member journals

Ginny Hendricks — Wed, 01 May 2024 00:00:00 +0000

Crossref is undertaking a large program, dubbed 'RCFS' (Resourcing Crossref for Future Sustainability) that will initially tackle five specific issues with our fees. We haven’t increased any of our fees in nearly two decades, and while we’re still okay financially and do not have a revenue growth goal, we do have inclusion and simplification goals. This report from Research Consulting helped to narrow down the five priority projects for 2024-2025 around these three core goals:

Scope of the RCFS Program 2024-2025

GOAL: MORE EQUITABLE FEES

Project 1: Evaluate the USD $275 annual membership fee tier and propose a more equitable pricing structure, which might entail breaking this down into two or more different tiers.
Project 2: Define a new basis for sizing and tiering members for their capacity to pay

GOAL: SIMPLIFY COMPLEX FEES

Project 3: Address and adjust volume discounts for Content Registration
Project 4: Address and adjust back-year discounts for Content Registration

GOAL: REBALANCE REVENUE SOURCES

Project 5: Reflect the increasing value of Crossref as a metadata source, likely increasing Metadata Plus fees

Work to date

As part of the RCFS program, we are working closely with our Membership & Fees Committee to discuss insights, gather feedback, and make recommendations to the Board. As a first step, we have surveyed and received responses from around 1000 of the current 8000 Crossref members in our lowest membership fee tier (USD $275). We are now starting to distill that data and will discuss it on our community call on May 8th and subsequently with the M&F Committee to inform recommendations for fee changes that may going into effect in 2025 or 2026.

Request For Information (RFI) about community consultation project

While we have useful data from existing Crossref members, we know that there are many thousands of journals that are not (yet) members, and we need to understand this group better, in particular, to document and address the financial obstacles as well as the technical or social challenges.

We are looking for community facilitation expertise, with multiple language skills, to conduct a series of focus groups with non-member journals, with a summary and insights report (in English) provided by the end of June 2024.

All the data and documentation will be available publicly on the dedicated RCFS Program website

As well as designing, conducting, and summarising the results of some focus groups (participants for which will be gathered via our own contacts and those of partners such as DOAJ, EIFL, and the Free Journal Network) we would like the consultant to review work such as the DIAMAS institutional publishing report, and identify data relevant to Crossref’s fee model.

If you would like to respond, please provide the following information and send it to Kora Korzec at feedback@crossref.org by 15th May:

Your consultancy organisation and your role within it
Examples of similar market research undertaken
Languages spoken within your team
Confirmation that the timeline is workable
Approximate fee, likely range, or structure/basis for your fee

Equally, if you represent a journal or group of journals, such as Diamond Open Access journals, and are not yet using Crossref, please get in touch and we can include your group in the research.

Thank you!

This year's call for expressions of interest to join our board

Lucy Ofiesh — Fri, 26 Apr 2024 00:00:00 +0000

The Crossref Nominating Committee is inviting expressions of interest to join the Board of Directors of Crossref for the term starting in January 2025. The committee will gather responses from those interested and create the slate of candidates that our membership will vote on in an election in September.

Expressions of interest will be due Monday, May 27th, 2024

This is an exciting time to join the board, as we have a number of active projects underway: We are considering resourcing Crossref for a sustainable future and board members will be part of deciding any changes to our fees scheme and overseeing its implementation. We’re focusing on how our community and metadata can contribute to ensuring the integrity of the scholarly record. We’re broadening our metadata record to capture richer funding and institutional affiliations. We’re working towards a future where the scholarly record prioritizes relationships between research outputs to build a holistic research nexus. The board helps guide this work.

About the board elections

This year, we will elect two of the larger member seats (membership tiers $3,900 and above) and two of the smaller member seats (membership tiers $1,650 and below). You don’t need to specify which seat you are applying for; we will provide that information to the nominating committee.

The online election will open in September, with results announced at the annual meeting on October 29th, 2024. New members will begin their term in January 2025.

About the Nominating Committee

2024 Nominating Committee

James Phillpotts*, Director of Content Transformation and Standards, Oxford University Press, committee chair
Oscar Donde*, Editor in Chief, Pan Africa Science Journal
Rose L’Huillier*, Senior Vice President Researcher Products, Elsevier
Ivy Mutambanengwe-Matanga, Chief Operating Officer, African Journals Online
Adam Sewell, Chief Technology Officer, IOP Publishing

(*) indicates Crossref board member

What is the committee looking for this year

Demonstrate a commitment to or understanding of our strategic agenda or the Principles of Open Scholarly Infrastructure;
Have expertise that may be underrepresented on the board currently;
Hold senior/director-level positions in their organisations;
Have experience with governance or community involvement;
Represent member organisations that are active in the scholarly communications ecosystem;
Demonstrate metadata best practices as shown in the member’s participation report

The board is also encouraging Crossref members who are research funders to apply.

Board roles and responsibilities

Setting the strategic direction for the organisation;
Providing financial oversight; and
Approving new policies and services.

The board is representative of our membership base and guides the staff leadership team on trends affecting scholarly communications. The board sets strategic directions for the organisation while also providing oversight into policy changes and implementation. Board members have a fiduciary responsibility to ensure sound operations. They do this by attending board meetings as well as joining more specific board committees.

Who can apply to join the board?

What is expected of board members?

Board members attend four meetings each year that typically take place in January, March, July, and November. Meetings have taken place in a variety of international locations and travel support is provided when needed. January, March, and November board meetings are held virtually, and all committee meetings take place virtually. Each board member should sit on at least one Crossref committee. Care is taken to accommodate the wide range of time zones in which our board members live.

Board members are expected to be comfortable assuming the responsibilities listed above and to prepare and participate in board meeting discussions.

How to apply

Please click here to submit your expression of interest. We ask for a brief statement about how your organisation could enhance the our board and a brief personal statement about your interest and experience with Crossref.

Please contact me with any questions at lofiesh@crossref.org

Common views and questions about metadata across Africa

Johanssen Obanda — Wed, 24 Apr 2024 00:00:00 +0000

This past year has been a captivating journey of immersion within the Crossref community, a mix of online interactions and meaningful in-person experiences. From the engaging Sustainability Research and Innovation Conference in Port Elizabeth, South Africa, to the impactful webinars conducted globally, this has been more than just a professional endeavour; it has been a personal exploration of collaboration, insights, and a shared commitment to pushing the boundaries of scholarly communication.

Working collaboratively with research funders and research organisations

Cocreation activity in smaller groups at the SRI conference.

The adventure began with a significant in-person event, the Sustainability Research and Innovation Conference. In the coastal city of Port Elizabeth, South Africa, I had the honour of hosting a parallel co-creation session titled “Connecting Science to Society: A Network Approach to Improving Science Communication in the Global South.” The co-creation session addressed research discoverability and accessibility among early-career researchers. Apart from some immediate feedback from the researchers in the room about how they might use co-creation beyond the conference to improve their research experience and outcome, I also had conversations with research funders from the Belmont Forum, Future Earth, and National Research Foundation - South Africa and the National Research Foundation - Mozambique about connecting their grants and grantees with their published outputs referencing Crossref’s Open Funder Registry and research grants registration. A different side conversation was about a community organisation in Botswana that is interested in registering patents with Crossref for proper referencing and protecting the intellectual property of their research on the indigenous communities’ innovations and the associated published work. These conversations are ongoing, unveiling a new understanding of unique needs and opportunities to pursue with research funders and research organisations working on indigenous knowledge and innovations.

Learning from organisations in GEM-eligible countries

The journey extended globally through a series of webinars conducted in Bangladesh, Tanzania, Nepal, and Ghana. Collaborating with dedicated Ambassadors and my colleagues leading the Global Equitable Membership (GEM) program, we witnessed an increase in Crossref membership from the GEM countries and initial metadata registration. The GEM Program offers relief from both Crossref membership and Crossref content registration fees for organisations in the least economically advantaged countries in the world, based on the World Bank’s IDA list. Susan, in her blog post, “The GEM Program: Year One”, elaborated on the significance of these efforts and their impact on fostering equitable access to scholarly resources and communication through the expansion of Crossref’s membership base in underrepresented regions, such as Bangladesh, Tanzania, Nepal, and Ghana. Specific concerns encountered while presenting the GEM program included feedback expressing reservations about the program’s approach, particularly in deciding on eligible countries, and advocating eligibility for the program to be extended to all the non-GEM countries in Africa. Additionally, a conversation with some organisations brought up concerns regarding the program’s sustainability, with inquiries about whether GEM was merely a free trial or freemium service, and seeking assurances against future fees. The audience found these sessions helpful, acknowledging that joining fees were no longer going to be a barrier, yet questions about the program’s longevity brought out the need for sustained support.

Discussing how The Research Nexus can support the community

My journey then led me to Makerere University in Uganda for the Consortium of Uganda University Libraries (CUUL 2023) conference and the Forum for Open Research in MENA (FORM 2023) in Abu Dhabi. In Uganda, I noticed the synergy between university libraries, institutional repositories, and the research and education network service provider formed a consortium that played a crucial role in bridging the digital gap and supporting the adoption of open infrastructure. The event was mainly attended by librarians from different universities in Uganda. Most of those I connected with needed more information about Crossref and had questions about how Crossref DOIs are different from ARKs, which they commonly use in their publishing workflows. At FORM 2023, in my presentation titled, “The Research Nexus: A Rich and Reusable Open Network of Relationships in the Scholarly Record,” I shared Crossref’s vision for a connected research ecosystem with the audience that comprised of researchers, research administrators, and funders, and a good number of big publishers like IEEE and Taylor & Francis. The Research Nexus seeks to reveal relationships beyond persistent identifiers, utilising rich metadata to connect various scholarly components. I also took the opportunity at both events to share about The Publishers Learning And Community Exchange (PLACE), an online forum promoting best practices in scholarly publishing. The goal was to show attendees how they can actively contribute to and benefit from this vision, fostering a robust and interconnected research community through Crossref’s open infrastructure.

Photo with Dr. Salwan Abdulateef, rossref Ambassador - Iraq

I enjoyed the opportunity to join the National Open Science Dialogue by TCC Africa, which provided crucial insights, emphasising the need for assessing awareness, implementing comprehensive policies, and fostering collaboration around Open Science. Higher education institutions were recognized as influencers in the global Open Science movement, while a call for an inclusive research environment was underscored through open access and data sharing. The dialogue emphasized a collective effort involving policymakers, educators, researchers, and institutions, focusing on inclusivity and collaboration to advance Open Science in East Africa.

Exploring how rich metadata can provide trust signals with members in Kenya

Reflecting on the Crossref Nairobi event that happened in February 2024, it was an enriching experience exploring key issues shaping scholarly publishing in Kenya. The discussions also touched on the role of metadata as a trust signal and a tool for the persistence of the scholarly record, particularly in regions where data protection challenges persist. This is exemplified by concerns raised during the event about the fear of data theft, misuse, or loss, especially in places with comparatively weaker data protection laws. The presence of robust metadata, particularly with detailed provenance information, becomes crucial in such contexts, as it enables better identification and handling of potential misuse. Thus, through effective metadata implementation and the persistence facilitated by identifiers, the management of data risks can be significantly improved.

The insights from existing Crossref members pointed out contextual challenges, regional differences, and the importance of effective post-publication processes. The conference served as a valuable platform for dialogue, emphasising the collective commitment to continuous improvement of scholarly communication in the country, and the need for continuous awareness and training on making the most of Crossref services. The roundtable discussions during the Crossmark service consultation brought to light various reflections and considerations regarding post-publication changes in publishing workflows. The Crossmark service was a new discovery for most participants, with potential value recognized in facilitating current updates on articles. However, there are existing barriers such as a lack of awareness and technical expertise, suggesting the need for further education to facilitate adoption. Overall, the consultation provided a platform for introspection and exploration of avenues for improving post-publication practices in scholarly publishing.

Crossref Nairobi group photo

We organised the Crossref Nairobi event with the help of colleagues from the outreach team and local Ambassadors, Mercury Shitindo of Kenya, Baraka Ngussa of Tanzania and our Board Members in Kenya, Oscar Donde. It was the first time I saw both my colleagues and Ambassadors in action and working closely together - making presentations and accommodating last-minute facilitation changes to the program. Compared to attending or speaking at an event, organising one was a unique experience requiring a lot of planning in advance for logistics and the event program, identifying and keeping in touch with important stakeholders, ushering guests and being on standby for any matters that come up about the event. All of that went very well thanks to the team on the ground and cooperative participants.

Exploring the role of open infrastructure for African universities

Attending the recent WACREN 2024 conference was an eye-opening experience, unfolding the role of open infrastructure in addressing challenges faced by African universities. A focus on open access systems and advocacy for decolonizing knowledge were voiced too, including challenges of affordability of DOIs and questions of local ownership amidst global initiatives. Global persistent identifier providers, including ORCID and DataCite too, had a presence at the conference, alongside passionate advocates for more locally managed, decentralised infrastructure. These are concerns that Crossref needs to understand better, as we seek to find effective ways of supporting equitable participation in the Research Nexus. The conference resonated with a call for continued work in fostering accessibility, sharing, and leveraging resources to accelerate research and innovation in Africa.

Photo with our Ambassadors from West Africa at WACREN 2024 event: Blessing Abumere - Nigeria, Audrey Kenni Nganmeni - Cameroon, Richard Lamptey - Ghana and Oumy Ndiaye - Senegal.

Conversations with Crossref Ambassadors brought about a shared narrative across universities in some African countries. These institutions are actively embracing digital shifts, setting up institutional repositories using platforms like DSpace and OJS. However, challenges persist, particularly in funding and technical capacity. It’s heartening to see how national and regional research and education networks step in to help in internet connectivity, opening up collaboration opportunities with other interoperable infrastructure, setting up repositories, providing hosting services and event managing content identifiers.

Deceptive publishing practices remain a shared concern, and we’ve had requests at these meetings for stricter inclusion criteria for membership of Crossref to ensure quality and trustworthiness of articles accessible through Crossref metadata.

We’ve explained to those we’ve met that Crossref doesn’t (and can’t) assess the quality of content or the integrity of the research process. We don’t have the people or the skills, and it isn’t our mission to be the gatekeepers of research quality. A DOI record is just an indication that something was published, it isn’t an indication of quality.

However, we do still have a vital role in preserving the integrity of the scholarly record. We provide the infrastructure which enables those who produce scholarly outputs to provide metadata (effectively evidence) about how they ensure the quality of content and how the outputs fit into the scholarly record. The scholarly record - that network of published outputs, inputs, relationships and contexts - is captured through the metadata records that our members register with us, and that we then distribute freely and openly through our API. The richer and more comprehensive Crossref records are, the more context there is for our members and for the whole scholarly research ecosystem to make their own decisions around trustworthiness. Blocking access to the infrastructure creates gaps in the scholarly record, but also potentially blocks legitimate newcomers.

“Crossref is focused on enriching metadata to provide more and better trust signals while keeping barriers to membership and participation as low as possible to enable an inclusive scholarly record.” Read more about Crossref’s role in preserving the integrity of the Scholarly record in the blog post by Amanda Bartell.

While the landscape of digital scholarly publication witnesses significant strides, a crucial need persists, the importance of preserving and interconnecting metadata to the global scholarly record. It’s not just about discoverability, a theme resonating strongly within the community, but about enabling reproducibility, upholding research and editorial integrity, and facilitating reporting and assessment.

The path forward

As I reflect on this year of immersing myself within the Crossref community, building awareness in new communities, and learning more about the different perceptions across the region, it feels like a personal progression of growth and discovery. From the captivating in-person moments to the global webinars and collaborative efforts to address challenges in scholarly communication, this journey is not just a professional pursuit; it’s a personal exploration. The path forward involves continued support, intensified awareness-building, and sustained dialogue, ensuring that the scholarly ecosystem continues to thrive, evolve, and leave a lasting impact.

Testing times

Martin Eve — Wed, 03 Apr 2024 00:00:00 +0000

One of the challenges that we face in Labs and Research at Crossref is that, as we prototype various tools, we need the community to be able to test them. Often, this involves asking for deposit to a different endpoint or changing the way that a platform works to incorporate a prototype.

The problem is that our community is hugely varied in its technical capacity and level of ability when it comes to modifying their platform. Some mega-publishers, for instance, outsource their platforms and so are dependent on third party developers/organisations when they want to make a change. Many smaller publishers, by contrast, use systems such as OJS, which come with Crossref plugins that make life very easy… but that require hard code changes to accommodate prototypes. Such changes are way beyond the technical capacity of most journal editors.

So how can we prototype new ideas and test them? One way is by creating new interstitial interfaces that allow people to manually supplement metadata or register for prototype services. Of course, this requires additional work on behalf of the user. Every time they wish to participate they have to visit an extra web page and re-input details that, surely, were included in the original deposit.

Another way would be for plugin developers to have an advanced option field that allowed end-users to change their deposit endpoint. It would be excellent to see this feature in OJS, Janeway, and also proprietary systems. This would allow us to work with the community to test new prototype mechanisms, without forcing anyone to edit code. Many systems already include the ability to switch between Crossref’s “test” system and our live deposit API. All I am really suggesting here is the logical next step: allow advanced users to specify a deposit endpoint of their own choosing so that we can give them access to prototype systems.

Of course, it’s not always that simple. Sometimes, prototype systems will require new data fields on submission, for example. In those cases, there is nothing for it except to modify the plugin or to provide a separate interface. But sometimes, as in the case of the Op Cit project (more on which soon), all the data is already in place; we just need to direct users to a different endpoint. Such changes would definitely make testing times less trying.

Mending Chesterton's Fence: Open Source Decision-making

Joe Wass — Mon, 18 Mar 2024 00:00:00 +0000

When each line of code is written it is surrounded by a sea of context: who in the community this is for, what problem we’re trying to solve, what technical assumptions we’re making, what we already tried but didn’t work, how much coffee we’ve had today. All of these have an effect on the software we write.

By the time the next person looks at that code, some of that context will have evaporated. There may be helpful code comments, tests, and specifications to explain how it should behave. But they don’t explain the path not taken, and why we didn’t take it. Or those occasions where the facts changed, so we changed our mind.

Some parts of our system are as old as Crossref itself. Whilst our process still involves coffee, it’s safe to say that most of our working assumptions have changed, and for good reasons! We have to be very careful when working with our oldest code. We always consider why it was written that way, and what might have changed since. We’re always on the look out for Chesterton’s Fence!

Leaving a Trail

We’re building a new generation of systems at Crossref, and as we go we’re being deliberate about supporting the people who will maintain it.

When our oldest code was written, the software development team all worked in an office with a whiteboard or three, and the code was proprietary. Twenty years later, things are very different. The software development team is spread over 8 timezones. Thanks to POSI, all the new code we write is open source, so the next people to read that code might not even be Crossref staff.

Working increasingly asynchronously, without that whiteboard, we need to record the options, collect evidence, and peer-review them within the team.

So for the past couple of years the software team has maintained a decision register. The first decision we recorded was that we should record decisions! Since then we have recorded the significant decisions as they arise. Plus some historical ones.

These aren’t functional specifications, which describe what the system should do. It’s the decisions and trade-offs we made along the way to get to the how. Look out for another blog post about specifications.

By leaving a trail of explanations as we go, we make it easier for people to understand why code was written, and what has changed. We’re writing the story of our new systems. This makes it easier to alter the system in future in response to changes in our community, and the metadata they use.

Difficult Decisions

There are some fun challenges to building systems at Crossref. We have a lot of data. Our schema is very diverse, and has a vast amount of domain knowledge embedded in it. It’s changed over time to accommodate 20 years of scholarly publishing innovations. Our community is diverse too, from small one-person publishers with a handful of articles, through to large ones that publish millions.

What might be an obvious decision for a database table with a thousand rows doesn’t always translate to a million. When you get to a billion, things change again. An initially sensible choice might not scale. And a scalable solution might look over-engineered if we had millions of DOIs, rather than hundreds of millions.

The diversity of the data also poses challenges. A very simple feature might get complicated or expensive when it meets the heterogeneity of our metadata and membership. What might scale for journal article or grant metadata might not work for book chapters.

The big decisions need careful discussion, experimentation, and justification.

2NF or not 2NF

One such recent decision was how we structure our SQL schema for the database that powers our new ‘relationships’ REST API endpoint, currently in development.

The data model is simple: we have a table of Relationships which connect pairs of Items. And each Item can have properties (such as a type). The way to model this is straightforward, following conventional normalization rules:

We built the API around it, and all was well.

We then added a feature which lets you look up relationships based on the properties of the subject or object. For example “find citations where the subject is an article and the object is a dataset”. This design worked well in our initial testing. We loaded more data into it, and it continued to work well.

And then, the context changed. Once we tested loading a billion relationships in the database, the performance dropped. The characteristics of the data: size, shape and distribution, reached a point where the database was unable to run queries in a timely way. The PostgreSQL query planner became unpredictable and occasionally produced some quite exciting query plans (to non-technical readers: databases are neither the time nor the place for excitement).

This is a normal experience in scaling up a system. We expected that something like this would happen at some point, but you don’t know when it will happen until you try. We bounced around some ideas and came up with a couple of alternatives. Each made trade-offs around processing time, data storage and query flexibility. The best way to evaluate them was to use real data at a representative scale.

One of the options was denormalisation. This is a conventional solution to this kind of problem, but was not our first choice as it involves extra machinery to keep the data up-to-date, and more storage. It would not have been the correct solution for a smaller dataset. But we had the evidence that the other two approaches would not scale predictably.

By combining the data into one table, we can serve up API requests much more predictably, and with much better performance. This code is now running with the right performance. Technical readers note that this diagram is simplified. The real SQL schema is a little different.

Without writing this history down, and explaining what we tried, someone might misunderstand the reason for the code and try to simplify it. Decision record DR-0500 guards against that.

But one day, when the context changes, future developers will be able to come back and modify the code, because they understand why it was like that in the first place.

Credential Checking at Crossref

Martin Eve — Fri, 15 Mar 2024 00:00:00 +0000

It turns out that one of the things that is really difficult at Crossref is checking whether a set of Crossref credentials has permission to act on a specific DOI prefix. This is the result of many legacy systems storing various mappings in various different software components, from our Content System through to our CRM.

To this end, I wrote a basic application, credcheck, that will allow you to test a Crossref credential against an API.

There are two modes of usage. First, a command-line interface that allows you to run a basic command and get feedback:

Usage: cli.py [OPTIONS] USERNAME PASSWORD DOI

Second, you can use it as a programmatic library in Python:

import cred
credential = cred.Credential(username=username, password=password, doi=doi)

if not credential.is_authenticated():
…

if credential.is_authorised():
…

The tool splits down authentication (whether the given username and password are valid) and authorisation (whether the valid credentials are usable against a specific DOI/prefix).

For technical information, the way this works is by attempting to run a report on the specific DOI in question and then scraping the response page. We hope, at some future point, that there will be a real API for this, but for now this solves the problem as a bridge.

Subject codes, incomplete and unreliable, have got to go

Patrick Polischuk — Wed, 13 Mar 2024 00:00:00 +0000

Subject classifications have been available via the REST API for many years but have not been complete or reliable from the start and will soon be deprecated. dfdfd

The subject metadata element was born out of a Labs experiment intended to enrich the metadata returned via Crossref Metadata Search with All Subject Journal Classification codes from Scopus. This feature was developed when the REST API was still fairly new, and we now recognize that the initial implementation worked its way into the service prematurely.

While subject classifications in Crossref metadata could be very useful, the current implementation in the REST API is problematic for three primary reasons:

They are misleadingly exposed in the API as a property of the work, when in fact they are a property of the container (e.g. a journal or conference proceeding). Just because a journal’s broad topic category is “X” doesn’t mean that a particular article in the journal is about “X.”

Existing works may have outdated subjects. Originally, subject codes were not updated periodically. However, subjects exposed in the /journals route are now updated once a day. Those exposed via the /works endpoint are indexed along with works, and so when a new subject list is ingested, new DOIs start getting new subjects, but existing works may have outdated subjects. We don’t have a mechanism for forcing updates when incorrect subject values are returned via the REST API, so this data can be stale and incorrect.

They are not applied to everything. This is because the Scopus list does not cover all the journals that Crossref has (conversely, the Scopus list contains some journals Crossref does not have), and does not contain other container types.

The Labs team investigated options for improving subject classification coverage but ultimately concluded that there are insufficient solutions to the coverage problem. For more, please see Esha Datta’s findings published at Force11’s Upstream: https://doi.org/10.54900/n6dnt-xpq48

Where does that leave us? Rather than continuing to supply unreliable and misleading subject category metadata, we will be deprecating this feature in the coming weeks. To minimize disruption and avoid breaking changes at this time, we will be removing this data from our index, so the subject element will simply be empty. We may remove the subject element in the future.

We know that the community’s desire for subject-based analysis of metadata is very strong, and we have supported efforts to establish a multidisciplinary taxonomy. Inaccurate codes in the meantime do not help but actually hinder these efforts, giving the false impression that they are correct.

We aim to deprecate the subject codes in April of this year.

Please let us know if you have any questions or concerns by leaving a comment below, which will start a thread in our community forum.

Frequently asked questions

Q. Will the subject field continue to be available and functional?
A. The subject metadata element will continue to be included in the JSON response but will not return any values.

Q. Will new subject codes be added in the future?
A. We do not have any current plans to add new subject codes in the future.

Q. I received a notification about this, but we don’t use subject codes. Do I need to do anything?
A. No, if you do not currently use the subject element, you do not need to do anything about this change.

Q. I noticed that wrong or inaccurate subject codes were assigned to my works. Is this a solution?
A. Yes. Until we can identify an accurate and sustainable system for assigning subject codes to Crossref metadata records, we want to stop assigning inaccurate subject codes and remove all existing assignments.

DOAJ and Crossref renew their partnership to support the least-resourced journals

Ginny Hendricks — Wed, 06 Mar 2024 00:00:00 +0000

Crossref and DOAJ share the aim to encourage the dissemination and use of scholarly research using online technologies and to work with and through regional and international networks, partners, and user communities for the achievement of their aims to build local institutional capacity and sustainability. Both organisations agreed to work together in 2021 in a variety of ways, but primarily to ‘encourage the dissemination and use of scholarly research using online technologies, and regional and international networks, partners and communities, helping to build local institutional capacity and sustainability around the world.’ Some of the fruits of this labour are:

DOAJ added support for Crossref XML to make it easier for publishers to upload metadata
Closer collaboration between customer/member support at both organisations, making it easier for publishers and journal editors to navigate both service’s technologies
the launch of PLACE: ‘a ‘one-stop shop’ for information to support publishers in adopting best practices the industry developed’ (together with other partners)
a pilot gap analysis of the journals in DOAJ with the possibility of helping them start to use and resolve DOIs.

The new agreement, signed earlier this month, will slightly shift focus to build upon existing collaborations, particularly around metadata. One of the primary sections of the MOU is enhancing support for the least-resourced journals by:

Assigning DOIs and depositing the metadata with Crossref
Finding ways to improve their DOAJ application experience to help them become indexed
Collect and ingest their Crossref metadata into DOAJ
Help them to get preserved via JASPER or similar initiatives
Help identify other local partners, such as Crossref Sponsoring Organisations, to support their use of Crossref services

It’s great that we can further underpin what is already a good working relationship. Both Crossref and DOAJ are central to discovery so it’s a natural partnership. Helping journals meet better standards and become indexed to make them more discoverable on a global scale is at the heart of our strategy. This agreement opens up a new avenue that allows the community to really focus on supporting those journals and the research they publish.’

– Joanna Ball, Managing Director of DOAJ

‘The collaborations with DOAJ so far only reconfirmed our shared goal to help make the global scholarly communications system more equitable wherever we can. Our joint projects aim to seek out and devise support for resource-constrained journals in multiple ways. DOAJ’s work is essential in helping journals to develop good practice, while Crossref offers an open infrastructure to ensure all journals can be included and discoverable in the global scholarly record.’

– Ginny Hendricks, Director of Member and Community Outreach at Crossref

——– END ——

About DOAJ

DOAJ is a community-curated online directory that indexes and provides access to high quality, open access, peer reviewed journals. DOAJ deploys around one hundred carefully selected volunteers from the community of library and other academic disciplines to assist in curating open access journals. This independent database contains over 20,400 peer-reviewed open access journals covering all areas of science, technology, medicine, social sciences, arts and humanities. DOAJ is financially supported worldwide by libraries, publishers and other like-minded organisations. DOAJ services (including the evaluation of journals) are free for all, and all data provided by DOAJ are harvestable via OAI/PMH and the API. See https://doaj.org/ for more information.

About Crossref

Crossref is a global community-governed open scholarly infrastructure that makes all kinds of research objects easy to find, assess, and reuse through a number of services critical to research communications, including an open metadata API that sees over 1.5 billion queries every month. Crossref’s ~20,000 members come from 155 countries and are made up of universities, publishers, funders, government bodies, libraries, and research groups. Their ~155 million DOI records contribute to the collective vision of a rich and reusable open network of relationships connecting research organisations, people, things, and actions; a scholarly record that the global community can build on forever, for the benefit of society.

For more information please contact: dominic@doaj.org and rclark@crossref.org

What do we know about DOIs

Martin Eve — Thu, 29 Feb 2024 00:00:00 +0000

Crossref holds metadata for approximately 150 million scholarly artifacts. These range from peer reviewed journal articles through to scholarly books through to scientific blog posts. In fact, amid such heterogeneity, the only singular factor that unites such items is that they have been assigned a document object identifier (DOI); a unique identification string that can be used to resolve to a resource pertaining to said metadata (often, but not always, a copy of the work identified by the metadata).

What, though, do we actually know about the state of persistence of these links? How many DOIs resolve correctly? How many landing pages, at the other end of the DOI resolution, contain the information that is supposed to be there, including the title and the DOI itself? How can we find out?

The first and seemingly most obvious way that we can obtain some of these data is by working through the most recent sample of DOIs and attempting to fetch metadata from each of them using a standard python script. This involves using the httpx library to attempt to resolve each of the DOIs to a resource, visiting that resource and seeing what the landing page yields.

Even this is not straightforward. Landing pages can be HTML resources or they can be PDF files, among other things. In the case of PDF files, to detect a run of text is not simple as a single line break can be enough to foil our search. Nonetheless, when using this strategy we find the following statistics:

Total DOI count in sample: 5000
Number of HTTP 200 response: 3301*
Percentage of HTTP 200 responses: 66.02%
Number of titles found on landing page: 1580
Percentage of titles found on landing page: 31.60%
Number of DOIs in recommended format found on landing page: 1410
Percentage of DOIs in recommended format found on landing page: 28.20%
Number of titles and DOIs found on landing page: 929
Percentage of titles and DOIs found on landing page: 18.58%
Number of PDFs found on landing page: 1469
Percentage of PDFs found on landing page: 29.38%
Percent of PDFs found on landing pages that loaded: 44.50%

* an HTTP 200 response means that the web page loaded correctly

While these numbers look quite low, the problem here is that a large number of scholarly publishers use Digital Rights Management techniques on their sites that block a crawl of this type. We can use systems like Playwright to remote control browsers to do the crawling, so that the request looks as much like a genuine user as possible and to evade such detection systems. However, lots of these sites detect headless browsers (where the browser is invisible and running on a server) and block them with a 403 Permission Denied error.

There’s a great Github javascript suite that aims to help evade headless detection. The tests it uses are:

User Agent: in a browser running with puppeteer in headless mode, user agent includes Headless.
App Version: same as User Agent above.
Plugins: headless browsers don’t have any plugins. So we can say that if it has plugin it’s headful, but not otherwise since some browsers, like Firefox, don’t have default plugins.
Plugins Prototype: check if the Plugin and PluginsArray prototype are correct.
Mime Type: similar to Plugins test, where headless browsers don’t have any mime type
Mime Type Prototype: check if the MimeType and MimeTypeArrayprototype are correct.
Languages: all headful browser has at least one language. So we can say that if it has no language it’s headless.
Webdriver: this property is true when running in a headless browser.
Time elapse: it pops an alert() on page and if it’s closed too fast, means that it’s headless.
Chrome element: it’s specific for chrome browser that has an element window.chrome.
Permission: in headless mode Notification.permission and navigator.permissions.query report contradictory values.
Devtool: puppeteer works on devtools protocol, this test checks if devtool is present or not.
Broken Image: all browser has a default nonzero broken image size, and this may not happen on a headless browser.
Outer Dimension: the attributes outerHeight and outerWidth have value 0 on headless browser.
Connection Rtt: The attribute navigator.connection.rtt,if present, has value 0 on headless browser.
Mouse Move: The attributes movementX and movementY on every MouseEvent have value 0 on headless browser.

Using the stealth plugin for Playwright also allows us to evade most of these checks. This just leaves Mouse Move and Broken Image detection, which I thought would not outweigh all the other factors. We can also jitter the connection with arbitrary delays so that it should appear to be coming at random intervals, rather than a robotic crawl.

Yet the basic fact is that we are still blocked from crawling many sites. This does not happen when we put the browser into headful mode, so current detection techniques have clearly evolved in the past half decade (since Detect Headless) was designed.

If, however, we run the browser in a headful mode, the results are somewhat stunningly different:

Total DOI count in sample: 5000
Number of HTTP 200 response: 4852
Percent of HTTP 200 responses: 97.04%
Number of titles found on landing page: 2547
Percentage of titles found on landing page: 50.94%
Number of DOIs in recommended format found on landing page: 2424
Percentage of DOIs in recommended format found on landing page: 48.48%
Number of titles and DOIs found on landing page: 1574
Percentage of titles and DOIs found on landing page: 31.48%
Number of PDFs found on landing page: 2085
Percentage of PDFs found on landing page: 41.70%
Percentage of PDFs found on landing pages that loaded: 42.97%

Let’s talk about the resolution statistics. Other studies, looking at general links on the web, have found a link-rot rate of about 60%-70% over a ten-year period (Lessig, Zittrain, and Albert 2014; Stox 2022). The DOI resolution rate that we have, with 97% of links resolving (or a 3% link-rot rate), is far better and more robust than a web link in general.

Is 3% a good or a bad number? It’s more robust than the web in general, but it still means that for every 100 DOIs, just under 3 will fail to resolve. We also cannot tell whether these DOIs are resolving to the correct target, except by using the metadata detection metrics (are the title and DOI on the landing page, which we could only detect at a far lower rate). It is entirely possible for a website to resolve with an HTTP 200 (OK) response, but for the page in question to be something very different to what the user expected, a phenomenon dubbed content drift. A good example is domain hijacking, where a domain name expires and spam companies buy them up. These still resolve to a web page, but instead of an article on RNA, for a hypothetical example, the user gets adverts for rubber welding hose. That said, other studies are also prone to this and there is no guarantee that content drift doesn’t affect a huge proportion of supposedly good links in the other studies, too.

Of course, one of the most frustrating elements of this exercise is having to work around publisher blocks on content when visiting using a server-only robot script. It’s important for us periodically to monitor the uptime rate of the DOI system. We also recognise, though, that publishers want to block malicious traffic. However, we can’t perform our monitoring in an easy, automatic way if headless scripts are blocked from resolving DOIs and visiting their respective landing pages. This is not even a call for open access; it’s just saying that current anti-bot techniques, sometimes implemented for legitimate reasons, stifle our ability to know the landscape. Even if the bot resolved a DOI to just a paywall, it would be easier for us to monitor this than it is now. Similarly, CAPTCHA systems such as Cloudflare that would seem to offer an easy way to distinguish between humans (good) and robots (bad) can make life very difficult at the monitoring end. We would certainly be grateful for any proposed solution that could help us to work around these mechanisms.

Conclusion

The context in which I wanted to know this information was so that we can take a snapshot of a page and then, at a later stage, determine whether it is down or has changed substantially. To do this, we are developing Shelob, an experimental content drift spider system; that’s what we’ve used so far to conduct this analysis. Over time, Shelob will evolve, we hope, to give us a way to detect when content has drifted or gone offline. If, however, we can’t detect whether an endpoint is good in the first place, then we likewise cannot detect when things have gone wrong. On the other hand, if, when we first visit, we find the DOI and title on the landing page, but at some future point this degrades, we might be able to say with some confidence that the original has died. I, personally, would encourage publishers not to block automated crawlers, because it’s good when we can determine these types of figures.

Works Cited

Lessig, Lawrence, Jonathan Zittrain, and Kendra Albert. 2014. ‘Perma: Scoping and Addressing the Problem of Link and Reference Rot in Legal Citations’. Harvard Law Review 127 (4). https://harvardlawreview.org/forum/vol-127/perma-scoping-and-addressing-the-problem-of-link-and-reference-rot-in-legal-citations/.(https://www.zotero.org/google-docs/?970bfS

Stox, Patrick. 2022. ‘Ahrefs Study on Link Rot’. SEO Blog by Ahrefs. 29 April 2022. https://ahrefs.com/blog/link-rot-study/.

The Lammey Effect

Crossref — Fri, 16 Feb 2024 00:00:00 +0000

We’re equally sad and proud to report that Rachael Lammey is moving on in her career to the very lucky team at 67Bricks. Her last day at Crossref is today, Friday 16th February. Which is too soon for us, but very exciting for her!

It’s hard to overstate Rachael’s impact on Crossref’s growth and success in her 12 years here. She started as a Product Manager where she developed that role into a broad and central function, and soon moved into the newly-formed community team as International Outreach Manager where she grew important programs such as Sponsors, Ambassadors, a series of ‘LIVE’ events around the world, and she went on to manage her own team and establish some of the most important strategic relationships that Crossref now feels fortunate to have.

Rachael was a significant part of the growth and adoption of new initiatives such as Crossmark, Similarity Check, the REST API, preprints, grants, data citation, and ROR. She's contributed to numerous organisations such as EASE, ALPSP, SSP, ISMTE, STM, and most recently co-Chaired the NISO working group on retractions and corrections.

As Head of Strategic Initiatives, and most recently, Director of Product, Rachael has shown dedication and leadership, supporting and strengthening not just her own teams but all of us across the organisation, encouraging us to do better while being one of the easiest people to work with.

The ‘butterfly effect’ is the notion that the world is deeply interconnected and that one small occurrence can influence a much larger complex system. Rachael embodies that notion, having created positive ripples and waves—and certainly many connections—in the scholarly record, in our organisation, and across the community.

Messages from colleagues

Rachael, I was saddened when I first heard the news that you were moving on to another opportunity. Your professionalism, work ethic, and positive attitude have been inspirational to work around. I have enjoyed the opportunities we have had to collaborate. As you move on to a new experience I wish you success and happiness in your future endeavors. Your presence will be missed at Crossref! Best Wishes.

– Ryan

I will miss you, Rachael. It has been great working with you for the few months that I have been at Crossref. I also cannot forget kayaking together with you and capsizing on the return to the shore, but almost professionally recovering. We would have made the best team this time around. I wish you all the best and many wins in your new role.

– Obanda

I feel like the luckiest human to have worked with Rachael over the last 4 years. She’s the perfect mix of smart and funny and knowing how to get things done. Rachael is a big part of what makes Crossref culture so special — I’ve never felt so supported in a role as when Rachael was my manager and for that I am very grateful. I will miss her wit and humor and her pragmatic approach to work and life!

– Sara

One of my first ‘Crossref LIVE’ events was with Rachael in Brazil in 2016. At the time, my role mostly focused on membership, and we had just started working more closely with ABEC, a large organisation in Brazil that sponsored quite a few members. Rachael managed the sponsor program then and thought this would be a good opportunity to collaborate with a sponsor on an event, and she asked me to join her. There is so much planning for these - venues, local partners, presentations, meetings - and she had all the details in order and made the event such a success. Rachael was supportive, encouraging, and I learned so much from working with her. The Brazil trip was such a positive experience that I realised I wanted to focus more closely on community engagement. Rachael encouraged me to do so.

She and I went on to partner on more LIVE events together. Our time in Indonesia was perhaps one of the most memorable for me - as well as our LIVE event, we had an unexpected tour of Yogyakarta with our Indonesian hosts, involving a tour of Prambanan Temple (see photo below), batik fabric shopping, visiting a few universities, and a stop at our hosts’ home. All the while trying not to let the winding car ride and traffic get the better of us. Our event the next day went perfectly, and I told her, half-jokingly, that the whole experience renewed my faith in humanity. Of note, we also drank the only bottle of wine available in the hotel bar.

Rachael was also my Crossref running buddy, and we spent quite a few miles together - in Brazil, NH, Maine, Oxford, and Spain. During our runs, topics ranged from Game of Thrones to Idris Elba to sportsing, but not so much about work. The next time I find myself in England, we will run a few more miles together, followed by a pint. Thank you for everything!

– Susan

Many have pointed out how talented, wise, or skilled you are and I certainly will not contradict a single word of it but that’s not what comes to mind first for me. Those traits, while true, pale in comparison to the person you are. Your positive, bright demeanor and the way everyone always feels better just being around you. I have dreaded some meetings from time to time. But whenever I’ve been involved in something with you, I’ve always left feeling better than when I started (no matter how grumpy I may have entered). You have been a consistent bright light in the Crossref constellation and you will truly be missed.

– Jon

Rachael! You are the best at cutting through all the bulls**t to get at what really needs to happen and why! Your knowledge is broad and deep, as is your institutional memory for all things Crossref and scholarly publishing. And your unflappability in pretty much any context is admirable and inspiring. We’ll miss you big time! Wishing you all the success at 67Bricks and otherwise.

– Shayn

Hey Rachael - I’m happy to be writing this note of Congratulations!! to you, particularly because it would be awkward to explain this bit of verklempt I’m feeling. Our interactions have been limited, but my impressions of you are of confidence, calm, capability, and collegiality. Thanks so much for your work with the Billing team. I’m sorry we are losing you, and am also so glad to know that you are out there at the forefront of inspiring others elsewhere, not only in the work you do, but also how you go about it.

– Laura

Hey Rachael, Just a big THANK YOU for helping me out all this time. I've had so many questions, and you've always been there to answer them. I always knew I could count on you. Thanks for those heartening chats when I needed a boost, and for including me in webinars and recordings - it really helped me improve. Remember that funny mistake I made on a recording when I called us 'Rochael'? We sure had a good laugh! I'm gonna miss those times and working with you. Can't wait to catch up with you over a drink the next time I'm in town. Wishing you all the very best and once again, thanks for everything!

– Rosa

I am happy we got to enjoy some delicious vegetarian/vegan meals and wine together. I guess I should also mention that I enjoyed recruiting, HR and business fun with you too. Thank you for being such a big part of Crossref for 12 years! Have fun conquering your new chapter. Congratulations!

– Michelle

Rachael! You will be missed. I have really enjoyed our chats and work together. I will miss our wide ranging talks about food, books, and your descriptions of all the sportsing, which I would admire because I can barely manage a short run. :-) Thanks so much for being you and let’s stay in touch! Congratulations on your new endeavor, you’re going to be great.

– Esha

When Rachel joined Crossref she brought a lot of enthusiasm and interest in learning about all that we were doing and also about what we could do. Her ideas and engaging leadership are wonderful for creating interest and drive to make projects happen. It has been wonderful to work with her over the 12 years here. I always look forward to seeing her and hearing what she has been doing outside of Crossref as well as inside. I will miss her but I know she will be doing great things wherever she may be.

– Tim

We’ve had a number of opportunities to reminisce, gassing each other up about how great it has been to work together, so I won’t do too much more of that here. But we will continue building on all of your contributions at Crossref and will carry forward your truth-telling and problem-solving approach to the work we do here. Best of luck with all the future has to offer, and we will certainly miss having you on the team.

– Patrick P

Rachael - I will miss you. I’ve really enjoyed working with you, hanging out while traveling, and getting recommendations on good books to read. Crossref won’t be the same without you. I think you have worked in the most different areas of Crossref and on the most projects of anybody, ever. Your commitment, professionalism and humour helped make Crossref what it is today. Your sportsing is also very impressive. All the best.

– Ed

Not all heroes wear capes! Rachael defines that saying so much with her ethic of getting things DONE! I know she loves to get things done but the speed and quality in which that happens is second-to-none. Rachael will be massively missed at Crossref and 67Bricks don’t yet know what they have found. I enjoyed working with Rachael throughout my tenure at Crossref, she has helped me a huge amount in developing my programming skills and has always been encouraging throughout, especially with the ’toil-bashing’ which is substantial and overwhelming at times.

On a more personal note, she is a great drinking buddy and always motivates me to be more active… by making me feel lazy. The number of hours Rachael would work was crazy, but then I always thought that anyone who gets up that early to go for runs must be a little crazy! AIl the best in your future endeavors and don’t be a stranger.

– Paul

When I started at Crossref in March 2015—at the UKSG conference in Glasgow—Rachael was leading a workshop on text mining, showing off in full glory her ‘unicorn’ mix of skills from her technical knowledge of metadata and APIs to her facilitation techniques with a large group of people, clearly a community whose needs she knew inside out. Later that evening, Rachael took it upon herself to induct me in the ways of Crossref. One of the most important things she thought I should know was that we were all trusted and treated like adults - there was no micromanagement and I was to feel completely free to challenge the status quo.

After one of the first ‘LIVE’ events, in Vilnius, I realised that it was Rachael who had created and embodied that trusted vibe through her own approach. She has been entrusted with so many programs, projects, teams, and tricky situations. Almost every launch, release, announcement, or achievement at Crossref very likely had Rachael’s eye on it at some stage, certainly the ‘actually-getting-it-done’ stage. Our close working relationship over the last nine years grew into a great friendship and I’m not quite sure how I’ll feel when the reality sets in and she’s not here for a quick chat, always a reality check. Working with Rachael has been inspiring, exciting, reassuring, and hilarious (that dry 'Norn Iron' humour!). 67Bricks is so fortunate and I can’t wait to watch her help them go from strength to strength, just like she has done for Crossref. See you soon, Ranty Rachael, no doubt putting the world to rights over a bottle of Malbec and many eyerolls 🙄.

– Ginny

Although our time working together only overlapped the short span of two years, I appreciate how much of a champion you were for ROR and everything else you did at Crossref! I’m sure you’ll continue to do the same, among many other great things, in your new journey at 67 Bricks. You will be missed!

– Adam

Rachael, It has been wonderful working with you!! You are truly a special person. I always looked forward to when we chatted over slack, had a call together, or got to spend time together in person. You are sure to do amazing things on your next adventure. You will truly be missed!! I hope we can stay in touch! Good luck, Rachael!!! Fondly, Amy.

– Amy

I am happy I got to meet Rachael when I joined Crossref in December 2023. We spoke generally about the Products team at Crossref, the differences and similarities between the African and British culture and upcoming projects on automation. You were really patient towards explaining and providing great information on metadata and research. Thank you so much for always responding swiftly to my requests pertaining to Finance issues. I have no doubt that you would be missed at Crossref and would keep doing great things into the future!!! Congratulations Rachael.

– Patience

I will greatly miss working with you, Rachael; you have been a stalwart of reliability and enthusiasm during my time at Crossref and the organisation will not be the same without you. That said, of course, I wish you all the best of luck and success in your future endeavours!

– Martin

Rachael- Congratulations on this new opportunity, I am thrilled for you! I am also very sorry that our time at Crossref did not overlap much and I am grateful for all the chances I had of interacting with you (including being able to meet you in London recently)- you were always very helpful and kind to me. I am hopeful that our paths will cross again in the future. We will definitely miss you here, and I wish you all the best for all the exciting things ahead.

– Madhura

My third week at Crossref back in 2017 was at the annual meeting in Singapore, and not getting into the timezone and not sleeping for 4 days was eased by our visit to a rooftop nightclub on the penultimate night - just before you headed off to Indonesia for a series of meetings with members and sponsors. I still don’t know where you get all your energy!
I’m so sorry you’re leaving - I’ll really miss your honesty, your approach to getting things done, and of course seeing Rosie on our zoom calls. Looking forward to seeing what’s next for 67 Bricks - exciting times!

– Amanda B

Rachael, it’s been a pleasure to work with you. You’re always ready to help and ever full of information. We’ve only just got coordinated on the perennial challenge of timelines! You took things on and got them done, as you said. The world of schol comms won’t even know how much it has to thank you for, probably chiefly for seeing the Retraction Watch data acquisition through and opening it up for all. I will miss your honesty and energy, and the opportunity to challenge you again on the amount of food consumed in one sitting… I don’t think you’ll need luck in your next place, but I wish you that it is all you want it to be.

– Kora

I’m so glad to have met you in person over these couple of days in London shortly after I joined Crossref and it’s such a shame we didn’t have much time to work together more and spend more time (not working) together. Thanks for the introduction to the Scampi Fries - you’ve changed my life forever (for the best obviously)!

– Maryna

Thank you for your collaboration and friendship over the past decade! You will be missed. We've worked on a long series of abbreviations, acronyms, and portmanteaus! Thanks for organizing countless things, from conference satellites to conference rooms. Your long record as fire warden was unblemished. 67bricks will benefit from your singular drive and attention to detail. All the best!

– Joe

Rachael! One thing I admire most in a person is a facility with metaphor accompanied by the ability to see to the heart of a matter, and hoo boy do you have those qualities in spades. I remember so clearly your talk at the Crossref team meeting in Spain in 2023 in which you clarified the Big Picture for us all in an extremely enlightening way, and then, in a smaller but equally impressive achievement, casually mentioning in a Funder Registry meeting that funders should start “stretching and warming up” for the transition to ROR – boy did I latch on to that terrific image. I wish you all the best at 67 Bricks.

– Amanda F

Rachael, thank you so much for all the support, patience, honesty, and determination. I will certainly miss our chats, work-related and non-work related. I wish you all the best in your new ventures!

– Dominika

Rachael - thank you for your boundless patience, generosity, and sense of humour. I’m very grateful I got to learn the Crossref ropes (cropes?) from you. Looking forward to randomly running into you on the Bristol karaoke circuit in 10 years’ time and performing an epic duet of Dancing in the Dark together. There’s a joke in there somewhere about you being the boss.

– Lena

Rachael, Congrats on your new opportunity. You will be greatly missed here. Through the years we have only been at the same events in person a handful of times but I will always remember your amazing personality and sense of humor. I am thankful to have spent some time with you at 2020 PIDapalooza.

– Maria

Thank you, Rachael. Thank you. I know everyone is telling you that they’re sad to see you go (I am too; we all are). I keep thinking if I delay telling you that, maybe the day won’t come when you walk out the Crossref doors. But here it is. Just wanted to you to know that I appreciate you. I appreciate you pushing us forward. I appreciate you being an advocate for all things Crossref. We’ll all miss you. Best of luck at 67bricks!

– Isaac

On one of our first meet ups together, I drove us from the Lynnfield office to Logan airport in rush hour, and we managed to survive the Bostonian road rage in one piece. We spent the ride talking through the intricacies of a sponsoring organisation’s agreement. Rachael has been a safe set of hands and an encyclopedia of institutional memory for Crossref for 12 years.
Rachael is one of those people who’s as equally competent as she is a pleasure to work with. She’s an innate leader because people want to get behind her. She shows her depth of understanding while also inviting input from everyone in the room. I’ll miss our Zoom calls, our marathon Friday sessions, and our post-meeting pub visits.

– Lucy

Hello! Here’s to hoping your new workplace appreciates you as much as you were here – they’re lucky to have you. I only wish we had the chance to interact more. Many hugs!

– Luis

Rachael, I will really really miss you, professionally and personally (but you know this already !). I'll miss all our work, dog, book and putting the world to rights chats. You'll be brilliant whatever you do and wherever you go (67Bricks have no idea how lucky they are !). Just keep 'getting stuff done' and have fun 😀

– Fabienne

You will be sorely missed but can be very proud of what you’ve done during your time at Crossref, I’m sure you’ll continue to have a big impact. You’ve always been a pleasure to work with: efficient, supportive, and always with a sense of fun and enjoyment. That’s probably one of the things that drew me to Crossref even before we worked together as colleagues. Thanks for the support and positivity you’ve brought on many, many occasions and best wishes for the future!

– Martyn

Hey Rachael! I might have not had the chance to meet with you much while still around but I’ll definitely miss your jokes and the good vibes you were bringing to each call! Looking forward to taking over your place for board games when around Bristol ;) Wishing you a great start in the new place!

– Panos

ThunderCats are on the move. ThunderCats are loose. Says it all, really. Best of luck in your new endeavours.

– Mike

Crossref won’t be the same without Rachael and we wish her well on her way to even greater things.

Good luck, Lammey!

Ed Pentz accepts the 2024 NISO Miles Conrad Award

Rosa Morais Clark — Tue, 13 Feb 2024 00:00:00 +0000

Great news to share: our Executive Director, Ed Pentz, has been selected as the 2024 recipient of the Miles Conrad Award from the USA’s National Information Standards organisation (NISO). The award is testament to an individual’s lifetime contribution to the information community, and we couldn’t be more delighted that Ed was voted to be this year’s well-deserved recipient.

During the NISO Plus conference this week in Baltimore, USA, Ed accepted his award and delivered the 2024 Miles Conrad lecture, reflecting on how far open scholarly infrastructure has come, and the part he has played in this at Crossref and through numerous other collaborative initiatives.

Established in 1965, the Miles Conrad Award gives recognition to those who’ve made substantial contributions to the information community over a lifetime. Named after the founder of the National Federation of Abstracting and Indexing Services (NFAIS)—an association that since merged with NISO—the award encourages innovation in content management and dissemination. Over the years, leaders and innovators who have significantly influenced the field of information exchange have been honored with the award. Ed has joined an illustrious group!

Ed’s leadership in collaboration and diplomacy has led to Crossref’s success in making research objects more accessible and useful to a wide global audience, including publishers, researchers, funders, societies, libraries, and more. Crossref’s founding purpose is stated as:

“To promote the development and cooperative use of new and innovative technologies to speed and facilitate scientific and other scholarly research”.

Acknowledging his privilege as a Western, university-educated, white man, which he comments has helped his career, Ed prioritises collaboration, open communication, teamwork, and equity in creating a positive, trusted environment that has brought together a diverse team of 49 colleagues from 11 countries. The organisation’s culture allows everyone to grow and contribute to the mission of a connected research nexus by including and developing solutions for community members across the globe.

Before his journey with Crossref, Ed held a number of roles at Harcourt Brace, including launching Academic Press’s first online journal. This experience led to his involvement with the DOI-X pilot project, which became the foundation for Crossref. Since its launch in 2000, under his leadership, Crossref has become an important component of the research ecosystem, an open scholarly infrastructure with nearly 20,000 members across more than 150 countries. Crossref is now the main source of >155 million records about all kinds of research objects and this open metadata registry is relied upon by thousands of tools and services across the whole research system.

Ed’s influence is also evident throughout the wider world of open scholarly infrastructure; aside from establishing Crossref, he co-founded ROR and was a founding member of ORCID, where he also served as board Chair. Further, he has engaged with the community by holding various advisory positions, including the DOI Foundation, the Digital Object Naming Authority (DONA), and the Coalition for Diversity in Scholarly Publishing (C4DISC).

Ed also emphasised that the long-term success of community initiatives lies in patience and the ability to agree on high-level principles of purpose and governance, which oil the wheels of collaboration, encourage participation, and enable more progressive change that builds and lasts over time. He says, “to solve collective problems it takes collaboration and diplomacy, bringing together a group of stakeholders, balancing their different concerns, building trust, and reaching consensus.”

The adoption of the Principles of Open Scholarly Infrastructure (POSI), along with (so far) 14 other organisations, was a key turning point for Crossref, Ed said, and one which has already paved the way for more openness of key metadata for the community, including references and retractions, as well as closer partnerships with many of the other POSI adoptees, given their shared understanding and experience.

Referencing the current “peak hype” around artificial intelligence (AI), Ed points to the challenge of research integrity and the “growing field of science sleuthing” as a forthcoming area that Crossref and open metadata may help tackle at scale, including through Crossref’s Integrity of the Scholarly Record (ISR) Program and—of course—community-wide collaboration.

In concluding his talk, Ed describes his hopes and dreams for scholarly communications in the future. He would like to see more balance in diversity in the leadership of open scholarly infrastructure, extended integrations among the various foundational infrastructures, and a fully connected system where the scholarly record is inclusive globally.

Ed, on behalf of all your proud colleagues at Crossref, thank you and congratulations!

ISR Roundtable 2023: The future of preserving the integrity of the scholarly record together

Madhura Amdekar — Tue, 06 Feb 2024 00:00:00 +0000

Metadata about research objects and the relationships between them form the basis of the scholarly record: rich metadata has the potential to provide a richer context for scholarly output, and in particular, can provide trust signals to indicate integrity. Information on who authored a research work, who funded it, which other research works it cites, and whether it was updated, can act as signals of trustworthiness. Crossref provides foundational infrastructure to connect and preserve these records, but the creation of these records is an ongoing and complex community effort. Crossref has always shown a deep commitment to preserving the integrity of the scholarly record in an open and scalable manner.

Given the increasing concerns in the community about matters of research integrity and integrity of the scholarly record (ISR), we at Crossref have been engaging with community members to understand what developments are needed. In 2022, we organised a roundtable discussion to talk about our role and the applicability of Crossref’s services in preserving and assessing the integrity of the scholarly record. We’ve acted on much of that feedback since, and so in October 2023, we organised a follow-up event, once more gathering representatives of publishers, research integrity experts, policy-makers, academic institutions, funders, and researchers (the full list of participants can be found in the appendix). This post aims to offer insight into the discussions at this event and the next steps. The objective of this event was to take the conversation forward by:

Sharing the progress made by Crossref on matters related to ISR since the last roundtable event.
Sharing information about how metadata contributes to the Research Nexus, and can act as trust markers for research outputs.
Apprising the community about the latest membership trends and examples of activities that we see, such as title transfer disputes, unregistered DOIs, requests for deleting records, and sneaked references .
Building upon the ideas discussed during the 2022 roundtable event to progress the conversation about issues related to ISR.
Learning from the participants about their experiences of pursuing research integrity initiatives.
Last but no less importantly, hearing from the participants their perspectives on strategies for preserving the integrity of the scholarly record, and opportunities for collaborating to leverage metadata to assess the integrity of the scholarly record.

The event was kicked off by Ed Pentz, who spoke to the participants about how integrity is key to Crossref’s mission, and Crossref’s vision of the Research Nexus. Next, Amanda Bartell, the Head of Member Experience at Crossref, shared the recent developments and trends in community behaviour. She expanded upon the actions taken by Crossref as part of its ISR program since the last roundtable event, which include:

Acquisition and opening of the Retraction Watch database, which makes it easier to access information on retractions and corrections.
Increased participation in the Global Equitable Membership (GEM) program, enabling a wider section of the community to provide and access trust signals.
Newer developments around metadata that act as trust signals: e.g. 120K grants or awards now have a Crossref DOI, and the planned transition of the Open Funder Registry into ROR.
Recruitment of a Community Manager to focus on working with publishers and editors, including on ISR (that’s me!), and recruitment of a Technical Community Manager to enable greater use of our APIs.

Amanda highlighted that all Crossref members should be using ROR IDs to provide affiliations for authors (along with ORCID iDs) in their Crossref metadata. She also shared some latest examples of community behaviours that we have seen, such as requests from authors to delete records of works that were published without their permission, title ownership disputes between publishers, and the recent instance of sneaked references.

Ivan Oransky, co-founder of Retraction Watch, and Lena Stoll, Product Manager at Crossref, were next, and they spoke about the future of the Retraction Watch database, and about the Crossmark service. After this, some of the other roundtable participants shared initiatives that they have undertaken that support ISR:

Jodi Schneider from the University of Illinois Urbana-Champaign spoke about NISO’s CREC Working Group that has created a Recommended Practice that should be followed by relevant stakeholders for communicating retracted research (Crossref’s Director of Product Rachael Lammey was the co-chair of that group).
Kihong Kim from the Korean Council of Science Editors shared information about the workshops that the Council has organised for researchers on publishing in journals.
Alberto Martín-Martín from Universidad de Granada presented his thoughts on how to reconcile the publishing system and the institutional view of tracking research outputs.
Bianca Kramer from Sesame Science spoke about her analysis of and the implications of sneaked references, duplicate references, and missing references for citation integrity.
Joris van Rossum from STM Solutions spoke about the STM Integrity Hub and the integrity tools that are being developed in collaboration with some publishers.

Some of the most valuable reflections stemmed from discussions in small groups on these three key questions:

What value do you see in the integrity and completeness of the scholarly record in the way you operate? How do you contribute to it? How can it support you to achieve your own goals?
Are you aware of Crossref services? What are the barriers to more uptake? What are the challenges and opportunities?
What information is essential and nice to have for you in the scholarly records to support trust signalling and ascertaining trustworthiness?

As groups shared their discussions, a few themes became apparent that I would like to elaborate on further.

What is “complete”?

Given the prompt to talk about the value of completeness of the scholarly record, an immediate reaction at most tables was: how much metadata qualifies as “complete” metadata? Can the scholarly record be considered complete if some publishers or journals do not use Crossref? What is the optimum level of metadata that should be deposited by members - should a minimum data standard be defined by disciplines, or should there be standard data requirements for all? The composition of metadata appears to change over time, too, as the processes change and our ability to record their facets increases. While there were spirited discussions about what constitutes a complete scholarly record, everyone agreed that “completeness” of metadata, as much as is possible, should be the aim. Unambiguous and consistent standards may help with this, for example, the Metadata 20/20 community creation of principles and best practices, and potentially also using a set of recognition standards and reproducibility badges.

Global participation is equally important for a truly “complete” scholarly record. In order to enable as many in the scholarly community as possible to participate in Crossref services and metadata, Crossref launched the Global Equitable Membership (GEM) program in 2023. Under this initiative, membership and content registration fees are waived off for members from the least economically advantaged countries. We are seeing first signs that this initiative meaningfully lowers the barriers to participation for organisations based in those countries, and allows the global community to contribute towards the building of a comprehensive research ecosystem.

At the end of the day, it is important to recognize that rich metadata is crucial because it can be used for all kinds of analysis, which in turn can drive decision-making. Even if some of the metadata components are sporadically missing, that could be acceptable, because every piece of data counts!

Corrections and Retractions

Similar to last year, retractions and corrections continued to be a topic of great interest in this year’s roundtable. This was not surprising given their relevance as trust indicators as well as the recent development with the acquisition of the Retraction Watch database by Crossref. Having heard from Ivan about the Retraction Watch taxonomy of reasons for retractions and the metadata included in the database, participants expressed the need to investigate this taxonomy as a community standard. While the Retraction Watch taxonomy is not widely known, we at Crossref are working to map the Crossmark taxonomy with the Retraction Watch taxonomy, which will enable complete integration of the Retraction Watch database with the Crossref database.

It would also be useful to add more information to retraction notices. Having more information about the reasons for retraction will not only destigmatize retractions, but certain additional information, such as submission dates for those outputs, might help with ethical investigations to determine whether manuscripts were being submitted to multiple publishers simultaneously.

On the topic of retractions, another aspect that came up in the room was about incentives for researchers to publish as much and as quickly as possible. If researchers indulge in unethical publishing practices due to this pressure to publish, that is hugely detrimental to the cause of research integrity and to the progress of scientific research in general. However, there is a distinction to be made between the integrity of the research and the integrity of the scholarly record - unethical research and publishing practices, including but not limited to data falsification, fabrication, and plagiarism, affect research integrity while integrity of the scholarly record is affected by unavailability of metadata, outdated metadata, incomplete metadata records, and incorrect metadata (e.g. as seen in the case of sneaked references).

There was a lot of discussion about Crossmark, a cross-platform service provided by Crossref that allows readers to discover whether an item has been updated, corrected, or retracted just by clicking a button that is standardised across publication platforms. While most participants acknowledged its importance, they also pointed out that its uptake has been limited and publishers do not use it as much, perhaps because it is difficult to implement and there’s a matter of providing more clarity about it to the readers. There were suggestions to add a notification system to Crossmark such that every time a published output is retracted, a notification goes out. This seemed of particular interest to funders, whose grievance was that they are usually the last to find out when research that they have funded is retracted. They would welcome notifications that would alert them to such events.

We already have plans to consult with the community more specifically about what changes they’d like to see to Crossmark that will enable them to implement it easily and use it more frequently. Take a look at this thread on our community forum and add your thoughts for our next steps on Crossmark.

The importance of education

There was an overwhelming sentiment that there was a need for collective arbitration of research integrity issues. However, everyone recognized that this is not a role for Crossref. We can act as a “trust broker” by bridging different metadata and identifiers that otherwise might not interact, creating a network of research outputs whose credibility can be verified by others. Many participants called for Crossref to increase its efforts in educating community members about the importance of metadata and how different pieces can be linked together to make meaningful connections.

Research practices vary between countries, and between institutions. Correspondingly, the metadata being provided by diverse Crossref members may also vary. There is an opportunity here for the global research community to work together to increase awareness about ethical standards, so that a lack of specific metadata or its variances (e.g. unusually formatted metadata, or non-standard metadata fields) may not be construed as “lower quality” metadata. Many felt that the greatest need for education about metadata is for the academic community – although individual researchers contribute a wealth of metadata associated with any published research output, they do not necessarily understand how metadata contributes to the completeness of the scholarly record. There is a further opportunity to talk to the academic community about how different metadata components link together to form a rich network, supporting visibility and confidence in their work. A greater awareness about these topics is likely to encourage researchers to provide more metadata and identifiers.

While most participants at the roundtable event agreed about the need for this conversation and the educational opportunities here, if Crossref were to lead these efforts, it would represent, in some eyes, a diversion from its mission. We do have several initiatives already to support our communities. As part of the Crossref Ambassadors program, volunteers from the international scholarly community who believe in Crossref’s mission liaise with our team to conduct training in their communities about using Crossref services and, generally, about the importance of metadata. In 2023, we also launched a new online public forum, the PLACE, in collaboration with the Committee on Publication Ethics (COPE), the Directory of Open Access Journals (DOAJ), and the Open Access Scholarly Publishers Association (OASPA). This forum is a place where new publishers can connect with these organisations and learn about best practices in scholarly publishing via discussion posts and by asking questions, as they get started. Another initiative that is designed to help new Crossref members is the “Managed Member Journey”: as members join and move through the various stages of membership, key information is shared with them during each of these stages in the form of triggered automated emails, web pages, and webinars.

While Crossref’s direct interactions with researchers are limited, we welcome the community’s recognition of the need to raise awareness about these matters. We have started engaging more closely with the reporters of metadata issues, in many cases investigators and ‘sleuths’ in the area of research integrity, and plan some closer collaborations with this group in 2024. We are open to supporting community efforts to inform other stakeholders about the importance and uses of metadata.

Incentives for the community

Another theme that was heard repeatedly was “incentives”: incentives for researchers to contribute to a “complete” scholarly record, incentives for publishers to improve metadata, and incentives for everyone to report on and register retractions.

As I mentioned before, a shared sentiment is that researchers may not be aware of the value of rich metadata. While more publications, increased citations, and greater grant funding are some examples of incentives that are part of the current academic settings, the right incentives probably do not exist for researchers to provide complete metadata. With the diverse set of participants present at this meeting, some groups also discussed how the current research assessment system can change to incorporate other metrics, perhaps those based on open science and open data.

What could be the incentives for publishers to improve the metadata collected and deposited by them? One suggestion was that clearly defined benefits of rich metadata can incentivise publishers. Being aware of what funders are mandating, can be another incentive. On the same note, funders will benefit from knowing what metadata is being provided by publishers. This metadata is available through our open API, and nine key checks on members’ activity are available through our public Participation Reports.

Retractions featured again in the discussion on the topic of incentives. As shared by Ivan, retractions are on the rise every year, with about 43k retractions currently in the Retraction Watch database. On the other hand, retractions registered in Crossmark at the time of the meeting numbered just 14k and have recently jumped up to 25k thanks to Hindawi/Wiley’s dedication to good open metadata. Besides the fact that the uptake of Crossmark by Crossref members is limited, another reason for the low number of retractions being registered is the associated stigma. Corrections and errata are usually conflated with retractions, and all these terms, which represent different kinds of updates that may happen to a published item, have a stigma associated with them in the academic community. There is a need to destigmatize retractions, and perhaps incentivize them by noting that these updates are essential to uphold the integrity of the scholarly record and to highlight the publishers that are showing leadership in addressing the issues openly through up-to-date Crossref metadata.

What metadata is nice to have in the scholarly record?

We asked everyone what information they think is essential as well as “nice to have” in the scholarly record to support trust signalling, and we heard a range of answers. Peer review information was recognized to be important. This would include data on who the peer reviewers were and standard peer review terminology that has been published by NISO. More generally, as much metadata as possible about the main actors of the peer review process was considered important - such as designating who the corresponding author is, and who the handling editor or the decision-making editor was.

As special issues led by guest editors in journals have been brought to the attention of late due to the uncovering of irregularities in some of them, one of the first suggestions in this context was more metadata about special issues. Participants thought that it would be useful to collect and distribute information on handling/guest editors of special issues, peer reviewers, as well as submission and acceptance dates. Recently, COPE has released guidance on “best practices for guest-edited collections” , highlighting that this topic looms at the forefront for the scholarly information industry.

Adding information on ethical approvals provided by institutional review boards would add more nuance to the research outputs. Metadata about clinical trials helps to add transparency to research in a field, where reproducibility is of primary importance. Conflicts of interest are another factor that could be a cause of concern if not reported accurately; these declarations were mentioned by the participants as important for signalling trust.

Recognizing that it is the relationships in the metadata that add context to research output, participants echoed that better interlinking between preprints and their published versions is required. To aid with all of this, it has been suggested that a complete list of all metadata that can be deposited with Crossref be made available in a simple format, so that members have more visibility about all the possibilities that exist for providing metadata.

Next steps

We asked all participants if the discussions prompted them to plan to take any actions in the near future. Several attendees reflected that the discussion encouraged them to go back and review the metadata that they are depositing with Crossref, and how they can make more use of the data openly available from Crossref. We also heard how some found training opportunities therein - discussion points from the event could be included in workshops for affiliated researchers, and in COPE guidance for members. As encouraged by members of the NISO’s CREC Working Group, some participants were looking to respond in the (then open) consultations of the draft Recommended Practice, NISO RP-45-202X, Communication of Retractions, Removals, and Expressions of Concern (CREC). One message resonated loud and clear: preserving the integrity of the scholarly record cannot be a lone endeavour and has to be a community effort. Attendees expressed their commitment to continue these conversations, with the next most opportune time being at the STM week. Everyone recognised that collaboration in this space is the need of the hour: facilitating information and data sharing across all the players in the ecosystem would be crucial to progressing this topic. As Bianca Kramer declared during her presentation, “I am committed to using only open data in my research, as access to data is important for the community to detect problems at scale”.

At our end, we are looking to act on suggestions that are specific to Crossref:

Consultation with the community about Crossmark

One of the first things that we are doing in early 2024 is to consult with our community about the developments needed in the Crossmark service. Our key aim with this exercise would be to understand how we can enable a more effective uptake of this service so that Crossref members can easily fulfil their obligation of keeping their records updated. We are keen to understand what we can do to help our members to send us metadata about updates to an output, and how we can help downstream services that use this data. Insights from this consultation will also help inform how the Retraction Watch data can be most effectively integrated into Crossmark and communicated to users. Please visit the discussion and add your thoughts here: https://community.crossref.org/t/communicating-post-publication-updates-inviting-feedback-on-the-next-steps-for-crossmark/.

Development of resources for using our API

As there is clearly no dearth of metadata components that the community thinks would be “nice to have” for signalling trust, it is equally important to equip users and downstream service providers to be able to access the rich metadata that is available with Crossref. This rich metadata opens up new avenues for the development of services and resources that can benefit the scholarly community. On account of this, we plan to prioritise development of resources for using Crossref APIs. These efforts would include making available workbooks with a variety of API use cases - ranging from how to use basic API queries, to how to use APIs for obtaining grant information or for obtaining citation data and so on, as well as retrieving corrections, retractions, and update information, especially when the Retraction Watch dataset merges in with the rest of the Crossref metadata.

Working group to facilitate community efforts for preserving ISR

We are looking to set up a working group that will facilitate the various stakeholders in the scholarly ecosystem to work together towards preserving the integrity of the scholarly record. One direction for the group could be to consider the role and impact of Crossref metadata in ISR. Another area of focus will be to enrich information about retractions, corrections, and expressions of concern. Raising industry-wide awareness about the current concerns in upholding the integrity of the scholarly record, and how comprehensive metadata can act as markers of trust about research output, would be another focal point.

Continued community outreach

We will continue our efforts to engage with the community on the very important issues surrounding ISR. We are particularly keen to redouble our efforts to include more funders and institutions in these conversations. Preserving the integrity of the scholarly record needs to be a truly inclusive effort and will benefit from diverse voices in the community. With that in mind, consulting with the community in Asia is next on our radar.

We look forward to working with the community further on this important topic - if you are keen to participate in these discussions and want to contribute towards preserving the integrity of the scholarly record, we would love to hear from you. Please write to us at feedback@crossref.org if you have any suggestions on this topic.

Appendix: Participant list

Name	Role	organisation
Ed Pentz	Executive Director	Crossref
Amanda Bartell	Head of Member Experience	Crossref
Madhura Amdekar	Community Engagement Manager	Crossref
Luis Montilla	Technical Community Manager	Crossref
Lena Stoll	Product Manager	Crossref
Kora Korzec	Head of Community Engagement and Communications	Crossref
Ivan Oransky	Co-Founder	Retraction Watch
Jennifer Wright	Research Integrity Manager	Cambridge University Press
Guntram Bauer	Director of Science Policy & Communications	Human Frontier Science Program
Wendy Patterson	Scientific Director	Beilstein-Institut
Sarah Jenkins	Director, Research Integrity & Publishing Ethics	Elsevier
Helene Stewart	Director, Editorial Relations Web of Science	Clarivate
Bianca Kramer	Advisor, Research Analyst, Facilitator	Sesame Open Science
Adya Misra	Research Integrity and Inclusion Manager	Sage
Andrew Joseph		Wits University Press
Theodora Bloom	Executive Editor	BMJ
Alberto Martín-Martín	Assistant Professor	Universidad de Granada
Aaron Wood	Head, Product & Content Management	American Psychological Association
Fred Atherden	Head of Production Operations	eLife
Kihong Kim		Korean Council of Science Editors
David Flanagan	Senior Director, Data Science	Wiley
Chiara Di Giambattista	Communications Director	OpenCitations
Scott Delman	Director of Publications	ACM
Chi Wai (Rick) Lee	General Manager	World Scientific Publishing Co (WSPC)
Leslie McIntosh	VP, Research Integrity	Digital Science
Adam Day	Director	Clear Skies
Damaris Critchlow	Project Manager	Karger
Tamara Welschot	Head of Research Integrity, Prevention	Springer Nature
Kathryn Dally	Research Integrity and Policy Lead	Research Services, University of Oxford
Masahiko Hayashi	Director, JSPS Bonn Office	Japan Society for the Promotion of Science
Simone Taylor	Chief, Publishing	American Psychiatric Association
Christna Chap	Head of Editorial Development	Karger Publishers
Coromoto Power Febres	Research Integrity Manager	Emerald Publishing
Carole Chapin	Project Manager	French Office for Research Integrity
Jodi Schneider	Associate Professor of Information Sciences	University of Illinois Urbana Champaign
Oliver Koepler	Head of Lab Linked Scientific Knowledge	TIB - Leibniz Information Centre for Science and Technology
Heather Staines		Delta Think
Eri Anno		JSPS Bonn office
Joris van Rossum		STM Solutions
Anita de Waard	VP Research Collaborations	Elsevier

RORing ahead: using ROR in place of the Open Funder Registry

Rachael Lammey — Tue, 30 Jan 2024 00:00:00 +0000

A few months ago we announced our plan to deprecate our support for the Open Funder Registry in favour of using the ROR Registry to support both affiliation and funder use cases. The feedback we’ve had from the community has been positive and supports our members, service providers and metadata users who are already starting to move in this direction.

We wanted to provide an update on work that’s underway to make this transition happen, and how you can get involved in working together with us on this.

Overall, we are building more comprehensive support for ROR into Crossref’s services. Some of this work is specifically to support using ROR to identify funding organisations in place of funder registry IDs. We have a number of parallel, complementary projects underway to support different elements of this work:

We are evolving our metadata schema so that we can collect ROR IDs in places where we currently support the collection of Funder IDs.
We are analysing the coverage of Funder ID to ROR ID mappings and testing the way we expose them in our APIs.
We are developing new matching strategies to match text strings to ROR IDs.

1. Schema updates

Everything flows from being able to get ROR IDs into the Crossref metadata!

We are evolving our metadata schema so that we can collect ROR IDs in places where we already support the collection of Funder IDs – for instance, in the funding section of the metadata for works and in the funder section for grants.

We’re working with members and service providers so that they can try sending us this data via a pipeline our Labs team has built to test schema updates before they go live. We are actively recruiting members to help us test our new pipeline by providing sample XML for registration. Planned metadata inputs and outputs are detailed in Including ROR as a funder identifier in your metadata (metadata prototyping instructions), we’d encourage you to provide feedback on these in the document, ideally in the next two weeks. We’re aiming to release an updated schema that supports these changes in Q1 2024.

2. Modelling ROR ID/Funder ID mappings in our metadata model

We have integrated the ROR registry into our evolving metadata model, and we have started work to integrate the Funder Registry. The aim is to create more flexibility in how Crossref’s metadata can be supplemented and queried, and give more clarity as to which party asserted or created a metadata element.

We’re working on an early iteration of how the model handles ROR IDs, funder IDs and their equivalencies. Once we have something to share, we’ll welcome community feedback on this approach and on the metadata model in general.

3. Developing new matching strategies to match text strings to ROR IDs

Ideally, everyone would always use persistent identifiers to exchange information about contributor and awardee affiliations, organisations related to works, as well as funders supporting the research. In practice, this information is often exchanged as data without identifiers, such as affiliation strings (e.g. “University of Virginia, Charlottesville, VA”), funder names, or even funding acknowledgements (e.g. “Funding and support generously provided by the Ford Foundation”). In such situations, a good metadata matching strategy can help map these to persistent identifiers.

Currently, we are focused on developing reliable strategies for matching affiliation strings to ROR IDs. In the future, we will adapt the strategies to support funder names and funding acknowledgements as well. All the strategies will be rigorously evaluated using real-life data. We will make the strategies, as well as the evaluation datasets and evaluation results, publicly available for anyone to use. If you are interested in collaborating on the development or the evaluation of the matching strategies, please get in touch!

In the future, we might also apply some of the new matching strategies at Crossref, to the metadata our members send us. This would allow us to insert matched identifiers to the metadata to better connect organisations with other items in the scholarly record. We already have a process that matches the names of funders supporting research against the Funder Registry and enriches the metadata with matched Funder Registry IDs. Developing and evaluating reliable matching strategies will allow us to modify this process to use ROR IDs instead, and extend it to support other use cases, such as contributor affiliations.

What will the transition mean for you?

We do recommend that you begin looking at what it will take to integrate ROR into your systems and workflows for identifying funders. Talk to your service providers about this to ready them for this change. To reiterate the point from the earlier post, in the short term, and even in the medium term, Funder IDs aren’t going away and the Funder IDs will continue to resolve – they are persistent, after all. Eventually, however, the Funder Registry will cease to be updated, so any new funders will only be registrable in Crossref metadata with ROR IDs. Legacy Funder IDs and their mapping to ROR IDs will be maintained, so if Crossref members submit a legacy Funder ID, it will get mapped to a ROR ID automatically. Note, too, that Crossref is committed to maintaining the current funder API endpoints until ROR IDs become the predominant identifier for newly registered content. We also know that there are questions that we’ll want to tackle with the community as we all make progress, some we know and some we don’t know. With that in mind:

Tell us what you need!

We want to hear from you! We have set up several channels of communication meant to ensure that you can tell both ROR and Crossref what will make this transition easier for you and that you can get answers to your questions.

First, we are conducting a series of Open Funder Registry user interviews designed to deepen our understanding of where Funder IDs are being used in workflows and systems. Write community@ror.org if you’d like to participate in these interviews to show and tell us how you’re using Funder IDs.

Second, in 2024, we will be running a follow-up to the funding data workshop we ran in June 2023. Please get in touch if your organisation would be interested in participating in the discussion.

Solving your technical support questions in a snap!

Isaac Farley — Thu, 25 Jan 2024 00:00:00 +0000

My name is Isaac Farley, Crossref Technical Support Manager. We’ve got a collective post here from our technical support team - staff members and contractors - since we all have what I think will be a helpful perspective to the question: ‘What’s that one thing that you wish you could snap your fingers and make clearer and easier for our members?’ Within, you’ll find us referencing our Community Forum, the open support platform where you can get answers from all of us and other Crossref members and users. We invite you to join us there; how about asking your next question of us there? Or, simply let us know how we did with this post. We’d love to hear from you!

A little about us and what drives the team

I’m fortunate to manage a great team - Evans, Kathleen, Paul, Poppy, and Shayn - who enjoy and are hardwired to guide. We have different strengths and interests, but the thing that unites us is that we are energized when we can unpick tricky problems for all of you, our members and users. In 2023, the technical support team answered around 11,000 questions from all of you. We do that with one-to-one requests sent to us via email and within our support center (using a closed-source software called Zendesk). And, we’ve been providing more and more support in our Community Forum, where we’re aiming for open interactions, so we can all learn from the rich exchanges with all of you (the Forum has an integration with Zendesk, so posts made in the Forum are delivered to us there, so our team won’t miss any of your questions).

We established in the previous paragraph that we have a great technical support team who all pride themselves on helping you. But we’re also human; the reality is that many of those ~11,000 technical support questions asked of us in 2023 were repetitive, and there are always trends in the questions asked. That’s another important reason why we’re hoping to have more and more of these questions asked and answered within our Community Forum; again, so we can all learn from one another. We know certain parts of content registration, metadata retrieval, and everything in between are, well, complicated. The Crossref learning curve can be steep for all of us. Collectively, our technical support team has more than 25 years of Crossref experience, and we’re continuously learning new things about the Research Nexus and the scholarly ecosystem from one another and all of you.

Learning through this complexity is one of the most enriching parts of our days. Our daily stand-up, modeled off of different software development methodologies, where together we troubleshoot tangly questions from all of you, share ideas, and just keep up-to-date on the latest from across the organisation leads to a lot of knowledge exchange. So, years ago, we decided to transform the issues we discuss in those stand-ups into public-facing posts in our Community Forum. It gave us the opportunity to share much-needed examples in a new community space; and, we knew, since these were the issues we were all discussing and learning from ourselves, that many of you would also benefit from us surfacing the topics openly. We call these posts tickets of the month, since the majority of topics we discuss have originated from tickets in our support center.

Examples of some of the most popular topics in the last two-plus years have been:

Snapping our fingers

Like I said, these posts originated from real-life questions of us from our community members. In most cases, we’ve been asked these questions by many of you. These Community Forum posts are our attempts to unlock understanding of our services, rich metadata, or the larger Research Nexus. Said another way: we all see value in putting in the effort to post one more example or answer that nuanced question. Perhaps one of our posts will include an example that really resonates with you and/or your work.

In that spirit, I asked Evans, Kathleen, Paul, Poppy, and Shayn to answer this question below (yes, I’m going to weigh in, too):

What’s that one thing that you wish you could snap your fingers and make clearer and easier for our members?

Evans, Technical Support Specialist

As a publisher and a Crossref member, at one point or another, you might have made a mistake in the metadata deposited for a given DOI. I’m sure after the slight ‘shock’, the next question you had in mind was, ‘How can I correct this mistake?’ Well, here is a simplified guide on how to do that correction/update!

Can I modify/ update the metadata of a registered DOI? As indicated by my colleague Shayn below in this blog post, Crossref DOIs are designed to be persistent (and cannot be changed/deleted once registered). And YES, you can update the metadata associated with any of your registered DOIs whenever necessary, at no additional fee.

How can I perform a standard metadata update? To add, change, or remove any metadata element from your existing records, you generally just need to resubmit your complete metadata record with the correct/new changes included. How you choose to update a DOI metadata record is highly dependent on the content registration tool/platform you are using/comfortable working with, as described below:

OJS: Navigate to the article record you wish to update, add in your new metadata/delete relevant metadata fields, and deposit it again using the Crossref import/export plugin. You must be running at least OJS 3.1.2 and have the Crossref import/export plugin enabled.
Web deposit form: Open the web deposit form, and re-enter all the metadata, including the new changes - leave the relevant field blank to delete it, or add in your new metadata to update it - and resubmit the form (note: there are a handful of exceptions to this for the web deposit form).
Depositing XML files with Crossref: Make changes to the relevant XML file and resubmit it to Crossref via the admin tool. When making an update, you must supply all the bibliographic metadata for the record being updated, not just the fields that need to be changed. During the update process, we overwrite the existing metadata with the new information you submit, and insert null values for any fields not supplied in the update. This means, for example, that if you’ve supplied an online publication date in your initial deposit, you’ll need to include that date in subsequent deposits if you wish to retain it. Note that the value included in the element must be incremented each time a DOI is updated.

If you’re looking for real-life examples of other members who have updated their metadata, the Community Forum is a great starting point. If you have follow-up questions on any of the existing threads, I invite you to post a message today.

Kathleen, Technical Support Specialist

One of my favorite types of queries to tackle are those regarding content registration problems. I love a good mystery and getting to the bottom of why that pesky submission just didn’t succeed. Sometimes members come to us with an error message and specific questions about what has gone awry. But, in fact, two of the most common questions we receive are: 1) I deposited something; did it work? and 2) I deposited something; why isn’t it showing up?!

To address the first question of whether your submission went through or not, I wrote a forum post back last June talking about how to use the admin tool to see whether your registration was successful or not. We know there are also email alerts and perhaps status messages within your own registration platform, but using the admin tool is a great way to concretely check where your submission has ended up. If it’s not there, we didn’t get it!

Using the admin tool is also a great way to get more details about the submission and more information in case the submission happened to fail. You may have had the experience in which you contacted us with a question about a failed deposit, and we asked you for the submission ID. You can find that info in the admin tool! And we ask for that, because that helps us get to the bottom of those error message mysteries.

And, as for the second question of when will your DOI be active, my colleague, Paul, wrote a fantastic post on the forum (with an excellent flowchart and all!), explaining when you can expect to see your DOI up and running. Often members will submit a deposit and expect the DOI to resolve immediately. When that doesn’t happen, many think that something has gone wrong or perhaps there is an error, but, in fact, our systems may still be updating and processing the metadata.

I recommend giving these two posts a read if you’re at all concerned about whether you’re depositing your content correctly or not. Hopefully, they’ll help ease your content-registration worries.

Isaac, Technical Support Manager

Oh, thanks for asking! Many of our members, after receiving one of our reports, will respond to us in support with a message similar to: ‘What did I do wrong? Please help me fix this. I don’t want to be out of compliance!’

The receipt of one of our reports does not necessarily mean that you’ve done anything wrong. In truth, the reports we send to our official member contacts are produced using very simple logic. It’s true that they may signal larger, more complicated problems, but we really need your help to determine next steps (and, in some cases, no action is needed because there is no issue for members to fix (e.g., many failed resolutions within the resolution reports)).

Let’s look at the conflict and resolution reports since those are the reports we get the most questions about:

Conflict reports are the most complicated of our reports to navigate. But, the reports are generated using simple logic: if you register two or more DOIs with matching bibliographic metadata, we’ll flag those DOIs as being in conflict, which will generate a warning message at the time of registration and a subsequent conflict report. When members receive this report, we often get the sense that members simply want us, the technical support team, to tell them how to fix it. The problem is we don’t know your content, so we don’t know if the two DOIs do represent a duplicate, or if both DOIs, while having very similar bibliographic metadata, are legitimate and will be maintained going forward (e.g., for errata). Paul wrote a great post in our community forum about what conflicts are and how to resolve them.

Resolution reports, like conflict reports, are generated using simple logic: a resolution is the result of a click on that DOI. If a DOI has been registered, that click results in a successful resolution. If that DOI has not been registered, that click results in a failed resolution. Our monthly report is a count of those resolutions - successful and failed. Failures can represent content registration errors in a member’s workflow. Or, they can signal that an end user has made a mistake when attempting to click the DOI in question. So, for example, an end user perhaps added an extra period onto their DOI link. Instead of trying to resolve https://doi.org/10.5555/cupnfcm2wj, a legitimate DOI, they added a period to the end and tried to resolve https://doi.org/10.5555/cupnfcm2wj. instead. That extra period at the end of the DOI has made it a completely different DOI that is not registered with us, thus they get a failed resolution. This is pretty common. For members with content being regularly clicked, there will be user errors in the logs appearing as failed resolutions. The first question members should ask themselves when reviewing the failed .csv report within the resolution report is: ‘are any of these DOIs legitimate DOIs that I thought we had registered?’ We have more on the basics of resolution reports also over in our Community Forum.

Paul, Technical Support Specialist & R&D Support Analyst

I know we were asked to name “one thing” but I have two that are closely related. May I snap my fingers twice and fix two issues? [Of course, Paul! Take it away!]

Paul’s first snap

One of the most asked questions we get in support is “why is my DOI not working?” 90% of the time it is down to a failed submission. A good proportion of those failures are a result of title mismatches between the deposited container title and the one we have stored on the system here. There are other error messages that occur, too, which I wrote about back in 2020.

So, “why do we fail submissions because of title differences?” You might ask.

Well, the title and ISSN/ISBN and/or the title level DOIs act like locks to the title record, which need the right keys to unlock the title so that you can add or update the records against it. So if you don’t match what was in the original submission, you get a failure. Without that stringent check, we would have way too many iterations of titles and matching to those would be a nightmare. Not to mention sorting those DOIs into one container in the REST API.

Isaac wrote a great forum post about these title-level issues as well.

If a title update is required due to an error with an original title deposit, then these need to be made by the support team, so get in touch with us on the Community Forum.

And, a second

Permissions against titles and DOIs: Lots of our members don’t realise that each DOI has its own permissions against the prefix that currently ‘owns’ or is associated with that DOI in the background.

It would be fair to assume you can tell just by looking at a DOI who the current publisher is, based on the prefix at the start —but that’s not always the case. Things can (and often do) change. Individual journals get purchased by other publishers, and whole organisations get bought and sold.

What you can tell from looking at a DOI prefix is who originally registered it, but not necessarily who it currently belongs to. That’s because if a journal (or whole organisation) is acquired, DOIs don’t get deleted and re-registered to the new owner. The update will of course be reflected in the relevant metadata, but the prefix on the DOI will stay the same. It never changes—and that’s the whole point, that’s what makes the DOI persistent. Isaac also wrote this in much more detail and explains the internal Crossref processes in his blog “What can often change, but always stays the same?“

These permissions are very important to understand when it comes to title transfers and working with updating your metadata against transferred DOIs to prevent duplicate DOIs for the same work.

Poppy, Technical Support Contractor

As a researcher myself, I’d like to talk about references in a journal article, book, conference paper, etc. (I’ll just use ‘article’ going forward for simplicity). These are the references included in an article by the author. References in one article result in citations for another article. It’s the thing every author dreams of and accruing citations can be a big deal for authors, journals, and publishers.

For readers, articles with no references can be less discoverable using systems that use citation links for relevance, and that discoverability is of critical importance for our members who decide to register references with us. We all want your content to be shared, cited, linked, and used far and wide.

We receive many questions from authors asking why citations don’t show up; it’s usually due to metadata deposits with no references included. There may be an assumption that our process is like Google Scholar, which crawls full text and websites. This misunderstanding has a big impact on references and citation counts. However, as we do not store a copy of the paper, our intake system does not extract references from the article, regardless if they have a DOI. This is one of the main reasons that Crossref citation counts are lower than services that use extraction methods. We only store the data that a publisher registers and maintains with us. On deposit of a metadata record that includes references, our system performs a matching process - if there is a match, a cited-by connection is applied to the metadata. With deposits with no references, however, there is no data to match to other articles (and, therefore limitations on the discoverability and no cited-by count increase).

An article with no references has big impacts for the authors, the journal, the publisher, researchers, and ultimately, the readers. This can mean decreased distribution of the content itself, reduced citation counts for cited articles, lower impact metrics for journals, and can ultimately affect value for publishers. For example, researchers just don’t include articles without references for scientometric analysis.

Our documentation on references includes the elements for both structured and unstructured data. Including the DOI in the structured data is best practice as it provides a precise location with rich data for matching. If the matcher does not see a link between the deposited DOI and the cited DOI at the time of deposit, then the references are stored to be crawled with other matching algorithms later. So, we’re always working to create those rich cited-by linkages between works (raising the content’s profile and overall discoverability), no matter when you register reference metadata. You can also see how your publisher is doing on depositing references by viewing their Participation Report. If you are an author, you can check if your DOIs that were registered contained any references by using our REST API. Don’t see them? You can always contact the editor of the journal or the publisher that published your paper and ask them to add them. Didn’t hear back? Just drop us a line in the Community Forum, we’re happy to help.

Shayn, Technical Support Specialist

Let’s ‘zoom out’ to the big picture. What are DOIs for? What makes them useful? What are we all doing here anyway?!

There are a lot of different answers to those questions. It’s a complex picture. But, way back in the late ‘90s, the DOI system was designed in order to allow for the creation of unique and persistent identifiers. Crossref members use these identifiers to represent their research outputs and publications. This allows for reliable linking to those items, and the ability to identify and communicate the relationships between them, notably (but not exclusively!) citation relationships.

So, what do we mean when we say that Crossref DOIs should be unique and persistent? In basic terms, unique means that there is only a single Crossref DOI registered for a given citable research output. And, persistent meaning that the DOI associated with a given research output today will continue to be associated with, and link to, that same research output indefinitely into the future.

Yes, there are some grey areas, and we know that everything doesn’t always work 100% perfectly all the time. But, the more deviations from persistence and uniqueness, the harder it becomes for end-users, publishers, Crossref, and other services which make use of our metadata to reliably find research outputs and reliably relate them to one another. It weakens the value and utility of DOIs for everyone.

So, what does this mean in practice?

Be certain that every item you register with Crossref is something you can maintain in the long-term.
- Have an arrangement with an archive that can take responsibility for your content if your organisation stops hosting it or ceases to exist.
- Don’t register things that you know will only exist for a short time.
When you’re about to register new content, be absolutely sure that it hasn’t been registered already, either by your organisation or any other organisation.
- If you acquire a new journal from another publisher, have a process in place to check what content has already been registered and adopt the use of the DOIs registered by the prior publisher for that content. We can always provide a list of the existing DOIs for a journal.
- If you publish books, and have a co-publishing agreement with another publisher, distributor, or hosting platform, be aware that one of those other parties may have already registered DOIs for your books. Adopt the use of those DOIs rather than assigning and registering new ones. And, if you don’t want them to do that going forward, communicate that to your co-publishing partners.
When mistakes happen, inadvertently resulting in duplicate DOIs for a single item, identify them quickly. Alias the new duplicate DOI to the long-standing original DOI, and remove all instances of the new DOI from your website or platform.
Ensure that your publishing software, platform, or journal management system can accommodate DOIs with various prefixes for the same publication. You should be able to use (display, link, update metadata and URLs for) the DOIs registered for older content by any prior publishers as easily as you use the DOIs that you registered yourself for more recent content.

Things like persistence and uniqueness can sound like theoretical abstractions, but they actually play an important role in the day-to-day grind of your publishing operations. Their impact on linking, citing, discovery, and analysis of your content is concrete and important. Thus, it’s not surprising that we often hear from members and others in the research community who share this commitment to persistence, uniqueness, and overall rich, accurate metadata. You’ll see that play out in the Community Forum where members and users get involved to troubleshoot issues, compare notes, and share ideas with us and one another. We appreciate the commitment to the Research Nexus and the overall spirit to serve in this growing community. Like we said at the top, we’re all wired to contribute in this way, so building an open, welcoming space that moves us forward excites us.

Again, we invite you to join the discussion on this and many other Crossref-related topics over in our Community Forum.

The GEM program - year one

Susan Collins — Wed, 24 Jan 2024 00:00:00 +0000

In January 2023, we began our Global Equitable Membership (GEM) Program to provide greater membership equitability and accessibility to organisations located in the least economically advantaged countries in the world. Eligibility for the program is based on a member’s country; our list of countries is predominantly based on the International Development Association (IDA). Eligible members pay no membership or content registration fees.

The list undergoes periodic reviews, as countries may be added or removed over time as economic situations change. Sri Lanka was added to the GEM program in March 2023 as they were recategorised to the IDA classification by the World Bank.

When the program launched, we had 214 existing members eligible for the program who then were no longer charged for membership or content registration. Since the program began, we have welcomed an additional 131 new members into the program, including our first members from Cambodia and Togo.

Country	As of 1/1/2023 (start of GEM)	Additions in 2023 (end of first year of GEM)	Total
Afghanistan	6	4	10
Bangladesh	56	33	89
Benin	1	1	2
Bhutan	4	2	6
Burkina Faso	2	0	2
Burundi	1	0	1
Cambodia	0	2	2
Central African Republic	1	0	1
Congo, Democratic Republic	1	11	12
Ethiopia	4	6	10
Ghana	14	7	21
Guyana	1	1	2
Haiti	1	0	1
Kosovo	2	2	4
Kyrgyz Republic	22	3	25
Laos	1	0	1
Madagascar	1	1	2
Malawi	1	0	1
Maldives	1	0	1
Mali	2	0	2
Mauritania	1	0	1
Myanmar	1	0	1
Nepal	20	18	38
Nicaragua	1	0	1
Rwanda	4	1	5
Senegal	3	3	6
Somalia	2	2	4
Sri Lanka	13	5	18
Sudan	9	2	11
Tajikistan	5	1	6
Tanzania	9	7	16
Togo	0	1	1
Uganda	3	6	9
Yemen	16	12	28
Zambia	5	0	5

With help from our ambassadors based in GEM countries, we organised and co-hosted several webinars to introduce the program, along with an introduction to Crossref, and the benefits of including all kinds of research objects in the Research Nexus.

In April, our team, together with ambassador Binayak Raj Pandey, provided an overview of Crossref for members and organisations in Nepal.
Our team and ambassadors, Dr Md Jahangir Alam and Shaharima Parvin hosted two webinars in May for members and organisations in Bangladesh. The first webinar provided an introduction to Crossref, our services, and the GEM Program. The second webinar focused on the methods to register content and how to add and update metadata.
In September, ambassador Baraka Manjale Ngussa joined us for an introductory webinar aimed at organisations in Tanzania
In November, CARLIGH (the Consortium of Academic and Research Libraries in Ghana), Crossref, and EIFL co-hosted a webinar for librarians and journal editors in Ghana with a discussion on the GEM program and Crossref services.

In 2024, we will continue to collaborate with our ambassadors and other members of the community to offer more opportunities for organisations in GEM-eligible countries to learn about the program and the benefits of membership for content discovery.

The program was initially met with scepticism by some organisations in GEM-eligible countries, who wanted to be certain that it wasn’t a free trial, that there are no hidden fees, or that they would be required to pay later for other services. Others expressed concern that Crossref would introduce fees after a year or two. Though we were able to clarify these aspects of the program, we understand the concerns and are working to ensure we provide clarity and transparency about the program. Additionally, we will be conducting a complete review of our fees in 2024, and we will ensure that GEM-eligible members will have input.

Although the program offers relief from fees, many organisations require technical assistance and language support. The GEM program would benefit from an increase in local Sponsors to facilitate membership and provide support, particularly In countries with the highest growth, such as Bangladesh, Nepal, Yemen, Kyrgyz Republic, and Ghana. Though we have Sponsors working with members who are in GEM countries (e.g. PKP), we do not yet have any Sponsors who are based in a GEM country.

We will be working with relevant like-minded organisations, such as PKP, DOAJ, INASP, OASPA, EIFL, and others, to help identify suitable candidates for new Sponsors in underserved regions and engage them proactively. Additionally, we will consult with our ambassadors in GEM countries to help identify potential Sponsors. We are beginning the year by making the most of the momentum created in African countries (Uganda, Ghana, Tanzania) and looking to develop new networks in other parts of the world in Q2-Q4 of this year.

Increasing Crossref Data Reusability With Format Experiments

Martin Eve — Fri, 19 Jan 2024 00:00:00 +0000

Every year, Crossref releases a full public data file of all of our metadata. This is partly a commitment to POSI and partly just what we do. We want the community to re-use our metadata and to find interesting ends to which they can be put!

However, we have also recognized, for some time, that 170GB of compressed .tar.gz files, spread over 27,000 items, is not the easiest of formats with which to work. For instance, there’s no indexing capacity on these files, meaning that it is virtually impossible simply to pull out the record for a DOI. Decompressing the .tar.gz files takes a good three hours or more even on high-end hardware, without any additional processing.

To that end, the Crossref Labs team has been experimenting with different formats for trial release that might allow us to reach broader audiences, including those who have not previously worked with our metadata files. The two new formats, alongside the existing data file format, with which we have been experimenting, are JSON lines and SQLite.

JSON-L

The first format with which we’ve been experimenting is JSON-L (JSON lines). With one JSON entry per line, as opposed to one giant JSON file/block, JSON-L lends itself to better parallelisation in systems such as SPARC, because the data can easily be partitioned.

This data format also has the benefit of being appendable, one line at a time. Unlike conventional JSON, which requires the entire structure to be parsed in-memory before an append is possible, JSON-L can simply be written to and updated. It’s also possible to do multi-threaded write operations on the file, without each thread having to parse the entire JSON structure and then sync with other threads.

In our experiments, JSON-L came with substantial parallelisation benefits. Our routines to calculate citation counts can be completed in ~20-25 minutes. Calculating the number of resolutions per container title takes less than half an hour.

SQLite

SQLite is a library written in C with client bindings for Python, Java, C#, and many other languages that produces an on-disk, portable, single-file SQL database. You can produce the SQLite file using our openly available Rust program, rustsqlitepacker. We also have a Python script that can produce the final SQLite file, for those happier working in this language.

The resultant SQLite file is approximately 900GB in size, so it requires quite a lot of free disk space to create in the first place (alongside storage of the data file that is needed to build it). However, queries are snappy when looking up by DOI and other indexes can be constructed (the indexing part of the procedure takes about 1.5 hours per field).

The database structure, at present, is the bare minimum that will work. It contains a list of fields for searching/indexing – DOI, URL, member, prefix, type, created, and deposited – and a metadata field that contains the JSON response that would be returned by the API for this value.

This allows for the processing and extraction of individual JSON elements using SQLite’s built-in json_extract method. For example, to get just the title of an item, you can use:

SELECT json_extract(metadata, ‘$.title’) from works WHERE doi=“10.1080/10436928.2020.1709713”;

The balance that we have had to strike here is between flattening the JSON so that more fields are indexable and searchable, as against the trade-off in time and processing that this takes to create the database in the first place. The first draft version of our experiment was wildly ambitious in flattening all the records and using an Object Relation Mapper (ORM) to present Python models of the database. Like painting the Forth Bridge, this initial attempt would not finish in any sane length of time. Indeed, by the time we’d created this year’s data file, we’d need to begin work on the next.

What are the anticipated use cases here? When people need to do an offline metadata search on an embedded device, for instance, the portability and indexed lookup of the SQLite database can be very appealing. One of our team has even got the database running on a Raspberry Pi 5. You can also load the database into Datasette if you want to explore it visually.

Where do we go from here with this? It would be good to flatten a few more fields, but we would welcome feedback on use cases that we haven’t anticipated for SQLite and we’d love to hear whether this is already too unwieldy (at 900GB).

Data Files

As usual, we will be releasing the annual data file in the next few months. As an experiment this year, we will also be releasing the tools that can be used on that file to produce these alternative file formats. We will consider releasing the final data files for each of these formats, too.

What we would like to hear from the community is whether there are other data file formats that you might wish to use. Are there use cases that we haven’t anticipated? What would you ideally like in terms of file formats?

I4OA Hall of Fame - 2023 edition

Bianca Kramer — Tue, 09 Jan 2024 00:00:00 +0000

The Initiative for Open Abstracts (I4OA) was launched in September 2020 to advocate and promote the unrestricted availability of the abstracts of the world’s scholarly publications, particularly journal articles and book chapters, in trusted repositories where they are open and machine-accessible. I4OA calls on all scholarly publishers to open the abstracts of their published works and, where possible, to submit them to Crossref.

Since the launch of I4OA, we have been tracking the openness of abstracts for all Crossref members over time (for data and code, see this GitHub repository). For a subset of 40+, mostly larger, publishers, the proportion of current journal articles (published in the current year and preceding two years) that have abstracts deposited in Crossref is shown in a chart on the I4OA website, which is updated quarterly (Figure 1).

Figure 1: Proportion of current journal articles from selected publishers that have open abstracts in Crossref. Data collected on January 1, 2024 for publication years 2021-2023. Publishers already supporting I4OA are shown in orange.

These longitudinal data and accompanying visualisations allow us to identify and highlight good examples from 2023: publishers (both large and small) who newly started to make abstracts openly available last year and/or who managed to get the proportion of their articles with open abstracts close to 100%¹.

While we highlight some of these examples below in our ‘Hall of Fame’, it’s important to also acknowledge all the publishers that already were depositing abstracts to Crossref for most or all of their journal articles prior to 2023, thereby contributing to the availability of abstracts as part of a rich ecosystem of open metadata, for others to use and build upon.

Hall of Fame - Part 1: publishers included in I4OA visualisation

For the set of (mostly larger) publishers included in the visualisation on the I4OA website, Figure 2 shows the difference in the proportion of abstracts available in Crossref between January and December 2023 for journal articles published in 2021-2023.

A number of publishers stand out from this figure:

Wiley announced in October 2022 that it was joining I4OA and would be making abstracts available through Crossref. In August 2023, Wiley started to deposit abstracts to Crossref, and at the end of 2023, the proportion of current journal articles with open abstracts was 77%.

This makes Wiley the first of the four largest traditional commercial publishers to deposit abstracts for the majority of journal articles they publish. Springer Nature does this only for their current open access articles, while Elsevier and Taylor & Francis² do not yet provide abstracts to Crossref at all. SAGE, the fifth largest traditional commercial publisher, was a founding member of I4OA and has open abstracts for 85% of current journal articles.
Among society publishers, the American Geophysical Union (AGU) went from 7% to 99% open abstracts for current journal articles last year, which is a great achievement. The publishing arm of the American Institute of Physics (AIP Publishing) joins them in reaching close to 100% open abstracts, going from 41% to 95% in 2023.

¹Depending on the type of journal(s) of a given publisher, the maximal coverage of open abstracts will often be somewhat below 100%, as in Crossref, all journal content is assigned the type ‘journal article’. This includes e.g. editorials, letters to the editor and other publication types that are not always expected to have abstracts.

²The numbers for Wiley and Taylor & Francis do not include Hindawi and F1000 Research, respectively, as these have separate Crossref member IDs. As most full open access publishers, both Hindawi and F1000 Research have high proportions of open abstracts (81% and 98%, respectively).

CAIRN and Project Muse, two publishing platforms in the humanities and social sciences representing a number of individual publishers, both started including abstracts in the metadata they provide to Crossref in 2023. At the end of 2023, CAIRN had abstracts available for 41% of current journal articles, while Project Muse was just starting out at 5%. Both will hopefully increase further this coming year.
Returning to traditional commercial publishers, Wolters Kluwer Health, part of Wolters Kluwer, had seen a slow growth in the proportion of journal articles with open abstracts in the years prior to 2023, going from 2% to 10%. However, they showed a rapid increase in 2023, ending the year with 52% open abstracts.

While it is good to see publishers who have publicly committed their support for I4OA follow through with opening their abstracts (like Wiley and AIP), it is also very encouraging to see publishers who are not (yet) listed as I4OA supporters do so. This shows a growing awareness and action on this issue beyond advocacy through I4OA alone. And of course, we would love to list these publishers on our website as official supporters of I4OA!

Figure 2 also shows some cases where the proportion of open abstracts has gone down during the year. This can be due to temporary technical issues in depositing abstracts (as was the case for Hindawi). Theoretically, the proportion of open abstracts can also go down when publishers stop providing abstracts altogether during the year, but we have not observed that to be the case.

Figure 2: Development in the proportion of open abstracts in 2023 for current journal articles (publication years 2021-2023) from selected publishers. Publishers already supporting I4OA are shown in orange. Light orange/blue dots show the proportion of open abstracts in January 2023, and dark orange/blue dots in December 2023.

Hall of Fame - Part 2: other publishers

Among the many publishers not included in the limited selection shown in the I4OA visualisation, there are also some interesting highlights of publishers either starting out to deposit abstracts (and reaching a sizeable proportion) or having deposited open abstracts for almost all their current journal articles in 2023. The examples below drew our attention in 2023; they include a number of medium-sized publishers as well as a group of smaller publishers that deserve special attention.

The European Molecular Biology organisation (EMBO) went from 0% to 42% open abstracts in 2023. However, from January 2024 onwards, several EMBO journals were transferred to Springer Nature, so EMBO can no longer be tracked at publisher level in Crossref. It will still be possible to look at the development of open abstracts for individual EMBO journals.
The Institution of Engineering and Technology (IET), a medium-sized publisher, started to deposit abstracts in 2023, reaching 33% open abstracts for current journal articles at the end of the year.
The Acoustical Society of America (ASA) had open abstracts for almost all their current journal articles at the end of 2023, increasing from 50% to 97%.
Finally, in the second quarter of 2023, a group of over 200 smaller Turkish publishers saw large increases in their coverage of open abstracts, resulting in open abstracts for 95%-100% of their current journal articles. Consultation with Crossref pointed to the potential supporting role of DergiPark, one of the largest Crossref sponsors in Turkey. This is a great example of developments in open metadata at smaller publishers.

Looking forward

At the beginning of 2024, the proportion of current journal articles published by Crossref members with open abstracts has reached 49.7%, up from 20.7% when I4OA was launched in September 2020. This is thanks to a growing number of publishers who are depositing abstracts to Crossref, often depositing open abstracts for close to 100% of their journal articles.

This blog post has highlighted a number of publishers who contributed to this growth in the availability of open abstracts in 2023. We hope these examples will inspire other publishers to start doing the same and are looking forward to following the growth in the availability of open abstracts in 2024.

For publishers that started to deposit abstracts in recent years and are doing so for newly published articles only, our data on open abstracts for current journal articles will look better in 2024 than in 2023, as only articles published in the current year and two preceding years are taken into account.

However, the benefits of having abstracts openly available from a central location such as Crossref (both for direct usage and for integration in other open scholarly infrastructures) are not limited to recent publications only. Hopefully, publishers currently depositing abstracts to Crossref will continue to do so both for newly published articles as well as for the backfiles of journal articles already published.

Publishers who would like to be added to the list of I4OA supporters, or who would like more information on how to deposit abstracts for both new and existing journal articles, are very welcome to reach out to I4OA. More information about open abstracts in general, and I4OA in particular, can also be found in the FAQ on the I4OA website.

The author would like to thank Ludo Waltman (CWTS) and Ginny Hendricks (Crossref) for useful feedback on an earlier draft of this post.

This blog post is published under a CC BY 4.0 license. The header image is an adaptation of an image by Adam Jones available from Wikimedia Commons (https://commons.wikimedia.org/wiki/File:Interior_02_of_Rock_%26_Roll_Hall_of_Fame_and_Museum,_Cleveland_%28by_Adam_Jones%29.jpg) and is shared under a CC BY-SA license.

Discovering relationships between preprints and journal articles

Dominika Tkaczyk — Thu, 07 Dec 2023 00:00:00 +0000

In the scholarly communications environment, the evolution of a journal article can be traced by the relationships it has with its preprints. Those preprint–journal article relationships are an important component of the research nexus. Some of those relationships are provided by Crossref members (including publishers, universities, research groups, funders, etc.) when they deposit metadata with Crossref, but we know that a significant number of them are missing. To fill this gap, we developed a new automated strategy for discovering relationships between preprints and journal articles and applied it to all the preprints in the Crossref database. We made the resulting dataset, containing both publisher-asserted and automatically discovered relationships, publicly available for anyone to analyse.

TL;DR

We have developed a new, heuristic-based strategy for matching journal articles to their preprints. It achieved the following results on the evaluation dataset: precision 0.99, recall 0.95, F0.5 0.98. The code is available here.
We applied the strategy to all the preprints in the Crossref database. It discovered 627K preprint–journal article relationships.
We gathered all preprint–journal article relationships deposited by Crossref members, merged them with those discovered by the new strategy, and made everything available as a dataset. There are 642K relationships in the dataset, including:
- 296K provided by the publisher and discovered by the strategy,
- 331K new relationships discovered by the strategy only,
- 15K provided by the publisher only.
In the future, we plan to replace our current matching strategy with the new one and make all discovered relationships available through the Crossref REST API.

Introduction

Relationships between preprints and journal articles link different versions of research outputs and allow one to follow the evolution of a publication over time. The Crossref deposit schema allows Crossref members to provide these relationships for new publications, either as a has-preprint relationship deposited with a journal article, or an is-preprint-of relationship deposited with a preprint.

To assist members who deposit preprints, we also try to connect deposited journal articles with preprints. The current method looks for an exact match between the title and first authors. We send possible matches as suggestions to the preprint server, which decides whether to update the metadata with the relationship.

At the time of writing, 137,837 journal articles in the Crossref database have a has-preprint relationship¹, and 562,225 works of type posted-content (preprints belong to this type) have an is-preprint-of relationship².

We suspected that many preprint–journal article relationships are missing, as some members inevitably fail to deposit them, even after suggestions from the current matching strategy. Another factor is that the current strategy is fairly conservative, and probably misses a significant number of relationships. For these reasons, we decided to investigate whether we could improve on the current process. Doing so would allow us to infer missing relationships on a large scale, similar to how we automatically match bibliographic references to DOIs.

This preprint matching task can be defined in two directions:

We start with a journal article and we want to find all its preprints.
We start with a preprint and we want to find a subsequently published journal article.

On the one hand, matching from journal articles to preprints would allow us to enrich the database continually with new relationships, either periodically or every time new content is added. Since journal articles tend to appear in the database later than their preprints, it makes sense for a new journal article to trigger the matching and not the other way round. This way we can expect the potential matches to be already in the database at the time of matching.

On the other hand, matching from preprints to journal articles can be useful in a situation where we want to add relationships in an existing database retrospectively. In our case, the database contains many more journal articles than preprints, so for performance reasons it is better to start with preprints.

In both cases we are dealing with structured matching, meaning that we match a metadata record of a work (preprint or journal article), rather than unstructured text.

As a result of matching a single preprint or a single journal article, we should expect zero or more matched journal articles/preprints. Multiple matches occur when:

there are multiple versions of the matched preprint and/or
matched works have duplicates.

The image shows the result of matching a journal article to two versions of a preprint:

Matching strategy

Our matching strategy uses the following workflow:

Gathering a short list of candidates using the Crossref REST API.
Scoring the similarity between the input item and each candidate.
A final decision about which candidates, if any, should be returned as matches.

Gathering candidates is done using the Crossref REST API’s query.bibliographic parameter. The query is a concatenation of the title and authors’ last names of the input item. We filter the candidates based on their type, to leave only preprints or only journal articles, depending on the direction of the matching. In the future, instead of getting the candidates from the REST API, we will be using a dedicated search engine, optimised for preprint matching.

Scoring candidates is heuristic-based. Similarities between titles, authors, and years are scored independently, and the final score is their average. Titles are compared in a fuzzy way using the rapidfuzz library. Authors are compared pairwise using the ORCID ID, or first/last names if ORCID ID is not available. The similarity score between issued years is 1 if the article was published no earlier than one year before the preprint and no later than three years after the preprint, or 0 otherwise.

The final decision is made based on two parameters: minimum score and maximum score difference, both chosen based on a validation dataset. The following diagram depicts the results of applying these two parameters in all possible scenarios. First, any candidate scoring below the minimum score is rejected (grey area in the diagram). Second, the scores of the remaining candidates are compared with the score of the top candidate. If the score of a candidate is close enough to the score of the top candidate, it is returned as a match (blue area).

This process can result in the following scenarios:

Scenario A: there is no candidate above the minimum score. This means nothing matches sufficiently, so nothing is returned.
Scenario B: there is only one candidate above the minimum score. This means it is the best match and we don’t have much of a choice, so it is returned.
Scenario C: there are multiple candidates above the minimum score, and they all have similar scores. This means they all are similarly good matches, so all are returned.
Scenario D: there are multiple candidates above the minimum score, but their scores differ a lot. In this case, we don’t want to return all of them, but only those that are close to the top match. Intuitively, we don’t want to return less-than-great matches if we have really great ones. This is when the maximum score difference comes into play: we return the candidates with the “score distance” to the top candidate lower than the maximum score difference.

We evaluated this strategy on a test set sampled from the Crossref metadata records. The test set contains 3,000 pairs (journal article, set of corresponding preprints). Half of the journal articles have known preprints and the other half don’t. The test set can be accessed here.

We used precision, recall, and F0.5 as evaluation metrics:

Precision measures the fraction of the matched relationships that are correct.
Recall measures the fraction of the true relationships that were matched.
F0.5 combines precision and recall in a way that favours precision.

The strategy achieved the following results: precision 0.9921, recall 0.9474, F0.5 0.9828. The average processing time was 0.96s.

We have made this strategy (journal article -> preprints) available through the (experimental) API: https://marple.research.crossref.org/match?task=preprint-matching&strategy=preprint-sbmv&input=10.1109/access.2022.3213707. The input is the DOI of a journal article we want to match to preprints, and the output is a list of matches found, along with the score for each.

We have investigated other approaches to making decisions about which candidates to return as matches (step 3 above), including using machine learning. At present none have outperformed the heuristic approach described above. The heuristic method is also preferred because of its fast performance.

Preprint–journal article relationship dataset

We applied the strategy to the entire Crossref database:

We selected all preprints published until the end of August 2023. This included only works with type posted-content and subtype preprint, as reported by the REST API. There were 1,050,247 of them.
We ran the matching strategy (preprint -> journal article) on them. This resulted in 627,011 preprint–journal article relationships.
The resulting relationships were combined with the relationships deposited by the Crossref members. We included relationships of types has-preprint or is-preprint-of, where both sides of the relationship exist in our database, were published until the end of August 2023, and are of proper types and subtypes (type=journal-article for the journal article and type=posted-content, subtype=preprint for the preprint).

The resulting dataset is a single CSV file with the following fields:

preprint DOI (string)
journal article DOI (string)
whether the publisher of the journal article deposited this relationship (boolean)
whether the publisher of the preprint deposited this relationship (boolean)
the confidence score returned by the strategy (float, empty if the strategy did not discover this relationship)

The dataset contains:

641,950 relationships in total, including 580,532 preprints and 565,129 journal articles,
14,939 of them were deposited by the Crossref members, but not discovered by the strategy,
330,826 of them were discovered by the strategy, but not provided by any Crossref member,
296,185 of them were both deposited by a Crossref member and discovered by the strategy.

The dataset can be downloaded here.

Conclusions and what’s next

Overall, based on the number of existing and newly discovered preprint–journal article relationships, it seems that employing automated matching strategies would approximately double the number of these relationships in the Crossref database. In the future, we would like to match new journal articles on an ongoing basis. We also plan to make all discovered relationships available through the REST API.

In the meantime, we will be publishing the discovered relationships in the form of datasets, and we invite anyone interested to further analyse this data. And if you find out something interesting about preprints and their relationships, do let us know!

Perspectives: Madhura Amdekar on meeting the community and pursuing passion for research integrity

Madhura Amdekar — Tue, 05 Dec 2023 00:00:00 +0000

The second half of 2023 brought with itself a couple of big life changes for me: not only did I move to the Netherlands from India, I also started a new and exciting job at Crossref as the newest Community Engagement Manager. In this role, I am a part of the Community Engagement and Communications team, and my key responsibility is to engage with the global community of scholarly editors, publishers, and editorial organisations to develop sustained programs that help editors to leverage rich metadata.

This represents an exciting phase in my professional journey, as I now have the chance to learn and develop new skills, broaden my understanding of the publishing landscape, and at the same time be able to leverage the experience I gained so far. I originally trained as an ecologist, obtaining a PhD studying colour change in a tropical agamid lizard in India at the Indian Institute of Science (Bengaluru, India). Having immensely enjoyed the process of writing manuscripts based on the data that resulted from my PhD thesis, I was drawn to working in the scholarly publishing industry. I worked for 3.5 years as a Senior Associate at Wiley, overseeing an editor support service by devising strategic scale-up planning and process improvement initiatives.

I then moved countries as well as jobs and joined Crossref. The world of scholarly communications is a rapidly changing ecosystem, that is ably supported by scholarly infrastructure - the sets of tools and services that support this industry. Being a part of Crossref, a global organisation that provides open scholarly infrastructure, allows me to work with and make an impact on the broad scholarly community that ranges from publishers of all shapes and sizes, funders, to academic institutions, and researchers.

So far, the integrity of the scholarly record (ISR) has been the focus of my work. Now more than ever, the community is cognizant of the need to uphold the integrity of the scholarly output. Metadata and relationships between research outputs can support this endeavour in a substantial manner because information such as who contributed to a research output, who funded it, who cites it, whether it was updated after publication, aids provenance and provides signals about whether the output is trustworthy.

Most of Crossref’s tools and services play a key role here: be it reference linking to allow researchers to increase discoverability of their work, tracking post-publication updates to research outputs via Crossmark, or detecting text plagiarism via Similarity Check. We noticed that not all editors and editorial teams associate metadata as signals of integrity, and might be unaware of the benefits of rich metadata. Therefore, my priority is to utilise opportunities to engage with editors about how metadata can provide trust indicators about a research output. I aim to empower editors to collect and leverage rich metadata.

While I am no stranger to the world of scholarly communications, engaging with the broader Crossref community has been a new experience for me. In my day to day work, I employ a range of different skills such as program design and management, content planning and outreach, networking, and meeting facilitation. I have also been participating in trainings to enhance my skill set – I recently completed a training course on Community Engagement Fundamentals, which has equipped me with a better understanding of the concepts and strategies that I will need as a community manager. Additionally, I also underwent the Group Facilitation Methods training course led by the Institute of Cultural Affairs (ICA) where I learnt a couple of effective methods for group facilitation and leading workshops.

Equipped with these skills, I have moderated a few community events already – most prominently the community call about Crossref and Retraction Watch to discuss Crossref’s acquisition and opening up of the Retraction Watch database. It was a valuable experience to contribute to the planning of an online event and host a panel of distinguished guests.

I was also fortunate to be able to meet our community members in-person: I supported the organisation of the Frankfurt roundtable event that was held as part of Crossref’s Integrity of the Scholarly Record (ISR) program, where we engaged with community members to get their perspectives on how to work together towards preserving the integrity of the scholarly record (keep watching this space for a forthcoming blog summarising the outcomes from this event!). Additionally, I attended the Frankfurt Book Fair – the experience of getting to meet our members and to hear from them first-hand about all things Crossref, was unparalleled! I used this opportunity to meet several of our publisher members and discuss their view points about engaging with editors on ISR. The idea was received positively: we heard specific suggestions of metadata that would be of interest to readers of scientific manuscripts, and our members also expressed interest in finding out more about how metadata can act as markers of trust for a research output. I plan to use the insights from these meetings for the development of the ISR editor engagement program.

As I reflect on the past three months, there are a few things that have stood out to me. In terms of work, no two days are the same. My work plan for the day can range from making presentations for outreach activities, creating content such as this blogpost, working on an engagement strategy, to planning events, attending online or offline community meetings, facilitating or moderating some of those events, and networking with community members. This variety in work keeps me motivated to give my best each day. I am also grateful that I have the ability to make an impact with my work in an area that I am passionate about. In my previous job, I had developed a good understanding of research integrity and publication ethics. As a community manager now, I’m looking to work with editorial teams on the integrity of the scholarly record. This role gives me an opportunity to further nurture this interest of mine.

At times, working from home remotely has been a challenge. However, I have enjoyed attending in-person events as they are not just a chance to meet our community members, but also a chance to meet my colleagues and connect with them.

I feel privileged to be able to connect with research communities all over the world and make a meaningful contribution towards supporting the discoverability and impact of their work. I am particularly excited to work at the forefront of shaping the future of preserving the integrity of the scholarly record, in tandem with our community. If this is a topic that excites you as well, I am keen to hear from you. It has been a wonderful three months at Crossref so far and I look forward to future collaborations with our community to develop effective ways of supporting and empowering editors to make the most of metadata for their publications.

Joint statement on research data

Hylke Koers — Tue, 28 Nov 2023 00:00:00 +0000

STM, DataCite, and Crossref are pleased to announce an updated joint statement on research data.

In 2012, DataCite and STM drafted an initial joint statement on the linkability and citability of research data. With nearly 10 million data citations tracked, thousands of repositories adopting data citation best practices, thousands of journals adopting data policies, data availability statements and establishing persistent links between articles and datasets, and the introduction of data policies by an increasing number of funders, there has been significant progress since. It now seems appropriate to focus on providing updated recommendations for the various stakeholders involved in research data sharing.

The premise of the original joint statement still stands: most stakeholders across the spectrum of researchers, funders, librarians and publishers agree about the benefits of making research data available and findable for reuse by others. This improves utility and rigor of the scholarly record. Still, research data sharing is not yet a self-evident step in the research lifecycle. We now have sufficient scholarly communication infrastructure in place to bring about widespread change and believe momentum is building for collective action.

It is in this context that DataCite, a global membership community working with over 2800 repositories around the world, and STM, whose membership consists of over 140 scientific, technical, and medical publishing organisations, are issuing this joint statement. Crossref, a nonprofit open infrastructure with over 18,000 institutional members from 150 countries, joins this call, recognising the need for an amplified focus on data citation. The aim of this statement is to accelerate adoption of best practices and policies, and encourage further development of critical policies in collaboration with a wide group of stakeholders.

Signatories of this statement recommend the following as best practice in research data sharing:

When publishing their results, researchers deposit related research data and outputs in a trustworthy data repository that assigns persistent identifiers (DOIs where available). Researchers link to research data using persistent identifiers.
When using research data created by others, researchers provide attribution by citing the datasets in the reference section using persistent identifiers.
Data repositories enable sharing of research outputs in a FAIR way, including support for metadata quality and completeness.
Publishers set appropriate journal data policies, describing the way in which data is to be shared alongside the published article.
Publishers set instructions for authors to include Data Citations with persistent identifiers in the references section of articles.
Publishers include Data Citations and links to data in Data Availability Statements with persistent identifiers (DOIs where available) in the article metadata registered with Crossref.
In addition to Data Citations, Data Availability Statements (human- and machine-readable) are included in published articles where appropriate.
Repositories and publishers connect articles and datasets through persistent identifier connections in the metadata and reference lists.
Funders and research organisations provide researchers with guidance on open science practices, track compliance with open science policies where possible, and promote and incentivize researchers to openly share, cite and link research data.
Funders, policymaking institutions, publishers and research organisations collaborate towards aligning FAIR research data policies and guidelines.
All stakeholders collaborate in the development of tools, processes, and incentives throughout the research cycle to enable sharing of high-quality research data, making all steps in the process clear, easy and efficient for researchers by providing support and guidance.
Stakeholders responsible for research assessment take into account data sharing and data citation in their reward and recognition system structures.

We, the following signatories shall adopt and promote the relevant best practices laid out above. We hope that our action inspires the community, including researchers, research funders, research institutions, data repositories and publishers, to join us in making it easy for researchers to share, link and cite research data.

Endorse the statement here.

What was the talk of #Crossref2023?

Kornelia Korzec — Tue, 21 Nov 2023 00:00:00 +0000

Have you attended any of our annual meeting sessions this year? Ah, yes – there were many in this conference-style event. I, as many of my colleagues, attended them all because it is so great to connect with our global community, and hear your thoughts on the developments at Crossref, and the stories you share.

Let me offer some highlights from the event and a reflection on some emergent themes of the day. You can browse the recordings and slides archived on our Annual Meeting page.

Ginny Hendricks opened the meeting by reminding everyone about the research nexus vision, and the work that’s underway to bring us closer to it. Ginny went on to highlight progress in metadata and relationships being registered by our members, and mentioned members that have particularly rich metadata records – with the special joint recognition for learned societies of South Korea. Participation statistics can be reviewed in our Labs Member Metadata Metrics Tables.

Since 2018 we’ve seen a 512% increase in the number of abstracts included in the metadata; with Wiley’s recent addition of millions of abstracts to their records largely contributing to this change. On the relationships side, in the same period, we’ve noted a staggering 3004% growth in preprint-to-article links, and we’re pleased to report a growing number of funding relationships being made available thanks to more and more funders registering Crossref DOIs for grants.

For those who couldn’t join us at such an early hour, Ed Penz included some of these highlights in his own strategic update later in the day. However, he focused on our activity and plans towards fulfilling our four strategic goals:

To contribute to an environment where the community identifies and co-creates solutions for broad benefit
To be a sustainable source of complete, open, and global scholarly metadata and relationships
To be publicly accountable to the Principles of Open Scholarly Infrastructure (POSI) practices of sustainability, insurance, and governance
To foster a strong team—because reliable infrastructure needs committed people who contribute to and realise the vision, and thrive doing it

Speakers from across our global community shared their initiatives too. Most of these talks have been accompanied by posters or abstracts shared on our Community Forum and still available for preview and discussion:

Making data citations available at scale: The Global Open Data Citation Corpus by Iratxe Puebla;
“Who Cares?” Defining Citation Style in Scholarly Journals by Vincas Grigas and Pavla Vizváry;
DOI registration for scholarly blogs by Martin Fenner;
Enhancing Research Connections through Metadata: A Case Study with AGU and CHORUS by Tara Packer, Kristina Vrouwenvelder, Shelley Stall;
Index Crossref, Integrity, Professional And Institutional Development by Engjellushe Zenelaj;
Brazilian retractions in the Retraction Watch Database - RWDB by Edilson Damasio; and
Now that you’ve published, what do you do with Metadata? - by Joann Fogleson.

In addition to these updates, we’ve heard from:

Izabela Szyprowska (OP, European Commission), Nikolaos Mitrakis (RTD, European Commission), and Paola Mazzucchi (mEDRA) talked about the process and rationale of implementing Crossref DOIs for grants at the European Commission; and
Amanda French from ROR/Crossref about the new ‘ROR / Open Funder Registry overlap’ tool.

We also assembled a diverse panel and invited the community to discuss “What we still need to build a robust Research Nexus?” The discussion ranged from how different parts of our community currently use existing metadata, to how we can come together to make improvements, especially in the area of standards and equitability, and touched on metadata priorities. I’ll highlight some of the threads below, but it’s certainly worth engaging with the full recording of the discussion, and offering your own perspective on the Community Forum, commenting below.

Having participated in the whole day of talks, I found that a few themes emerged as popular in the community: data citations, making it easier to register metadata, making better use of metadata, retractions, and equity of participation in the research nexus.

Data citations

With the advances in the Crossref API relationships endpoint, Martyn Rittman demonstrated how we’re now providing more comprehensive support for data citations. You can follow his demonstration in the Collab Notebook he used for the demo and shared for your perusal. He also mentioned that the developments in this feature of our API will soon replace the current service provided via the Events API. Feel free to connect with Martin on the community forum and comment with questions and suggestions.

As mentioned above, DataCite’s Iratxe Puebla mentioned the Make Data Count initiative and the leaky pipeline of data citations we’ve got at the moment in the scholarly literature, obscuring the true picture of data reuse. This prevents the community from recognising and incentivising data creation and reuse appropriately. One way of addressing this is the Global Open Data Citation Corpus. Crossref and DataCite collaborate closely in connecting and making that data available.

Linking datasets, as well as software, was reported as part of the AGU and CHORUS initiative in Enhancing Research Connections through Metadata.

Data sharing and citing is as much a culture as a technology problem. As Iratxe Puebla admitted, there are many norms and processes for capturing and sharing that information,and DataCite is interested to hear about different use cases. As highlighting data’s relationship with works is a growing interest for our community, hopefully more understanding and perhaps even commonality can be built soon.

Making it easier to register metadata

As part of the Demonstrations session, we’ve seen two developments to support members with registering their metadata more easily.

Crossref’s Lena Stoll shared plans for the new version of the Crossref Registration Form, the helper tool for manual registration of metadata, which translates the submission into XML, for inclusion in the Crossref database. At the moment, the form only accepts grant registrations, but it will be bolstered before the end of the year to include journal articles then other record types in time.

Erik Hanson from PKP demonstrated the latest OJS version, commenting on specific changes made in the new version in response to the key pain points reported by users of the previous release.

In addition, we’ve heard of two independent projects by Martin Fenner and Esha Data to enable metadata registration and Crossref DOIs for scholarly blogs.

Making better use of metadata

Supported by the beginner’s demo of our REST API by Luis Montilla, there were many voices about opportunities for making good use of Crossref’s open metadata. Nikolaos Mitrakis of the European Commission talked about the implementation of Crossref IDs for grants as a step towards tracing and connecting the grants with not just academic but also societal outcomes of the awards, and the plans for using those in the evaluation and steering of their funding programmes.

Joann Fogleson of the American Society of Civil Engineers gave a buzzy metaphor of publishers’ role in their work with metadata being comparable with that of a pollinator – collecting the metadata at one end, then registering, displaying and making it available to different services, in order to enable a reacher scholarly environment for discovery.

Many of the major themes have found their way to the discussion of what is still needed to build a robust network of connections between scholarly objects, institutions and individuals. One of the ways Ludo Waltman of CWTS, Leiden University, intends to use our open metadata is as part of the upcoming open-source version of the Laiden rankings and he invited the community to contribute and help optimise this project to provide an alternative to closed and selective databases.

Panellists also spoke of new opportunities in the light of data mining and machine learning. Ran Dang, Atlantis Press, as a publisher shared a concern about the standard of metadata across cultures and disciplines, and the need to digitise past publications – which can then help better leverage multi-lingual scholarship. Matt Buys of DataCite, pointed out to the Global Data Citation Corpus they are developing, which leverages a SciBERT model to pull out data citations, which is brought together with Crossref/DataCite citation metadata.

Opening the data is essential to enabling its wider use, and here Ludo gave the example of the fantastic outcome for references metadata, which has been made open by default for the entire corpus of Crossref-registred works. He hopes that this can inspire us to make similar progress in other areas.

A little on a tangent with regards to metadata use, yet speaking of excellent examples of the community making progress together, Ginny pointed out ROR, how this is becoming a new standard for solving a longstanding problem of standardising affiliations metadata.

Retractions

Perhaps not entirely surprising, given the recent acquisition of the Retraction Watch database by Crossref and making the data openly available, retractions featured in a few different talks at the meeting. First, Lena Stoll and Martin Eve from Crossref, shared how that data can be accessed – that is as the csv file from https://api.labs.crossref.org/data/retractionwatch?[your-email@here](add your email as indicated), and the Crossref Labs API also displays information about retractions in the /works/ route when metadata is available. There are plans for incorporating this information with our REST API in the future.

Ed and Ginny have shown stats for increases in retraction metadata registered in Crossmark but commented on limited participation in Crossmark overall. Recording retraction information in this way is still important, alongside the Retraction Watch data, this allows for multiple assertions of that information, and increases confidence in its accuracy. We’re preparing to consult with the community at large about the future direction of the Crossmark service, to make it easier to implement and more useful for the readers.

Finally, Edilson Damasio from State University of Maringá-UEM, Brazil, and a long-time Crossref Ambassador, presented the analysis of Brazilian records in the Retraction Watch data, and he promises further analysis to come, comparing the situation across geographies.

Equity of participation in the research nexus

Amanda Bartell opened the research nexus discussion with a reminder of what that vision entails and pointing out commonality of goals in the community – “Like others, Crossref has a vision of a rich and reusable open network of relationships connecting research organisations, people, things, and actions; a scholarly record that the global community can build on forever, for the benefit of society. We call this interconnected network the Research Nexus, but others in the community have different names for it, such as knowledge graph or PID graph.”

The richness of this network depends upon the participation of all those who produce and publish scholarship, so naturally the topic of equality emerged in that discussion. In addition to Ran Dang’s concern for multilingualism and digitisation of past publications from all parts of the world, Mercury Shitindo of St Paul’s University, Kenya talked of the need for more education, training and accessible resources for her community, to be able to participate more effectively in this ecosystem. She can see that affiliations and citations are of priority there, as these enable transparency and facilitate collaborations. Matt Buys of DataCite echoed her point, talking about the importance of the role of contributors “It’s important not to lose sight of people and places – to recognise the importance of contributor roles in the PID-graph”.

Earlier in the day, we mentioned the launch of our Global Equitable Membership, or GEM programme. Since January, 110 new organisations from eligible countries have joined Crossref fee-free. Ginny was quick to admit that the need for a fee-waiver programme like this stems from the regular fees schedule not being in tune with our global membership, and she mentioned the upcoming fees review.

Financial barriers are often what get attention, yet reducing barriers to participation with technology is equally important for building a robust research nexus. With the planned changes to our registration form, we’ll make it easier to register works for those who don’t regularly use XML.

Johanssen Obanda took time to show the examples of community activity and events organised by our global network of Ambassadors, and to thank all our advocates and partners for their tireless work. They are also helping tackle barriers, supporting our members to actively participate in the research nexus with their metadata, and help enable the community to make good use of the network of relationships that data denotes.

Showcasing our “One member one vote” truth, the Board election was the focal point of the annual meeting, as always. We closed the ballot and announced the results, with seven members selected to join the Board in 2024.

The event went very smoothly overall. Talks were delivered efficiently, the panellists shared diverse perspectives and we elected our new Board members. Huge thanks to Rosa Clark, our Communications and Events Manager, who orchestrated the event and has been a constant behind-the-scenes presence supervising the entire show. I’m grateful to all colleagues at Crossref, who helped make it an enjoyable experience and an informative event for our community. Finally – it wouldn’t be a real meeting without the active participation of the speakers and panellists, who shared their metadata stories, and even joined us for some relaxed unplugged chats.

Perspectives: Luis Montilla on making science fiction concepts a reality in the scholarly ecosystem

Luis Montilla — Mon, 20 Nov 2023 00:00:00 +0000

Hello, readers! My name is Luis, and I’ve recently started a new role as the Technical Community Manager at Crossref, where I aim to bridge the gap between some of our services and our community awareness to enhance the Research Nexus. I’m excited to share my thoughts with you.

My journey from research to science communications infrastructure has been a gradual transition. As a Masters student in Biological Sciences, I often felt curious about the behind-the-scenes after a paper is submitted and published. For example, the fate of data being stored in the drawer or copied and forgotten in the hard drive after the paper is online. I come from a university that shares its name with at least three completely different universities in Latin America, and that also is pretty similar to another one with multiple offices across the region, which made me wonder if there was a standard way of identifying our affiliations. And then we have the topic of our names in hispanoamerica. We use two family names, and more often than not, we have a middle name (and then I could tell you stories about multiple-word middle names), which inevitably leads to authors having many combinations of full names and hyphenations.

This curiosity led me to volunteer in the Journal of the Venezuelan Society of Ecology. This role has been a transformative experience because my goal was to learn more about the publishing aspect of science. Still, today I realize that this is a fraction of what the scholarly ecosystem represents. The experience allowed me to grasp the importance of having a community with a sense of belonging, the relevance of multilingualism, and the importance of having access to an open infrastructure that allows smaller communities to be participants in the global dynamics. Moreover, it seemed to me that a research paper is more than the capstone of a building that we place and then move on to the next project or the next experiment; instead, it is a node in the vast network of human knowledge, connected to other papers through references, but also to all the other elements that are produced as part of the research, namely datasets, protocols, code, presentations, posters, preprints, peer-review reports and more. In short, the research metadata extends the life of the research output and makes it visible to the rest of the community.

This brings us to my onboarding to the Crossref team. At Crossref, I became part of a team and a driving force whose idea of the Research Nexus ¹ aligns perfectly with my aspirations. And to explain myself better, I’ll draw an analogy using one of my favorite authors. In Isaac Asimov’s Second Foundation, a character shows to another a wall covered to the last millimeter with equations and writings. He describes his contribution to “The Plan” as follows: “…Every red mark you see on the wall is the contribution of a man among us who lived since Seldon”.² This idea sounded fascinating to me and only possible in a sci-fi book; a massive integrated research ecosystem where scientists focused more on how their contributions fit in the big picture. Today I have come to think that metadata helps materialize this idea by interconnecting all knowledge, and more importantly, in stark contrast to Asimov’s plan developed and guarded by a secret society, Crossref’s research nexus is a “reusable open network,” “a scholarly record that the global community can build on forever.” In a world with undeniably unequal access to resources, providing open access and fostering community efforts to contribute to this growing collective effort is a fundamental condition to empower and visualize underrepresented voices.

We make available a series of tools to access and probe this data, including our REST API, but we know its potential is far from being realized. As Technical Community Manager at Crossref, my primary responsibility is to understand the needs of our community members who interact with our REST API. I aim to build and maintain relationships with new and existing metadata users to promote the effective usage of our API. I will also be working closely with organisations such as hosting platforms, manuscript submission systems, and general publisher services. In essence, I want to ensure that our community across the globe is aware of the vast possibilities that imply using and contributing to the Research Nexus.

I am committed to fostering an engaged and collaborative technical community. As we move forward, I look forward to sharing insights, experiences, and knowledge with all of you. Stay tuned for more updates, and let’s explore the world of APIs, metadata, and scholarly communities together!

Crossref (2021) The research nexus. Accessed on 20 October 2023. ↩︎
Asimov, I. (1953) Second Foundation. Gnome Press. ↩︎

Similarity check update: A new similarity report and AI writing detection tool soon to be available to iThenticate v2 users

Fabienne Michaud — Wed, 01 Nov 2023 00:00:00 +0000

In May, we updated you on the latest changes and improvements to the new version of iThenticate and let you know that a new similarity report and AI writing detection tool were on the horizon.

On Wednesday 1 November 2023, Turnitin (who produce iThenticate) will be releasing a brand new similarity report and a free preview to their AI writing detection tool in iThenticate v2. The AI writing detection tool will be enabled by default and account administrators will be able to switch it off/on.

Turnitin will be running a webinar on their new similarity report and AI writing detection tool on ~~Tuesday 28 November~~ (EDIT 23/11/16: Monday 11 December 2023). More information on the webinar and how to register will be communicated by Turnitin in the coming weeks.

New similarity report

On Wednesday, all iThenticate v2 users will have access to the new version of the similarity report which will include:

a word count and the number of text blocks for each matched source
the ability to include or exclude overlapping sources from the overall similarity score
a clearer colour differentiation between the different sources
improved accessibility features

Enabling the new similarity report

The new similarity report will be enabled as a default for all your journals. Account administrators wishing to switch off the new similarity report can do so by going to Settings and selecting from the General tab, under the New Similarity Report Experience heading, the Disable option.

Classic view / new view

As this will be a significant change to your current experience, Turnitin have provided access for a period of time to the ‘classic view’ and you will be able to toggle between the original interface and the new one by clicking on ‘Switch to the classic view’ or ‘Switch to the new view’ buttons at the top of your report.

The similarity score will continue to be available at the top right-hand corner of the similarity report.

Exclusions

By clicking on the Filters button you’ll be able to check and/or adjust your report’s section and repository exclusions.

Please note that the exclusions previously set up by account administrators should be unchanged by this release.

Sources / Match Groups view

The Sources view will be the default view and will list all sources. By using the on/off button next to ‘Show overlapping sources’, you’ll be able to include or exclude overlapping sources. This will be ‘off’ as a default.

The Match Groups view is completely new and may not suit everyone’s needs. It is divided into four categories ‘Not Cited or Quoted’, ‘Missing Quotations’, ‘Missing Citation’ and ‘Cited and Quoted’ and will highlight matches found in your text.

PDF report

You’ll also now find the PDF report in the top right-hand corner of the similarity report, by clicking on the ‘download’ icon.

Submission details

‘Submission Details’ is located now under the ‘i’ icon in the top right-hand corner of your report. This is where you will find the oid (or unique number) for your manuscript which Turnitin will ask you to provide when you are reporting a technical issue.

Turnitin’s documentation for the new similarity report

AI writing detection tool

Many of you have been concerned about the use of AI writing in the research papers you’ve received since the launch of ChatGPT last November and have been in touch to enquire about the availability of an AI writing detection tool for Crossref members.

You will also have read that Turnitin have developed an AI writing detector tool and have made it available to their education sector customers since April. Turnitin have published an update in May, a helpful video and further information on the false positive rates in June based on the feedback they’ve received from the education community.

I am pleased to announce that Turnitin’s AI writing detection tool will be available as a free preview to iThenticate v2 users, via the new version of the similarity report, from Wednesday 1 November until the end of December 2023.

Enabling AI writing detection

Our preference was to have the new AI writing detection tool turned ‘off’ as a default, however this hasn’t been possible. Account administrators can turn this feature off by going to Settings and selecting the Crossref Web tab and scrolling down to the AI Writing section at the very bottom of the page. The feature is applied to all submissions when it is enabled.

Please note that AI Writing detection is only available in the new similarity report.

Integrations

There is currently no integration between manuscript tracking systems and the AI writing detection tool. However the AI score will be available via the similarity report. If the AI writing detection tool has been set as ‘off’ by the account administrator, there will be no score and the ‘AI Writing’ heading will not be visible on the similarity report:

File requirements

Turnitin have made some important file requirements available for the tool to run a report:

Must be written in English
A minimum 300 words
A maximum of 15,000 words
The file size must be less than 100 MB
Accepted file types are .docx, .pdf, .rtf and .txt

If your file does not meet the above requirements, iThenticate v2 will display the following message:

Turnitin’s AI writing detection tool has been developed to detect GPT 3, 3.5, 4 and other variants. More information on this is available on their FAQs page.

Turnitin have provided the following guidance regarding the AI scores:

“Blue with a percentage between 0 and 100: The submission has processed successfully. The displayed percentage indicates the amount of qualifying text within the submission that Turnitin’s AI writing detection model determines was generated by AI. As noted previously, this percentage is not necessarily the percentage of the entire submission. If text within the submission was not considered long-form prose text, it will not be included.

Our testing has found that there is a higher incidence of false positives when the percentage is between 1 and 20. In order to reduce the likelihood of misinterpretation, the AI indicator will display an asterisk (*) for percentages between 1 and 20 to call attention to the fact that the score is less reliable.

To explore the results of the AI writing detection capabilities, select the indicator to open the AI writing report. The AI writing report opens in a new tab of the window used to launch the Similarity Report. If you have a pop-up blocker installed, ensure it allows Turnitin pop-ups.”

Please note that unlike the similarity report, the AI writing report will only provide a score and highlight the blocks of texts likely to have been written by an AI tool and will not list source matches.

We encourage you to test the writing detection tool as much as possible during the free preview period (1 November-31 December 2023).

Paraphrase detection

~~Turnitin are planning to release a beta version of their new paraphrase detection tool at the end of this year/Q1, 2024. It will be initially available as a free preview for a short period of time.~~ (EDIT 23/11/16: There is currently no timeline available for Turnitin’s paraphrase detection tool which is having a knock-on effect on the availiblity of the AI writing and paraphrase detection bundle and associated fees previously mentioned in this post)

AI and paraphrase detection bundle (EDIT 23/11/16: AI writing detection tool)

Once the free preview period ends, Turnitin ~~would like to offer Crossref members an AI and paraphrase detection bundle~~ (EDIT 23/11/16: are planning to make their AI writing detection tool available) from 2024 - this means that if you choose to subscribe to this new service, you will be charged an additional fee each time you upload a manuscript.

Fixes

Many of you have been waiting for fixes to the aggregation of URLs issues in the matched sources of the similarity report and to the doc-to-doc PDF report in iThenticate v2. Turnitin are planning to release fixes for these before the end of 2023.

✏️ Do get in touch via support@crossref.org if you have any questions about iThenticate v1 or v2 or start a discussion by commenting on this post below or in our Community Forum.

Perspectives: Audrey Kenni-Nemaleu on scholarly communications in Cameroon

Audrey Kenni-Nemaleu — Thu, 05 Oct 2023 00:00:00 +0000

Our Perspectives blog series highlights different members of our diverse, global community at Crossref. We learn more about their lives and how they came to know and work with us, and we hear insights about the scholarly research landscape in their country, the challenges they face, and their plans for the future.

Notre série de blogs Perspectives met en lumière différents membres de la communauté internationale de Crossref. Nous en apprenons davantage sur leur vie et sur la manière dont ils ont appris à nous connaître et à travailler avec nous, et nous entendons parler du paysage de la recherche universitaire dans leur pays, des défis auxquels ils sont confrontés et de leurs projets pour l’avenir.

Today, we meet Audrey Kenni-Nemaleu, Crossref Ambassador in Cameroon and Assistant Editor of the Pan-African Medical Journal (PAMJ). Audrey is excited about engaging Crossref’s community in French West Africa. Please take a moment to read and listen to Audrey’s perspective.

Aujourd’hui, nous rencontrons Audrey Kenni-Nemaleu, ambassadrice Crossref au Cameroun et rédactrice adjointe du Pan-African Medical Journal (PAMJ). Audrey est enthousiaste à l’idée d’impliquer la communauté Crossref en Afrique occidentale française. Veuillez prendre un moment pour lire et écouter le point de vue d’Audrey.

English

Français

Tell us a bit about your organisation, your objectives, and your role
Pouvez-vous nous parler de votre organisation, vos objectifs et votre rôle ?

My name is Audrey Kenni Nganmeni-Nemaleu, assistant editor for the Pan-African Medical Journal. I am specifically responsible for editing the articles in terms of form, ensuring that they meet the journal’s standards. Furthermore, I am the focal point of my journal for Crossref, that is to say I am responsible for managing all the problems that all publishers may encounter with DOIs and the various Crossref services to which our journal has subscribed. My role is also to manage all the conflicts that we may encounter with the DOIs submitted to Crossref. I train our journal staff in using Crossref services. I am also the focal point of my journal for COPE (Committee of Publications Ethics) which is an organisation that helps to regulate ethical publishing practices. It is in this capacity that I participate COPE’s webinars on behalf of our journal.

Je m’appelle Audrey Kenni Nganmeni Nemaleu, éditrice assistante pour le Pan African Medical Journal. Je m’occupe précisément de traiter les articles sur le plan de la forme en m’assurant qu’ils respectent les normes du journal. Par ailleurs je suis point focal de mon journal pour Crossref c’est-à-dire je suis chargée de gérer tous les problèmes que l’ensemble des éditeurs peuvent rencontrer avec les DOIs et les différents services de Crossref auxquels notre journal a souscrit. Mon rôle également c’est de gérer tous les conflits qu’on peut rencontrer avec les DOIs soumis à Crossref. Je forme également le personnel de notre journal à l’utilisation des services de Crossref. Je suis aussi point focal de mon journal pour COPE (Committee of Publications ethics) qui est un organisme qui aide dans la régulation des pratiques éthiques en matière de publication. C’est dans ce cadre que je participe à tous les webinaires de cette organisation afin qu’il y ait toujours au moins une personne qui participe à ces webinaires pour le compte de notre journal.

What is one thing that others should know about your country and its research activity?
Que doivent savoir les autres sur les activités de recherche dans votre pays ?

In my country, Cameroon, the research activity is still young. There are few scientific journals and we are actually the most influential journal in our country and subregion. There are also few schools or institutions that focus especially on research. For the time being, research activities in my country mainly revolve around congresses and conferences where researchers can exhibit their works. There is very little support for scientific research in my country.

Dans mon pays, le Cameroun, la recherche scientifique est encore jeune. Il existe peu de revues scientifiques et nous sommes en fait le journal le plus influent de notre pays et de notre sous-région. Il existe également peu d’écoles ou d’nstitutions qui spécialisées sur la recherche. Pour l’instant, les activités de recherche dans mon pays s’articulent principalement autour de congrès et de conférences où les chercheurs peuvent exposer leurs travaux. Il y a très peu de soutien à la recherche scientifique dans mon pays.

Are there trends in scholarly communications that are unique to your part of the world?
Existe-t-il des tendances particulières en matière de recherche scientifique dans votre région ?

In this part of the world, we do our best to follow the code of ethics of the various organisations in which we are a member: Committee of publication ethics (COPE), World Association of Medical Editors (WAME), Open Access Scholarly Publishing Association (OASPA). What we have seen emerging recently is the organisation, by professional scientific societies, of small conferences, workshops and meetings to exchange information. These small events are less costly to organize, hence their gain in popularity. We support these activities through sponsorship, and use them as opportunities to strengthen young researchers’ capacities in areas such as scientific writing, publication ethics. We also use those opportunities to introduce to young researchers concepts such as Open Access, Open Science, DOIs and other modern publishing services.

Dans notre pays, nous nous efforçons de suivre le code de déontologie des différentes organisations dont nous sommes membre : Committee of publication ethics (COPE), World Association of Medical Editors (WAME), Open Access Scholarly Publishing Association (OASPA). Ce que l’on a vu émerger récemment, c’est l’organisation, par des sociétés scientifiques professionnelles, de petits colloques, ateliers et réunions d’échange d’informations. Ces petits événements sont moins coûteux à organiser, d’où leur gain en popularité. Nous soutenons ces activités par le sponsoring et les utilisons comme des opportunités pour renforcer les capacités des jeunes chercheurs dans des domaines tels que l’écriture scientifique, l’éthique de la publication. Nous utilisons également ces opportunités pour leur présenter des concepts tels que le libre accès, la science ouverte, les DOIs et d’autres services d’édition modernes.

What about any political policies, challenges, or mandates that you have to consider in your work?
Quels sont les politiques, défis ou mandats auxquels vous faites face dans votre travail ?

Operating a journal in our context is challenging. The critical challenges are as basic as constant availability of electricity or stable and fast internet connectivity. How to maintain a stable stream revenue to support the journal is also a critical challenge. Most of our authors are young, self-funded and with limited resources. Most cannot afford the amount we charge for article publishing fees, which in comparison, is very limited. So we have to be extremely creative to operate.

Faire fonctionner une revue dans notre contexte est difficile. Les défis critiques sont aussi fondamentaux que la disponibilité constante de l’électricité ou une connexion Internet stable et rapide. Comment maintenir un flux stable des revenus pour soutenir la revue constitue également un défi crucial. La plupart de nos auteurs sont jeunes, autofinancés, avec des ressources limitées et par conséquent n’arrivent pas à payer les frais de publication d’articles pourtant très bas. Nous devons donc être extrêmement créatifs pour gérer nos charges.

How would you describe the value of being part of the Crossref community; what impact has your participation had on your goals?
Comment décririez-vous la valeur de faire partie de la communauté Crossref ? Quel est l’impact de votre participation sur vos objectifs ?

As a Crossref ambassador, I talk about Crossref around me, among my colleagues whether they are in Kenya or Cameroon. I shared the links to participate in Crossref webinars with my colleagues. I invited them to become ambassadors by sharing with them the links to join the community. I participated in several ambassador training webinars on different themes including: how to submit DOI to Crossref, ORCID. I participated in a Crossref event in Nairobi, Kenya. It was a memorable moment where I was able to meet other ambassadors. We were able to have a small meeting on the difficulties we encountered in growing the Crossref community in Africa. We produced a document to this effect which we submitted to Crossref in 2022. For the moment, I have not yet been able to organize an event as an ambassador, but I would like to with the help of Crossref. But being an ambassador is not the easiest thing because sometimes in our context people do not understand the use of Crossref’s services because we are in an environment where the DOI is not yet very well known, and where even publishers know nothing about this. A question I am often asked is whether this work is paid and are discouraged when they learn that it is voluntary work.

Comme ambassadrice de Crossref, je parle autour de moi de Crossref, parmi mes collègues qu’ils soient au Kenya ou au Cameroun. J’ai partagé les liens pour participer à des webinaires de Crossref à mes collègues. Je les ai invités à devenir des ambassadeurs en partageant avec eux les liens pour rejoindre la communauté. J’ai participé à plusieurs webinaires de formation des ambassadeurs sur différents thèmes notamment ORCID. J’ai également participe à un évènement de Crossref à Nairobi au Kenya. C’était un moment mémorable ou j’ai pu rencontrer d’autres ambassadeurs. Nous avons pu faire une petite réunion sur les difficultés que nous rencontrons pour faire grandir la communauté Crossref en Afrique. Nous avons d’ailleurs produit un document à cet effet que nous avons soumis à Crossref en 2022. Pour l’instant, je n’ai pas encore pu organiser d’évènement dans le cadre d’ambassadeur, mais j’aimerais avec l’aide de Crossref voir comment le faire. Etre ambassadrice n’est pas la chose la plus facile car parfois dans notre contexte les gens ne comprennent pas le bien-fondé des services de Crossref car on est dans un environnement ou le DOI n’est pas encore très connu, et où beaucoup de journaux et même d’editeurs ne savent rien de cela. Une question qu’on me pose souvent est celle savoir si ce travail est remunere et se découragent quand ils apprennent que c’est du bénévolat.

For you, what would be the most important thing Crossref could change (do more of/do better in)?
Pour vous, quelle serait la chose la plus importante que Crossref pourrait changer (faire plus/faire mieux) ?

Crossref could invest in more capacity building, events, and communications in this part of the world. Why not localize Crossref in the francophone part of Africa? Crossref could offer continuing educational activities to professionals in order to improve their skills or acquire new knowledge in metadata and correlative disciplines. Crossref could also sponsor/support journal publishing and scholarship in Africa.

Crossref pourrait investir dans davantage de renforcement des capacités, d’événements et de communications dans cette partie du monde. Pourquoi ne pas localiser Crossref dans la partie francophone de l’Afrique ? Crossref pourrait proposer des activités de formation continue aux professionnels afin d’améliorer leurs compétences ou d’acquérir de nouvelles connaissances dans les métadonnées et les disciplines corrélatives. Crossref pourrait également sponsoriser/soutenir la publication de revues et les bourses d’études en Afrique.

Which other organisations do you collaborate with or are pivotal to your work in open science?
Avec quelles autres organisations collaborez-vous ou alors quelles sont les organismes pivot au cœur de votre travail en science ouverte ?

I collaborate with various institutions such as COPE (Committee on Publication Ethics), AJOL African Journals Online, and OASPA (Open Access Scholarly Publishing Association). I attend webinars of these organisations on behalf of my journal.

Je collabore avec diverses institutions telles que COPE (Committee on Publication Ethics), AJOL African Journals Online, et OASPA (Open Access Scholarly Publishing Association). J’assiste à des webinaires de ceux-ci organisations au nom de ma revue.

What are your plans for the future?
Quels sont vos plans pour l’avenir ?

My plan for the future is to continue working in science communication with different other organisations, and more within my community.

Mon plan pour l’avenir est de continuer à travailler dans le domaine de la communication scientifique avec différentes autres organisations, et davantage au sein de ma communauté.

Thank you, Audrey!

Merci, Audrey !

Feedback on automatic digital preservation and self-healing DOIs

Martin Eve — Thu, 28 Sep 2023 00:00:00 +0000

Thank you to everyone who responded with feedback on the Op Cit proposal. This post clarifies, defends, and amends the original proposal in light of the responses that have been sent. We have endeavoured to respond to every point that was raised, either here or in the document comments themselves.

We strongly prefer for this to be developed in collaboration with CLOCKSS, LOCKSS, and/or Portico, i.e. through established preservation services that already have existing arrangements in place, are properly funded, and understand the problem space. There is low level of trust in the Internet Archive, also given a number of ongoing court cases and erratic behavior in the past. People are questioning the sustainability and stability of IA, and given it is not funded by publishers or other major STM stakeholders there is low confidence in IA setting their priorities in a way that is aligned with that of the publishing industry.

We acknowledge that some of our members have a low level of trust in The Internet Archive, but many of our (primarily open access members) work very closely with the IA and our research has shown that, without the IA, the majority our smaller open access members would have almost no preservation at all. We have already had conversations with CLOCKSS and Portico about involvement in the pilot and thinking through what a scale-to-production would look like. That said, for a proof-of-concept, the Internet Archive presents a very easy way to get off the ground, with a stable system that has been running for almost 30 years.

This seems to be a service for OA content only, but people wonder for how long. Someone already spotted an internal CrossRef comment on the working doc that suggested “why not just make it default for everything & everyone”, and that raises concern.

The primary audience for this service is small OA publishers that are, at present, poorly preserved. These publishers present a problem for the whole scholarly environment because linking to their works can prove non-persistent if preservation is not well handled. Enhancing preservation for this sector therefore benefits the entire publishing industry by creating a persistent linking environment. We have no plans to make this the “default for everything and everyone” because the licensing challenges alone are massive, but also because it isn’t necessary. Large publishers like Elsevier are doing a good job of digitally preserving their content. We want this service to target the areas that are currently weaker.

Crossref will always respect the content rights of our members. We never force our members to release their content through Crossref that they don’t ask us to release.

The purpose of the Op Cit project is to make it easier for our members to fulfil commitments they already made when they joined Crossref.

Crossref is fundamentally an infrastructure for preserving citations and links in the scholarly record. We cannot do that if the content being cited or linked to disappears.

When signing the Crossref membership agreement, members agree to employ their best efforts to preserve their content with archiving services so that Crossref can continue to link citations to it even in extremis. For example- if they have ceased operations.

Some of our members already do this well. They have already made arrangements with the major archiving providers. They do not need the Op Cit service to help them with archiving. However, the Op Cit service will still help them ensure that the DOIs that they cite continue work. So it will still benefit them even if they don’t use it directly.

However, our research shows that many of our members are not fulfilling the commitments they made when joining Crossref. Over the next few years, we will be trying to fix this. Primarily through outreach- encouraging members to set up and record with Crossref archiving arrangements with the archives of their choice.

But we know some members will find this too technically challenging and/or costly. [And frankly, given what we’ve learned of the archiving landscape, we can see their point.] The proposed Op Cit service is for these members. The vast majority of these members are Open Access publishers, so the “rights” questions are far more straightforward- making the implementation of such a service much more tractable.

Someone asked what this means for the publisher-specific DOI prefix for this content? Will this be lost?

No.

There is concern about the interstitial page that Crossref would build that gives the user access options. The value of Crossref to publishers is adding services that are invisible and beneficial to users, not adding a visible step that requires user action.

There is nothing in Crossref’s terms that says that we have to be invisible. The basic truth is that detecting content drift is really hard and several efforts to do so before have failed. Without a reliable way of knowing whether we should display the interstitial page, which may become possible in future, we have to display something for now, or the preservation function will not work.

Crossref has, also, supported user-facing interstitial services for over a decade, including:

Multiple Resolution
Coaccess
CrossMark
Crossref Metdata Search
REST API

So we have a long track record of non-B2B service provision.

There is confusion about why Crossref seems to want to build the capacity to “lock” records in absence of flexibility. People feel no need for Crossref to get involved here.

This is a misunderstanding of the terminology. The Internet Archive allows the domain owner to request content to be removed. This would mean that, in future, if a new domain owner wanted, they could remove previously preserved material from the archive, thereby breaking the preservation function. When we say we want to “lock” a record, we mean that a future domain owner cannot remove content from the preservation archive. This also prevents domain hijackers from compromising the digital preservation.

There is concern about the possibility to hack this system to give uncontrolled access to all full-text content by attacking publishing systems and making them unavailable. This is an unhappy path scenario but something on people’s minds.

The system only works on content that is provided with an explicitly stated open license (see response above).

I think this project would be improved by better addressing the people doing the preservation maintenance work that this requires. Digital preservation is primarily a labor problem, as the technical challenges are usually easier than the challenge of consistently paying people to keep everything maintained over time. Through that lens, this is primarily a technical solution to offload labor resources from small repositories to (for now) the Internet Archive, where you can get benefits from the economies of scale. There are definitely cases where that could be useful! But I think making this more explicit will further a shared understanding of advantages and disadvantages and help you all see future roadblocks and opportunities for this approach.

This consultation phase was designed, precisely, to ensure that those working in the space could have their say. While this is a technical project, we recognize that any solution must value and understand labor. That means that any scaling to production must and will also include a funding solution to address the social labor challenge.

Is there any sense in polling either the IA Wayback Machine or the LANL Memento Aggregator first to determine if snapshot(s) already exist?

We could do this, but it would add an additional hop/lookup on deposit. Plus, we want to store the specific version deposited at the specific time it is done, including re-deposits.

I would encourage looking at a distributed file system like IPFS (https://en.wikipedia.org/wiki/InterPlanetary_File_System). This would allow easy duplication, switching and peering of preservation providers. Correctly leveraged with IPNS; resolution, version tracking and version immutability also become benefits. Later after beta the IPNS metadata could be included as DOI metadata.

We had considered IPFS for other projects, but really, for this, we want to go with recognised archives, not end up running our own infrastructure for preservation.

It might be useful to look into the 10320/loc option for the Handle server: the https://www.handle.net/overviews/handle_type_10320_loc.html. I can imagine a use case where a machine agent might want to access an archive directly without needing to go to an interstitial page.

It is good to see reference to the HANDLE system and alternative ways that we might use it. We will consult internally on the technical viability of this.

In general, though, we prefer to use web-native mechanisms when they are available. We already support direct machine access via HTTP redirects and by exposing resource URLs in the metadata that can be retreivd via content negotiation. In this case, we would be looking at supporting the 300 (multiple choice) semantics.

I’m curious to see how this will work for DOI versioning mechanisms like in Zenodo, where you have one DOI to reference all versions as well as version specific DOIs. If your record contains metadata + many files and a new version just versions one of the several files my assumption is that within the proposed system an entire new set (so all files) is archived. In theory this could also be a logical package, where simply the delta is stored, but I guess in a distributed preservation framework like the one proposed here, this would be hard to achieve.

This is a good point and it could lead to many more, frustrating, hops before the user reaches the content. We will conduct further research into this scenario, but we also note that Zenodo’s DOIs do not come from Crossref, but from DataCite.

There’s a decent body of research at this point on automated content drift detection. This recent paper: https://ceur-ws.org/Vol-3246/10_Paper3.pdf likely has links to other relevant articles.

We have no illusions about the difficulty of detecting semantic drift but this is helpful and interesting. We will read this material and related articles to appraise the current state of content drift detection.

Out of curiosity, will we be using one type of archive (i.e., IA or CLOCKSS or LOCKSS or whatever) or will it possibly be a combination of a few archives? Reading the comments, it looks like some of them charge a fee, so I see why we’d use open source solutions first. Also, eventually could it be something that the member chooses? i.e. which archive they might want to use. Again, the latter question isn’t something for the prototype, but I’m curious about this use case. Also, I wonder about the implementation details if it is more than one archive. The question is totally moot of course, if we’re sticking with one archive for now.

The design will allow for deposit in multiple archives – and we will have to design a sustainability model that will cover those archives that need funding. As above, this is an important part of the move to production.

Will be good for future interoperability to make sure at least one of the hashes is a SoftWare Hash IDentifier (see swhid.org). The ID is not really software specific and will interoperate with the Software Heritage Archive and git repositories.

We will certainly ensure best practices for checksums.

Comments on the Interstitial Page

I’d keep the interstitial page without planning its eradication. (See why in the last paragraph) I’d even advocate for it to be a beautiful and useful reminder to users that “This content is preserved”. I’d go further and recommend that publishers deposit alternate urls of other preservation agents like PMC etc, that would also be displayed. This page could even be merged with multi-resolution system.

The why: I’m concerned of hackers and of predatory publishers exploiting the spider heuristics by highjacking small journals and keeping just enough metadata as in them as to fool the resolver and then adding links to whatever products, scams and whatnots…

Technical. Scraping landing pages is hard. We’ve had a lot of projects to do this over the years. You can mitigate the risk by tiering / heuristics. Maybe even feedback loop to publishers to encourage them to put the right metadata on the landing page.

This is the only part of this proposal that I don’t like. People are used to DOIs resolving directly to content, and I don’t think that should be changed unless absolutely necessary. I would prefer that the DOI resolves to the publisher’s copy if it exists, and the IA copy otherwise.

We will continue the discussion about the interstitial page. The basic technical fact, as above, is that detecting content drift is hard and so we may need, at least, to start with the page. However, some commentators presented reasons for keeping it.

We also have already supported interstitial pages for multiple resolution and co-access for over a decade.

It is member’s choice whether they wish to deposit alternative URLs and we already have a mechanism for this.

2023 board election slate

Lucy Ofiesh — Wed, 27 Sep 2023 00:00:00 +0000

I’m pleased to share the 2023 board election slate. Crossref’s Nominating Committee received 87 submissions from members worldwide to fill seven open board seats.

We maintain a balance of eight large member seats and eight small member seats. A member’s size is determined based on the membership fee tier they pay. We look at how our total revenue is generated across the membership tiers and split it down the middle. Like last year, about half of our revenue came from members in the tiers $0 - $1,650, and the other half came from members in tiers $3,900 - $50,000. We have two large member seats and five small member seats open for election in 2023.

The Nominating Committee presents the following slate.

The 2023 slate

Tier 1 candidates (electing five seats):

Beilstein-Institut, Wendy Patterson
Korean Council of Science Editors, Kihong Kim
Lujosh Ventures Limited, Olu Joshua
NISC Ltd, Mike Schramm
OpenEdition, Marin Dacos
Universidad Autónoma de Chile, Dr. Ivan Suazo
Vilnius University, Vincas Grigas

Tier 2 candidates (electing two seats):

Association for Computing Machinery (ACM), Scott Delman
Oxford University Press, James Phillpotts
Public Library of Science (PLOS), Dan Shanahan
University of Chicago Press, Ashley Towne

Here are the candidates’ organisational and personal statements

You can be part of this important process by voting in the election

If your organisation is a voting member in good standing of Crossref as of September 10th, 2023, you are eligible to vote when voting opens on September 27th, 2023.

How can you vote?

Your organisation’s designated voting contact will receive an email from eBallot the week of September 25th with the Formal Notice of Meeting and Proxy Form with concise instructions on how to vote. The email will include a username and password with a link to our voting platform.

The election results will be announced at the LIVE23 online meeting on October 31st, 2023. Save the date! Incoming members will take their seats at the March 2024 board meeting.

News: Crossref and Retraction Watch

Ginny Hendricks — Tue, 12 Sep 2023 00:00:00 +0000

https://doi.org/10.13003/c23rw1d9

Crossref acquires Retraction Watch data and opens it for the scientific community

Agreement to combine and publicly distribute data about tens of thousands of retracted research papers, and grow the service together

12th September 2023 —– The Center for Scientific Integrity, the organisation behind the Retraction Watch blog and database, and Crossref, the global infrastructure underpinning research communications, both not-for-profits, announced today that the Retraction Watch database has been acquired by Crossref and made a public resource. An agreement between the two organisations will allow Retraction Watch to keep the data populated on an ongoing basis and always open, alongside publishers registering their retraction notices directly with Crossref.

Both organisations have a shared mission to make it easier to assess the trustworthiness of scholarly outputs. Retractions are an important part of science and scholarship regulating themselves and are a sign that academic publishing is doing its job. But there are more journals and papers than ever, so identifying and tracking retracted papers has become much harder for publishers and readers. That, in turn, makes it difficult for readers and authors to know whether they are reading or citing work that has been retracted. Combining efforts to create the largest single open-source database of retractions reduces duplication, making it more efficient, transparent, and accessible for all.

Product Director Rachael Lammey says, “Crossref is focused on documenting and clarifying the scholarly record in an open and scalable form. For a decade, our members have been recording corrections and retractions through our infrastructure, and incorporating the Crossmark button to alert readers. Collaborating with Retraction Watch augments publisher efforts by filling in critical gaps in our coverage, helps the downstream services that rely on high-quality, open data about retractions, and ultimately directly benefits the research community.”

The Center for Scientific Integrity and the Retraction Watch blog will remain separate from Crossref and will continue their journalistic work investigating retractions and related issues; the agreement with Crossref is confined to the database only and Crossref itself remains a neutral facilitator in efforts to assess the quality of scientific works. Both organisations consider publishers to be the primary stewards of the scholarly record and they are encouraged to continue to add retractions to their Crossref metadata as a priority.

“Retraction Watch has always worked to make our highly comprehensive and accurate retraction data available to as many people as possible. We are deeply grateful to the foundations, individuals, and members of the publishing services industry who have supported our efforts and laid the groundwork for this development,” said Ivan Oransky, executive director of the Center for Scientific Integrity and co-founder of Retraction Watch. “This agreement means that the Retraction Watch Database has sustainable funding to allow its work to continue and improve.”

Please join Crossref and Retraction Watch leadership, among other special guests, for a community call on 27th September at 1 p.m. UTC to discuss this new development in the pursuit of research integrity.

Supporting details

Crossref retractions number 14k, and the Retraction Watch database currently numbers 43k. There is some overlap, making a total of around 50k retractions.
The full dataset has been released through Crossref’s Labs API, initially as a .csv file to download directly: https://api.labs.crossref.org/data/retractionwatch?name@email.org (add your ‘mailto’). Edit: 2024-10-10: The full dataset is available in a git repository at https://gitlab.com/crossref/retraction-watch-data.
The Crossref Labs API also displays information about retractions in the /works/ route when metadata is available, such as https://api.labs.crossref.org/works/10.2147/CMAR.S324920?name@email.org (add your ‘mailto’). If you don’t have a .json viewer, please see below for screenshot.
Crossref is paying an initial acquisition fee of USD $175,000 and will pay Retraction Watch USD $120,000 each year, increasing by 5% each year.
The initial term of the contract is five years. ~~The full text of the contract will be made public in the coming fortnight.~~ EDIT 2023-09-26: Here is the signed agreement.
There will be a community call on 27th September at 1 p.m. UTC (your time zone here). Please register.
An open FAQ document is available to collect questions to be answered at the webinar.
This announcement will always be accessible via Crossref DOI https://doi.org/10.13003/c23rw1d9; please use this persistent link for sharing.

About Retraction Watch and The Center for Scientific Integrity

The Center for Scientific Integrity is a U.S. 501(c)3 non-profit whose mission is to promote transparency and integrity in science and scientific publishing, and to disseminate best practices and increase efficiency in science. In addition to maintaining and curating the Retraction Watch Database, the Center is the home of Retraction Watch, a blog founded in 2010 that reports on scholarly retractions and related issues in research integrity.

About Crossref

Crossref is a global community infrastructure that makes all kinds of research objects easy to find, assess, and reuse through a number of services critical to research communications, including an open metadata API that sees over 2 billion queries every month. Crossref’s >19,000 members come from 151 countries and are predominantly university-based. Their ~150 million DOI records contribute to the collective vision of a rich and reusable open network of relationships connecting research organisations, people, things, and actions; a scholarly record that the global community can build on forever, for the benefit of society.

Enquiries

For Retraction Watch/Center for Scientific Integrity: Ivan Oransky, ivan@retractionwatch.com
For Crossref: Ginny Hendricks, ginny@crossref.org

A screenshot of an example Labs API metadata record with a Retraction Watch-asserted retraction

Open Funder Registry to transition into Research Organisation Registry (ROR)

Amanda French — Thu, 07 Sep 2023 00:00:00 +0000

Today, we are announcing a long-term plan to deprecate the Open Funder Registry. For some time, we have understood that there is significant overlap between the Funder Registry and the Research Organisation Registry (ROR), and funders and publishers have been asking us whether they should use Funder IDs or ROR IDs to identify funders. It has therefore become clear that merging the two registries will make workflows more efficient and less confusing for all concerned. Crossref and ROR are therefore working together to ensure that Crossref members and funders can use ROR to simplify persistent identifier integrations, to register better metadata, and to help connect research outputs to research funders.

Just yesterday, we published a summary of a recent workshop between funders and publishers on funding metadata workflows that we convened with the Dutch Research Council (NWO) and Sesame Open Science. As the report notes, “open funding metadata is arguably the next big thing” [in Open Science]. That being the case, we think this is the ideal time to strengthen our support of open funding metadata by beginning this transition to ROR.

Comparing the features of ROR and the Funder Registry

Let’s look at some of the major similarities and differences between the two registries, including their history, features, scope, and usage, since there are important nuances and distinctions that are helpful to understand.

Overview

ROR	Funder Registry
Launched in 2019	Launched in 2013
Primary use case is contributor affiliation	Primary use case is funding acknowledgement
105k+ records	35k+ records
CC0 data	CC0 data
REST API	REST API
Free to use	Free to use
Entire registry downloadable as JSON and CSV	Entire registry downloadable as RDF; funder names and IDs downloadable as CSV
Records contain mappings to other IDs	Records do not contain mappings to other IDs
organisation relationships and hierarchy	organisation relationships and hierarchy
8 organisation types	2 funder types, 8 funder subtypes
Open source code and multiple open-source tools available	Open source code
Web-based registry search	Web-based search for works in Crossref associated with each Funder ID
Web-based landing pages for each ROR record	JSON landing pages for each Funder Registry record
Updated monthly	Updated monthly
Public curation process	Private curation process
Anyone can request changes and additions	Anyone can request changes and additions
Stable financial support	Stable financial support
Beginning to be supported in funding and publishing workflows	Somewhat well supported in most funding and publishing workflows
Currently used by 260+ Crossref members ¹	Currently used by 2100+ Crossref members ²

History

The Open Funder Registry was launched as FundRef over a decade ago to enable the community to cite research financing and assert it within the scholarly record, acknowledging the organisations granting their support. Elsevier generously donated the seed data for the Funder Registry and has managed its curation for the last ten years, while we have maintained the technical operations and promoted community adoption of the Funder Registry.

The Research Organisation Registry (ROR) was introduced in 2019 by the California Digital Library, DataCite, and Crossref to enable the community to cite contributor affiliations and assert them within the scholarly record, acknowledging the organisations that housed or performed the research. Digital Science generously donated the seed data for the Research Organisation Registry from its Global Research Identifier Database (GRID) initiative, and Crossref, DataCite, and the California Digital Library have contributed labor and resources to turn ROR into a mature, independent, freely available offering.

Scope

One key difference between the registries is that ROR has always included funding organisations, and ROR records have always included mappings to Funder IDs where available, while the reverse is not true: the Funder Registry includes only funding organisations, not other kinds of organisations, and Funder Registry records do not currently include mappings to ROR IDs or other identifiers. It therefore makes sense to expand ROR’s initial contributor affiliation use case to include the function of identifying research financing.

Usage

More Crossref members use Funder IDs than use ROR IDs, to be sure. You can see from the table above that the number of Crossref members using Funder IDs in Crossref records is higher by almost a factor of 10 than the number of Crossref members using ROR IDs in Crossref records. But note too that the current rate of adoption is far higher for ROR than it is for the Funder Registry. Since January of 2022, we’ve seen a gratifying number of publishers and service providers beginning to use ROR identifiers for contributor affiliations in Crossref. In the last year, the number of Crossref members depositing ROR IDs has increased by 356%, while the number depositing Funder IDs has increased only by 12%. As evidenced by its ballooning API traffic, too, with more than 20 million requests last month,³ ROR is clearly being used by many scholarly research systems for many purposes. The more systems that use an identifier, the more valuable that identifier becomes as a vehicle for exchanging information.

Even though ROR’s primary use case has been to identify contributor affiliations, ROR is in fact already being used by funders. Nineteen funding organisations are depositing ROR IDs in their grant records with Crossref to denote principal investigator affiliations,⁴ and, following a meeting of the Crossref Funder Advisory Group last month, all eighty funder members are primed to start using ROR IDs to identify themselves in grant records. DataCite has allowed ROR IDs as a funding identifier since 2019⁵, and while there are currently over 877,000 DataCite records that use Funder IDs to identify funders,⁶ there are also over 161,000 DataCite records that use ROR IDs to identify funders.⁷

Tools and services

Both the Funder Registry and ROR offer open data and open source code, but we think that ROR’s suite of free and open source utilities (some of which were developed by Crossref staff) gives it a competitive advantage. We know that publishers and their service providers have ongoing challenges in collecting and matching funding information from authors and in validating Funder IDs. With ROR’s extensive toolkit, publishers and their technology providers who adopt ROR will be in a much better position to improve the accuracy of funding acknowledgements in metadata, which can in turn enable the development of reliable analytics, tools, and services for funders, regulators, research facilities, and the public.

Crossref has built tools based on OpenRefine for both the Funder Registry and ROR: the Open Funder Registry Reconciliation Service and the ROR Reconciler are both useful ways to clean messy data. ROR, however, also offers a much-used API endpoint that helps match organisation names to ROR IDs, and several third parties have also developed and shared open source matching tools and services for ROR. Crossref and ROR are also collaborating on new strategies for affiliation matching that will be able to match funding references.

Community engagement models

The Funder Registry has been curated for over a decade through time and expertise generously donated by Elsevier. ROR offers more transparency and community involvement; it is openly governed by Crossref, DataCite, and the California Digital Library and is advised by a global network of community stakeholders through its Steering Group and Community Advisory Group. ROR is openly curated and is aided by a global Curation Advisory Board of volunteers.

Summary

For all of the above reasons, then, we believe that in the long term ROR will serve the community better as an identifier for funders. In a future post, we’ll do an even deeper dive into comparing the Funder Registry and ROR, comparing the metadata and data in each registry and giving statistics on funder assertions in our metadata.

What will this mean for you?

The many organisations whose tools, services, and workflows have been architected to use Funder Registry IDs will find this transition a challenge, and we don’t want to make light of that issue. Over the last ten years, we have encouraged the community to adopt Funder IDs, and the community has demonstrably recognized the benefits of doing so. Publishers have put a great deal of time, thought, and effort into collecting funder data and including it in Crossref metadata, and they have built internal reports and workflows around the Funder Registry. Both Crossref and ROR are committed to making the transition from the Funder Registry to the Research Organisation Registry as simple as possible for those who have adopted the Funder Registry.

If you are not already using the Funder Registry and are planning to begin standardizing funding data, we recommend that you use ROR to identify funders. If you are currently using the Funder Registry in your systems and workflows, don’t worry! In the short term, and even in the medium term, Funder IDs aren’t going away. Eventually, however, the Funder Registry will cease to be updated, so any new funders will only be registrable in Crossref metadata with ROR IDs. Legacy Funder IDs and their mapping to ROR IDs will be maintained, so if Crossref members submit a legacy Funder ID, it will get mapped to a ROR ID automatically. Note, too, that Crossref is committed to maintaining the current funder API endpoints until ROR IDs become the predominant identifier for newly registered content.

In short, if you are already using Funder IDs, you can and should continue to do so. However, we do recommend that you begin looking at what it will take to integrate ROR into your systems and workflows for identifying funders. Think of it as warming up before a workout: it’s time to start swinging your arms and stretching your hamstrings.

We face challenges in this transition, too. Of these, we think the largest will be (1) completing the reconciliation work involved in mapping Funder IDs to ROR IDs, and (2) overhauling Crossref’s schemas, APIs, and deposit tools to support ROR IDs in all the ways we currently support Funder IDs. We’ll discuss both of these challenges in future blog posts, but it’s worth saying that any challenges pale in comparison to the benefit of enabling the whole community to use a single open identifier in multiple places in the scholarly record.

Tell us what you need!

We want to hear from you. You can use our Community Forum talk to us about the Crossref Funder Registry, and you can join the ROR Slack to talk to the ROR team and community. You can also contact Crossref via our request form or email ROR at info@ror.org, and you can attend online Crossref events and ROR events to get updates from us and ask us your questions.

One of the major messages we’re already hearing from funders and publishers is expressed in yesterday’s post on open funding metadata: “While many concluded that there was still a long way to go to solve the many technical challenges related to funding metadata, attendees were unanimous on its importance.” We look forward to beginning this important work together.

Open funding metadata through Crossref; a workshop to discuss challenges and improving workflows

Hans de Jonge — Wed, 06 Sep 2023 00:00:00 +0000

Ten years on from the launch of the Open Funder Registry (OFR, formerly FundRef), there is renewed interest in the potential of openly available funding metadata through Crossref. And with that: calls to improve the quality and completeness of that data. Currently, about 25% of Crossref records contain some kind of funding information. Over the years, this figure has grown steadily. A number of recent publications have shown, however, that there is considerable variation in the extent to which publishers deposit these data to Crossref. Technical but also business issues seem to lie at the root of this. Crossref - in close collaboration with the Dutch Research Council NWO and Sesame Open Science - brought together a group of 26 organisations from across the ecosystem to discuss the barriers and possible solutions. This blog presents some anonymized lessons learned.

There is no Open Science without open metadata

The interest in the potential of this open-source funding metadata seems to be entering a new stage. When registering (or updating) a DOI record for a publication, publishers can include information about the funding of the research. The Open Funder Registry grew out of recommendations in the report from the US Scholarly Publishing Roundtable in 2010. During the Annual Meeting of Crossref that year, Frederick Dylla, CEO of the American Institute of Physics, argued that in order to make research funding information in publications accessible, it needed to be presented in a standard way and stored in a central location.

The benefits of having open funding metadata available, listed by Dylla in his presentation 13 years ago, are still very valid:

Researchers benefit because it increases transparency of their funding sources and supports the requirements they already have from their funders.
For funders, having this data available is essential because it allows them to identify the published outcomes of publicly funded research. Essential to monitor compliance with open access policies, but also important given the pressures funders face to account for their spending of public money.
For publishers, funding metadata provides a valuable service, as it provides insight into how the research they publish is funded.

Although Crossref has been collating funding metadata for many years, there seems to be a renewed interest in this service. Publishers have long expressed a desire to solve the challenges, meta-researchers need this information in order to analyze research on research, editors are concerned with research integrity, including funding trends, and funders themselves need to track the reach and return of their support.

Open Science seems to be an important driver: As we move to an ecosystem built on Open Science principles, not only publications, data, and software need to be openly available, but also the metadata associated with those scholarly outputs. Indeed, in an Open Science world, all meta information should be open, and academia should not be dependent anymore on data from proprietary bibliographic databases. Indicators for research assessment and policy development should be open indicators, derived from open metadata. Much has been done in this area already, in the context of Open Citations and Open Abstracts. While many in the community have focused on the bigger picture of advocating for all open metadata, e.g. Metadata 20/20, open funding metadata is arguably the next big thing. Open Research Information, including open metadata, must be a strategic priority for science and society.

Room for improvement

After ten years of collecting funding metadata, 25% of records in Crossref contain some kind of funding information, and this figure was reached by a steady growth over that time. A number of recent studies have shown, however, that there is room for improvement. A case study published by two of the present authors has shown that the extent to which publishers deposit funding information to Crossref varies considerably. Some larger society presses - American Chemical Society (ACS), American Physical Society (APS), and Royal Society of Chemistry (RSC) - perform exceptionally well, with almost 100% of publications containing funding information. But there is still a large number of publishers - among them large legacy publishers - that attain substantially lower figures or do not seem to deposit funding metadata at all. Our case study has shown that often this cannot be explained by the fact that authors have not provided any funding information, as often this information is available in the acknowledgement sections of the papers. Somehow, however, this data does not find its way to Crossref.

Workflows and challenges: collect, retain, validate, deposit

In order to chart the challenges that publishers face when collecting this information, we organized a roundtable session. 26 organisations were invited from across the ecosystem. These included: major publishers (American Chemical Society, British Medical Journal, Elsevier, IOP Publishing, PLOS, Royal Society of Chemistry, Sage, Springer Nature, Taylor & Francis, and Wiley), funders (European Research Council, Austrian Research Council, Dutch Research Council, OSTI-DOE, UKRI, and Michael J Fox Foundation) as well as service providers (Aries Editorial Manager, PKP / OJS, Scholastica, and eJournal Press).

In order to map the potential barriers and challenges publishers face, participants were presented with a workflow scheme representing a hypothetical production process.

This workflow outlined the steps in the production process at which funder information would potentially be handled, as well as some of the considerations that might be at play at each step.

collecting funder information (upon submission or acceptance)
extracting funder information from full text
retaining funder information through the production workflow
including funder information in article metadata
making metadata and/or full text available for indexing

Participants were invited to comment on this workflow and place digital dots in the scheme to identify challenges in the collection, retention, and deposit of funding information. These pain points were afterwards fleshed out in break-out groups.

Lessons learned

1. Still a lack of awareness among editors and authors

For many journals and publishers, collecting funding information starts when papers are submitted through submission systems. Many publishers use the same systems: ScholarOne and Editorial Manager, though many have multiple systems in place for different portfolios of journals. Around 25,000 journals use PKP’s Open Journal System, and Scholastica and eJournal Press are growing in popularity and importance. All of them provide the possibility for authors to enter funder information but this does not by all means mean that all journals make use of it. Submission systems are highly customizable, and publishers tend to tailor systems to the needs and wishes of their journals. Editors who do not see much value in collecting funding metadata therefore present a first ‘weak link’. Publishers and tech providers agreed that more outreach is needed about the importance of funding metadata among editors and authors.

2. Improvements are needed in submission systems

Where journals and publishers agree on asking authors to register funding information through the submission systems, many express a tension between collecting structured metadata and making it as easy as possible for authors. Many are hesitant to use mandatory input fields. Instead, funding metadata is often collected as free text, giving rise to a plethora of ambiguities. Most systems provide suggestions based on the input of the author based on the Open Funder Registry. A lot seems to go wrong at this stage. Authors often persist in the wrong spelling of their funder and do not choose predefined suggestions, making it very difficult to match input to Funder IDs. Publishers estimated the number of non-matches up to 50%. Trivial issues like “Bill & Melinda” versus “Bill and Melinda” or “Netherlands organisation” versus “Netherlands Organisation” result in errors. Here, autocomplete techniques seem to be in dire need of improvement. Based on a preliminary analysis of funder name variants used in Crossref, adding up to 3 of the most frequently used name variants to the list of ‘alternative funder names’ in the Funder registry could solve around 60% of missed matches.

3. A lot can be learned from how some publishers have changed and organized their workflows

Faced with these issues, the Royal Society of Chemistry has invested in innovative workflows to enhance the availability of funding metadata. Instead of relying solely on the free text input of the authors, RSC presented to the group the details of how they have tackled the issue. In addition to author-provided acknowledgements, they work with third-party production vendors to programmatically extract information from the acknowledgement section of papers. Data from the two sources are compared, and when differences or conflicts are being noted, the data is fixed, completed, and reformatted. The next step is crucial - the newly-cleansed funding data is fed back to the author for validation, and retained during the production phase of the paper. Implementation of this validation stage has increased the availability of funding metadata by 30%. In 2023 80% of papers published by RSC have some kind of structured funding metadata. An additional benefit of this feedback loop was its educational effect by alerting authors to the importance of correct funding information. But even RSC continues to struggle with issues of funder name ambiguity, use of acronyms, authors reporting grant or award names instead of funder names, issues with phraseology of funding acknowledgements, and frustrations with the user experience of the service provider integrations with the OFR.

Many publishers agreed that collecting funding information from full-text papers is the preferred option. Not only because it lowers the burden for authors, but also because this potentially renders better data as this is where authors are expected to include this information as part of their funder’s commitments.

4. Retaining information and submitting: no big deal

At the beginning of the workshop, it was expected that maybe the retention of funding information and the propagation through various interlinked systems might pose problems for publishers. However, this was not identified as a problem by participants. Nor was there mention of any challenges in depositing information to Crossref, nor of downstream databases having difficulties retrieving the metadata.

5. There is a genuine interest across the ecosystem to improve funding information in Crossref

While many concluded that there was still a long way to go to solve the many technical challenges related to funding metadata, attendees were unanimous on its importance. Participants agreed that these improvements would require investments from publishers. A willingness to do those was expressed, but also a sense that publishers who do should be incentivised for it, maybe as part of the agreements they have with library consortia. JISC’s recent contract with Taylor & Francis (page 164, Section 7a (iii)) is a good example of how consortia can successfully negotiate the supply of high quality metadata, including funding metadata. It was agreed that another solution could be to allow the additional deposit of the free-text acknowledgement section as a metadata field in Crossref. Instead of educating authors to enter their data correctly or relying on publishers and tech providers to improve their systems to turn free text funder acknowledgement text to structured data, text mining and machine learning could facilitate the improvement of this data.

Next steps

For this workshop, we concentrated on the collection and registration of funding metadata by publishers and did not go into the important, related, issue of the Crossref Grant Linking System (Grant IDs) nor of the plans to further align funder IDs with ROR IDs, both projects that help the community to better record funding information.

Next steps resulting from this community workshop, as

Funders are encouraged to join and register their grants with Crossref DOIs so that registered grants can in future be linked directly to publications and other outputs. About 50 funders have already created around 90,000 grant records. The more grant DOIs that are created by funders, the more likely publishers will be able to prioritize collecting them in their own publication metadata.
Publishers are encouraged to work with their service providers to prioritize the quality of the open funding metadata through Crossref, which is a source for downstream analyses and inclusion by many thousands of tools and services.
Other stakeholders are also offering opportunities to focus on funding metadata, showing a growing interest in the completeness of funder metadata. For example, OA Switchboard’s funder pilot, which also looks at the potential to feed enriched metadata back to Crossref to make them publicly available, and the Open Research Funder Group’s work to promote the improvement of tracking research output, including funding metadata, which includes an active working group in this area.
Crossref will continue to work with publishers and service providers to encourage and make it easier to include funder information in article metadata, including the use of grant identifiers and funder identifiers. Work is underway to bring the Open Funder Registry closer to ROR (Research Organisation Registry), and is planning, at some point in the future, to merge the OFR into ROR, as ROR has a much wider scope and is more broadly community-governed. Crossref has also begun some work on collecting ROR IDs where we currently collect Funder IDs. More technical information is available in this ticket).

We would like to thank all the participants of the workshop for their openness and commitment to working through these issues together. It was a rare opportunity to share insights from publishers, service providers, funders, and researchers - and a useful first step in co-creating a shared understanding of the challenges and charting a path forward.

Perspectives: My thoughts on starting my new role at Crossref

Johanssen Obanda — Thu, 06 Jul 2023 00:00:00 +0000

My name is Johanssen Obanda. I joined Crossref in February 2023 as a Community Engagement Manager to look after the Ambassadors program and help with other outreach activities. I work remotely from Kenya, where there is an increasing interest in improving the exposure of scholarship by Kenyan researchers and ultimately by the wider community of African researchers. In this blog, I’m sharing the experience and insights of my first 4 months in this role.

Right before joining Crossref, I was working as Stakeholder Manager with AfricArXiv, a community-led digital archive for African research communication. I transitioned to working with Crossref to take up a more challenging role, so I can apply the community-building and social innovation skills I gained over the last five years in my profession.

What surprised me the most here is realising that such a robust infrastructure is being administered by a relatively small team. I wondered how the team keeps the services running and builds new solutions for the community. However, I am impressed by the collaborative culture, positive and healthy work environment, and great systems.

I work within the Community Engagement and Communications team, where we collaboratively address members’ questions and challenges, plan events, create helpful content for our community and keep in touch with them. We help grow our community and create a better experience using our products and services.

My main focus has been the Ambassador programme, which started in 2018 and currently comprises 48 Ambassadors globally. The Ambassadors are our trusted contacts who support and engage our communities locally to make scholarly communications better. Through one-on-one virtual interaction with most of them, I noted that there was little interaction among the Ambassadors. Most of our Ambassadors want to connect more, both face-to-face and online. In the coming months, we aim to design our meetings together with the Ambassadors to encourage better exchange and relationships.

I value Crossref’s insistence on diversity, equity and inclusion, and I enjoy contributing to those activities. Working with my colleagues in the outreach team to organise webinars and activities for the Global Equitable Membership (GEM) programme has been an exciting experience. I particularly enjoyed engaging with our Ambassadors Shaharima Parvin and Jahangir Alam from Bangladesh, and Binayak Pandey from Nepal, in organising the initial webinars for the GEM program in their countries. I feel it is one of the ways of creating more in-depth connections between our communities and our Ambassadors while making it possible for more institutions to be part of Crossref and contribute to scholarly communication.

I have made a few webinar presentations online and recently did one in-person poster presentation in South Africa at the Sustainability, Research and Innovation conference. I gained more confidence interacting with the wider Crossref community and a deeper understanding of Crossref’s services. I look forward to more opportunities to discuss Crossref’s mission with the community and to collaborate with like-minded organisations, contributing to joint initiatives, such as the upcoming Better Together webinar series with ORCID and DataCite, and the Forum for Open Research in MENA events.

I experienced the challenges of working remotely in many ways. A couple of days, there was no power, other days the internet connection was painfully slow, and hopping from one restaurant to another was something I had to deal with from time to time, with the hopes of finding quiet most times to have a good meeting with my colleagues, until I had more dependable work station. On the positive side, coordinating meeting times with colleagues, taking on tasks asynchronously and collaborating in real-time across different tools are making me more agile, patient and empathetic with myself and my colleagues.

I am driven by the impact I want to contribute in my career working with Crossref, which is to build an inclusive research ecosystem where researchers across the globe can easily access scientific knowledge and make meaningful connections. And I feel confident about my colleagues, our systems and infrastructure and my capabilities to be part of a thriving community and organisation.

A Request for Comment - Automatic Digital Preservation and Self-Healing DOIs

Martin Eve — Thu, 29 Jun 2023 00:00:00 +0000

Digital preservation is crucial to the “persistence” of persistent identifiers. Without a reliable archival solution, if a Crossref member ceases operations or there is a technical disaster, the identifier will no longer resolve. This is why the Crossref member terms insist that publishers make best efforts to ensure deposit in a reputable archive service. This means that, if there is a system failure, the DOI will continue to resolve and the content will remain accessible. This is how we protect the integrity of the scholarly record.

I will write another post, soon, on the reality of preservation of items with a Crossref DOI, but recent work in the Labs team has determined that we have a situation of drastic under-preservation of much scholarly material that has been assigned a persistent identifier. In particular, content from our smaller Crossref members, with limited financial resources, is often precariously preserved. Further, DOI URLs are not always updated, even when, for instance, the underlying domain has been registered by a different third party. This results in DOIs pointing to new, hijacked, and elapsed content that does not reflect the metadata that we hold.

We (Geoffrey) have (has) long-harboured ambitions to build a system that would allow for automatic deposit into an archive and then to present access options to the resolving user. This would ensure that all Crossref content had at least one archival solution backing it and greatly contribute to the improved persistent resolvability of our DOIs. We refer to this, internally, as “Project Op Cit”. And we’re now in a position to begin building it.

However, we need to get this right from the design phase out. We need input from librarians working in the digital preservation space. We need input from members on whether they would use such a service. We are not digital preservation experts and we are acutely aware that we need the expertise of those who are, particularly where we’ve had to take some shortcuts. For instance: we are aware that the Internet Archive is perhaps not the first choice of many digital preservation librarians and specialists, who opt for specific scholarly-communications solutions. However, it is easy, open, and free. Hence, we propose for the prototype to use IA, on the assumption that this will be a proof-of-concept only, which we will expand to other archives if there is demand and once it works.

So: please do read the below and add your comments and questions to this thread in the community forum (link below), or send me queries/concerns by email. It would be excellent if we could receive comments by mid-August 2023. If you would rather comment on a Google doc, that’s also possible.

If enough people are interested, we could also host a community call to discuss this design and its prototyping. Do please, when emailing, let me know if this is of interest.

Project Op Cit (Self-Healing DOIs)

Request for Comment

This document sets out the problem statement, a proposed prototype solution, and a transition path to production if successful.

Proposed Prototype Solution

For members who opt-in to the service, We have a special class of DOI (only for open-access content) where, when the DOI is registered:

We immediately make an archive of the item with any archiving services that care to participate in the project (minimally, the Internet Archive, which is the easiest for us to begin with, but a modular/pluggable archival system). The Internet Archive Python Library should let us submit to them. We could pursue other arrangements with CLOCKSS, LOCKSS, and Portico.
We update the XML to reflect the archives to which it has been submitted.
The DOI landing page is redirected to an interstitial page that we control. This page gives the user access options.
We develop processes to determine whether the original URL “works”. The heuristics that define whether a resource has changed substantially or works need long-term consideration and real-world testing. Using the interstitial page approach will allow us to refine this, with a long-term goal of eradicating it.

Figure 1: The Deposit Process

Figure 2: The Resolution Process

Potential Challenges

Content drift. It would be extremely difficult to detect content change vs. (eg) page structure change, except in the case of binary fulltext. However, we can poll for the DOI at an HTML endpoint and detect when binary fulltext items, such as a PDF, change.
Latency on resolver if lookup is real-time. For this reason, we need a periodic crawler so that resolvers do not wait for real-time detection on access.
If using Internet Archive, the domain owner (at the present moment) can request the removal of content. We would need the capacity to “lock” records that are being used as Op Cit redirection archival copies. This requires a further conversation with the Internet Archive.

Prototype Components/Architecture

Registration Proxy and Database (“Fleming”)

The registration proxy implements a pass-through to the deposit API and hosts a relational database of self-healing DOIs (Postgres). It will be hosted at api.labs.crossref.org/deposit/opcit and clients will have to use this endpoint to deposit. Simultaneously, the proxy will:

Determine the license status of the incoming item.
If the license is open and fulltext is provided, deposit a copy in selected digital preservation archives. Store proof of licensing attestation.
In the case of binary files (fulltext PDF), store a hash of the content.
Store the DOI, binary hash, and all URLs in a relational database under “pending” state.
Pass through the request to Crossref’s content registration system.
Monitor the result of this request and remove stored data if registration fails.
Re-registration through Fleming will update existing entries and re-fix their data against content drift at this time.

Spider (“Shelob”)

A series of components that:

Check that “pending” DOIs have been successfully registered. Remove those that have not and move those that have to “active” state.
Dereference “active” DOIs and ensure that we have the most current URL in case updates have gone directly to the live resolver.
Periodically crawl URLs in the self-healing database.
On HTTP 301 code, update database entry to point to new permanent URL.
On HTTP 302 code, follow the temporary redirect expecting the original content.
On HTTP 4xx codes, mark the entry as dead.
On HTTP 200 code of HTML landing page, parse the page for the presence of the DOI. If the DOI is not present, mark the entry as dead.

Resolver Proxy (“Hippocrates”)

Display an interstitial landing page with archival versions and an explanation.
At some future point, for active entries, resolve to the stored URL (faster but could be de-synced) or pass the request to the live resolver (requires an extra hop but will always be in-sync with deposit).

Observability and statistics

Metrics we will collect:

Count of DOIs using Op Cit
Count of visitors arriving on Op Cit landing pages
Usage count of each outgoing link/access option

A daily report will present:

Newly “failed” entries that we believe have died
These will be checked extensively, particularly at first, to ascertain whether our failure heuristics are valid
Entries that have recovered

Errors will be logged and monitored via Grafana.

Documentation and Automated Tests

Core assumptions and new behaviours of the platform will be documented as part of the prototype.
Automated tests will be written, especially for the spider (“Shelob”), which must handle a diverse variety of real-world situations.

Prototype Architecture Requirements

Postgres RDS for resolution/self-healing DOI data (AWS).
FastAPI hosting for passthrough proxy (fly.io).
EC2 hosting for the spider (AWS).
FastAPI hosting for resolver proxy (fly.io).

Transition to Production

If this prototype garners popular appeal, a transition to production would need to keep some prototype components and rewrite others.

“Fleming” would need to be rewritten as a deposit module / integrated with Manifold’s (the next-generation system at Crossref) deposit. If this would create too much overhead, it need not be a blocking process in the deposit.
“Shelob” would continue to need to run continuously and to scale with the adoption of self-healing DOIs unless one of the other options were used.
Prototype architecture will be written so that spidering can be distributed between several servers, if required.
“Hippocrates” would need to be integrated into the live link resolver. Depending on how a field for a self-healing DOI is embedded in Manifold, this may not need any additional database hits.

Back Content

We also have a database of back content stored by the Internet Archive, mapped to DOIs where they have been able to do so. This data source could be used to enable self-healing DOIs on all content in this archive.

Crossref Research and Development: Releasing our Tools from the Ground Up

Martin Eve — Wed, 21 Jun 2023 00:00:00 +0000

This is the first post in a series designed to showcase what we do in the Crossref R&D group, also known as Crossref Labs, which over the last few years has been strengthened, first with Dominika Tkaczyk and Esha Datta, last year with part of Paul Davis’s time, and more recently, yours truly. Research and development are, obviously, crucial for any organisation that doesn’t want to stand still. The R&D group builds prototypes, experimental solutions, and data-mining applications that can help us to understand our member base, in the service of future evolution of the organisation. One of the strategic pillars of Crossref is that we want to contribute to an environment in which the scholarly research community identifies shared problems and co-creates solutions for broad benefit. We do this in all teams through research and engagement with our expanding community.

For example, if the metadata team wants to implement a new field in our schema, it helps to have a prototype to show to members. The Labs team would implement such a prototype. If we want to know the answer to a question about the 150m or so metadata records we have – e.g. how many DOIs are duplicates? – it’s the Labs team that will work on this.

When building such prototypes, which can often seem esoteric and one-off, though, it can be easy to believe that there is no way anybody else would re-use our components. At the same time, we find ourselves consistently working with the same infrastructures, re-using different code blocks across many applications. One of the tasks I have been working on is to extract these duplicated functions and to get them into external code libraries.

Why is this important? As many readers doubtless know, Crossref is committed to The Principles of Open Scholarly Infrastructure. For reasons of insurance, everything we do and newly develop is open source and we want our members to be able to re-use the software that we create. It’s also important because, if we centralize these low-level building blocks, we make it much easier to fix bugs when they occur, which would otherwise be distributed across all of our projects.

As a result, Crossref Labs has a series of small code libraries that we have released for various service interactions. We often find ourselves needing to interact with AWS services. Indeed, Crossref’s live systems are in the process of transitioning to running in the cloud, rather than our own data centre. It makes sense, therefore, for prototype Labs systems to run on this infrastructure, too. However, the boto3 library is not terribly Pythonic. As a result, many of our low-level tools interact with AWS. These include:

CLAWS: the Crossref Labs Amazon Web Services toolkit. The CLAWS library gives speedy and Pythonic access to functions that we use again and again. This includes downloading files from and pushing data to S3 buckets (often in parallel/asynchronously), fetching secrets from AWS Secrets Manager, generating pre-signed URLs, and more.
Longsight: A range of common logging functions for the observability of Python AWS cloud applications. Less mature than CLAWS, this is the starting point for observability across Labs applications. It supports running in AWS Lambda function contexts or pushing your logs to AWS Cloudwatch from anywhere else. It also supports logging metrics in structured forms. Crucially, the logs are all converted into machine-readable JSON format. This allows us to export the metrics into Grafana dashboards to visualize failure and performance.
Distrunner: decentralized data processing on AWS services. Easily the least mature and experimental of these libraries, distrunner is one of the ways that we distribute the workloads of our recurrent data processing. A number of the Labs projects require us to run recurrent data-processing tasks. For instance, my colleague Dominika Tkaczyk has developed the sampling framework that is regenerated once per week. We use Apache Airflow (and, specifically, Amazon Managed Workflows for Apache Airflow) to host these periodic tasks. This is useful because it gives us quick, visual oversight if tasks fail. However, the Airflow worker instances on AWS are quite severely underpowered and unsuitable for large in-memory activities. Hence, the sampling framework fires up a Spark instance for its processing. Often, though, we do not need the parallelization of Spark and just want to be able to run a generic Python script in a more powerful environment. That’s what distrunner is designed to do. The current version uses Coiled but this may change in the future.

While these tools will be useful to nobody except programmers – and this has been quite a technical post – there is a broader philosophical point to be made about this approach, in which everything is available for re-use, “from the ground up”. The point is: we also try, in Labs and in the process of “R&Ding”, to work without privileged access. That is: I don’t get “inside” access to a database that isn’t accessible to external users. I have to work with the same APIs and systems as would an end-user of our services. This means that, when we develop internal libraries, it’s worth releasing them. Because they use systems that are accessible to any of our users.

I should also say that our openness is more than unidirectional. While we are putting a lot of effort into ensuring that everything new we put out is openly accessible, we are also open to contributions coming in. If we’ve built something and you make changes or improve it, please do get in touch or submit a pull request. Openness has to work both ways if projects are truly to be used by the community.

Future posts – coming soon! – will introduce some of the technologies and projects that we have been building atop this infrastructure. This includes a Labs API system; new functionality to retrieve unpaginated datasets of whole API routes; a study of the preservation status of DOI-assigned content; and a mechanism for modeling new metadata fields.

Our annual call for board nominations

Lucy Ofiesh — Tue, 30 May 2023 00:00:00 +0000

The Crossref Nominating Committee invites expressions of interest to join the Board of Directors of Crossref for the term starting in March 2024. The committee will gather responses from those interested and create the slate of candidates that our members will vote on in an election in September.

Expressions of interest will be due Monday, June 26th, 2023.

About the board elections

The board maintains a balance of seats, with eight seats for smaller members and eight seats for larger members (based on total revenue to Crossref). This is to ensure that the diversity of experiences and perspectives of the scholarly community are represented in decisions made at Crossref.

This year we will elect two of the larger member seats (membership tiers $3,900 and above) and five of the smaller member seats (membership tiers $1,650 and below). You don’t need to specify which seat you are applying for. We will provide that information to the nominating committee.

The election takes place online, and voting will open in September. Election results will be shared at the annual meeting on October 31st. New members will commence their term in March 2024.

About the Nominating Committee

2023 Nominating Committee:

Aaron Wood, American Psychological Association, chair*
Oscar Donde, Pan Africa Science Journal*
David Haber, American Society for Microbiology
Rose L’Huillier, Elsevier*
Marie Souliere, Frontiers

(*) indicates Crossref board member

What does the committee look for

demonstrate a commitment to or understanding of our strategic agenda or the Principles of Open Scholarly Infrastructure;
have expertise that may be underrepresented on the board currently;
hold senior/director-level positions in their organisations;
have experience with governance or community involvement;
represent member organisations that are active in the scholarly communications ecosystem;
demonstrate metadata best practices as shown in the member’s participation report

Board roles and responsibilities

Crossref’s services provide a central infrastructure to scholarly communications. Crossref’s board helps shape the future of our services and, by extension, impacts the broader scholarly ecosystem. We are looking for board members to contribute their experience and perspective.

Setting the strategic direction for the organisation;
Providing financial oversight; and
Approving new policies and services.

The board is representative of our membership base and guides the staff leadership team on trends affecting scholarly communications. The board sets strategic directions for the organisation while also providing oversight into policy changes and implementation. Board members have a fiduciary responsibility to ensure sound operations. Board members do this by attending board meetings, as well as joining more specific board committees.

Who can apply to join the board?

What is expected of board members?

Board members attend three meetings each year that typically take place in March, July, and November. Meetings have taken place in various international locations, and travel support is provided when needed. March and November board meetings are held virtually, and all committee meetings take place virtually. Each board member should sit on at least one Crossref committee. Care is taken to accommodate the wide range of timezones in which our board members live.

While the expressions of interest are specific to an individual, the seat that is elected to the board belongs to the member organisation. The primary board member also names an alternate who may attend meetings if the primary board member cannot. There is no personal financial obligation to sit on the board. The member organisation must remain in good standing.

Board members are expected to be comfortable assuming the responsibilities listed above and to prepare and participate in board meeting discussions.

How to apply

Please click here to submit your expression of interest. We ask for a brief statement about how your organisation could enhance the Crossref board and a brief personal statement about your interest and experience with Crossref.

Please contact me with any questions at lofiesh@crossref.org

Metadata connects the global community – summary of our Community update 2023

Kornelia Korzec — Fri, 12 May 2023 00:00:00 +0000

We were delighted to engage with over 200 community members in our latest Community update calls. We aimed to present a diverse selection of highlights on our progress and discuss your questions about participating in the Research Nexus. For those who didn’t get a chance to join us, I’ll briefly summarise the content of the sessions here and I invite you to join the conversations on the Community Forum.

You can take a look at the slides here and the recordings of the calls are available here.

TL;DR

The membership is growing, including that in the GEM programme countries, and we focus on adding new Sponsors in areas where we have insufficient coverage to support prospective members
The grant registration form is available for funders who don’t use XML, and we’re working to expand to other record types
The preview of the Relationship API endpoint is available – start exploring relationships between different records and record types, from citations to funding, and more
Usefulness of metadata records for inferring integrity of the content or publisher relies on all members of the community contributing to this effort. Crossref will continue to enrich our schema to capture new types of relevant information and to promote the best metadata practices.
Cited-by is now open for everyone to use 🎉 – no need for additional authorisation steps – Registering your references will have even greater impact now!
The Labs participation report is available and it’s been a hit. Please note that this tool is still underdevelopment – new functionalities can be added but there might also be bugs that we are yet to resolve, so don’t hold off with feedback.
We’ve received close to 1,000 responses in our first ever Metadata Priorities Survey. It’s still open until 18th of May and we encourage all members to take it. So far we’ve learnt that majority of our respondents are keen to deposit as much metadata as possible – and some would like to register more than we currently enable.

Metadata completeness and integrity

A key theme of the call was encouraging greater participation in the Research Nexus and the importance of complete metadata. One particular benefit of a rich and transparent metadata network is the opportunity to infer judgments on the integrity of the scholarly record (ISR). Amanda Bartell, Head of Member Experience, highlighted that the community agrees that availability of information about relationships between research outputs, institutions and other elements of the scholarly ecosystem together provide essential context for deciding about trustworthiness of organisations and their published content. Conversely, it can make it harder for parties to pass off information as trustworthy when that context is missing. Amanda summarised community feedback related to Crossref’s role in the integrity of the scholarly record in her recent blog post.

Our members can contribute to that rich network of relationships by curating their metadata and providing contextual information – especially the highly sought for elements highlighted in the presentation.

Our community

Since LIVE22, we have had 1,130 new members join us. That includes 51 organisations from countries included in our Global Equitable Membership (GEM) programme. You can find out more in the latest news about the programme on our Community Forum from Susan Collins, Community Engagement Manager.

We see great opportunities with enriching our metadata corpus with works carried out in some of the least economically-advantaged regions of the world. Registering their content with us will increase its discoverability for the global scholarship, while adding important relationships into the Research Nexus. We’re glad at the new members joining us under the auspices of the Global Equitable Membership (GEM) programme and we’re reaching out to existing and new communities with our Ambassadors, to encourage more metadata registrations.

Our Sponsors and Ambassadors, alongside our Outreach and Membership Team, support members to participate as effectively as possible in the Research Nexus. We’re delighted to see both programmes growing, with eight new Sponsors and seven new Ambassadors having joined us since October.

Simultaneously, we’re working with like-minded organisations to provide useful resources for the growing and changing scholarly communications community. The recent launch of the online forum for new publishers seeking to learn about best practices in the industry, The PLACE, is another way in which we hope to support wider participation in the Research Nexus, and promote open and sustainable practices.

With our growing community, there’s always interest in We have planned a webinar later this month to provide an overview of Crossref – including the members benefits and obligations, and how to use our services.

Service news

References metadata is essential for connecting works with one another. It enables provision of citation information, aids discoverability for researchers, as well as assessment and evaluation for institutions and funders. It’s almost a year since all the references metadata deposited with Crossref has been made openly available. At the moment, 52.0% of journal articles, and 44.5% of all works have references. Martyn Rittman, Product Manager for the Cited-by service says “It’s not bad, but we can do better!”

With three different mechanisms for doing it available to our members, we hope that all have a suitable tool to fit with their needs. You can register references with XML via HTTPS POST (structured or unstructured), with the dedicated OJS Plugin if you’re an OJS user, or with our Simple Text Query (unstructured text) – this is especially relevant to the Web Deposit Form users. We find that journal articles with deposited references seem to be cited more than those without, and by a lot: 21.8 vs. 6.1 incoming citations on average!

We have now made our Cited-by service open to all. To realise its full benefit, it is essential to register your references.

There were concerns in the community about references ‘lost’ as part of supplementary material that may not be registered in its own right. Colleagues advised that if the data has an identifier, such as a DataCite DOI, you can add a relationship to say that it’s supplementary material (see https://www.crossref.org/documentation/schema-library/markup-guide-metadata-segments/relationships/) or add them as a reference. Martyn is curious to hear from others in the community on this topic. There is an increasing focus on data citations and we’d like to see how we can better support them.

Many members have questions related to plans for replacing Metadata Manager. Rachael Lammey, Director of Product, explained that we’re working on broadening our new Grant Registration Form to include more record types over the course of 2023. It has a few advantages over the current Web Deposit Form. It allows you to save a local copy once you first register a piece of content. It makes updating your records easier, as you can drop that file onto the form to add the metadata so that you can update it and redeposit rather than having to fill out the information all over again, and we have started adding automatic lookup fields to help users populate information on affiliations using ROR IDs more accurately. We will keep you posted on the progress with new developments and ask for beta testers for new record types as they are added.

Metadata information about individual work is not as useful as the opportunity to interrogate the relationships between works and within the global scholarly output. [The preview of the Relationship API endpoint](https://community.crossref.org/t/relationships-are-here/3523, modest as it is at this stage – with only 1% of our relationship metadata included (or 10 mln relationships) – offers a powerful demonstration of the way in which metadata contextualises research outputs within the entangled network of ever-progressing scholarship.

We’ve also mentioned the recent transition of our website to GitLab, which allows everyone to contribute by creating merge requests and issues. Through this open collaboration, which supports our commitment to meet the Principles of Open Scholarly Infrastructure, we aim to cultivate a sense of ownership among contributors and make our information and documentation more useful and efficient for everyone.

Labs participation report

For organisations who wish to keep a close eye on their metadata – to understand what they deposit, how that compares with other members, and what could be improved, can start using our Lab participation reports. We encourage you to test this not-yet-finished tool and let us know your feedback. Participants at our updates found it very informative, with the opportunity to preview contents of recent deposits, see the participation breakdowns by a prefix, and improved data visualisation. We had questions about how data citation counts are generated in the report. Martyn Rittman explained that: “This is a prototype and that’s one of the issues we need to tidy up! We know via Event Data and our Scholix endpoint what is a dataset, but that hasn’t yet been incorporated to the Labs Reports”. There was also a suggestion of enabling export of simple lists of all member’s DOIs with respective URLs from the report and the team might look into that. Yet, lists of DOIs missing specific metadata types are already downloadable.

To learn more about the reports, try them out, and to provide feedback, please take a look at the information shared recently by Paul Davis, Tech Support Specialist & R&D Support Analyst.

Metadata priorities

Patricia Feeney, Head of Metadata, shared some updates about the current metadata corpus registered with Crossref, and some recent trends.

She then went on to summarise some preliminary results of our ongoing metadata priorities survey, which all members are encouraged to take part in by 18th of May. So far, we’ve received close to 1,000 responses. We’ve learnt that majority of our respondents are keen to deposit as much metadata as possible – and some would like to register more than we currently enable. Close to a half of the respondents who did not express an interest in sharing all metadata are still interested to learn more about the value of their metadata.

She then went on to summarise some preliminary results of our ongoing metadata priorities survey, which all members are encouraged to take part in by 18th of May. So far, We’ve received close to 1,000 responses. We’ve learnt that majority of our respondents are keen to deposit as much metadata as possible – and some would like to register more than we currently enable. However, close to a half of the respondents are interested to learn more about the value of their metadata.

The survey consults our members about their preferences for developing any of the potential projects under consideration:

Contributor IDs
Contributor roles/ CRediT
Alternate names
Multilingual metadata
Expand abstract support
Citation types (content)
Conference event IDs

It appears that support for citation types is the strongest among our respondents, while very polarised views have been shared about multilingual metadata and expanding support for abstracts. Among other suggestions, we received a lot of comments related to keywords. Overall, support for all projects was strong.

The verdicts are not in yet – still time to respond to the survey and make your metadata priorities known!

Thank you and keep in touch

With much of the content shared ahead of the time through our Community Forum, the sessions were bubbling with questions and valuable comments from the community. We look forward to continuing the conversations asynchronously on the Community Forum. Please don’t hesitate to share your thoughts and ask further questions. We’d also love to hear suggestions for topics of the most interest for our future updates.

The more complete the metadata we collect together, the more connections in the ecosystem become transparent. This creates opportunities for discovery and collaborations, and greater insights about the scholarly process. Our community is growing in numbers, diversity, and technical capacity for building the Research Nexus together. We welcome your questions and suggestions of initiatives that support the fullest participation possible.

2023 public data file now available with new and improved retrieval options

Patrick Polischuk — Tue, 02 May 2023 00:00:00 +0000

We have some exciting news for fans of big batches of metadata: this year’s public data file is now available. Like in years past, we’ve wrapped up all of our metadata records into a single download for those who want to get started using all Crossref metadata records.

We’ve once again made this year’s public data file available via Academic Torrents, and in response to some feedback we’ve received from public data file users, we’ve taken a few additional steps to make accessing this 185 gb file a little easier.

First, we’re proactively hosting seeds in a few locations around the world to improve torrent download performance in terms of both speed and reliability.

And second, we’ve added an option to download this year’s public data file directly from Amazon S3 for a small transaction fee paid by the recipient, bypassing the need to use the torrent altogether. The fee just covers the AWS cost of the download. Instructions for downloading the public data file via the “Requester Pays” method are available on the “Tips for working with Crossref public data files and Plus snapshots” page.

The 2023 public data file features over 140 million metadata records deposited with Crossref through the end of March 2023, including over 76,000 grant records. Because Crossref metadata is always openly available, you can use our API to keep your local copy of our metadata corpus up to date with new and updated records.

In previous years, closed and limited references were removed from the public data file. Since we updated our membership terms to make all deposited references open in 2022, the 2023 public data file for the first time includes all references deposited with us.

We hope you find this public data file useful. Should you have any questions about how to access or use the file, please see the tips below, or bring your questions to our community forum.

Tips for using the torrent and retrieving incremental updates

Use the public data file if you want all Crossref metadata records. Everyone is welcome to the metadata, but it will be much faster for you and much easier on our APIs to get so many records in one file. Here are some tips on how to work with the file.
Use the REST API to incrementally add new and updated records once you have the initial file. Here is how to get started (and avoid getting blocked in your enthusiasm to use all this great metadata!).
While bibliographic metadata is generally required, because lots of metadata is optional, records will vary in quality and completeness.

Questions, comments, and feedback are welcome at support@crossref.org.

Similarity Check: look out for a refreshed interface and improvements for iThenticate v2 account administrators

Fabienne Michaud — Mon, 01 May 2023 00:00:00 +0000

In 2022, we flagged up some changes to Similarity Check, which were taking place in v2 of Turnitin’s iThenticate tool used by members participating in the service. We noted that further enhancements were planned, and want to highlight some changes that are coming very soon. These changes will affect functionality that is used by account administrators, and doesn’t affect the Similarity Reports themselves.

From Wednesday 3 May 2023, administrators of iThenticate v2 accounts will notice some changes to the interface and improvements to the Users, Groups, Integrations, Statistics and Paper Lookup sections.

Logging in

iThenticate v2 account administrators and browser users will see a new login page when logging in to iThenticate v2:

A refreshed interface

Once logged in to iThenticate v2, account administrators will see an updated design, with improved notifications to let them know whether a task/action has been successfully completed or not.

Users

There will be improvements to the user management system for account administrators, including a much clearer navigation menu for managing active, pending and deactivated users.

There will also be a filtering option on the Users page to search for active, pending and deactivated users by first name, last name, email address, group and date added. In addition coloured labels will be introduced to easily identify the level of access (or ‘Role’) for each user.

An improved bulk user import process will be available, with clearer guidance on any issues that may arise during the upload. This new development will also include new screens for adding and editing users with more notifications to help prevent mistakes.

Integrations

For account administrators managing peer review management system integrations and needing to generate API keys, the Integrations page will be improved to make copying API keys simpler.

Statistics

iThenticate v2 administrators will also notice some improvements to the Statistics page. Usage data should load faster and will be sortable by user group. They will also be able to generate large usage reports of over 100k submissions.

Paper lookup

The Paper lookup will allow iThenticate v2 account administrators to find submissions that have been made from any integration connected to their iThenticate v2 account. They can be found by searching the paper ID (or oid number) of the submission.

Please note: the ability to search for submissions by the user’s name is available for manuscripts submitted via the iThenticate v2 website only and not for papers submitted via an integration.

New password requirements

To improve the security of users’ accounts, new password requirements will be introduced, including a minimum of 8 symbols, 1 special symbol, 1 upper case letter, and 1 number.

Next in iThenticate v2

Turnitin, who produce iThenticate, are currently working on a number of new features and developments including an improved similarity report, paraphrase and AI writing detection. A detailed timeline is not yet available but we’ll be updating you on these new developments in the coming months.

✏️ Do get in touch via support@crossref.org if you have any questions about iThenticate v1 or v2 or start a discussion by commenting on this post below.

ISR part four: Working together as a community to preserve the integrity of the scholarly record

Amanda Bartell — Wed, 26 Apr 2023 00:00:00 +0000

We’ve been spending some time speaking to the community about our role in research integrity, and particularly the integrity of the scholarly record. In this blog, we’ll be sharing what we’ve discovered, and what we’ve been up to in this area.

We’ve discussed in our previous posts in the “Integrity of the Scholarly Record (ISR)” series that the infrastructure Crossref builds and operates (together with our partners and integrators) captures and preserves the scholarly record, making it openly available for humans and machines through metadata and relationships about all research activity. This Research Nexus makes it easier and faster for everyone involved in research performance, management, and communications to understand information in context and make decisions about the trustworthiness of organisations and their published research outputs. Conversely, it can make it harder for parties to pass off information as trustworthy when the information doesn’t include that context.

The community needs open scholarly infrastructure that can adapt to the changes in scholarly research and communications, and we’ve been changing and adapting already by building on the concept of the scholarly record with our vision:

Like others, we envision a rich and reusable open network of relationships connecting research organisations, people, things, and actions; a scholarly record that the global community can build on forever, for the benefit of society.

We don’t assess the quality of the work that our members register, and we keep the barriers to membership deliberately low to ensure that we are capturing as much of the scholarly record as possible and encouraging best practice. We are careful to talk about Crossref’s specific role being with the Integrity of the Scholarly Record (ISR), and not the broader area of ‘research integrity’ (i.e. the integrity of the research process or content itself).

But there are many challenges and threats to research integrity and the integrity of the scholarly record, and there are tradeoffs with keeping the barriers to membership low. With that in mind, we have been dedicating more time to speaking with the community to explore what part we are and should in future play to help the community assess and improve trustworthiness in the scholarly record. We also want to work out where we can make use of our neutral, central role to convene different groups in scholarly communications to work together on these challenges.

A revealing afternoon in Frankfurt

Our starting point was a roundtable discussion in Frankfurt in October 2022. We organized it to coincide with the Frankfurt Book Fair, but the invited participants were from a wider spectrum than just publishers. The 40 invited participants represented editors, funders, research integrity professionals at publishers, representatives of ministries of science, and other partner organisations such as OASPA, COPE, STEM and DOAJ.

This half-day session enabled us to sense-check our thinking with the community and get input into whether our position is the best one for their needs.

Ed Pentz introduced the session by reminding participants that integrity is key to Crossref’s mission and is the basis of the shared Research Nexus vision. Amanda (that’s me) talked through our current membership processes, recent membership trends, and why wider participation is key and also the sort of questions the community comes to Crossref to solve (eg title ownership disputes). And finally, Ginny Hendricks talked through the specific services and metadata that Crossref has already developed to support the community as signals of trustworthiness, and introduced some new activities and ideas.

You can check out the slide deck and for more background, read our previous posts in the ISR series.

Participants then split into small groups representing a mix of communities, and we asked them to discuss three key questions:

Is Crossref’s role what you expected? What surprised you? What are we missing?
Are you aware of Crossref services? What are the barriers to more uptake? What are the challenges and opportunities?
What more could Crossref or its members do?

After discussion, each small group fed back to the room, and we followed up with a whole group discussion, before ending the day with a post-it note exercise for what Crossref should start doing, stop doing, and continue doing.

Here’s what we learned.

The importance of whole community involvement in research integrity and ISR

The need for all parts of the community to come together to solve the problems of research integrity came through loud and clear - there is no single group that can solve this problem on its own.

Publishers expressed frustration that responsibility for research integrity has been placed seemingly solely in their hands when institutions and funders can “unwittingly incentivise bad behaviour”. But it was clear that funders are just as concerned with research integrity issues, with many having made a dedicated trip for the roundtable. There were comments that bringing publishers and funders together around these issues was a rare but important opportunity, and there were calls for this to be an annual event. Both funders and publishers called for more involvement from and inclusion of research institutions in the discussion.

The group agreed that Crossref’s main focus should continue to be capturing and sharing the scholarly record, and that metadata and relationships are key for attribution, evidence, and provenance. One participant commented that “you can’t make open science work unless the metadata is complete” and that this would only happen with efforts throughout the community. Accurate and complete metadata needs to be:

pushed for by funders and institutions (through advocacy and policy)
provided by the authors and other contributors
collated, curated, and registered by the publishers and repositories
collected, matched, (sometimes cleansed), and distributed by Crossref.
(and we would add “prioritised by all who want to support open infrastructure over commercial alternatives”)

Interestingly, this echoes the ‘metadata personas’ output of the Metadata 20/20 initiative which defined roles in the community’s collective metadata effort:

Metadata Creators: providing descriptive information (metadata) about research and scholarly objects.
Metadata Curators: classifying, normalising, and standardising this descriptive information to increase its value as a resource.
Metadata Custodians: storing and maintaining this descriptive information and making it available for consumers.
Metadata Consumers: knowingly or unknowingly using the descriptive information to find, discover, connect, cite, and assess research objects.

Importance of whole-publisher involvement

A few participants, particularly those in editorial or integrity roles at publishing organisations, had not previously made the connection that metadata could be important signals of integrity. This highlighted a key problem - working with Crossref is seen by publishers as a technical/production workflow issue, and so knowledge of the benefits of metadata can be siloed within those teams. Crossref needs to reach out to editorial and research integrity teams to explain that good metadata isn’t just an end in itself and reinforce the impact it has on research integrity. This buy-in from across publisher organisations is vital.

We’re currently recruiting a Community Engagement Manager with editorial or research integrity experience to dedicate time to this area, to advocate for richer metadata within the editorial community, and progress this important conversation.

Agreement of the importance of metadata but an acknowledgment that this brings extra cost

Most participants agreed that rich metadata and relationships provide a core tool in establishing and protecting integrity. But they also acknowledged that collecting and registering more metadata often comes with an extra cost - whether that’s from system changes or just extra staff time. This is particularly true where publishers are working with third-party platforms and suppliers where there may be additional costs for adding fields and functionality to collect more metadata and register it with Crossref. Where knowledge of metadata is siloed in technical and production teams, and the wider benefits aren’t acknowledged, it can be hard to get internal buy-in for these extra costs and efforts.

The Frankfurt group also pointed out that the benefits of more comprehensive metadata (and what this means for ISR) are spread across the research ecosystem, but it is the publisher that usually bears the costs.

Need to define which metadata elements are trust signals and make it easier for the community to provide and access them

Through the course of the discussion, various elements were determined to be important to capture as “trust signals” and to identify relationships such as for retractions, conferences, reviewers, data, and when Crossref membership has been revoked for cause. We need to spend time identifying and prioritising these so that our members can do the same.

We need to make it easier for smaller, less technically-resourced members to provide this metadata, both through our tools and our documentation, as “doing this work can be very geeky and the documentation isn’t easy to understand as a layperson”.

There was also a discussion about where the metadata comes from - should community members be able to contribute metadata and assertions to other members’ records? If the provenance is captured then yes.

Once the metadata is captured, there remain challenges for users in where to start with the 145 million Crossref records. The groups asked Crossref to make it easier for community members to understand and use these records to make informed decisions, including by creating and sharing sample queries, libraries, and case studies.

We’re currently recruiting a Technical Community Manager to help improve the support we provide in this area to API users, service providers, and other metadata integrators .

The importance of retractions/corrections information

There was a lot of discussion about retractions and their importance as trust indicators. The group was surprised by how few retractions are currently registered with Crossref through Crossmark (12k). There was a lot of discussion around why Crossmark isn’t currently being adopted, and interest in taking this forward.

This needs to be a focus for Crossref, to encourage members to register retractions, corrections, and updates, and to make it easier for smaller publishers. There are new and emerging publishers who really want tools to help them demonstrate the legitimacy of their research, and an easy way for them to record corrections and retractions is key.

In their paper Towards a connected and dynamic scholarly record of updates, corrections, and retractions (September 17th, 2022), Ginny Hendricks, Rachael Lammey, and Martyn Rittman discuss how retraction information could be more effectively used - for example, letting a preprint reader know that the resulting article has been retracted, or letting the author of an article know the data that they’ve based their work on has been withdrawn.

Collecting the information is just the start - cascading retraction information throughout the research ecosystem is the main goal, and Crossref plays a central role here. As noted in the Information Quality Lab’s project Reducing the inadvertent spread of retracted science: Shaping a research and implementation agenda, “Many retracted papers are not marked as retracted on publisher and aggregator sites, and retracted articles may still be found in readers’ PDF libraries, including in reference management systems such as Zotero, EndNote, and Mendeley”.

It’s particularly important that this information is fed back to funders and institutions, and the group discussed having push notifications to these audiences for retractions. Some funders even employ staff members whose main purpose is to identify retractions.

It was pointed out that there may be good sources of retraction information (such as Retraction Watch) that Crossref could incorporate and match in our metadata.

Gaps in ‘ownership’, and Crossref’s role

The group discussed the many gaps in ownership for elements of research integrity, and some groups wondered if Crossref should actually change our approach and take on more responsibility for vetting content. However, after discussion, the group mostly agreed that this would mean a change of mission (and more staff) for Crossref and potentially limit global participation, thus making the metadata corpus less useful. Crossref should provide the widest possible metadata in an easy-to-consume format, and “other organisations can provide the verification layer”.

It was acknowledged that it would be easy for Crossref to get overwhelmed, so we ended the day by discussing not only what we should start doing, but also what we should stop doing. Unsurprisingly, there was a lot more to continue or start doing than stop doing!

However, the fact remains that there are gaps in ownership - for example, there is no central arbiter of who ‘owns’ a journal. Also, where do you go if you have a problem with a journal? Often the Committee on Publication Ethics (COPE) is seen as a solution, but they can’t solve this problem alone - it needs a coordinated effort from funders, institutions, publishers, and other partner organisations such as the Open Access Scholarly Publishing Association (OASPA), the Directory of Opena Access Journals (DOAJ), and like-minded organisations.

Many noted that Crossref is well-positioned to convene horizontal multi-stakeholder discussions to start to find solutions.

We also know that there are other industry initiatives aimed at supporting this work. The STM Association’s work on an Integrity Hub is gathering pace and aims to provide, among other things ‘a cloud-based environment for publishers to check submitted articles for research integrity issues’.

What happened next? Turns out, it really is all about relationships…

Since this meeting in Frankfurt last October, we’ve been focusing on relationships - thinking about how we capture them in our metadata, and working in partnership with other organisations to bolster our support for ISR.

The rest of this blog post highlights some of the activities underway:

Increasing participation in Crossref

In January 2023, we launched our new GEM Program, which offers relief from fees for members in the least economically-advantaged countries in the world. By opening up participation even further, we aim to extend the corpus of open metadata, giving opportunities for more connections, more context, and more relationships.

Supporting members in meeting best practices

ISR blog 2 explained more about how we help new members become “good Crossref citizens” with automated onboarding emails, extensive documentation, events and webinars, and help from our support team, Ambassadors, and other members in our Community Forum.

We’ve recently joined forces with COPE, DOAJ, and OASPA to create a new online public forum for organisations interested in adopting best practices in scholarly publishing. At the Publishers Learning And Community Exchange or The PLACE, new scholarly publishers can access information from multiple agencies in one place, ask questions of the experts, and join conversations with each other. Do take a look!

Being clearer on the impact of better metadata

As discussed earlier, better metadata can sometimes bring extra costs, and it’s helpful to understand the impact of this investment. We know from our ongoing outreach work that it’s difficult for our members to keep hearing that Crossref needs more and better metadata. They ask us for resources and increasingly want to see hard evidence of benefits to them. We recently showcased the journey of the American Society for Microbiology which went from ‘zero to hero’ in terms of metadata participation and completeness in Crossref. They describe their efforts to increase their registered metadata over the last few years, and note a significant increase in their average monthly successful DOI resolutions from ~390,000 in 2015 to an average of ~3.7 million in 2022. They found that “the more metadata we push out into the ecosystem, the more it appears to be used… Remembering that your publishing program benefits as much as everyone else’s when you deposit more metadata can help refine your short-term and long-term priorities.”

We know we sound like a broken record sometimes, but now other members can take it from ASM!

Encouraging better metadata and more relationships and identifying ’trust signals'

We’re trying to make it easier for members to accurately register key metadata fields, with the launch of our new grants registration form which will be extended to journals and other record types soon. This includes a ROR lookup - adding this unique identifier for research organisations gives even better context for the metadata.

We are also working to make it possible for anyone to contribute to metadata records, and have the provenance of these contributions clearly asserted.

Metadata adoption is still a key goal for our staff; indeed our new 2023-2025 strategic roadmap specifies…

“We want to be a sustainable source of complete, open, and global scholarly metadata and relationships. We are working towards this vision of a ‘Research Nexus’ by demonstrating the value of richer and connected open metadata, incentivising people to meet best practices, while making it easier to do so.”

… with item number one under projects ‘in focus’, being: “Adoption activities to focus on top metadata adoption priorities, which are:

We’re continuing to talk with the community to work out which metadata elements are most useful as trust signals, and we’re trying to prioritise some of the schema changes required to capture new elements. If you haven’t already, please respond to Patricia Feeney’s metadata priorities survey.

Thinking about retractions and corrections

We’ve been closely involved with the NISO CREC working group, and they should be making the initial draft recommendations public soon - watch this space!

Making it easier to view and compare metadata and expand the relationships

Our Participation Reports provide a visualisation of the metadata that’s available via our free REST API. There’s a separate Participation Report for each member, and it shows what percentage of that member’s content includes nine key metadata elements. It’s an important tool to help those in the community understand our metadata more easily.

We have been working on a new version of Participation Reports, allowing more comparison between members, and extra metadata elements to communicate trustworthiness, including whether each member has thought about the long-term preservation of their content, and whether it has been added to a repository. There is a test version to look at in our Labs sandbox. Do take a look and provide feedback.

We’ve also made public our list of members whose membership was revoked for contravention of the membership terms.

Continuing to work with funders

We’re continuing to work with funders through our growing funder membership, the Funder Advisory Group and other groups, including the Open Research Funders Group, the HRA, Altum, Europe PMC, and the ORCID Funder Interest Group. And we’re continuing to build the important relationships between funding and outputs (see Dominika Tkaczyk’s recent report) and engage with this key audience for research integrity.

Discussions with the community

We’ll be talking about ISR at our next community update on May 3rd - there are two versions of the meeting depending on your timezone - do sign up if you haven’t already. And if you’re attending the SSP conference in June, do come along to our panel “Working together to preserve the integrity of the scholarly record in a transparent and trustworthy way”.

We’re hiring! New technical, community, and membership roles at Crossref

Michelle Cancel — Fri, 21 Apr 2023 00:00:00 +0000

Do you want to help make research communications better in all corners of the globe? Come and join the world of nonprofit open infrastructure and be part of improving the creation and sharing of knowledge.

We are recruiting for three new staff positions, all new roles and all fully remote and flexible. See below for more about our ethos and what it’s like working at Crossref.

🚀 Technical Community Manager, working with our ‘integrators’ so all repository/publishing platforms and plugins, all API users incl. managing contracts with subscribers, and generally helping a very nice bunch of RESTful API dabblers, both novice and intermediate. The goal is to offer more interactive engagement such as sprints, and more technical consultation to help the community with things like query efficiency, public data dump ingestion, etc. Thousands of users exist, from individual researchers and small academic tools to giant technology companies. Researching and analysing usage and building tools to meet their needs is key, so this role works closely with Product and R&D colleagues and likely needs a developer or developer-advocacy background.

🎯 Member Experience Manager, ramping up to handle the mammoth operation that is… membership, currently 18,000 members from 150 countries, and onboarding the ~180 new joiners we welcome monthly, mostly from Africa and Asia. This role involves lots of education and relationship management, but because of the scale, we also need someone with a real business process/analysis approach, improving how our systems function so that the operation flows seamlessly and isn’t a pain for people (both members and staff). This role manages two full-time Member Support Specialists (UK and Indonesia) and three part-time contractors (USA, France, and one other as yet unknown).

🎈 Community Engagement Manager, working with the global community of scholarly editors at a time when research integrity is top of mind for our entire ecosystem. This is a classic community role for someone keen to cross over from managing or editing journals or books and perhaps make your volunteer work official. Activities will include program and project management, event and working group facilitation, communications and content creation. You’d be interacting with groups like the Asian Council of Science Editors, the European Association of Science Editors, and the Council of Science Editors, plus many more that you’d identify. It’s all about helping editors, who work hand-in-hand with authors, to think about metadata as signals of trust and better use available services, such as those for retraction management or plagiarism checking, and helping to define needs for emerging activity too, such as machine-generated content.

Working at Crossref

We’re a not-for-profit membership organisation that exists to make scholarly communications better. We rally the community; tag and share metadata; run an open infrastructure; play with technology; and make tools and services—all to help put research in context.

Crossref sits at the heart of the global exchange of research information, and our job is to make it possible—and easier—to find, cite, link, assess, and reuse research, from journals and books, to preprints, data, and grants. Through partnerships and collaborations we engage with members in 150 countries (and counting) and it’s very important to us to nurture that community.

We’re about 45 staff and remote-first. This means that we support our teams working asynchronously and with flexible hours. We are dedicated to an open and fair research ecosystem and that’s reflected in our ethos and staff culture. We like to work hard but we have fun too! We take a creative, iterative approach to our projects, and believe that all team members can enrich the culture and performance of our whole organisation. Check out the organisation chart.

We are active supporters of ongoing professional development opportunities and promote self-learning at every opportunity. Crossref has a healthy financial situation and we only continue to grow. While we won’t have a clear hierarchical path for staff to follow, there are always evolving opportunities to progress and be challenged.

We especially encourage applications from people with backgrounds historically under-represented in research and scholarly communications.

Bookmark our jobs page to watch for future opportunities!

The PLACE for new publishers – a one-stop-shop for information and a friendly community

Kornelia Korzec — Mon, 17 Apr 2023 00:00:00 +0000

The Publishers Learning And Community Exchange (PLACE) at theplace.discourse.group is a new online public forum created for organisations interested in adopting best practices in scholarly publishing. New scholarly publishers can access information from multiple agencies in one place, ask questions of the experts and join conversations with each other.

Scholarly publishing is an interesting niche of an industry – it appears at the same time ancillary and necessary to the practice and development of scholarship itself. The sooner and more easily a piece of academic work is shared, the greater the chance that others will find and build upon it. Many practices of the publishing industry have been developed to support discovery and integrity of the scholarship that produces shareable works, and as the landscape of scholarly communications constantly evolves, a number of agencies arose to promote and continuously update the standards and best practices within it.

We realise that the sheer number of agencies involved in regulating and preserving scholarly content is in itself a challenge and can be confusing. Newer publishers may find it difficult to know where to go to find the right information, what policies they need to follow or international criteria they need to meet and how to go about doing so. When time or finances are tight, it’s not easy to try to reinvent the wheel.

Following the long-established practice of signposting organisations between us, we’ve worked together with the Committee on Publication Ethics (COPE), the Directory of Open Access Journals (DOAJ), and the Open Access Scholarly Publishers Association (OASPA) to establish the PLACE. We share values and goals to work more effectively to better support the needs of our communities. Each organisation is taking actions to lower barriers to participation and provide greater support for the organisations that publish scholarly and professional content that we work with.

Hence, we envisaged the PLACE as a ‘one stop shop’ for access to more consolidated and plainly put information, to support publishers in adopting best practices the industry developed. We also hope that by setting the information service as a forum, we will encourage open exchange with publishers who aspire to do things right, as well as between them.

Renewed Persistence

Joe Wass — Sat, 01 Apr 2023 00:00:00 +0000

We believe in Persistent Identifiers. We believe in defence in depth. Today we’re excited to announce an upgrade to our data resilience strategy.

Defence in depth means layers of security and resilience, and that means layers of backups. For some years now, our last line of defence has been a reliable, tried-and-tested technology. One that’s been around for a while. Yes, I’m talking about the humble 5¼ inch floppy disk.

This may come as surprise to some. When things go well, you’re probably never aware of them. In day to day use, the only time a typical Crossref user sees a floppy disk is when they click ‘save’ (yes, some journals still require submissions in Microsoft Word).

History

But why?

Let me take you back to the early days of Crossref. The technology scene was different. This data was too important to trust to new and unproven technologies like Zip disks, CD-Rs or USB Thumb Drives. So we started with punched cards.

IBM 5081-style punched card.

Punched cards are reliable and durable as long as you don’t fold, spindle or mutilate them. But even in 2001 we knew that punched cards’ days were numbered. The capacity of 80 characters kept DOIs short. Translating DOIs into EBCDIC made ASCII a challenge, let alone SICIs. We kept a close eye on the nascent Unicode.

Breathing Room

In 2017 the change of DOI display guidelines from http://dx.doi.org to https://doi.org shortened each DOI by 2 characters, buying us some time. But eventually we knew we had to upgrade to something more modern.

So we migrated to 5¼ inch floppy disks.

5¼ Floppy disk in drive

At 640 KB per disk these were a huge improvement. We could fit around 20,000 DOIs on one floppy. Today we only need around 10,000 floppy disks to store all of our DOIs (not the metadata, just the DOIs). Surprisingly this only takes about 20 metres of shelf space to store.

Typical work from home setup. Getting ready to backup some DOIs!

The move to working-from-home brought an unexpected benefit. Staff mail floppy disks to each other and keep them in constant rotation, which produces a distributed fault tolerant system.

Persistence Means Change

But it can’t last forever. DOIs registration shows no sign of slowing down. It’s clear we need a new, compact storage medium. So, after months of research, we’ve invested in new equipment.

Today we announce our migration to 3½ inch floppies.

If it goes to plan you won’t even notice the change.

Image credits

Punched card: IBM 5081-style punched card. Derived from public domain by Gwern.

Start citing data now. Not later

Geoffrey Bilder — Thu, 23 Mar 2023 00:00:00 +0000

Recording data citations supports data reuse and aids research integrity and reproducibility. Crossref makes it easy for our members to submit data citations to support the scholarly record.

TL;DR

Citations are essential/core metadata that all members should submit for all articles, conference proceedings, preprints, and books. Submitting data citations to Crossref has long been possible. And it’s easy, you just need to:

Include data citations in the references section as you would for any other citation
Include a DOI or other persistent identifier for the data if it is available - just as you would for any other citation
Submit the references to Crossref through the content registration process as you would for any other record

And your data citations will flow through all the normal processes that Crossref applies to citations. And it will be distributed openly to the community (including DataCite!) via Crossref’s services and APIs. All data citations deposited with Crossref will be exposed in the (soon-to-be launched) Data Citation Corpus.

And then, you can sit back and congratulate yourself for making your publication more useful to researchers who want to be able to reuse the data underlying your publications.

Background

You might ask, “So if submitting Data Citations to Crossref has long been possible, why do you have to write this?”

Historically, authors did not cite data in the way they cited publications. Instead, they would often refer to the data in the main text of the article. This has made it hard to determine what data lay behind the research and/or access the data.

But the research community has increasingly recognized that data is a first-class research output and that we should treat it as such. In short, we should formally cite data.

But because citing data is a comparatively new practice, it has been subject to a lot of new analysis. And unsurprisingly, people analyzing data citation have discovered that there is a lot of nuance to citation of any kind.

There are lots of reasons for citing something. There are lots of internalized conventions for citing things. And there are different conventions for citation for different research objects. And SSH citation practice differs from STEM. And legal citation practices are different from scholarly citation practices. And citation practices even vary by subdiscipline and by journal.

Those who have been looking at what it means to “cite data” have naturally stumbled into a thicket of divergent practices - some of which are historical holdovers, some of which are stylistic preferences, and some of which are clearly adaptations to deal with the specific needs of certain research objects/containers or different disciplines.

The temptation has been to try and rationalize this before extending the practice of citation to data.

“Maybe because data is a distinct record type, we should include the fact that it is a data citation in the citation itself?”

“Maybe because people cite data for different reasons, we should include a typology of citation types in all data citations?”

And so you may hear some people say, “hold off on data citation - we don’t have an optimal way to do it yet, and it can be very complicated.”

But guess what?

We currently don’t label citations to monographs as “citation to monograph.”

And we don’t currently include the reason for citation when we are citing a journal article.

It would be very cool if we did. And it would likely make citations even more useful if we did.

But citations are already useful even without these features. And so, to delay citing data indefinitely because we have an opportunity to improve the act of citation is just perverse. Our community has always opted for progress over perfection.

For one thing - the efforts are not mutually exclusive. We can start citing data with the current limitations of citation practices and simultaneously propose mechanisms for making citation more useful in the future, including new guidelines to deal with the unique issues that citing data poses.

But in the meantime, we will be doing researchers a giant favour if we at least include our imperfect and ambiguous, and unconventional references to data in the references section of an article so that they can be accessed and processed along with all the other imperfect, ambiguous and variant citations that we find so useful.

Some of our members are already doing this. They have been for a long time. And they haven’t found it any more complicated than managing non-data references in the past.

Join them and make your metadata more useful.

Cite data now. Don’t put it off.

And Crossref will continue to work with DataCite and the rest of the community to make the distribution even easier and more useful.

So who is already citing data?

Top 10 members depositing data citations from November-May 2022

(broken down by DOI prefix, which is why you see some publishers listed twice):

Prefix	Member name	Data citations deposited
10.1038	Springer Science and Business Media LLC	7174
10.1016	Elsevier BV	6527
10.1007	Springer Science and Business Media LLC	4748
10.5194	Copernicus GmbH	3017
10.1080	Informa UK Limited	2346
10.1177	SAGE Publications	2082
10.1002	Wiley	2048
10.1111	Wiley	1888
10.1108	Emerald	1876
10.3390	MDPI AG	1827

Top 10 data citations per deposited work

(again, broken down by prefix)

Member name	Prefix	Data citations deposited	Data citations per work
Consortium Erudit	10.7202	580	1.149
SLACK, Inc.	10.3928	462	0.646
S. Karger AG	10.1159	1653	0.532
Proceedings of the National Academy of Sciences	10.1073	973	0.502
American Academy of Pediatrics (AAP)	10.1542	486	0.397
F1000 Research Ltd	10.12688	552	0.341
American Association for the Advancement of Science (AAAS)	10.1126	952	0.317
Springer Science and Business Media LLC	10.1038	7174	0.231
JMIR Publications Inc.	10.2196	864	0.187
American Geophysical Union (AGU)	10.1029	692	0.166

These are for the prefixes with the most data citations deposited (>500 in 6 months) so there might be smaller members doing better than this.

Summaries are great, but I want to see some actual examples!

Here are some examples showing how data is cited by our members:

This eLife article: https://doi.org/10.7554/eLife.26410 cites this dataset in Dryad https://doi.org/10.5061/dryad.854j2.
This Copernicus article: https://doi.org/10.5194/acp-22-7105-2022 cite to this dataset https://doi.org/10.24381/cds.bd0915c6
This Sciendo article: https://doi.org/10.2478/plc-2021-0008 cites this APA-hosted language competence test https://doi.org/10.1037/t15159-000
This De Gruyter article: https://doi.org/10.1515/opth-2020-0160 cites this bibliography at Oxford Bibliographies: https://doi.org/10.1093/OBO/9780195396584-0012

And here are some example API requests for discovering more metadata citations. You can use these API requests as examples and adapt to your own needs.

Shooting for the stars – ASM’s journey towards complete metadata

Kornelia Korzec — Tue, 14 Mar 2023 00:00:00 +0000

At Crossref, we care a lot about the completeness and quality of metadata. Gathering robust metadata from across the global network of scholarly communication is essential for effective co-creation of the research nexus and making the inner workings of academia traceable and transparent. We invest time in community initiatives such as Metadata 20/20 and Better Together webinars. We encourage members to take time to look up their participation reports, and our team can support you if you’re looking to understand and improve any aspects of metadata coverage of your content.

In 2022, we have observed with delight the growth of one of our members from basic coverage of their publications to over 90% in most areas, and no less than 70% of the corpus is covered by all key types of metadata Crossref enables (see their own participation report for details). Here, Deborah Plavin and David Haber share the story of ASM’s success and lessons learnt along the way.

Could you introduce your organisation?

The American Society for Microbiology publishes 16 peer-reviewed journals advancing the microbial sciences, from food microbiology, to genomics and the microbiome, comprising 14% of all microbiology articles. Six of those are open-access journals, and 56% of ASM’s published papers are open access. Together, our journals contribute 25% of all microbiology citations.

Would you tell us a little more about yourselves?

DH: David Haber, Publishing Operations Director at the American Society for Microbiology. I live in a century-old house that is in a perpetual state of renovation due to my inability to stop starting new projects before I complete old ones.

DP: Deborah Plavin, Digital Publishing Manager at the American Society for Microbiology. Following David’s example, my apartment in Washington D.C. is just up the block from one of the homes Duke Ellington lived in https://www.hmdb.org/m.asp?m=142334.

What value do society publishers in general see in metadata in your view?

DP: In my view, robust metadata allows publishers to look at changes over time, do comparative analysis within and across research areas, more easily identify trends, and plan for future analysis (e.g., if we deposit data citation information and we change our processes to make it more straightforward, do we see any change in the percentage of articles that include that information, etc.).

DH: To echo Deborah’s point, to be able to name something distinctly and clearly identify its specific attributes is vital to understanding past research and planning for future possibilities. One of our fundamental roles as a publisher for a non-profit society is to properly lay this metadata foundation so that we can provide services and new venues for our members, authors, and readers that match their needs and track with the trends in research. Without good and robust metadata, it is impossible to truly understand the direction in which our community is pointing us.

Metadata for your own research outputs in the last year has grown rapidly. Why such focus on metadata in 2022?

DP: This is something that ASM has been chipping away at over time. Years ago we found that it wasn’t always easy to take advantage of deposits that included new kinds of metadata. That was either because we needed to work out how and where to capture it in the process or because platform providers weren’t always ready — coming up with ways to process the XML that publishers supply in many different ways takes time. These back-end processes that feed the infrastructure aren’t usually of great interest to stakeholders, and so it allowed us to play around, flounder, fail, refine, and try again.

We looked at having 3rd parties deposit metadata for us, and while that helped expand the kind of metadata we were delivering, it created workflow challenges of its own. What turned out to be most effective was budgeting for content cleanup projects and depositing updated and more robust metadata to Crossref.

We also benefited from a platform migration, which allowed us to take advantage of additional resources during that process.

DH: Coming from a production background, I have always been fascinated with the when and how of capturing key metadata during the publishing process. When are those data good and valuable, and when should they be tossed or cleaned up for downstream deliveries? Because Deborah and ASM directors saw a more complete Crossref metadata set for our corpus as a truly valuable target, we were able to really think hard about what kind of data we were capturing and when, how those requirements may have influenced our various policies and copyediting requirements over the years, and how best to re-engineer our processes with the goal of good metadata capture throughout our publishing workflows. From our perspective, Crossref gave us a target, a “this-is-cool-bit-of-info" that Crossref can collect in a deposit; therefore, how can we capture that during our processes while driving further efficiencies? ASM journals had been so driven by legacy print workflows that such a change in perspective (toward metadata as a publishing object) really allowed us to re-imagine almost everything we do as a publisher.

Has the OSTP memo influenced your effort?

DP: I think that the Nelson memo hasn’t changed our focus; instead, I think it’s been another data point supporting our efforts and work in this area.

DH: Deborah is exactly right. The release of this memo only re-affirmed our commitment to creating complete and rich metadata. The Nelson memo points to many possible paths forward, in terms of both Open Access and Open Science, but we feel our work on improving our metadata outputs positions us well to pick a path that best suits our goals as a non-profit society publisher.

How big was this effort? Could you draw us a picture of how many colleagues or parts of the organisation were involved? Did you involve any external stakeholders, such as authors, editors, or others?

DH: It was simple. Took five minutes… In all seriousness, the key is having the support of the organisation as a whole. To do this properly, it is vitally important to know the end from the beginning, so to speak. It is one thing to say let’s start capturing ORCID IDs and deliver them to Crossref, but it is completely another to create a cohesive process in which those IDs are authenticated and validated throughout the workflow. So something as simple as a statement “ORCID IDs seem cool, let’s try to capture them” could affect how researchers submit files, how reviewers log into various systems (i.e., ORCID as SSO), how data are passed to production vendors, what copyeditors and XML QC people need to be focused on, and what integrations authors may expect at the time of publication. Being part of an organisation that embraced such change allowed us to proceed with care with each improvement to the metadata we made.

But that is more about incremental improvement. The beginning of this process started when we were making upgrades to our online publishing platform, and we were trying to figure out how best to get DOIs registered for our older content. When we started looking at this, we soon realized that, sure, we could do the bare minimum and just assign DOIs to this older content outside the source XML/SGML, but did that make sense? Wouldn’t it make more sense, especially since we were updating the corpus to a new DTD, to populate the source content with these newly assigned DOIs? Once we decided that we were going to revise the older content with DOIs, it made sense for us to create a custom XSL transform routine to generate Crossref deposits that would capture as much metadata as possible. So, working with a vendor to clean and update our content for one project (an online platform update) allowed us also to make massive improvements to our Crossref metadata as a side benefit.

Of course, I do have to apologize to the STM community for the Crossref outages in late 2019. That was just me depositing thousands of records in batches one sleepless night.

What were the key challenges you encountered in this project, and how did you overcome them?

DH: Resources and time are always an issue. Much of the work was done in-house in spare moments captured here and there. But there are great resources in github and at Crossref to help focus on defining what is important and what is possible in such a project. And, honestly, defining what was important and weighing that against the effort to find said important bit in the corpus of articles we have was the most challenging part of this process. In other words, limiting the focus. Once one decides to start looking at the inconsistencies in older content, it is hard not to say: “Oh, look. That semi-important footnote was treated as a generic author note rather than a conflict-of-interest statement; let’s fix that.” Once you start down that path, you can spend years fiddling with stuff. For me, a key mantra was: “We now have access to the content. We can always do another Crossref metadata update if things change or shift over time.”

Have there been any important milestones along the way you were able to celebrate? Or any set-backs you had to resolve in the process?

DP: For as long as I can remember, the importance of good metadata has been among the loudest messages of best practice in the industry. I don’t think that I have been able to really quantify/ demonstrate the value of that work. Looking at the consistent increases in the Crossref monthly resolution reports that we saw between 2015 and 2022 and looking at our participation reports has helped provide some measure of progress. For example, the number of average monthly successful resolutions in that Crossref report in 2015 was ~390,000. The last time I checked, the 2022 numbers were ~ 3.7 million. In 2023, I hope that we will be able to leverage Event Data for this as well.

The setbacks have fallen into two categories: timing and process. Our internal resourcing to get this done within our preferred time frame, to have the content loaded and delivered, and triage problems—it’s a battle between the calendar and competing priorities.

DH: When Deborah first shared those stats with me, I was floored. I don’t think either of us suspected such an increase was possible. For me, the biggest setback was mistakenly sending about ~50,000 DOI records to queue and watching them all fail because I grabbed the wrong batch. Ooops. I never made that mistake again, though.

Was any specific type of metadata or any part of the schema particularly easy or particularly difficult to get right in ASM’s production process?

DH: For us, the most difficult piece of metadata revolves around data availability and how we capture linked data resources (outside of data citation resources). Because of our current editorial style (which had been print-centric for years), we did not do a good job of identifying whether there are data associated with published content in a consistent machine-readable way. We did some experiments with one of our journals to capture this outside of our normal Crossref deposit routine, but that was not as accurate or sustainable as we would have liked. But, in that experiment, we learned a few things about how we treat these data throughout our publishing process and we have plans to create a sustainable integrated workflow for this to capture resource/data linkages in our Crossref deposits.

What were your thoughts on last year’s move to open references metadata? Has that impacted on your project in any way?

DP: We were really excited about this; based on the rather limited approach to sorting out impact at the moment, the more metadata we push out into the ecosystem, the more it appears to be used. In my view, that is at the core of what society publishers want to do—ensure that research is accessible and discoverable wherever our users expect to find it.

DH: 100% agree.

How did you keep motivated and on-course throughout?

DP: These kinds of things are never done; for example, we have placeholders for CRediT roles, and getting ready for that work as part of a DTD migration will be the next big thing. The motivation for that is really meeting our commitment to the community, seeing the impact of the author metadata versus article metadata, and seeing what we can learn.

DH: Metadata at its core is one of the pillars of our service as a publisher. To provide the best service, we need to provide the best metadata possible. Just remembering that this can be incremental, allows us to celebrate the large moments and the small. And whether one is partying with a massive 7 layer cake or a smaller cake pop, both are sweet and motivating.

Now that the project is completed, are you seeing the benefits you were hoping to achieve?

DP: This is a hard one to answer as we are using limited measurements at this time. At a high level, I am pleased. While I am eager to leverage event data in the coming year, it would be really helpful to get feedback from the community on how we can improve as well as other ways to evaluate impact.

DH: I want to take up this idea of metadata as a service once more. I don’t mean in terms of discoverability or searchability, either. Let’s take ORCID deposited into Crossref as an example. When done properly (with the proper authentication and validation occurring in the background), we are able to integrate citation data directly to an author’s ORCID profile. We have found that this small service is really appreciated.

Is there any metadata that you’d like to be able to include with your publishing records in the future that isn’t possible currently? What would it be and why?

DP: CRediT roles would be great because it could give greater insight into collaboration within and across disciplines, it could allow for some automation and integration opportunities in the peer review process, and maybe it would visualize aspects of authors’ careers.

DH: I second capturing CRediT roles. What would be really interesting is also creating a standard that quantifies the accessibility conformance/rating of content and passing that into Crossref.

What was the key lesson you learned from this project?

DP: Incremental change can be just as challenging as a massive overhaul, and so it’s important to reevaluate your goals along the way—things always change. There have been cases where we were able to do things that we hadn’t initially thought were feasible.

DH: Always keep the larger goal in mind and remember that any project can birth a new project. Everything does not happen at once.

What’s your next big challenge for 2023?

DP: There is a lot to contend with in the industry right now, and in addition to that we are going through some serious infrastructure changes in our program. With all that madness comes many opportunities. For that reason, when I take a step back from the tactical implications of all that and what we are interested in doing, I think our biggest challenge in 2023 will be identifying what has made an impact and why.

DH: In the short-term, it is making sure that none of our production process changes has negatively affected the past metadata work we spent so much time honing. Once that settles down, it will be determining the best way forward from a publishing perspective in handling true versioning and capturing accurate event data.

Based on your experience, what would be your advice for colleagues from other scholarly publishing organisations?

DP: It can seem daunting, but the small wins can create momentum and do not have to be expensive. Remembering that your publishing program benefits as much as everyone else’s when you deposit more metadata can help refine your short-term and long-term priorities.

DH: Don’t be afraid of making a mess of things. Messes are okay. They aren’t risky. They just reveal the clutter. And clutter gives one reason to clean things up.

THANK YOU for the interview!

About the American Society for Microbiology

The American Society for Microbiology is one of the largest professional societies dedicated to the life sciences and is composed of 30,000 scientists and health practitioners. ASM’s mission is to promote and advance the microbial sciences.

ASM advances the microbial sciences through conferences, publications, certifications and educational opportunities. It enhances laboratory capacity around the globe through training and resources. It provides a network for scientists in academia, industry and clinical settings. Additionally, ASM promotes a deeper understanding of the microbial sciences to diverse audiences. For more information about ASM visit asm.org.

In the know on workflows: The metadata user working group

Jennifer Kemp — Tue, 28 Feb 2023 00:00:00 +0000

What’s in the metadata matters because it is So.Heavily.Used.

You might be tired of hearing me say it but that doesn’t make it any less true. Our open APIs now see over 1 billion queries per month. The metadata is ingested, displayed and redistributed by a vast, global array of systems and services that in whole or in part are often designed to point users to relevant content. It’s also heavily used by researchers, who author the content that is described in the metadata they analyze. It’s an interconnected supply chain of users large and small, occasional and entirely reliant on regular querying.

Tl;dr

Crossref recently wrapped up our first Working Group for users of the metadata, a group that plays a key role in discoverability and the metadata supply chain. You can jump directly to the stakeholder-specific recommendations or take a moment to share your use case or feedback.

Why a metadata user group? Why now?

A majority of Crossref metadata users rely on our free, open APIs and many are anonymous. A small but growing group of users pay for a guaranteed service level option and while their individual needs and feedback have long been integrated into Crossref’s work, as a group they provide a window into the workflows and use cases for the metadata of the scholarly record. As this use grows in strategic importance, to both Crossref and the wider community, it was clear that we might be overdue for a deeper dive into user workflows.

In 2021, we surveyed these subscribers for their feedback and brought together a few volunteers over a series of 5 calls to dig into a number of topics specific to regular users of metadata. This group, the first primarily non-member working group at Crossref, wrapped up in December 2022, and we are grateful for their time:

Achraf Azhar, Centre pour la Communication Scientifique Directe (CCSD)
Satam Choudhury, HighWire Press
Nees Jan van Eck, CWTS-Leiden University
Bethany Harris, Jisc
Ajay Kumar, Nova Techset
David Levy, Pubmill
Bruno Ohana, biologit
Michael Parkin, European Bioinformatics Institute (EMBL-EBI)
Axton Pitt, Litmaps
Dave Schott, Copyright Clearance Center (CCC)
Stephan Stahlschmidt, German Centre for Higher Education Research and Science Studies (DZHW)

This post is intended to summarize the work we did, to highlight the role of metadata users in research communications, to provide a few ideas for future efforts and, crucially, to get your feedback on the findings and recommendations. Though this particular group set out to meet for a limited time, we hope this report helps facilitate ongoing conversations with the user community.

Survey Highlights

If you’re looking for an easy overview of users and use cases, here’s a great starting point.

If you interpret this graphic to mean that there is a lot of variety centered on a few high level use cases, the survey and our experiences with users certainly supports that. A few key takeaways from the 2021 survey may be useful context:

Frequency of use: At least 60% of respondents query metadata on a daily basis
Use cases
- Finding and enhancing metadata as well as using it for general discovery are all common use cases
- For most users, matching DOIs and citations is a common need but for a significant group, it is their primary use case
- Analyzing the corpus for research was a consistent use case for 13% of respondents
Metadata of particular interest
- Abstracts are the most desirable non-bibliographic metadata, followed by affiliation information, including RORs
  - Some other elements (beyond citation information) that respondents find useful are:
    - Corrections and retractions
    - Relationship metadata
    - Book chapters
    - Grant information

NB: The survey did not ask about references but we are frequently asked why they’re not included more often.

It’s also worth noting that about a third of respondents said that correct metadata is more important to them than any particular element.

There is more to this survey that isn’t covered here but it was kept fairly short to help with the response rate. Knowing we would have some focused time to discuss issues too numerous or nuanced to reasonably address in a survey, we compiled a long list of questions and topics for the Working Group then followed up with a second, more detailed survey to kick off the meeting series.

What we set out to address

We had three primary goals for this Working Group:

Highlight the efforts of metadata users in enabling discovery and discoverability
Determine direction(s) for improved engagement
Inform the Crossref product development roadmap for metadata retrieval services

Of course, everyone involved had some questions and topics of interest to cover, including (but not limited to):

Understanding publisher workflows
How best to introduce changes, e.g. for a high volume of updated records
Understanding the Crossref schema
Query efficiencies, i.e. ‘tips and tricks’ (here for the REST API)
Which scripts, tools and/or programs are used in workflows
What other metadata sources are used
What kind of normalization or processing is done on ingest
How metadata errors are handled

What did we learn?

Workflows
I started with the admittedly ambitious goal of collecting a library of workflows. After a few years of working with users, I learned never to assume what a user was doing with the metadata, why or how. For example, some subscribers use Plus snapshots (a monthly set of all records), regularly or occasionally and some don’t use them at all. Understanding why users make the choices they do is always helpful.

In my experience, workflows are frequently characterized as “set it and forget it.” It’s hard to know how often and how easily they might be adapted when, for example, a new record type like peer review reports becomes available. So, it’s worth exploring when and how to highlight to users changes that might be of interest.

As it turned out, half the group had their workflows mostly or fully documented. The rest are partially documented, not documented at all or the availability of documentation was unknown. Helping users document their workflows, to the extent possible, should be a mutually beneficial effort to explore going forward. We’re doing similar work with the aim of making ours more transparent and replicable.

Feedback on subscriber services
User feedback might be the most obvious and directly consequential work of this group, at least for Crossref - understanding how well the services used meet their needs and what might be improved.

One frequent suggestion for improvement is faster response time on queries. This is an area we’ve focused on for some time, because refining queries to be more efficient is often the most straightforward way to improve response times and one reason for the emphasis on workflows.

We also discussed the possibility of whether or how to notify users of changes of interest. Just defining “change” is complex since they are so frequent and may often be considered very minor. We’ve been experimenting a bit over the past few years with notifying these users in cases where we’re aware of upcoming large volumes of changes, which is sometimes the case when landing page URLs are updated due to a platform change, for example. It was incredibly useful to discuss with the group what volume of records would be a useful threshold to trigger a notification (100K if you’re curious).

But perhaps the most common feedback we get from all users is on the metadata itself and the myriad quality issues involved. The group spent a fair amount of time discussing how this affects their work and shared a few examples of notable concerns:

Author name issues, e.g. ‘Anonymous’ is an option for authors but that or things like ‘n/a’ are sometimes used in surname fields
Invalid DOIs are sometimes found in reference lists
Garbled characters from text not rendering properly
Affiliation information is often not included or incomplete (e.g. doesn’t include RORs)
Inconsistencies in commonly included information, e.g. ISSNs

It’s worth noting that a common misunderstanding - not just among users - is what is required in the metadata. Users nearly always expect more metadata and more consistency than is actually available. The introduction of Participation Reports a few years ago was a very useful start to what is an ongoing discussion about the variable nature of metadata quality and completeness.

Users in the metadata supply chain
A few years ago, our colleague Joe Wass used Event Data to put together this chart of referrals from non-publisher sources in 2015.

The role of metadata users in discoverability of content is key in my view and one that often doesn’t get enough attention, especially given that the systems and services that use this information often use it to point their own users to relevant resources. And because they work so closely with the metadata, users frequently report errors and so serve as a sort of de facto quality control. So, unfortunately, the effects of incomplete or incorrect metadata on these users might be the most powerful way to highlight the need for more and better metadata.

What are the recommendations?

In discussions with the Working Group, a few themes emerged, largely around best practices, which, by their nature, tend to be aspirational.

If you’re not already familiar with the personas and Best Practices and Principles of Metadata 2020, that is a useful starting point (I am admittedly biased here!) and many are echoed in the following recommendations:

For users:

Document and periodically review workflows
Report errors to members or to Crossref support and reflect corrections when they’re made (metadata and content)
Understand what is and isn’t in the metadata
Follow best practices for using APIs

For Crossref:

Define a set of metadata changes, e.g. to affiliations, to further the discussion around thresholds for notifying users of ‘high volumes’ of changes
Provide an output schema.
Continue refining the input schema to include information like preprint server name, journal article sub types (research article, review article, letter, editorial, etc.), corresponding author flags, raw funding statement texts, provenance information, etc.
Collaborate on improving processes for reporting metadata errors and making corrections and enhancements

For metadata providers (publishers, funders and their service providers):

Follow Metadata 2020 Metadata Principles and Practices
Consistency is important, e.g. using the same, correct relationship for preprint to VoR links for all records
- Workarounds such as putting information into a field that is ‘close’ but not meant for it can be considered a kind of error
Understand the roles and needs of users in amplifying your outputs
Respond promptly to reports of metadata errors
Whenever possible, provide PIDs (ORCID IDs, ROR IDs, etc.) in addition to (not as a substitute for) textual metadata

What is still unclear or unfinished?

Honestly, a lot. We knew from the outset that the group would conclude with much more work to be done, in part because there is so much variety under the umbrella of metadata users and many answers lead to more questions and in part because the metadata and the user community will continue to evolve. Even without a standing group that meets regularly, it’s very much an ongoing conversation and we invite you to join it.

Now it’s your turn–can you help fill in the blanks?

Does any or all of this resonate with you? Do you take exception to any of it? Do you have suggestions for continuing the conversation?

Specifically, can you help fill in any of the literal blanks? We’ve prepared a short survey that we hope can serve as a template for collecting (anonymous) workflows. Please take just a few minutes to answer a few short questions such as how often you query for metadata.

If you are willing to share examples of your queries or have questions or further comments, please get in touch.

Perspectives: Mohamad Mostafa on scholarly communications in UAE

Mohamad Mostafa — Mon, 27 Feb 2023 00:00:00 +0000

تسلط سلسلة مدونة توقعات - وجهات نظر الخاصة بنا الضوء على أعضاء مختلفين من مجتمعنا العالمي المتنوع في كروس رف .نتعلم المزيد عن حياتهم وكيف تعرفوا وعملوا معنا، ونسمع رؤى حول مشهد البحث العلمي في بلدهم، والتحديات التي يواجهونها، وخططهم للمستقبل.

As we continue with our Perspectives blog series, today, we meet Mohamad Mostafa, Crossref Ambassador in the UAE and Production Manager at Knowledge E. Mohamad is passionate about helping improve the discoverability of research through rich metadata. We invite you to read and listen to what Mohamad has to say!

بينما نواصل سلسلة مدونة توقعات - وجهات نظر الخاصة بنا، نلتقي اليوم مع محمد مصطفى، سفير كروس رف في الإمارات العربية المتحدة ومدير الإنتاج في نوليدج اي . محمد متحمس للمساعدة في تحسين إمكانية اكتشاف البحث من خلال البيانات الوصفية الغنية. ندعوكم لقراءة ما يقوله محمد والاستماع إليه!

English عربي

Tell us a bit about your organisation, your objectives, and your role

أخبرنا قليلاً عن مؤسستك وأهدافك ودورك

My name is Mohamad Mostafa, and I am the Production Manager at Knowledge E. Within our publishing program, we publish around 2000 articles across 13 titles that are fully Open Access, which is something that I really value.

اسمي محمد مصطفى، وأنا مدير الإنتاج في نولدج إي. ضمن برنامج النشر الخاص بنا، ننشر حوالي 2000 مقالة عبر 13 عنوانًا مفتوح الوصول بالكامل، وهو أمر أقدره حقًا.

In a world that’s moving faster than ever, the availability, quality, and pursuit of knowledge are fundamental for advancement. Knowledge E, in line with its vision of developing a more knowledgeable world, helps institutions advance the quality of their research; move towards teaching excellence; upgrade library technology, services, and practices; and advance scholarship through journal publication, management, and training. In other words, it works with higher education institutions, research centres, ministries, publishers, and scholars to solve our society’s most significant challenges.

في عالم يتحرك بشكل أسرع من أي وقت مضى، يعد توافر المعرفة وجودتها والسعي وراءها أمورًا أساسية للتقدم. إن نوليدج إي، تماشياً مع رؤيتها لتطوير عالم أكثر معرفة ودراية، تساعد المؤسسات على تحسين جودة أبحاثها؛ التحرك نحو التميز في التدريس؛ ترقية مكتباتها الرقمية والخدمات والممارسات المتعلقة بها؛ ودعم المنح الدراسية المتقدمة من خلال نشر المجلات وإدارتها والتدريب. بمعنى آخر، تعمل شركة نولدج إي مع مؤسسات التعليم العالي ومراكز البحث والوزارات والناشرين والعلماء لحل أهم التحديات التي تواجه مجتمعنا.

I am also a Crossref Ambassador. As part of the ambassador program, we aim to raise awareness about Crossref services among librarians, publishers, editors, and authors in the Middle East and North Africa region. As part of this, we run workshops in English and Arabic, emphasizing the importance of comprehensive metadata and persistent identifiers. We also help research communities improve their understanding of how to use Crossref services. The importance of making regional research objects easy to find, cite and reuse encouraged me to join the ambassador program.

أنا أيضًا سفير كروس رف. كجزء من برنامج السفراء، نهدف إلى زيادة الوعي حول خدمات Crossref بين أمناء المكتبات والناشرين والمحررين والمؤلفين في منطقة الشرق الأوسط وشمال إفريقيا. وكجزء من هذا، فإننا ندير ورش عمل باللغتين الإنجليزية والعربية، للتأكيد على أهمية البيانات الوصفية الشاملة والمعرفات المستمرة. نحن أيضًا نساعد مجتمعات البحث على تحسين فهمهم لكيفية استخدام خدمات .Crossref شجعتني أهمية تسهيل العثور على عناصر البحث الإقليمية والاستشهاد بها وإعادة استخدامها على الانضمام إلى برنامج سفراء كروس رف.

What is one thing that others should know about your country and its research activity?

ما هو الشيء الذي يجب أن يعرفه الآخرون عن بلدك ونشاطه البحثي؟

A lot of regional research is being produced (in Arabic) and even without proper infrastructure (the lack of language support within the international publishing ecosystems such as peer review systems, indexes, citations databases, submissions systems, etc.) and the inadequate awareness about the various services (such as Crossref solutions) that can help with the discoverability and visibility of this research, the Arab region is increasingly recognised as a global leader in research outputs. Generally, these are some of the challenges and frustrations associated with the MENA (Middle East/North Africa) region.

يتم إنتاج الكثير من الأبحاث الإقليمية (باللغة العربية) وحتى بدون بنية تحتية مناسبة (نقص الدعم اللغوي داخل أنظمة النشر الدولية مثل أنظمة مراجعة الأقران، والفهارس، وقواعد بيانات الاستشهادات، وأنظمة التقديم، وما إلى ذلك) وعدم كفاية الوعي حول الخدمات المختلفة (مثل حلول(Crossref التي يمكن أن تساعد في اكتشاف هذه البحوث وإبرازها، يتم الاعتراف بالمنطقة العربية بشكل متزايد كرائد عالمي في مخرجات البحث. بشكل عام، هذه بعض التحديات والإحباطات المرتبطة بمنطقة الشرق الأوسط وشمال إفريقيا.

Are there trends in scholarly communications that are unique to your part of the world?

هل توجد اتجاهات في الاتصالات العلمية فريدة من نوعها في الجزء الذي تعيش فيه من العالم؟

In general, Open Access and Open Research are getting more and more attention in our region currently. We have recently launched the Forum for Open Research in MENA to raise awareness about all the new scholarly communications trends and support the Middle East and North Africa movement towards Open Science.

بشكل عام، يحظى الوصول الحر والبحث المفتوح باهتمام متزايد في منطقتنا حاليًا. لقد أطلقنا مؤخرًا منتدى الأبحاث المفتوحة في منطقة الشرق الأوسط وشمال إفريقيا لزيادة الوعي حول الاتصالات العلمية الجديدة ودعم حركة الشرق الأوسط وشمال إفريقيا نحو العلوم المفتوحة.

The Forum for Open Research in MENA (FORM) is a non-profit membership organisation supporting the advancement of open science policies and practices in research communities and institutions across the Arab world.

منتدى البحوث المفتوحة في الشرق الأوسط وشمال إفريقيا (FORM) هو منظمة غير ربحية ذات عضوية تدعم النهوض بسياسات وممارسات العلوم المفتوحة في المجتمعات والمؤسسات البحثية في جميع أنحاء العالم العربي.

We believe the Arab world has the resources and capability to play a pivotal role in the global transition towards more accessible, sustainable, and inclusive research and education models. And we want to support all our research communities and stakeholder groups in the journey towards a more ‘open’ world. Our vision is to help unlock research for and in the Arab world. Our mission is to support the advancement of open science practices in research libraries and universities across the Arab world by facilitating the exchange of actionable insights and developing practical policies.

نعتقد أن العالم العربي لديه الموارد والقدرة على لعب دور محوري في التحول العالمي نحو نماذج بحث وتعليم أكثر سهولة واستدامة وشمولية. ونريد دعم جميع مجتمعاتنا البحثية ومجموعات أصحاب المصلحة في رحلتنا نحو عالم أكثر "انفتاحًا". رؤيتنا هي دعم الوصول الحر والبحوث المفتوحة في العالم العربي. ومهمتنا هي دعم تقدم ممارسات العلوم المفتوحة في مكتبات البحث والجامعات في جميع أنحاء العالم العربي من خلال تسهيل تبادل الأفكار القابلة للتنفيذ وتطوير السياسات العملية.

Our first Annual Forum was held in Cairo in October 2022 (as part of the global Open Access Week initiative). The event was a huge success, with over 1,100 delegates from over 48 countries across the globe. The next Annual Forum will be hosted in the UAE in October 2023, and details will be available shortly on our website.

عقد المنتدى السنوي الأول في القاهرة في أكتوبر 2022 (كجزء من مبادرة أسبوع الوصول الحر العالمي). حقق الحدث نجاحًا كبيرًا، حيث حضره أكثر من 1100 مندوب من أكثر من 48 دولة حول العالم. سيتم استضافة المنتدى السنوي القادم في دولة الإمارات العربية المتحدة في أكتوبر 2023، وستتوفر التفاصيل قريبًا على موقعنا.

How would you describe the value of being part of the Crossref community; what impact has your participation had on your goals?

كيف تصف قيمة أن تكون جزءًا من مجتمعCrossref ؟ ما هو تأثير مشاركتك على أهدافك؟

I have been a Crossref ambassador for more than 5 years now, and I can really say that it has been a great experience being part of such an amazing and collaborative community. We got the chance to interact with different publishers and service providers and participate in different Crossref annual events. It’s also perfectly aligned with our vision of supporting Open Research.

لقد كنت سفيرًا لـ Crossref لأكثر من 5 سنوات حتى الآن، ويمكنني حقًا أن أقول إنها كانت تجربة رائعة أن أكون جزءًا من هذا المجتمع المذهل والتعاوني. لقد أتيحت لنا الفرصة للتفاعل مع مختلف الناشرين ومقدمي الخدمات والمشاركة في الأحداث السنوية المختلفة لـ .Crossref كما أنه يتماشى تمامًا مع رؤيتنا لدعم البحث المفتوح.

Recently, we have delivered a series of three Arabic webinars that offered basic metadata information and advanced insights about the role of metadata and how Crossref services can help an institution. These webinars have been well received by the community of regional publishers, university presses, and librarians. Dozens of questions have been answered, and technical enquires have been resolved. It was a great experience, and it was good to see that kind of interest in our community. Also, more educational webinars are yet to come!

قدمنا مؤخرًا سلسلة من ثلاث ندوات عربية عبر الإنترنت تمحورت حول معلومات البيانات الوصفية الأساسية ورؤى متقدمة حول دور البيانات الوصفية وكيف يمكن لخدمات Crossref أن تساعد المؤسسات البحثية. لقيت هذه الندوات عبر الإنترنت استحسان مجتمع الناشرين الإقليميين دور النشر الجامعية وأمناء المكتبات. تمت الإجابة على عشرات الأسئلة، وتم الرد على الاستفسارات الفنية. لقد كانت تجربة رائعة، وكان من المفرح أن نرى هذا النوع من الاهتمام في مجتمعنا. بالإضافة إلى ذلك، سيتم تقديم المزيد من الندوات التعليمية على الإنترنت في المستقبل.

For you, what would be the most important thing Crossref could change (do more of/do better in)?

بالنسبة لك، ما هو الشيء الأكثر أهمية الذي يمكن لـ Crossref تغييره (القيام بالمزيد / القيام بعمل أفضل في)؟

Language is still a barrier in some parts of the Arab region, so producing more educational content in different formats (webinars, flyers, videos with subtitles, etc.) would be highly appreciated here. 

لا تزال اللغة تشكل حاجزًا في بعض المناطق العربية، لذا سيكون إنتاج المزيد من المحتوى التعليمي بتنسيقات مختلفة (ندوات عبر الإنترنت، ونشرات، ومقاطع فيديو مع ترجمة، وما إلى ذلك) موضع تقدير كبير هنا.

Which other organisations do you collaborate with or are pivotal to your work in open scholarship?

ما هي المنظمات الأخرى التي تتعاون معها أو التي تلعب دورًا محوريًا في عملك في مجال الابحاث المفتوحة؟

We work closely with ORCiD and invite them to our events, support DOAJ via our charitable Foundation, and rely heavily on PKP products mainly the Open Journal Systems (OJS) with plans to expand and start using Open Monograph Press (OMP).

إننا نعمل عن كثب مع ORCiD ونقدر دعمهم لفاعلياتنا، كما ندعم DOAJ عبر موقعنا ومؤسستنا الخيرية، ونعتمد بشكل كبير على منتجات مشروع المعرفة العامة وخاصة المجلة المفتوحة أنظمة (OJS) كما أننا نود التوسع والبدء في استخدام Open Monograph Press (OMP).

What are the post-pandemic challenges/hopes you are facing and how are you adapting to them/what you’re looking forward to?

ما هي التحديات / الآمال التي تواجهها في فترة ما بعد الجائحة وكيف تتكيف معها / ما الذي تتطلع إليه؟

We aim for more face-to-face meetings and onsite workshops/conferences as the world opens up again. In addition, we have launched the Forum for Open Research in MENA (FORM) (a non-profit membership organisation supporting the advancement of Open Science policies and practices in research communities and institutions across the Arab region.)

نحن نهدف إلى المزيد من الاجتماعات وجهًا لوجه وورش العمل / المؤتمرات. بالإضافة إلى ذلك، أطلقنا منتدى البحث المفتوحMENA (FORM) ، وهي منظمة غير ربحية ذات عضوية تدعم النهوض بسياسات وممارسات العلوم المفتوحة في مجتمعات ومؤسسات البحث في جميع أنحاء المنطقة العربية.

A catalyst for positive action, we work with key stakeholders to develop and implement a pragmatic programme to facilitate the transition toward more accessible, inclusive, and sustainable research and education models in the Arab region. Our driving focus is on building the resources, the membership, the organisational structures, and the broader community to support the advancement of Open Science in research communities and research institutions across the Arab world.

كمحفز للعمل الإيجابي، نحن نعمل مع أصحاب المصلحة الرئيسيين للتطوير وتنفيذ برنامج عملي لتسهيل الانتقال نحو المزيد من نماذج البحث والتعليم الشاملة والمستدامة والتي يسهل الوصول إليها في المنطقة العربية. ينصب تركيزنا الدافع على بناء الموارد، والعضوية، والهياكل التنظيمية، والمجتمع الأوسع لدعم تقدم العلوم المفتوحة في المجتمعات البحثية والمؤسسات البحثية عبر العالم العربي.

Following the huge success of our 2022 Annual Forum (held in Cairo with the support and endorsement of UNESCO and the Egyptian Knowledge Bank), which attracted over 1100 delegates from 48 countries, our 2023 Annual Forum will be held in Abu Dhabi in the UAE. For more details about the event and the call for papers, see our website: https://forumforopenresearch.com

بعد النجاح الكبير لمنتدى 2022 السنوي (الذي عقد في القاهرة مع دعم وتأييد اليونسكو وبنك المعرفة المصري)، التي اجتذبت أكثر من 1100 مندوب من 48 دولة، المنتدى السنوي لعام 2023 سيعقد في أبو ظبي في دولة الإمارات العربية المتحدة. لمزيد من التفاصيل حول الحدث والدعوة للمشاركة، راجع موقعنا على الإنترنت:https://forumforopenresearch.com

What are your plans for the future?

ما هي خططك المستقبلية؟

Keep working with different global and regional stakeholders to help the transition of our region towards Open Science.

استمر في العمل مع مختلف الشركاء العالميين والإقليميين للمساعدة في انتقال منطقتنا العربية نحو العلوم المفتوحة.

Thank you, Mohamad!

شكرا لك يا محمد!

The more the merrier, or how more registered grants means more relationships with outputs

Dominika Tkaczyk — Wed, 22 Feb 2023 00:00:00 +0000

One of the main motivators for funders registering grants with Crossref is to simplify the process of research reporting with more automatic matching of research outputs to specific awards. In March 2022, we developed a simple approach for linking grants to research outputs and analysed how many such relationships could be established. In January 2023, we repeated this analysis to see how the situation changed within ten months. Interested? Read on!

TL;DR

The overall numbers changed a lot between March 2022 and January 2023:
- the total number of registered grants doubled (from ~38k to ~76k)
- the total numbers of relationships established between grants and research outputs quadrupled (from 21k to 92k)
- the percentage of linked grants increased substantially (from 10% to 23%)
Most of this growth can be attributed to one funder, the European Union. They started registering grants with us in December 2022, and:
- their grants constitute 47% of all grants registered by January 2023 and 95% of grants registered between March 2022 and January 2023
- 72% of all established relationships involve their grants
We have further work planned both internally and with the community to consolidate and build out important relationships between funding and research outputs.

Introduction

When we started to develop, think and talk about grant registration at Crossref back in 2017, one of the key things we expected this to support was easier, more efficient, accurate analysis of research outputs funded by specific awards.

This is backed up by conversations with funders who are keen to fill in gaps in the map of the research landscape with new data points and better quality information, search for grants, investigators, projects or organisations associated with awards and simplify the process of research reporting and with automatic matching of outputs to grants.

This is in keeping with and informed our recent recommendations about how funding agencies can meet open science guidance using existing open infrastructure, which included input from ORCID and DataCite. It’s also in keeping with recent studies on how important funding and grant metadata is to help the community use this information in their own research.

To meet these expectations, we need not only identifiers and metadata of grants, but also relationships between them and research outputs supported by them. Unfortunately, our schema does not make it easy to directly deposit such relationships, and so there are only a handful of them available. But we wouldn’t let such a minor obstacle stop us! In March 2022 we analysed the metadata of registered grants and developed a simple matching approach to automatically link grants to research outputs supported by them. Back then, we were able to find 20,834 relationships, involving 17,082 research outputs and 3,858 grants (which was 10% of all registered grants).

Now that we are seeing the accumulation of grant metadata being registered with Crossref, we have a bigger dataset to test these expectations against than we did a year ago. So we decided to do the analysis again. And the results are in, they’re open, and they’re positive. We’ll explain below.

The methodology

To spare you from having to read the old analysis in detail, here is a very brief summary of the matching methodology. To find relationships between grants and research outputs, we iterated over all registered grants, and for each grant we searched for research outputs that looked like they might have been supported by this grant. We established a relationship between a grant and a research output if one of the following three scenarios was true:

The research output contained the DOI of the grant (deposited as the award number).
The award number in the grant was the same as the award number in the research output, the research output contained the funder ID, and one of the following was true:
a. Funder ID in the grant was the same as the funder ID in the research output
b. Funder ID in the grant replaced or was replaced by the funder ID in the research output
c. Funder ID in the grant was an ancestor or the descendant of the funder ID in the research output
The award number in the grant was the same as the award number in the research output, the research output did not contain the funder ID, and one of the following was true:
a. Funder name in the research output was the same as the funder name in the grant
b. Funder name in the research output was the same as the name of a funder that replaced or was replaced by the funder in the grant
c. Funder name in the research output was the same as the name of an ancestor or a descendant of the funder in the grant

Note that the replaced/replaced-by relationships and ancestor/descendant hierarchy are taken from the Funder Registry.

Current results

Since March 2022, six additional funders have started registering grants with us. As a result, the total number of grants doubled, and the total number of established relationships between grants and research outputs, linked grants, and linked research outputs quadrupled. Here is the comparison of the total numbers of grants, established relationships, linked grants, and linked research outputs in March 2022 and in January 2023:

95% of grants registered within ten months between March 2022 and January 2023 were registered by one funder: the European Union. This suggests that this funder contributed a lot to this rapid increase in the number of established relationships. It looks like this funder’s grant metadata is of high quality and matches well the funding information given in the research outputs supported by this funder’s grants.

Let’s also compare the breakdowns of all established relationships by the matching method:

The distributions are a bit different. Currently, the percentage of relationships established based on the replaced/replaced-by relationship is much smaller than before, suggesting that newer data uses correct funder IDs instead of deprecated ones. Also, the percentage of the relationships matched by the funder ID increased from 40% to 48%, which is great, because this is the most reliable way of matching.

And here we have the statistics broken down by grant registrants. Only funders with at least 100 registered grants are included. The table shows the number of relationships, grants, linked grants, and linked research outputs, and is sorted by the percentage of linked grants.

funder	relationships	linked research outputs	grants	linked grants
European Union	66,562	60,630	35,530	12,688 (36%)
Gordon and Betty Moore Foundation	93	92	113	33 (29%)
Japan Science and Technology Agency (JST)	15,584	13,464	9,923	2,323 (23%)
James S. McDonnell Foundation	519	513	577	121 (21%)
Melanoma Research Alliance	188	185	425	82 (19%)
Muscular Dystrophy Association	50	50	178	25 (14%)
Parkinson’s Foundation	30	29	107	15 (14%)
Asia-Pacific Network for Global Change Research	127	127	560	70 (13%)
The ALS Association	96	90	477	58 (12%)
Wellcome	8,868	6,436	17,537	1,735 (10%)
American Cancer Society	19	19	266	15 (6%)
Templeton World Charity organisation	2	2	281	2 (0.7%)
Office of Scientific and Technical Information (OSTI)	73	69	8,723	62 (0.7%)
Children’s Tumor Foundation	1	1	662	1 (0.1%)

There are substantial differences between the percentages of linked grants from different funders. One of the newest registrants, the European Union, is at the top of the table with 36% of their grants linked to research outputs. This further confirms the high quality of the metadata registered by this member. It is worth noticing that this member is responsible for the majority of the growth reported here as they cover Horizon Europe, the European Research Council, and many other funding bodies and schemes.

Why are these percentages so low for some funders? It could be caused by systematic discrepancies between the award numbers attached to the grants and those reported in research outputs. It could also be the case that most grants registered by a given funder are new grants, and the research outputs supported by them simply have not been published yet. Time will tell!

What’s next

We’re dedicating lots of time in 2023 to examine, evolve, and expose the matching we do and can do at Crossref across different metadata fields. We then plan to incorporate matching improvements into our services so that everyone can benefit.

This isn’t a standalone piece of work. As you can see, the more award metadata we have connected to grants by funders and connected to outputs by those who post or publish research, the better we’ll be able to do this. To make it easier for more funders to participate, and based on funder feedback, we’ve built a simple tool for members to register their grants. We will also work to help incorporate grant identifiers into publishing and funder workflows, and further our discussions with the funders in our Funder Advisory Group and the wider community, including working together with the Open Research Funders Group, the HRA, Altum, Europe PMC, the OSTP, and the ORCID Funder Interest Group. And there will be more to come as we work together to consolidate and build out important relationships between funding and outputs - for everyone.

Follow-up

Every new thing takes time to get off the ground and to show evidence of its value. We’ve seen a significant step forward recently with funders joining and contributing to the research nexus. Publishers have been contributing funding data for years, and it’s now becoming much clearer to see how these two communities and these two sets of metadata are coming together to make research smoother and easier to manage and evaluate. If you are ready to register grants, talk about linking up your outputs, or just want to learn more about this work, we’d love to hear from you.

Don't take it from us: Funder metadata matters

Jennifer Kemp — Thu, 16 Feb 2023 00:00:00 +0000

Why the focus on funding information?

We are often asked who uses Crossref metadata and for what. One common use case is researchers in bibliometrics and scientometrics (among other fields) doing meta analyses on the entire corpus of records. As we pass the 10 year mark for the Funder Registry and 5 years of funders joining Crossref as members to register their grants, it’s worth a look at some recent research that focuses specifically on funding information. After all, there is funding behind so much scholarly work it seems obvious that it would be routinely documented in the scholarly record. But it often isn’t and that’s a problem. These sources make clear the need for accurate funding information and the problems that the lack of it creates.

First, a few notes for context on these sources and the issues they discuss :

The percent of records with funding information reached about 25% as of 2021. Not all items registered are the result of funding but surely it is much higher than 25% so there is considerable room for improvement. The authors cite publishers that omit funding information as well as those that include it routinely. Overall, society publishers are at the top of the list of those that do it well.
Three of the four sources found problems in some cases confirming funding information from the metadata in the original sources. This initially surprised me though less so once I thought about the strange nature of metadata workflows.
The complexity of fully and correctly acknowledging multiple sources of funding in any given publication is a recurring theme.
All of the sources mention the need for manual work in analyzing funding and publication information.

The first two papers are from the same 2022 issue of Quantitative Science Studies and are complementary.

Alexis-Michel Mugabushaka, Nees Jan van Eck, Ludo Waltman; Funding COVID-19 research: Insights from an exploratory analysis using open data infrastructures. Quantitative Science Studies 2022; 3 (3): 560–582. doi: https://doi.org/10.1162/qss_a_00212

This first paper tackles the timely question of determining which funders have supported publications of COVID-19 research and compares coverage of funding data in Crossref to that in Scopus and Web of Science. Even with so much urgent attention focused on the pandemic, the authors found that only 17% of publications in the COVID-focused CORD-19 database have funding identified in their Crossref records. We’re often asked about differences in the metadata (and citation counts) between Crossref and other sources such as Scopus. In this case, both proprietary sources studied have more funder coverage. If you are disappointed in these results or want to learn more, I encourage you to read the authors’ recommendations for improving funding data in Crossref or get in touch with us.

Bianca Kramer, Hans de Jonge; The availability and completeness of open funder metadata: Case study for publications funded by the Dutch Research Council. Quantitative Science Studies 2022; 3 (3): 583–599. doi: https://doi.org/10.1162/qss_a_00210

This next paper focuses on a set of outputs funded by the NWO (the Dutch Research Council). Since the funder is already known, the authors could look at multiple sources (Crossref and others) to see whether or where the NWO is correctly identified as the funder. This study also found better coverage than Crossref in proprietary sources like Web of Science. Knowing that not all outputs are the result of funded research, this paper provides a new and useful baseline for comparing percentages of coverage. Discussions of research funding so often focus on the physical and life sciences so it’s very good to see that 37% of works in this study are in the humanities and social sciences.

Borst, T., Mielck, J., Nannt, M., Riese, W. (2022). Extracting Funder Information from Scientific Papers - Experiences with Question Answering. In: , et al. Linking Theory and Practice of Digital Libraries. TPDL 2022. Lecture Notes in Computer Science, vol 13541. Springer, Cham. https://doi.org/10.1007/978-3-031-16802-4_24

Given the considerable effort required to conduct these analyses, it’s only logical to consider automating as much of the work as possible. This next paper focuses on automatic recognition of funders in economics papers in digital libraries. An interesting complication described here is the inclusion of funding for open access fees in acknowledgments and while the authors conclude that automated text mining of funder information performs better than manual curation, they also state that manual indexing is still necessary “for a gold standard of reliable metadata.”

Habermann, T. (2022). Funder Metadata: Identifiers and Award Numbers. https://metadatagamechangers.com/blog/2022/2/2/funder-metadata-identifiers-and-award-numbers

Finally, this concise blog post looks at RORs as well as funder names and acronyms. The author shows how acronyms contribute to the need for manual analysis. He also spends some time on award numbers, which is one of the three funding elements publishers can (and, as we’ve seen, should) include in their metadata. Award numbers are also a focus of this work and, unfortunately, another frequent reason for additional manual work.

A common theme: More metadata needed

Though collectively, this research paints a fairly dim picture of the current availability, completeness and accuracy of existing funding information in publication metadata, all is not lost. This is a good opportunity to point out the value and availability of grant records since unique, persistent identifiers for grants (yes, DOIs for grants) paired with more and better funding metadata from publishers go a very long way to realizing the vision of the Research Nexus. And it certainly would make things a whole lot easier for the researchers who use this open metadata to analyze the scholarly record for the rest of us.

Refocusing our Sponsors Program; a call for new Sponsors in specific countries

Susan Collins — Mon, 06 Feb 2023 00:00:00 +0000

Some small organisations who want to register metadata for their research and participate in Crossref are not able to do so due to financial, technical, or language barriers. To attempt to reduce these barriers we have developed several programs to help facilitate membership. One of the most significant—and successful—has been our Sponsor program.

Sponsors are organisations that are generally not producing scholarly content themselves but work with or publish on behalf of groups of smaller organisations that wish to join Crossref but face barriers to do so independently. Sponsors work directly with Crossref in order to provide billing, technical, and, if applicable, language support to Members.

Because Sponsors are important partners in facilitating membership there is a high bar to meet to be accepted as a Sponsor. To ensure that an organisation can accurately represent Crossref and has the resources to be successful we created a set of criteria that must be met to be considered.

Our Sponsors program has grown considerably over the last decade and has now become the primary route to membership for emerging markets and small or academic-adjacent publishing operations.

The program began in 2012 with four Sponsors, based primarily in South Korea and Turkey, representing fewer than 100 members. In the next stage of development, the program covered Brazil, India, and Ukraine, and nearly 1300 members. At the end of 2022, the program had grown to over 100 sponsors from 45 countries representing over 11,000 of our members.

Though the program continues to expand, there are still regions where we lack Sponsors, while having an abundance in others. We are working with members, ambassadors, and the community to help identify organisations that may be a fit with the Sponsor program and based in those regions where coverage is lacking.

This January we announced our Global Equitable Membership (GEM) Program which offers relief from membership and content registration fees for members in the least economically-advantaged countries in the world. Eligibility for the program is based on a member’s country on our curated list.

Though the GEM program reduces financial barriers to becoming a member, many organisations still require technical assistance and local language support. Working with a Sponsor would help organisations overcome these burdens. However, there is little or no Sponsor coverage for organisations located in most GEM-eligible countries. That means that in places like Bangladesh, Nepal, and Senegal, where we’ve seen a lot of growth, more organisations could join us if a suitable local Sponsor could support them.

We have made the decision to pause accepting new Sponsors from regions where Sponsor numbers are already very high or not based in a GEM region. By doing so we can focus on growing the program in areas where there is the greatest need.

We are also going to focus on how best to support our current 100+ Sponsors and work with them to evaluate ways to improve the program. We will bolster the training and resources, outreach activities, and solicit feedback on additional ways we can help.

We would love to hear from organisations based in GEM countries who might consider becoming a Sponsor. But our invitation for Sponsors is not limited to the support for the GEM program. There are countries where the GEM program won’t apply, but where growth is high and no Sponsor is present. In particular, we seek support in the following countries where member numbers are growing but could be better supported.

Country/state	Region	No. Crossref members
Nigeria	Sub-Saharan Africa (Western)	99
Philippines	South-eastern Asia	81
Kenya	Sub-Saharan Africa (Eastern)	40
Egypt	Northern Africa	26
Sri Lanka	Southern Asia	13

If your organisation is based in one of these regions and supports or provides services to scholarly publishers in one of the above countries —please take a look at the criteria set out on our website and do get in touch to start the conversation if you think you can meet them. We’re excited to hear from you!

Measuring Metadata Impacts: Books Discoverability in Google Scholar

Lettie Conrad — Wed, 25 Jan 2023 00:00:00 +0000

This blog post is from Lettie Conrad and Michelle Urberg, cross-posted from the The Scholarly Kitchen.
As sponsors of this project, we at Crossref are excited to see this work shared out.

The scholarly publishing community talks a LOT about metadata and the need for high-quality, interoperable, and machine-readable descriptors of the content we disseminate. However, as we’ve reflected on previously in the Kitchen, despite well-established information standards (e.g., persistent identifiers), our industry lacks a shared framework to measure the value and impact of the metadata we produce.

In 2021, we embarked on a Crossref-sponsored study designed to measure how metadata impacts end-user experiences and contributes to the successful discovery of academic and research literature via the mainstream web. Specifically, we set out to learn if scholarly books with DOIs (and associated metadata) were more easily found in Google Scholar than those without DOIs.

Initial results indicated that DOIs have an indirect influence on the discoverability of scholarly books in Google Scholar – however, we found no direct linkage between book DOIs and the quality of Google Scholar indexing or users’ ability to access the full text via search-result links. Although Google Scholar claims to not use DOI metadata in its search index, the results of our mixed-methods study of 100+ books (from 20 publishers) demonstrate that books with DOIs are generally more discoverable than those without DOIs.

As we finalize our analysis, we are sharing some early results and inviting input from our community. What relevant lessons can we glean from this exercise? What changes might book publishers consider based on the outcomes of this study?

Background on the study

This study was designed to evaluate metadata impacts & benefits to users. Given its popularity with a range of stakeholders in our industry, we set out to measure metadata impacts on discoverability in the mainstream web – namely, Google Scholar.

Our test method and analysis rubric was developed based on our own information-user research, in particular how readers search and retrieve scholarly ebooks, as well as published studies about academic information experiences and research practices. We rated the search performance of more than 100 scholarly books using preset test queries (two for each title). The books tested in this study came from publishers of all sorts and sizes, and represent both monographs and edited volumes from a range of fields; some were open access and others were published under traditional licensing models.

We developed and executed known-item test searches that were designed to simulate common researcher practices. Heuristic analysis of the search results was used to rate the search performance on a 5-point scoring rubric, which was designed to measure the degree of friction in locating the book in question. This method allowed us to assess specific book and metadata attributes by their search performance scores to assess the impact of book metadata on content discoverability in Google Scholar.

Results and findings

In this study, we learned that high-value fields include the primary title paired with subtitles, author/editor surnames and/or field of study. Queries using full book titles performed the best across the board. Those using publication dates and/or author/editor surnames and/or publisher names, but without the book title, were the lowest performers.

Surprisingly, our discoverability scores show no significant variation in performance by the type of book, whether edited or authored. Open-access titles performed somewhat better than traditional ones. Books covering humanities and social science fields performed a bit better than STM books, but only by a slim difference (that is not statistically significant).

We primarily tested the discoverability of book titles, from equal numbers of books with and without chapter-level DOIs. We ran similar tests for chapter-title discoverability but found the majority of test queries for chapters lead users to the full book itself. While books without title-level DOIs were found to be less discoverable, we did not find a measurable difference between books with or without chapter-level DOIs. (Note: All books in this study with chapter-level DOIs assigned also carried a title-level DOI, which was found to be fairly common.)

Based on these results, we are developing a theory that books with DOIs perform better in Google Scholar because they benefit from the structured, open metadata associated with those DOIs – which are used by hundreds of platforms and services, and therefore are “seeded” throughout the mainstream web, which Scholar may draw on for indexing, linking, etc. That said, however, these results also suggest that publishers are best served by a metadata strategy that is well attuned to the protocols expected of each channel for book search and discovery. In a recent conversation about our findings, Anurag Acharya himself noted that these results underscore the need for publishers to invest in the robust construction and broad distribution of book metadata.

In this study, we have observed that the metadata protocols surrounding Google Scholar are not fully integrated into our industry’s established scholarly information standards bodies, like NISO, or infrastructure organisations, like Crossref. While some mainstream data standards prevail in the Scholar index, like the use of schema.org and HTTP, some key metadata attributes seem to be lacking. For example, an indicator of the type of scholarly book (monograph, handbook, etc.) would improve Google Scholar’s search index and could be used to filter search results, thereby improving users’ experiences discovering scholarly books. One clear challenge for book publishers today is the fact that Google Scholar operates outside of our community-governed scholarly information infrastructure.

What comes next

While this study focused on Google Scholar, the results and lessons learned are applicable to other mainstream channels of information seeking/discovery. Our report, due out spring 2023, will contribute to the literature intended to support user-centric information systems design and content architecture by scholarly publishers and service providers.

As we write up our findings, we intend to develop a framework that can help publishers and others measure the impact of their work to enrich and distribute scholarly metadata. We hope this first systematic review of the impacts of metadata on the discoverability of books in Google Scholar will provide valuable insights for this community. In the meantime, please share your thoughts and questions in the comments below – or reach out to us directly (see Lettie’s profile here and Michelle’s profile here).

Acknowledgments: The authors would like to thank Jennifer Kemp at Crossref for the inspiration to take this dive into the metadata literature and reflect on its impact on research information experiences. Special thanks to Anurag Acharya at Google Scholar for his consultation during this study.

Introducing our new Global Equitable Membership (GEM) program

Susan Collins — Wed, 07 Dec 2022 00:00:00 +0000

When Crossref began over 20 years ago, our members were primarily from the United States and Western Europe, but for several years our membership has been more global and diverse, growing to almost 18,000 organisations around the world, representing 148 countries.

As we continue to grow, finding ways to help organisations participate in Crossref is an important part of our mission and approach. Our goal of creating the Research Nexus—a rich and reusable open network of relationships connecting research organisations, people, things, and actions; a scholarly record that the global community can build on forever, for the benefit of society—can only be achieved by ensuring that participation in Crossref is accessible to all. Building a network for the global community must include input from all of the global community.

Although Crossref membership is open to all organisations that produce scholarly and professional materials, cost and technical challenges can be barriers to joining for many organisations. To address some of these challenges, we created our Sponsors Program, which provides technical, financial and local language support. We also collaborate with the Public Knowledge Project on the Open Journals Platform to develop plugins for OJS users.

Additionally, we had a limited ‘fee assistance’ program to waive the content registration fees for members working under specific Sponsor arrangements, including INASP, and African Journals Online (AJOL). Learning from the experiences of such successful partnerships, starting in January 2023, we are expanding this program to provide greater membership equitability and accessibility to organisations located in the least economically-advantaged countries in the world through our Global Equitable Membership (GEM) Program. This new scheme now encompasses the annual fee as well as the content registration fees.

Eligibility for the program is based on a member’s country. We have curated the list, predominantly based on the International Development Association (IDA) list and excluding anywhere we are bound by international sanctions. From January 2023, organisations based in countries listed in our GEM program will be eligible to join Crossref and contribute with their metadata to a robust scholarly record at no cost. This also applies to 187 existing members in eligible countries who will no longer be charged for Crossref membership or content registration.

Existing Crossref members in GEM-eligible countries

Bangladesh (54)	Burundi (1)	Kiribati (0)
Kyrgyz Republic (20)	Central African Republic (1)	Lesotho (0)
Nepal (19)	Democratic Republic of the Congo (1)	Liberia (0)
Ghana (15)	Guyana (1)	Marshall Islands (0)
Yemen (10)	Haiti (1)	Mauritania (0)
Sudan (7)	Honduras (1)	Micronesia (0)
Tanzania (7)	Laos (1)	Mozambique (0)
Afghanistan (6)	Madagascar (1)	Nicaragua (0)
Ethiopia (5)	Malawi (1)	Niger (0)
Zambia (5)	Maldives (1)	Samoa (0)
Bhutan (4)	Myanmar (1)	Sao Tome and Principe (0)
Rwanda (4)	Cambodia (1)	Sierra Leone (0)
Tajikistan (4)	Chad (1)	Solomon Islands (0)
Kosovo (3)	Comoros (1)	South Sudan (0)
Senegal (3)	Cote d’Ivoire (1)	Togo (0)
Uganda (3)	Djibouti (1)	Tonga (0)
Burkina Faso (2)	Eritrea (1)	Tuvalu (0)
Mali (2)	Gambia (1)	Vanuatu (0)
Somalia (2)	Guinea (1)
Benin (1)	Guinea-Bissau (1)

The list of countries will undergo an annual review, to follow the latest guidance from IDA, which uses the somewhat simplistic World Bank income classifications but applies a more granular blend of criteria for economic health, thereby allowing for greater nuance, such as indicating countries where the gap between rich and poor is very wide.

The program results from our experience working with and knowing the communities through Sponsors and working with past members who have struggled to pay. It aims to bring us closer to our vision of building an inclusive, rich and open network of relationships underpinning the scholarly record. With the support of the Membership and Fees Committee, the launch of the program was confirmed with the recent unanimous vote of our Board to evolve our fee assistance program into a more expansive scheme. GEM presents a more comprehensive and equitable solution than our former arrangements. It involves an opportunity to join Crossref and contribute scholarly metadata to our global community on a zero-fee basis for membership and content registration. This offering will be applied by default to organisations based in all eligible countries, irrespective of joining through any specific Sponsor, or independently.

While the GEM Program will alleviate financial barriers, and we hope to see the numbers above grow significantly, the GEM program will not necessarily help ease technical or administrative burdens. We still need our valued Sponsors for that and we seek new Sponsors in the above locations. We would love to hear from organisations based in GEM countries who might consider becoming a Sponsor or otherwise support local colleagues in building experience of metadata and working with global open scholarly infrastructure systems like Crossref. Please reach out to me to discuss ideas or with any other questions or comments.

How funding agencies can meet OSTP (and Open Science) guidance using existing open infrastructure

Ed Pentz — Thu, 17 Nov 2022 00:00:00 +0000

In August 2022, the United States Office of Science and Technology Policy (OSTP) issued a memo (PDF) on ensuring free, immediate, and equitable access to federally funded research (a.k.a. the “Nelson memo”). Crossref is particularly interested in and relevant for the areas of this guidance that cover metadata and persistent identifiers—and the infrastructure and services that make them useful.

Funding bodies worldwide are increasingly involved in research infrastructure for dissemination and discovery. While this post does respond to the OSTP guidelines point-by-point, the information here applies to all funding bodies in all countries. It will be equally useful for publishers and other systems that operate in the scholarly research ecosystem.

In response to calls from our community for more specifics, this post:

Provides an overview of the specific ways that Crossref (along with organisations and initiatives like DataCite, ORCID, and ROR) helps U.S. federal agencies—and indeed any other funder—meet critical aspects of the recommendations.
Restates our intent to collaborate with all stakeholders in the scholarly research ecosystem, including the OSTP, the US federal agencies, our existing funder, publisher, and university members, to support the recommendation as plans develop.
References the work and adoption of Crossref Grant DOIs, including analyses of existing metadata matching funding to outputs.
Highlights that what’s outlined in the memo aligns with our longstanding mission to capture and maintain the scholarly record and our vision of the Research Nexus, as we describe in our current blog series, regarding our role in preserving the integrity of the scholarly record (ISR).

Infrastructure already exists to support funder goals; it just needs more adoption

Ensuring free, immediate, and equitable access to metadata that captures the scholarly record is an essential part of meeting the aims of the memo but also supporting Open Science globally.

In September, Crossref ORCID, DataCite, and ROR participated in the 2022 Forum on Global Grants Management run by Altum and the summary provides a good example of the importance of open infrastructure and open metadata to the goals of Open Science:

Open Science begins with open infrastructure: Attendees agreed that Open Science relies on many other ‘opens’ – most notably, open metadata, open infrastructure, and open governance. Metadata and DOIs (digital object identifiers) for publications, grants, and research outputs, are essential to illuminate the connections that exist between funding and outcomes. That metadata runs on infrastructure powered by organisations such as Crossref, ORCID, ROR, and DataCite.

As a foundational scholarly infrastructure committed to meeting the Principles of Open Scholarly Infrastructure (POSI) of governance, insurance, and sustainability, Crossref plays an essential role in implementing and supporting key aspects of the guidance. For many years, we have been focused on the integrity of the scholarly record (ISR), and the shared vision to collectively achieve what we call the Research Nexus, which is described as

A rich and reusable open network of relationships connecting research organisations, people, things, and actions; a scholarly record that the global community can build on forever, for the benefit of society.

Metadata—including persistent identifiers and relationships between different research objects—is the foundation of the Research Nexus and is critical to openly and sustainably fulfilling the OSTP memo’s recommendations.

This topic of open metadata and identifiers isn’t just an issue for research resulting from US federal funding. We are working to implement open scholarly infrastructure globally, bringing significant benefits to the whole scholarly research ecosystem.

The current situation brings to mind the William Gibson quote, “The future is already here - it’s just not evenly distributed yet”. Much of the open infrastructure to support the identifier, metadata and reporting requirements of the OSTP memo already exists, but it is unevenly implemented. Increased collaboration and effort will be needed to bring this all to fruition.

We set out below some steps that all stakeholders can take to meet not just the OSTP guidelines, but Open Science goals more broadly, and globally.

What does ‘adoption’ look like? How exactly do funders and other stakeholders work with this infrastructure?

The OSTP memo calls for specific actions concerning metadata and identifiers where, fortunately, open and global solutions already exist.

For example, item 4 a) says, “Collect and make publicly available appropriate metadata associated with scholarly publications and data resulting from federally funded research.” Crossref and DataCite make metadata, including persistent identifiers (DOIs to be specific), openly available for a broad range of research objects from publications to data. Item 4 b) reads, “Assign unique digital persistent identifiers to all scientific research and development awards and intramural research protocols”. Again, federal agencies and other funders are already joining to register awards and grants and distribute these records openly through Crossref. However, this is an example of uneven adoption as registering awards and grants with DOIs is only being done by a few funders so far, which needs to increase.

Here is an ideal workflow that funders and publishers can already follow

Funders join Crossref to register grants and awards (or indeed any other object such as reports). They apply on our website, accept our terms, and provide key information such as contact details. An annual membership fee ranges from $200-$1200 USD.
Funders and publishers collect ROR IDs and authenticated ORCID iDs for all authors/awardees and their affiliations.
Funders register a Crossref DOI for the award/grant, including awardees’ ORCID iDs and ROR IDs. They send us XML information about the grant (note that we will imminently release an online form to make it easier for the less technical funders). Many funder members register the metadata through a third party, such as Altum (if they use ProposalCentral) or Europe PMC.
At the same time, funders update the awardees’ ORCID record directly with the Crossref Grant DOI and metadata.
Grantees produce research objects and outputs such as data, protocols, code, preprints, articles, conference papers, book chapters, etc.
These objects are registered with Crossref or DataCite, and DOIs are created by the publisher or repository members who include ORCID iDs, Crossref Grant DOIs (gathered from the author), ROR IDs for affiliations for all contributors, and other key metadata such as licensing information, and in the case of publications - references and abstracts. Note that the publisher works its magic (actually, publishers do a lot of editorial and production work, such as including data citations in the references using DataCite DOIs for the data in data repositories).
On the Crossref side, we do a bunch of processing and matching and are planning to refine this and do more. Sometimes relationships are notified and added, such as data citation, preprints related to articles or funding acknowledgements converted from free text to Open Funder Registry IDs and names.
Grant records with Crossref DOIs are now part of the scholarly record. All stakeholders may retrieve the open metadata and relationships through our public APIs. Crossref and DataCite will always provide open metadata, as safeguarded by our respective commitments to POSI.

Anyone can use the open metadata registered with Crossref, DataCite and ORCID as connections have been established between (ideally all) research objects and entities through open metadata and identifiers. This means that:

Funding agencies can monitor compliance with their policies
Publishers can identify the funder and meet their requirements
Funding agencies can assess and report on the reach and return of their funding programs
The provenance and integrity of the scholarly record is preserved and discoverable, benefitting all stakeholders.

Suggestions for meeting OSTP and Open Science guidance, point by point

OSTP Recommendation	Publishers should…	Funding agencies should…
4 a) Collect and make publicly available appropriate metadata associated with scholarly publications and data resulting from federally funded research	For scholarly publications: register comprehensive metadata & DOIs with Crossref. For scholarly data: register comprehensive metadata and DOIs with DataCite.	Use Crossref’s API to retrieve publication and other metadata. Use DataCite’s API to retrieve data/repository metadata.
i) all author and co-author names, affiliations, and sources of funding, referencing digital persistent identifiers, as appropriate;	Collect and validate the following from authors at manuscript submission: ROR & ORCiD IDs, Crossref Grant DOIs. Include data citations in reference lists, preferably with DataCite DOIs.	Register awards and grants with Crossref and create DOI records for them. Use ORCID’s API to retrieve validated contributor metadata. Update contributors’ ORCID records with Crossref Grant DOIs and metadata. Use ROR API to retrieve and verify affiliation metadata. Recommend data citations be included in published outputs.
ii) the date of publication; and,	Include acceptance and publication dates in Crossref metadata.	Use Crossref’s API to retrieve publication dates.
iii) a unique digital persistent identifier for the research output;	For scholarly publications and research outputs: register full metadata & DOIs with Crossref. For scholarly data: register full metadata and DOIs with DataCite.	Use Crossref and DataCite APIs to retrieve DOIs for research outputs.
4 b) Instruct federally funded researchers to obtain a digital persistent identifier that meets the common/core standards of a digital persistent identifier service defined in the NSPM-33 Implementation Guidance, include it in published research outputs when available, and provide federal agencies with the metadata associated with all published research outputs they produce, consistent with the law, privacy, and security considerations.	Collect ORCID iDs on manuscript submission for all authors. Register Crossref and DataCite DOIs and metadata for research outputs, including data.	Recommend that researchers applying for funding obtain an ORCID iD and collect them upon grant application for all applicants. Prepopulate grant applications with CV and publication information from applicants’ ORCID records. ORCID iDs should be included in the grants registered by the agencies with Crossref. Agencies can use our open APIs to retrieve the metadata on publications and data rather than ask researchers to do it, saving time and effort.
4 c) Assign unique digital persistent identifiers to all scientific research and development awards and intramural research protocols that have appropriate metadata linking the funding agency and their awardees through their digital persistent identifiers.		Join Crossref to register Crossref Grant DOIs, including ROR IDs and ORCID iDs Ensure grant proposal and assessment systems integrate with Crossref, ROR for affiliations and with ORCID for applicants/awardees.
5 a) coordinate between federal science agencies to enhance efficiency and reduce redundancy in public access plans and policies, including as it relates to digital repository access;	Work with agencies to ensure a smooth, automated workflow.	Using and supporting existing open scholarly infrastructure and using open identifiers will avoid duplication of effort and make the overall ecosystem more efficient .
5 b) improve awareness of federally funded research results by all potential users and communities;	Collect Crossref Grant DOIs from authors and use them to link from publications to grant information.	Communicate your Crossref Grant DOIs and open grant metadata widely via human and machine interfaces. Inclusion in the Crossref API will enhance dissemination and discoverability Update contributors’ ORCID records with Crossref Grant DOIs and metadata
5 c) consider measures to reduce inequities in the publishing of, and access to, federally funded research and data, especially among individuals from underserved backgrounds and those who are early in their careers;		Registering grants and sharing metadata through Crossref means it’s part of the world’s largest open community-governed metadata exchange and makes it available to the entire world without restriction.
5 d) develop procedures and practices to reduce the burden on federally funded researchers in complying with public access requirements;	Ensure your systems and those you work with make it as easy as possible for authors to provide the necessary metadata and persistent identifiers - work towards as much automation as possible and pulling from other systems rather than asking for data to be re-keyed.	Ensure the platforms you work with, such as grant proposal or assessment systems, retrieve and prepopulate ROR IDs, ORCID iDs, and Crossref and DataCite DOIs and associated metadata whenever possible so that the researchers don’t have to manually rekey or reformat data.
5 e) recommend standard consistent benchmarks and metrics to monitor and assess implementation and iterative improvement of public access policies over time;		Ensure that platforms and systems integrate with ROR, ORCID, Crossref, and DataCite so that this open metadata can lead to the creation of benchmarks and metrics.
5 f) improve monitoring and encourage compliance with public access policies and plans;	Use open infrastructure to help authors easily comply with public access and funder/institution policies. Automate systems as much as possible.	Using the open infrastructure, metadata, and identifiers outlined in this post will make monitoring more straightforward and compliance easier for all stakeholders. The community can build services on open infrastructure and metadata.
5 g) coordinate engagement with stakeholders, including but not limited to publishers, libraries, museums, professional societies, researchers, and other interested non-governmental parties on federal agency public access efforts;	Work with the global open infrastructure organisations (Crossref, DataCite and ORCID) whose members include funding agencies, societies, publishers, universities, libraries, repositories, museums, NGOs, and many other stakeholders - all looking to improve the efficiency of the research ecosystem.	Work with the global open infrastructure organisations (Crossref, DataCite and ORCID) whose members include funding agencies, societies, publishers, universities, libraries, repositories, museums, NGOs, and many other stakeholders - all looking to improve the efficiency of the research ecosystem.
5 h) develop guidance on desirable characteristics of—and best practices for sharing in—online digital publication repositories;	Support automated systems that use metadata and identifiers to populate repositories automatically.	Collaborate with publishers, Crossref and others to develop automated systems to populate repositories.
5 j) develop strategies to make federally funded publications, data, and other such research outputs and their metadata are findable, accessible, interoperable, and re-useable, to the American public and the scientific community in an equitable and secure manner.	Provide and support a range of discovery services based on open infrastructure.	Encourage discovery services - and develop services - that use the open infrastructure, metadata and persistent identifiers to enable.

Everybody needs to play their part

A lot of the work on making the above happen is already underway, and there is widespread adoption of open identifiers and metadata, but as noted above, funders are still early in the adoption journey, and implementation among all stakeholders is patchy.

Critical parts of the infrastructure rely on third-party platforms that supply tools and systems to authors, funders, and publishers - so coordinating the support for the appropriate metadata and identifiers in these systems and tools is very important.

We are emphasising how our existing open scholarly infrastructure systems are helping. But we also know that it’s not all perfect yet. Infrastructure is always evolving, metadata is never complete, refactoring workflows and systems can be costly, and integration can always be smoother. But our existing open infrastructure has already delivered significant benefits, and broader adoption will bring additional benefits to the whole scholarly research and communications ecosystem and help achieve the promise of Open Science in advancing human knowledge.

While working on this coordination and integration, we all try to remember that it should minimise work for researchers, and processes should be as automated as possible.

Collaboration is key to making this all work.

We already work with many funders through our Advisory Group, our 30 funder members, 25 of whom have so far collectively registered around 40,000 Crossref Grant DOIs, retrievable from our open API. Some grants are even matched to resulting outputs already, and some funders have recently dug into Crossref metadata to analyse outcomes from their investments, such as the Dutch Research Council (NWO) which presents findings and makes a case for greater emphasis on Crossref funding metadata.

We also work closely with partners Europe PMC and Altum, and we engage in community research and discussion, for example, through the Open Research Funders Group.

Alongside our fellow infrastructures and open identifier registries ORCID, DataCite, and ROR, we integrate with and support each other operationally and out in the community.

We will continue focusing our resources and efforts on engaging with funders, including US federal agencies responding by the OSTP guidelines, and all stakeholders to support the entire global scholarly research ecosystem.

Everyone has a part to play, and we must all pull together to prioritize this work.

Who’s in?

Please get in touch with Ed, Ginny, or Jennifer (or indeed DataCite or ORCID or ROR) if you’d like to have a discussion about the workflows described here, or just to make sure you’re up to date on the latest developments and opportunities we describe. We look forward to working with all funding agencies to support them as they develop their plans.

Better preprint metadata through community participation

Martyn Rittman — Wed, 09 Nov 2022 00:00:00 +0000

Preprints have become an important tool for rapidly communicating and iterating on research outputs. There is now a range of preprint servers, some subject-specific, some based on a particular geographical area, and others linked to publishers or individual journals in addition to generalist platforms. In 2016 the Crossref schema started to support preprints and since then the number of metadata records has grown to around 16,000 new preprint DOIs per month.

Preprints aren’t the same as journal articles, books, or conference papers. They have unique features, and how they are viewed and integrated into the publishing process has evolved over the past six years. For this reason, we have been revisiting the preprint metadata schema and decided that the best approach would be to form an advisory group (AG) of preprint practitioners and experts to help us.

The AG has identified a number of areas in which preprint metadata could be improved. Four of these were considered to have the highest priority:

Withdrawal and removal of preprints.
Preprints as an article type (not a subtype of posted content) in the schema.
Relationships between preprints and other outputs.
Versioning of preprints.

The members of the AG set to work with great enthusiasm, sharing perspectives and expertise. This led to a first tranche of recommendations shared for feedback earlier this year, and we’re grateful for engagement and feedback from the community over the last few months.

What did the community say?

Some of the points raised in the feedback were:

Could the origin of a withdrawal be included in the metadata, in particular whether it was requested by an author or another party?
Can the metadata represent when a preprint has been submitted to a journal and what stage it is in the editorial process?
Crossref is not alone in looking at preprint metadata, and several NISO groups are also engaged in related work.
Interoperability and the ability to create relationships with identifiers beyond DOIs is important to maintain an accurate and comprehensive record of research outputs.

These will form the basis for ongoing discussions.

What happens next?

There are three next steps that we will be taking.

The recommendations outline only the outcomes of discussions in a relatively brief format. We have been working on a more detailed paper to communicate more about what was discussed and provide some extra justification and alternatives.
The AG will continue to meet and discuss the points raised during consultation on the recommendations, along with topics that were considered a lower priority at an earlier stage.
We will draw up a set of proposals for specific changes to the metadata schema that will reflect the outcomes of the recommendations and discussions.

Although the initial period for feedback on preprint metadata has ended, we welcome feedback at any time. If you would like to get in touch, please contact me or any member of the advisory group.

Forming new relationships: Contributing to Open source

Patrick Vale — Wed, 19 Oct 2022 00:00:00 +0000

TL;DR

One of the things that makes me glad to work at Crossref is the principles to which we hold ourselves, and the most public and measurable of those must be the Principles of Open Scholarly Infrastructure, or POSI, for short. These ambitions lay out how we want to operate - to be open in our governance, in our membership and also in our source code and data. And it’s that openness of source code that’s the reason for my post today - on 26th September 2022, our first collaboration with the JSON Forms open-source project was released into the wild.

Like most organisations, we depend heavily on open-source software for our operations - the software is universally available, generally high quality and ‘free’. And it’s easy to take that dependency, and the associated dependency on free time and effort on the part of the maintainers, for granted - but that’s not very sustainable. In fact, we believe relying on open-source software without helping to sustain it is an anti-pattern, and this project marks the start of our efforts to make funding open-source software a standard part of our technology budget.

This isn’t the first time we’ve supported or released open-source software. Indeed for the past few years, all our new software is open source, and we’re in the process of replacing old closed code with new, so that eventually all our code will be open source. But this is the first time we’ve contributed extensively to something that isn’t focussed primarily on us, and our services. This is a project that we will find very useful, but it is a general purpose tool, and it’s already gaining traction in the community.

Background and motivations

A while back, I was tasked to do a quick spike of work on testing the theory that we could use automated form generation tools to bring new interfaces to our users more quickly, and make them easier for “people who aren’t devs” to adapt and manage. We wanted to build a new user interface for registering content, and especially we wanted to make it easier for funders to register the grants they were awarding. As well as being more approachable by a less-technical audience, we also wanted these forms to be accessible (in terms of a11y and users of assistive technology) and localisable - we wanted a solution that would cater to the needs of our rapidly diversifying membership.

Enter JSON Schema

We were clear about one side of the puzzle - we knew that we had to look beyond the XML ecosystem upon which much of our existing system is built - and landed on JSON Schema. JSON Schema is a ‘vocabulary that allows you to annotate and validate JSON documents’. This means you can describe the shape you expect your data to take, and apply constraints-based validation to that. Which means, in terms of a form library, that you can infer the structure of the form and test that the data entered into it matches what you expect. More than that, you can use that built-in validation to provide error messages to help people get the data right, first time.

Working backwards from the outcome, the argument for adopting JSON Schema is compelling. It provides a mechanism for checking that data you are handling (for example, receiving input from a form) conforms to the constraints that you declare, but also allows you to tell people up-front, in a human and machine-readable way, what structure and format you will accept. This closed-loop of data annotation and validation gets more appealing when you look at the wide adoption of JSON Schema across languages and libraries. You can pretty much guarantee that for whatever client or server -side technology you are using, there will be a JSON Schema validator for it. Being able to share schemas across your systems (and equally importantly, with third parties) moves JSON schema from ‘just’ being about data validation, to a key supportive technology.

Building a form derived from a JSON Schema is an equally attractive prospect. JSON Schema was conceived during the AjaxWorld conference in 2007 as a ‘JSON-based format for defining the structure of JSON data’, and its use as a form-generation tool is relatively new, but there is growing community interest. There is even a discussion about how to best create a JSON Schema vocabulary, specifically geared towards addressing some of the needs of form generation users. However, even in its current form, a JSON Schema can be passed to a library, and a very serviceable user interface appears. The devil is always in the detail, and the client-side libraries differ in their abilities to customise areas such as layout (you may not always want your form fields to appear in exactly the same order as they do in your JSON Schema), custom elements (you might want something that wasn’t a form input, or that changes based on user input) and localisation. The ability to flexibly customise the appearance and behaviour of the interface was a key factor in our selection of a client-side form generation library.

Choosing a library

The other side of the puzzle was less clear - choosing a UI library that would take this JSON Schema, and turn it into a useful, and usable, form. I made the prototype using the venerable React JSON Schema form. This worked well as a proof of concept, but veered dramatically off our chosen Frontend stack of VueJS and Vuetify, and had some architectural constraints that would limit the scope of customisations we could make to our forms. So I went off looking for libraries that would work with our stack and came up with Vuetify JSON Schema Form, and JSON Forms.

Vuetify JSON Schema Form matched our stack perfectly, but made some interesting decisions about the layout of data within the form, and that wouldn’t suit our purposes without dramatic modification.

JSON Forms was an abstracted library, with a core handling the JSON Schema transformation and validation, and separate rendering libraries to handle the form generation. This was great - they had renderers for Angular, React, and even some support for VueJS. But not Vuetify.

Clearly, we were going to have to make something.

We made contact with the maintainers of both short-listed libraries to see how we could collaborate in creating a tool that would meet all of our (and hopefully, much of the wider community’s) requirements. Both maintainers were very helpful, and we had constructive discussions in both cases. In the end, we decided that the abstracted nature of the JSON Forms project was a better fit for our needs, providing a flexible platform on which we - and others - could extend. We were fortunate to receive funding from the Gordon and Betty Moore Foundation (Grant Agreement #10485) in order to accelerate this work, so we could provide a Grant Registration UI more quickly. We paid a large portion of that funding to the library maintainers, and Crossref contributed a portion of my time on the project. This allowed us to enter into an agreement with EclipseSource, the maintainers of JSON Forms, to collaboratively develop the new VueJS and Vuetify renderer library. Stefan Dirix, the lead maintainer, worked with me to build it.

We didn’t forget about Vuetify JSON Schema Form though, and by way of appreciation for their help in the early stages, Crossref made a contribution towards the continued development of that library.

JSON Forms - now with Vuetify

Work started on the JSON Forms Vuetify renderer set in September 2021 - Stefan quickly created the first early prototypes of the new form renderers - but then we had a stroke of luck. Our repository received more input from the community. The one that made us sit up and take real notice was the news that someone else had already ported the JSON Forms React renderer set to Vue/Vuetify - and was offering this as a contribution. Krasimir Chobantonov’s fantastic first contribution got merged in at the end of the month. This propelled the project forward massively, and was an early validation of the value of working in the open. Needless to say, we were very grateful. Another example of the open source value chain was that Stefan - as the maintainer - could take the time to carefully review and tidy up the incoming code, so what was merged was the product of two great developers.

Having this great head start meant we could turn our attention to one of the other big areas we wanted to get right - localisation. Traditionally, JSON Schema -generated forms have handled localisation (translation of text and adjustment of date and numerical formats) by wholesale duplication and translation of the schema. This is cumbersome, and doesn’t integrate very well with custom error messages, nor external sources of interface messages (think form labels, descriptions, placeholders). So Stefan came up with a proposal, which we accepted, to add complete i18n support to the library. We now have a mechanism by which you can hook up a translation engine of your choice, and JSON forms will use that to lookup messages, before falling back to the validator (also localised!) and finally, the JSON Schema’s defaults. This gives much stronger integration and allows the community to plug in their existing localisation methods - no wasted effort.

Since the localisation addition, we’ve been working on fine-tuning the layout engine, making bug fixes, and integrating more closely with the underlying Vuetify library. This allows developers to more easily use the existing Vuetify parameters to change the style and behaviour of their form widgets. Again, no wasted effort.

We’re lucky to have an active community - @kchobantonov continues to make great contributions and push the library forward in unexpected ways - and the library is gaining popularity, with an average of a few hundred downloads per day.

Some of our funder members have already seen this work in action, and given their feedback on early iterations of the user interface that supports registering grant records. We’ll be releasing this publicly very soon to get feedback from members - and then using that feedback to iterate on the grants registration form, and look towards extending it to other record types.

Open source POSItivity

A continuous theme throughout this project has been the willingness of people working on these open source projects to be generous with their time and experience. Whether it has been form generation libraries, the JSON Schema project or maintainers of localisation plug-ins - help, advice and encouragement have never been far away. And that’s appreciated. But it’s not something that we, or any other organisation who relies on the software they produce, should take for granted. Open source software helps everyone who uses it, and there’s a real opportunity within our community to make meaningful steps towards supporting its sustainability. Ironically, it’s often the most-used general purpose tools that get the least attention. We can change that.

Look out for more

Look out for more posts from the engineering team, coming soon!

References

JSON Binpack: A space-efficient schema-driven and schema-less binary serialization specification based on JSON Schema (Chapter 3.2.1 History and Relevance)

https://web.archive.org/web/20071026190426/http://www.json.com/2007/09/27/json-schema-proposal-collaboration/

ISR part three: Where does Crossref have the most impact on helping the community to assess the trustworthiness of the scholarly record?

Rachael Lammey — Mon, 17 Oct 2022 00:00:00 +0000

Ans: metadata and services are all underpinned by POSI.

Leading into a blog post with a question always makes my brain jump ahead to answer that question with the simplest answer possible. I was a nightmare English Literature student. ‘Was Macbeth purely a villain?’ ‘No’. *leaves exam*

Just like not giving one-word answers to exam questions, playing our role in the integrity of the scholarly record and helping our members enhance theirs takes thought, explanation, transparency, and work.

Some of the elements Amanda outlines in the previous posts in this series (Part 1, Part 2) really resonated from a product perspective:

We must be cautious that our best practices for demonstrating legitimacy and identifying deceptive behaviour do not raise already-high barriers for emerging publications or organisations that present themselves in ways that some may not recognize as professional standards. Disruption is different from deception. Crossref has an opportunity to think about how to identify deceptive actions and pair that with our efforts to bring more people on board and support their full participation in our ecosystem.

We don’t have the means or desire to be the arbiter of research quality (whatever that means). However, we operate neutrally, at the center of scholarly communications, and we can help develop a shared consensus or framework. Our metadata elements and tools can be positioned to signal or detect trustworthiness. An important distinction is that we can play a role in assessing legitimacy (activities of the actors) but not in quality (calibre of the content itself).

Crossref has lots of plans (and lots to do) to improve our role in ISR

Rather than a long list of things we want to do in terms of tools, services, and functionality, it feels more manageable to break this work into three key areas.

1. Collecting better information in better ways

We think many elements of the metadata our members record with us help expose important information about the research, e.g., authors, publication dates, and abstracts. We also help our members assess submissions for originality via our Similarity Check service, and the ongoing migration to iThenticate V2 aims to better support this aspect of the publication process.

Beyond this, as Amanda points out, ‘once members start registering their content, their metadata speaks about their practices’. Seeing who published a work along with the metadata they provide; validated ORCID IDs to identify the authors, reference lists and links to related research and data, and important updates to the work via Crossmark, all contribute to showing not just the ‘what’ but the ‘how’ so that the community can use that information to support their decision-making.

I always want to stress that this work is not just an ‘ask’ for our members. We are moving in the same direction as we improve the things we do to support organisations in registering their records with us, answering their questions, working with partner organisations like PKP, consulting with our community on pain points, and thinking about how we can better enhance and facilitate their work. We’ve been fortunate that our community has taken the time to engage in discussions with Turnitin on iThenticate improvements, do user testing sessions as we build simple user interfaces to record grants, lead calls and conversations on improving grant metadata and supporting the uptake of ROR and data citation, and provide thoughtful feedback on our recent preprint on CRE metadata. This all helps us to explain, structure, and prioritize our product work.

There are also some closely related R&D-led projects that are already informing our thinking:

A more responsive version of participation reports so that it’s easier for members to identify gaps in their metadata and compare against others.
Making it easier to get metadata back in a format where members can easily redeposit it.
Better matching to help us and our members augment the metadata they send us to add value to the work we all do.

We said in the previous blog posts that we’ll pose questions about what kinds of metadata give what kind of levels of trustworthiness, and have previously highlighted the following activities:

Reporting corrections and retractions through Crossmark metadata. We know that our members are collecting this information, but often it isn’t making it through metadata workflows to us. We’re part of the NISO CREC (Communication of Retractions, Removals, and Expressions of Concern) working group with many of our members and metadata users, as this feels like something critical to address.
Assessing originality using Similarity Check. On average, we’re seeing 320 new Similarity Check subscribers each year, with over 10 million checks being done each year by our members.
Establishing provenance and stakeholders through ORCID and ROR. At the time of writing, we have over 30,000 ROR IDs in Crossref, and this is growing steadily across different record types. ROR is keen to support adoption and so are we.
Acknowledging funding and other support through the use of the Open Funder Registry and registering grants metadata. This has improved in quality and completeness since we launched the Funder Registry in 2014 and with more comprehensive support for grants in more recent years. But we still have work to do, as this paper by Kramer and de Jonge points out: The availability and completeness of open funder metadata.
Citing data for transparency and reproducibility, including linking to related research data. Scholix, MDC and STM Research Data groups.
Demonstrating open peer review by registering peer review reports. Members have already recorded over 300,000 peer reviews with Crossref, opening up this information on their processes.

In your organisation, what weight do you give these? We know that some of our members register some of these things in more volume than others - is that due to their perceived value, technical limitations, or ‘we’re working on it, give us time?’ Do you think of them in the context of the integrity of the record or are we off the mark? Are there other things we haven’t mentioned in this blog that we could capture, report on and highlight?

2. Disseminating this information and supporting its downstream use

We want to make it as easy as possible for everyone to access and use the metadata our members register with us. Especially as some of the biggest metadata users are our members and, more selfishly, us! But there’s no point collecting metadata to support ISR if it’s unwieldy and difficult to access and use.

We’re working on a project, described in the mid-year community update by a number of my colleagues to break down internal metadata silos and model it in a more flexible way. This will lend itself to better information collection and exchange, and support of the Research Nexus by building a relationships API to let anyone see all of the relationships Crossref can see between a given work and well, anything else related to it (citations, links to preprints, links to data to name but a few).

Part of that work will involve supplementing the metadata our members register with high-quality, curated data from selected sources, making it clear where those assertions have come from.

We want our API to perform consistently and well, to contain all the metadata our members register, handle it appropriately, and be able to keep the information in it up-to-date.

Our API will underpin the reports we provide our members (among other things) so that we can provide simple interfaces for organisations to check how they’re doing along with more functional requests. Do their DOIs resolve? Are they submitting metadata updates when they publish a correction? How much will they be billed in a given quarter? We have a lot of internal reporting and need to build more, and if we want to use these, chances are many others do too, so we should open those up.

3. Trying to live up to POSI to underpin this work

When I see a new project, initiative, tool or service in the research ecosystem the first thing I want to do is find out about the organisation itself so that I can base some decisions on that. Lateral reading in action.

At Crossref, we want to show who we are beyond just our tools, services, and products and be transparent about our values. That’s why we have adopted the Principles of Open Scholarly Infrastructure or POSI for short. Now we need to meet these principles and we’re working towards that. POSI proposes three areas that an Open Infrastructure organisation like Crossref can address to garner the trust of the broader scholarly community: accountability (governance), funding (sustainability), and protection of community interests (insurance). POSI also proposes a set of concrete commitments that an organisation can make to build community trust in each area.

So POSI isn’t just opening code and metadata, it’s telling our community how we handle membership, governance, product development, technical and financial stability and security, holding our hands up when we’ve got something wrong, and actively looking to improve upon the things we do.

Are you still reading? If so, you’ve done better than many of my examiners, I’m sure. So stay with us as we work together to ensure we bring quality, transparency, and integrity to the work we all do.

The next part in this series will report back on the feedback and discussions and potentially propose some new or adjusted priorities. Join us at the Frankfurt bookfair this week (hall 4.2, booth M5) or comment on this post below.

ISR part two: How our membership approach helps to preserve the integrity of the scholarly record

Amanda Bartell — Mon, 10 Oct 2022 00:00:00 +0000

In part one of our series on the Integrity of the Scholarly Record (ISR), we talked about how the metadata that our members register with us helps to preserve the integrity of the record, and in particular how ’trust signals’ in the metadata, combined with relationships and context, can help the community assess the work.

In this second blog, we describe membership eligibility and what you can and cannot tell simply from the fact that an organisation is a Crossref member; why increasing participation and reducing barriers actually helps to enhance the integrity of the scholarly record; and how we handle the very small number of cases where there may be a question mark.

Who can become a Crossref member and do we check new applicants?

Membership is open to organisations that “produce professional and scholarly materials and content”, and this is deliberately defined broadly. We’re a global community of members with content in all disciplines, in many formats, with all kinds of business models - research institutions, publishers, government agencies, research funders, banks, museums and many more.

Essentially, if your content is likely to be cited in the research ecosystem and you consider it part of the evidence trail, then you’re eligible to join.

We ask organisations to complete an online application form and accept our member terms. On receipt of the application, we run a few very basic checks to ensure that:

The applicant can meet the membership criteria and seems to have the capacity to fulfill the obligations (and follow our code of conduct).
We are legally permitted to accept them as a member (for example, we can’t accept applications from some countries due to sanctions.
They haven’t previously been a member of Crossref whose membership was revoked.
They haven’t misrepresented themselves in the application (such as their location).
The applicant or an affiliate is not already a member of Crossref (so that we can advise they join under a single membership fee).

As long as the applicant can meet these requirements, and as long as they are able to pay any membership fees upfront for their first year of membership, they are able to become a Crossref member, get a DOI prefix, and start registering their metadata to share it with the global scholarly community.

We are aware that some organisations in some regions may not be able to join Crossref independently. There may be barriers for them - the cost of membership fees, the fact that we only accept payment in US dollars, language barriers or technical barriers.

To help increase participation globally, we work with sponsors in some regions. All sponsors facilitate membership for organisations who wish to participate in Crossref. They pay one central membership fee on behalf of all the members they work with, and they also pay content registration fees on behalf of their members. Many sponsors register content on behalf of their members, and even if they don’t, most provide local language and technical support. Sponsors are able to charge for their services, but it can be a very economical route for a member to join. In the last year, out of the 2,322 new members that we’ve welcomed, almost 58% joined via a sponsor.

We also waive registration fees for members in certain lower income countries who join via three of our sponsors, and we are planning to expand this program soon (pending board approval in November). [EDIT 2022-November-23: The new Global Equitable Membership (GEM) Program was approved and takes effect 1st January 2023]

The importance of keeping barriers to entry low

As you can see, the checks that we run on new applicants are fairly limited in scope. In the last year, we’ve welcomed 2,322 new members and we only declined 39 applications. And 34 of these declined applications were effectively from one organisation whose membership was revoked in 2019.

Even this minimal set of checks takes a lot of research and keeps our member support specialists very busy - thank you Sally Jennings and Robbykha Rosalien (as well as contractors Kim and Collin).

So why shouldn’t we run more extensive checks on new member applicants? Why don’t we check the quality of their content, or that they are following best practices? Why don’t we decline membership for organisations that can’t demonstrate editorial integrity or that aren’t meeting 100% of the membership obligations from the start?

Nevermind the additional capacity that more extensive checks on the over 200 applicants we receive per month would entail, it’s more fitting with our mission to:

enable equitable participation; and
focus on evidence:

Equitable participation

Inclusivity is very important to us - after all, one of our organisational truths (the guiding principles for everything we do) is “come one, come all”, and this is mirrored in the POSI principles that commit us to broad stakeholder representation. We know that for new organisations, it may take them a while to be able to completely fulfil the membership obligations. We support them with information to help them understand what being a participant in the Crossref community entails. These organisations would have less of a chance of developing better practices if we were to limit membership in Crossref to ‘proven’ candidates. Besides, it would introduce a race condition; if joining and sharing metadata through Crossref is widely considered best practice, new entrants need to join Crossref in order to show that they are adopting best practices.

Trust signals and the Research Nexus

Secondly, it’s not our role to make such a call; we don’t have the expertise to decide if an organisation would be considered “good” at what they are producing; there are other organisations guiding in this area, such as with the Principles of Transparency and Best Practice in Scholarly Publishing. Instead, we focus on the decision-making tools, metadata, and relationships that can help provide trust signals for the community.

Once members start registering their content, their activity and metadata speak about their practices – others in the community can process that metadata, combined with its wider context, and identify trust signals to make their own decisions. That metadata can only be shared in an open and machine-readable way if an organisation joins Crossref and starts registering their records and underpinning data with us.

To paint a more detailed picture of the scholarly record, our priority is to get more and varied organisations contributing to the research nexus, rather than putting up barriers and blockers until they are performing perfectly. If they aren’t acting in the best interests of the scholarly community, then having the metadata available to assess will quickly make that obvious and hopefully encourage changes - sunlight being the best disinfectant, as the saying goes.

As we said in the first ISR blog:

“Crossref itself doesn’t assess the quality of content or the integrity of the research process but rather enables those who produce scholarly outputs to provide metadata (effectively evidence) about how they ensure the quality of content and how the outputs fit into the scholarly record.”

In our next post in the series, we’ll talk more about the workflow and decision-making tools we have in place and are planning to develop. We’ll pose questions about what kinds of metadata give what kind of levels of trustworthiness.

Helping new members become “good Crossref citizens”

Once an applicant becomes a member, we help them to completely fulfil the membership terms - ensuring that, for example, they register and display DOIs, keep their metadata up to date, and implement reference linking properly.

We have a lot of documentation on our website, we run regular events and webinars, and we have a series of automated onboarding emails for new members to help them move through the key stages of the member journey from set up and onboarding to levelling up and using additional services like Crossmark and Similarity Check. Our staff are also on hand alongside Ambassadors and other members in our Community Forum. Speaking of POSI (and transparent operations) we receive around 3,000 emails per month with support requests so we are gradually moving support from closed 1:1 email to the more public and efficient community support forum.

We work with members who aren’t fulfilling the obligations to understand challenges and help explain what they need to do. This is currently reactive, but we have plans to automate checks on whether members are meeting the membership terms in future.

Outside of confirming that our members are behaving as “good Crossref citizens”, there aren’t many other areas where the membership team typically gets involved. Our mission is to help preserve the integrity of the scholarly record by making the metadata provided by our members openly available in a machine-readable format. We don’t investigate our members’ business practices or take a deep dive into their editorial processes (such as peer review), and there are many areas where we aren’t able to get involved. For example, we cannot arbitrate title ownership disputes.

It’s all about preserving the integrity of the scholarly record

We do sometimes revoke membership, but this is for limited reasons:

unpaid invoices;
legal sanctions or judgments against the member or its home country; or
contravention of the membership terms.

Membership revocation due to unpaid invoices

We spend a lot of time communicating with members who haven’t paid their invoices and ensuring they have the information they need to solve the problem. Revoking membership due to unpaid fees is an absolute last step for us, but financial sustainability means we can keep the organisation afloat and keep our infrastructure running.

Where members have unpaid fees, we eventually suspend their access to register new records and then ultimately revoke their membership if the fees remain unpaid. Once an organisation’s membership has been revoked, they would need to re-apply if they wanted to become a member again in the future. If accepted, the applicant would need to pay all outstanding invoices before re-joining.

In March 2022, we revoked membership for around 140 members due to unpaid invoices (out of a total of over 17,000 active members).

Membership revocation due to sanctions

Occasionally, we are informed of sanctions that we need to comply with, such as the recent case of Russia invading Ukraine where each Russian member needed to be checked for individual sanctions and some were revoked. Such revocations have to be voted on by the Executive Committee and then ratified by the board. Read more information on our sanctions process.

Membership revocation for cause

Very occasionally there may be evidence that a member is in contravention of the membership terms. This may include:

Misrepresentation in the original membership application
Fraudulent use of identifiers or metadata
Contravening the code of conduct
Any other basis set forth in our governing documents.

We always try to work together with the member to solve problems, and again, revoking membership is an absolute last step. The revocation has to be voted on by the Executive Committee and then ratified by the board.

Our first ever revocation for cause was in July 2019 for OMICS, after the board voted that the US Federal Trade Commission’s ruling against them amounted to a cause for revocation. There have been a handful of cases since. For example, most recently in September this year we revoked membership for a member who was registering DOIs for journals with the ISSNs of similarly-named publications.

There’s more information about our processes to revoke membership on our website.

More participation for the win

In conclusion, we believe that the more parties able to participate in Crossref and provide metadata and context for the research nexus, the more robust this makes the scholarly record.

But do you agree? Are these measures enough? What other information about our membership operations would help us be more transparent? As we said in our first blog, we need your help to establish whether our approach is still the right one, if we are missing anything and what else we might be able to do.

Here’s how you can help:

Join the discussion about the integrity of the scholarly record on our community forum.
Keep an eye out for future blog posts and meetings. We are having a small, in-person discussion prior to the Frankfurt Book Fair and will report on this in a future blog post.
Sign up to attend Crossref LIVE22 for updates on these topics and all things Crossref.
Join and support initiatives and organisations that we partner with or who use our metadata to look at ethical practices, for example, COPE, DOAJ, and OASPA, and review the Principles of Transparency in Scholarly Publishing, which these organisations worked on with WAME.

ISR part one: What is our role in preserving the integrity of the scholarly record?

Amanda Bartell — Thu, 22 Sep 2022 00:00:00 +0000

The integrity of the scholarly record is an essential aspect of research integrity. Every initiative and service that we have launched since our founding has been focused on documenting and clarifying the scholarly record in an open, machine-actionable and scalable form. All of this has been done to make it easier for the community to assess the trustworthiness of scholarly outputs. Now that the scholarly record itself has evolved beyond the published outputs at the end of the research process – to include both the elements of that process and its aftermath – preserving its integrity poses new challenges that we strive to meet… we are reaching out to the community to help inform these efforts.

Scholarly research, and therefore scholarly communications, are rapidly changing with the development of new approaches, technologies, and models. We need open scholarly infrastructure that can adapt to these changes and provide trust signals that enable assessment of the integrity of the research and reflect the ways that research is changing. Crossref has been changing and adapting by building on the concept of the scholarly record with our vision of the Research Nexus:

“a rich and reusable open network of relationships connecting research organisations, people, things, and actions; a scholarly record that the global community can build on forever, for the benefit of society”.

The foundation of the scholarly record and Research Nexus is metadata and relationships - the richer and more comprehensive the metadata and relationships in Crossref records, the more context there is for our members and for the whole scholarly research ecosystem. This will lead to a range of benefits from better discovery and saving researchers time to the assessment of research impact and research integrity. This is why Crossref is focused on enriching metadata to provide more and better trust signals while keeping barriers to membership and participation as low as possible to enable an inclusive scholarly record.

We want to engage with the community to emphasise this role, share our plans for the future, and get feedback to establish if we are heading in the right direction.

This blog explains our current position and will be followed by subsequent posts exploring all our services and plans in this area, as well as more details on our membership operations and policies.

What is “Integrity of the Scholarly Record” (ISR), and how does it feed into Research Integrity?

The US National Institutes of Health (NIH) defines research integrity as a set of values in scientific research: honesty; accuracy; efficiency; and objectivity. It’s concerned with the soundness of the process of science. As a subset of that, the outputs of the scholarly publishing process create a “scholarly record” which allows those in the community to find evidence and context to help confirm whether these values have been adhered to. The scholarly record is Crossref’s focus. This means that Crossref itself doesn’t assess the quality of content or the integrity of the research process but rather enables those who produce scholarly outputs to provide metadata (effectively evidence) about how they ensure the quality of content and how the outputs fit into the scholarly record (through reference links, ORCID iDs for authors, ROR IDs for affiliations, funding and licensing information, etc.).

Crossref members include any organisation that produces research objects and materials (publishers, societies, universities, funders, research institutions, scholars) so they can establish a persistent record—tied to a persistent and unique identifier—for these outputs and supply metadata about this content in an open, machine-readable way. Maintaining this record for the long term, and adding in an important layer of context, establishes the integrity of the scholarly record as well as ensuring it is something that can be used by the whole community to improve scholarly research for generations to come.

The scholarly record is about more than just published outputs - it’s also a network of inputs, relationships, and contexts

In the past, the Scholarly Record was seen as just the published outputs at the end of the research process - for example, journal articles or book chapters. But as the OCLC Research Group notes in their 2014 report on The Evolving Scholarly Record:

“The boundaries of the scholarly record are in flux, as they stretch to extend over an ever-expanding range of materials.”

OCLC describes how outputs at the “process” and “aftermath” stages of the research process are becoming increasingly important alongside the outputs at the traditional “outcomes” stage.

We like to take this even further. We think the evolving Scholarly Record is about more than just recording different types of works. As the above report notes “The scholarly record is evolving to have greater emphasis on collecting and curating context of scholarly inquiry […] One can imagine an article in quantitative biology published in a Wiley journal, the data for which resides in Dryad; the e-print in arXiv; and the conference poster in F1000. All of these materials may be considered part of the scholarly record, but no single institution will collect them all. Instead, access is achieved through a coordination of stewardship roles in which the scholarly record is decomposed into discrete, interrelated units that organisations specialize in collecting, preserving, and making available.”

It’s this interrelatedness that we think is important, and Crossref plays an important role in collecting, matching, and sharing those relationships. We now focus on this ‘nexus’ - so no longer primarily the different types of objects, but increasingly the interplay and relationships between them. The context, rather than the individual metadata elements, is what’s key.

Martin Eve explores this idea further in his blog What is the Scholarly Record, suggesting “the scholarly record is a decentralized network of evolving truth assertions” and “Whether a truth assertion is part of the scholarly record is determined by another set of distributed assertions and their power configurations (say, through institutional affiliation) of the individuals who make such assertions.”

Barbara Fister’s excellent talk about the importance of lateral reading as a way to understand information systems discusses how professional fact checkers “engaged in “lateral reading,” check other sources for context before spending time reading and analyzing a source.”

Fister highlights the “SIFT” approach from A Curriculum for Civic Online Reasoning, created by a group of educators at Stanford University for students to evaluate online content. And she argues that this approach is also useful for assessing scholarly materials noting

“The networked, social nature of scholarship is worth making explicit”.

Where does Crossref fit in? Where do we have the most impact and opportunity?

To address the question of our role in the integrity of the scholarly record, we need to understand several aspects that Crossref has to balance in this capacity, such as

We don’t have the means or desire to be the arbiter of research quality. However, we operate neutrally, at the centre of scholarly communications, and we can help develop a shared consensus or framework. Our metadata elements and tools can be positioned to signal or detect trustworthiness. An important distinction is that we can play a role in assessing legitimacy but not in assessing quality.
We must be cautious that our best practices for demonstrating legitimacy and handling less-than-legitimate behaviour do not raise already-high barriers for emerging publications or organisations that present in ways that some may not recognise as professional standards. Disruption is different from deception. In discussions with our board this point has come out strongly: that Crossref has an opportunity to think about how to help the community identify deceptive actions and pair that with our efforts to bring more people on board.
Addressing this issue may involve changes to our membership eligibility and processes, bylaws, policies, staff resources, and technical and metadata solutions; actually, a combination of all these aspects. Many of these are projects that are already planned and we have ideas for extending these.
We regularly review the process we use for evaluating when and why to revoke membership for reasons other than non-payment. The volume of cases that we believe justify membership revocation—while a tiny fraction of members—is growing and does take staff and legal resources to address.

Crossref and our members aleady help preserve the integrity of the scholarly record in significant ways

Almost all of our services in some way touch on enabling people to express and evaluate trustworthiness; our mission statement commits us to “making research objects easy to find, cite, link, assess, and reuse […] all to help put research in context.”

We have, of course, specific tools and services that augment this activity too. Many members are active in:

Reporting corrections and retractions through Crossmark metadata.
Assessing originality using Similarity Check.
Conveying their stewardship via the public participation report.
Establishing provenance and stakeholders through funding metadata, ORCID, and ROR.
Acknowledging funding through the use of the Open Funder Registry and registering grants metadata.
Citing data for transparency and reproducibility, including linking to related research data via Event Data.
Demonstrating open peer review by registering peer review reports.

As recently concluded in this Nature editorial calling for us to think beyond open references,

“Depositing all relevant metadata in Crossref should become the norm in scholarly publishing.”

For those members just starting out on their journey, there are some immediate specific things that all members are able to do. Check your participation report and start registering more metadata to add that contextual layer:

References
Abstracts
Corrections and retractions via Crossmark
License links
ORCID IDs for authors
ROR IDs for affiliations
Grant IDs for funding acknowledgements
Cite data (preferably using DataCite DOIs in reference lists)
Register all related objects such as versions and translations via relationships
Register grants with Crossref (funder members).

By enabling our members to register their research objects and create metadata records about them that are freely and openly shared with the scholarly community, we facilitate them in being able to communicate the context and trustworthiness of that object.

And within that metadata, they can create relationships not just between research objects and also between research stakeholders - the individuals, affiliations, funders, and other players involved. That’s why we work so closely with other parts of foundational scholarly infrastructure (ORCID, DataCite, ROR) and why we now have more than 30 funders registering grants with us. We want to help to capture, identify, and link together all these important elements and more to deliver context for the scholarly record.

We started this blog by talking about the changes that are taking place in the world of research and how the infrastructure needs to adapt and change. Although we have extensive plans in place to improve our contribution to ISR, we need your help to establish whether our role is still the right one, whether we are missing anything and what else we might be able to do.

Join the discussion about the integrity of the scholarly record, and the Research Nexus on our Community Forum.
Keep an eye out for future blog posts and meetings. We are having a small, in-person discussion prior to the Frankfurt Book Fair and will report on this in a future blog post.
Sign up to attend Crossref LIVE22 for updates on these topics and all things Crossref.
Join and support initiatives and organisations that we partner with or who use our metadata to look at ethical practices in publishing, for example, COPE, DOAJ, and OASPA, and review the Principles of Transparency in Scholarly Publishing, which these organisations worked on with WAME.

In the coming weeks, we will post more about our product and metadata plans and also about the specifics of membership operations and cases we see and how we’re currently addressing them.

2022 Board Election

Lucy Ofiesh — Fri, 16 Sep 2022 00:00:00 +0000

I’m pleased to share the 2022 board election slate. Crossref’s Nominating Committee received 40 submissions from members worldwide to fill five open board seats.

We maintain a balance of eight large member seats and eight small member seats. A member’s size is determined based on the membership fee tier they pay. We look at how our total revenue is generated across the membership tiers and split it down the middle. Like last year, about half of our revenue came from members in the tiers $0 - $1,650, and the other half came from members in tiers $3,900 - $50,000. We have four large member seats and one small member seat open for election in 2022.

The Nominating Committee presents the following slate.

The 2022 slate

Tier 1 candidates (electing one seat):

eLife, Damian Pattinson, Executive Director
Pan Africa Science Journal, Oscar Donde, Editor in Chief

Tier 2 candidates (electing four seats):

Clarivate, Christine Stohn, Director of Product Management
Elsevier, Rose L’Huillier, Senior Vice President Researcher Products
The MIT Press, Nick Lindsay, Journals and Open Access Director
Springer Nature, Anjalie Nawaratne, VP Data Transformation & Chief Business Architect
Wiley, Allyn Molina, Group Vice President, Research Publishing

Here are the candidates’ organisational and personal statements

You can be part of this important process by voting in the election

If your organisation is a voting member in good standing of Crossref as of September 6th, 2022, you are eligible to vote when voting opens on September 20th, 2022.

How can you vote?

Your organisation’s designated voting contact will receive an email the week of September 19th with the Formal Notice of Meeting and Proxy Form with concise instructions on how to vote. You will also receive a username and password with a link to our voting platform.

The election results will be announced at the LIVE22 online meeting on October 26th, 2022. Save the date! Incoming members will take their seats at the March 2023 board meeting.

Accessibility for Crossref DOI Links: Call for comments on proposed new guidelines

Jennifer Kemp — Tue, 06 Sep 2022 00:00:00 +0000

Our entire community – members, metadata users, service providers, community organisations and researchers – create and/or use DOIs in some way so making them more accessible is a worthy and overdue effort.

For the first time in five years and only the second time ever, we are recommending some changes to our DOI display guidelines (the changes aren’t really for display but more on that below). We don’t take such changes lightly, because we know it means updating established workflows. We appreciate the questions that prompted us to make this recommendation and we know it’s critical that we get community input on the proposed updates.

TL;DR

Here is a quick overview:

DOIs and URLs themselves don’t really tell readers much. People with visual impairments rely on screen readers to read out loud the contents of a page. We’re asking for the title of each DOI to be added, in an ARIA (Accessible Rich Internet Applications) attribute, so these users understand what these links are for.
Accessible text, as this kind of description is known, should be included for all links, but at this time, we’re specifically recommending it for landing pages of newly registered records.
It’s not required, yet. We’re proposing a 2 year recommendation period and we want your feedback on the particulars, including timing and how we can help. Please take a short survey and/or get in touch and share your thoughts.
We’ll finalize these recommendations after assessing the feedback. Please check back for updates.

What is changing, when and why

The proposed updates are meant to improve overall usability, particularly for people with visual impairments, by aligning our guidelines with modern accessibility requirements such as the new W3C recommendations and the European Accessibility Act. This means that assistive technologies such as screen readers can interpret DOI links.

Why are changes being recommended?

DOIs are unique and persistent links to items in the scholarly record so it makes sense that they link to the full URLs for the associated content –for example, a journal article. The issue for people who rely on screen readers is that a DOI link doesn’t provide title or other information to give that link context. Users of screen readers need to know what the destination of a link is.

These users often lack the context that other users have; in fact, they may be presented with links in a document as a list. That’s why all links, not just DOI links, need what is called “accessible text.” Providing additional information for links requires ARIA (Accessible Rich Internet Applications) techniques. This speaks to the Web Content Accessibility Guidelines (WCAG), the standard guidelines for accessibility across the web, specifically success criterion 2.4.4 - Link Purpose (In Context), which aims to ‘help users understand the purpose of each link so they can decide whether they want to follow the link.’

For your feedback: recommended draft changes

We recommend the addition of an aria-label attribute for DOI links, containing as its value the descriptive title of the content represented by the DOI, so that screen readers can interpret DOI links. This means that, while the DOI display itself doesn’t actually change, the link is enhanced with additional, contextual information for the user of assistive technology, in one of two ways, either:

an aria-label attribute, described as ‘a way to place a descriptive text label on an object,’ identifying the destination, or
an aria-describedby attribute pointing to where the destination is identified in the surrounding text.

The updated HTML for a journal article*, for example, would be:

<a href="https://doi.org/10.5555/12345678" aria-label="DOI for Toward a Unified Theory of High-Energy Metaphysics: Silly String Theory">https://doi.org/10.5555/12345678</a>

Here the aria-label has been set to the value of the ‘title’ property as retrieved from the Crossref REST API at https://api.crossref.org/v1/works/10.5555/12345678.

*Note that fields may vary slightly for different record types.

This proposed solution allows screen readers to read aloud to users the value of the aria-label attribute, instead of the full DOI in the link text.

At this time, we are recommending the change for landing pages in particular, but it can and should be applied to wherever DOI links appear, whenever feasible (more on this below).

Our guidelines will continue to state that the DOI should always be displayed as a full URL link–that will not change. Neither will content registration–we are not asking for additional information in your deposits.

It’s not perfect, but it’s very worthwhile

This recommendation has some limitations worth noting but it must be said that there is no perfect solution.

DOI links appear in lots of places - PDFs for one notable example. We reviewed and tested the recommendation with Bill Kasdorf, Principal, Kasdorf & Associates, LLC, Richard Orme, CEO, DAISY Consortium, and George Kerscher, Chief Innovations Officer, DAISY Consortium-Senior Officer, Global Literacy, Benetech, who graciously provided their time and expertise. EPUBs and websites proved to be easy to update; other formats, notably PDFs, less so. Widespread adoption of accessible DOIs is so important and we don’t want confusion or frustration to get in the way of making progress. We support and welcome efforts to include an ARIA attribute wherever DOI links appear, but we recommend focusing on landing pages, for now.

Patrick Vale, Crossref Senior Front End Developer, explains that:

”DOI links serve a very specific purpose: to provide the persistent link to an item in the scholarly record. And as such, they present an unusual set of requirements when balancing accurately presenting the information they encode - the persistent link - and making that link accessible, and understandable. With these proposed changes, we hope to strike this balance.“

We know it will be a challenge (more on that below) but we think it’s absolutely a worthwhile effort. Indeed, we are undertaking a project to update our own website to meet these recommendations and to review overall accessibility.

As Bill Kasdorf notes:

“Most people have no idea how many people with visual impairments there are. Not only is it unfair to those people not to provide accessible text for links, the authors and publishers of the linked resource are missing a lot of readers. This update is a great move by Crossref, and every bit aligned with its mission to make scholarly content discoverable and consumable.”

We propose the following timeline, also for your feedback

Once finalized, following community feedback, the updated guidelines will be issued as a recommendation for a suggested period of two years starting next year, 2023. Beginning in 2025, the changes will be required for landing pages of newly registered content (and strongly recommended for existing registered content). Feedback on this approach and timeline is also encouraged.

Help us help you

We are conscious that adding descriptive information to DOI links places a significant responsibility on the members and Service Providers creating and hosting these links. Therefore, we are also considering the creation of a tool to help with implementation. Initial discussions suggest this could be a JavaScript helper tool, which could be included on member websites. We also welcome feedback as to how such a tool might be implemented, and how it would best integrate with existing sites and workflows.

Call for comments - by 1st November

We hope that this proposal is a welcome one and that the timing is good for moving forward together toward greater accessibility of the scholarly record. We welcome questions, feedback and suggestions through 1st November via the survey below or by email to feedback@crossref.org

Small changes, big impact

We’re excited to make changes that improve accessibility and we look forward to the community’s response to our proposal. We will share aggregated feedback in an updated post later this year.

A note on language

Multiple sources were consulted to find the most appropriate and inclusive term(s) for users of screen readers in this context. “Print disabled,” for example, seemed to be a good candidate but was ultimately deemed likely to be confusing to a very global publishing audience, who often don’t physically print anything. Sources differ slightly, for example between the US and UK and of course, this English text may well be translated into other languages. Feedback on the terms used here is also very welcome.

Additional resources

The Inclusive Publishing Hub (DAISY Consortium)
National Center on Disability and Journalism (Arizona State University, US)
Inclusive Language guidance (UK government)
The American Psychological Association (APA) Bias-Free Language Disability Guide
The Open Access Books Network (OABN)

Martin Paul Eve is joining our R&D group as a Principal Developer

Geoffrey Bilder — Fri, 26 Aug 2022 00:00:00 +0000

I’m delighted to say that Martin Paul Eve will be joining Crossref as a Principal R&D Developer starting in January 2023.

As a Professor of Literature, Technology, and Publishing at Birkbeck, University of London- Martin has always worked on issues relating to metadata and scholarly infrastructure. In joining the Crossref R&D group, Martin can focus full-time on helping us design and build a new generation of services and tools to help the research community navigate and make sense of the scholarly record.

Martin himself explains the logic of this move on his own blog, so I won’t attempt to do the same here other than to say:

praxis makes perfect.

(mic drop)

Created with DALL·E, an AI system by OpenAI with the the prompt: ‘A bookwheel in the style of the 16th-century illustration by Agostino Ramelli and where the books are replaced by open laptops’

Flies in your metadata (ointment)

Isaac Farley — Mon, 25 Jul 2022 00:00:00 +0000

Quality metadata is foundational to the research nexus and all Crossref services. When inaccuracies creep in, these create problems that get compounded down the line. No wonder that reports of metadata errors from authors, members, and other metadata users are some of the most common messages we receive into the technical support team (we encourage you to continue to report these metadata errors).

We make members’ metadata openly available via our APIs, which means people and machines can incorporate it into their research tools and services - thus, we all want it to be accurate. Manuscript tracking services, search services, bibliographic management software, library systems, author profiling tools, specialist subject databases, scholarly sharing networks - all of these (and more) incorporate scholarly metadata into their software and services. They use our APIs to help them get the most complete, up-to-date set of metadata from all of our publisher members. And of course, members themselves are able to use our free APIs too (and often do; our members account for the vast majority of overall metadata usage).

We know many organisations use Crossref metadata. We highlighted several different examples in our API case study blog series and user stories. Now, consider how errors could be (and often are) amplified throughout the whole research ecosystem.

While many inaccuracies in the metadata have clear consequences (e.g., if an author’s name is misspelled or their ORCID iD is registered with a typo, the ability to credit the author with their work can be compromised), there are others, like this example of typos in the publication date, that may seem subtle, but also have repercussions. When we receive reports of metadata quality inaccuracies, we review the claims and work to connect metadata users with our members to investigate and then correct those inaccuracies.

Thus, while Crossref does not update, edit, or correct publisher-provided metadata directly, we do work to enrich and improve the scholarly record, a goal we’re always striving for. Let’s look at a few common examples and how to avoid them.

Pagination faux pas

First page marked as 1

In the XML registered

<pages>
<first_page>1</first_page>
<last_page>1</last_page>
</pages>

https://api.crossref.org/works?filter=type:journal-article&select=DOI,title,issue,page&sample=100

Other pagination errors

In the XML registered

<item_number item_number_type="article-number">1</item_number>

In the XML registered

<pages>
<first_page>121-123</first_page>
<last_page>129</last_page>
</pages>

Author naming lapses

Examples: Titles (Dr., Prof. etc.) in the given_name field; Suffixes (Jr., III, etc.) in the surname field; superscript number, asterisk, or dagger after author names (usually carried over from website formatting that references affiliations); full name in surname field

In the XML registered

<contributors>
<person_name sequence="first" contributor_role="author">
<given_name>DOCTOR KATHRYN</given_name>
<surname>RAILLY</surname>
</person_name>
<person_name sequence="additional" contributor_role="author">
<given_name>DOCTOR JOSIAH S.</given_name>
<surname>CARBERRY</surname>
</person_name>
</contributors>

<contributors>
<person_name contributor_role="author" sequence="first">
<surname>Mahmoud Rizk</surname>
</person_name>
<person_name contributor_role="author" sequence="additional">
<surname>Asta L Andersen(</surname>
</person_name>
</contributors>

organisations as authors slip-ups

Examples: The contributor role for person names is for persons, not organisational contributors, but we see this violated from time to time. Unfortunately, no persons are being credited with contributing to content that have these errors present in the metadata record.

In the XML registered

<contributors>
<person_name sequence="first" contributor_role="author">
<surname>Society</surname>
</person_name>
</contributors>

<person_name contributor_role="author" sequence="first">
<given_name>University of Melbourne</given_name>
<surname>University of Melbourne</surname>
</person_name>
</contributors>

Null no-nos

Examples: Too many times we see “N/A”, “null”, “none” in various fields (pages, authors, volume/issue numbers, titles, etc.). If you don’t have or know the metadata, it’s better to omit it for optional metadata elements than to include inaccuracies in the metadata record.

In the XML registered

<journal_volume>
<volume>null</volume>

<pages>
<first_page>null</first_page>
<last_page>null</last_page>
</pages>

<person_name sequence="first" contributor_role="author">
<given_name>Not Available</given_name>
<surname>Not Available</surname>
</person_name>
<person_name sequence="additional" contributor_role="author">
<given_name>Not Available</given_name>
<surname>Not Available</surname>
</person_name>

Where to go from here?

One thing we’ve said throughout this blog that we’ll reiterate here is: accurate metadata is important. It’s important in itself, and the metadata registered with us is heavily used by many systems and services, so think Crossref and beyond. In addition to that expanding perspective, there are practical steps members and metadata users can take to help us:

As a member registering metadata with us:

make sure we have a current metadata quality contact for your account and update us if there’s a change
if you receive an email request from us to investigate a potential metadata error, help us
if you do not know what to enter into a metadata element or helper tool field, please leave it blank; perhaps some of the examples of errors within this blog were placeholders that the responsible members intended to come back to - to correct in time; that’s also a practice to avoid
if you find a record in need of an update, update it - updates to existing records are always free (we do this to encourage updates and the resulting accurate, rich metadata, so take advantage of it).

As a metadata user:

if you spot a metadata record that doesn’t seem right, let us know with an email to support@crossref.org and/or report it to the member responsible for maintaining the metadata record (if you have a good contact there)
if you’re eager to confirm the last update of a metadata record, our REST API is a great resource; here’s a handy query to use as a starting point: this one returns records on our Crossref prefix 10.5555 that have been updated in 2022: https://api.crossref.org/prefixes/10.5555/works?rows=500&filter=from-update-date:2022-01-01,until-pub-date:2022-12-31&mailto=support@crossref.org

Making connections between research objects is critical, and inaccurate metadata complicates that process. We’re continually working to better understand this, too. That’s why we’re currently researching the reach and effects of metadata. Our technical support team is always eager to assist in correcting errors. We’re also keen on avoiding those mistakes altogether, so if you are uncertain about a metadata element or have questions about anything included in this blog post, please do contact us at support@crossref.org. Or, better yet, post your question in the community forum so all members and users can benefit from the exchange. If you have a question, chances are others do as well.

How I think about ROR as infrastructure

Amanda French — Fri, 08 Jul 2022 00:00:00 +0000

The other day I was out and about and got into a conversation with someone who asked me about my doctoral work in English literature. I’ve had the same conversation many times: I tell someone (only if they ask!) that my dissertation was a history of the villanelle, and then they cheerfully admit that they don’t know what a villanelle is, and then I ask them if they’re familiar with Dylan Thomas’s poem “Do not go gentle into that good night.” So far, everyone has heard of it – it’s a very well-known poem indeed. I then explain that “Do not go gentle into that good night” is a villanelle, and that a villanelle is a poetic form something like a sonnet. So far, everyone also knows what a sonnet is, which is why I use that as a comparison, even though a villanelle isn’t all that much like a sonnet, in my opinion. They’re both poetic forms, however, with a particular standard number of lines and a particular standard rhyme scheme, so in that sense they certainly are alike.

Oddly enough, I think my early background in the study of poetic form is very much of a piece with my new role here at Crossref as Technical Community Manager for ROR, the Research Organisation Registry. Both poetic form and metadata are invisible to most people, but both are valuable infrastructure. Both poetic form and metadata involve generally-accepted practices and standards that differ between different groups of people and change over time. Both writing formal poetry and creating rich metadata can seem burdensome and rigid to some people, but to my mind, both are generative. A solid underlying foundation allows for all kinds of creativity to flourish on the surface.

That might be part of why as soon as I heard about ROR I understood its tremendous potential. As someone who’s worked in digital humanities and scholarly communication for over fifteen years, I’ve long appreciated the value of clean, standard, comprehensive metadata in general. For instance, I explained the origin and value of the Dublin Core metadata standard to many a history scholar in the Omeka workshops I often taught at THATCamp. Later, while overseeing the institutional repository at Virginia Tech University Libraries, I learned even more about both the importance and the difficulty of creating, acquiring, and providing good metadata. When the pandemic began in 2020, I learned more than I ever wanted to know about messy data as Community Lead for The COVID Tracking Project at The Atlantic.

Data and metadata are, let’s admit it, very hard to keep clean and consistent as they travel through multiple systems, and that’s why it’s important to regularize as much as we can through automatic means such as APIs that use agreed-upon standards. Scholarship is a network of networks, and common identifiers like DOIs and ORCIDs enable the interchange of information in those networks about scholarly outputs and scholars, and thus they enable scholarship itself. What could be more important than that?

But the organisations that employ, fund, and publish scholarly researchers have had a hard time keeping track of everything “their” researchers have given to the world. That’s the problem that ROR, “a community-led registry of open, sustainable, usable, and unique identifiers for every research organisation in the world,” can help solve. In an ideal world, universities might use ROR IDs to track the research their faculty have produced, certainly, but they might also discover which universities their faculty’s co-authors most often come from. Funders might use ROR IDs to identify the research outputs that have benefited from their funds, certainly, but they might also analyze whether they are funding enough researchers from institutions in rural areas. Publishers might use ROR IDs to offer affiliation searching in their own public interfaces, certainly, but they might also create internal reports on compliance with institution-level transformative Open Access agreements. Once something like ROR is widely adopted, the vision of the Research Nexus becomes closer to reality: “A rich and reusable open network of relationships connecting research organisations, people, things, and actions; a scholarly record that the global community can build on forever, for the benefit of society.” ROR is all about the “organisations” part of that alluring vision.

If you’re curious about ROR and want to learn more (hey, that rhymes!), you might want to watch the highly informative presentation from September 2021 “Working with ROR as a Crossref Member”, in which you’ll learn several interesting things, including the following:

ROR itself is not an organisation, but an initiative supported jointly by Crossref, DataCite, and the California Digital Library;
Crossref members cited institutional affiliation identifiers as one of their top priorities in 2019, second only to abstracts;
The specifics of how one recent ROR integrator, the open access journal publisher Hindawi, used the ROR API to create a typeahead widget in its manuscript submission system that replaces user-supplied free text with a standard institution name and a ROR ID behind the scenes, helping them to generate useful internal reports about institutional payments; and
Crossref supports the submission of ROR IDs in its XML content registration process and makes ROR IDs available in its API.

I’m also enthusiastically inviting you to get in touch with me if you’d like to learn more about ROR or if you’d like to tell me about your previous experience with ROR. And if you don’t get in touch with me, please be aware that I might well reach out to you – I’m eager to hear what you hope for from ROR, but also what you’re skeptical about. For, after all, I learn by going where I have to go – don’t we all?

Seeing your place in the Research Nexus

Kornelia Korzec — Wed, 22 Jun 2022 00:00:00 +0000

Having joined the Crossref team merely a week previously, the mid-year community update on June 14th was a fantastic opportunity to learn about the Research Nexus vision. We explored its building blocks and practical implementation steps within our reach, and within our imagination of the future.

Read on (or watch the recording) for a whistlestop tour of everything – from what on Earth is Research Nexus, through to how it’s taking shape at Crossref, to how you are involved, and finally – to what concerns the community surrounding the vision and how we’re going to address that.

Summary of presentations

Click on image above to access the presentation.

The idea is simple in principle: scholarly records ought to be transparent – available to examine and learn from for all. Much of scientific production and communication these days has a heavy digital footprint so the Nexus is nothing but simply connecting the loose strands, right? Yet, as the scholarly record is a reflection of the continuous progress made by multiple actors within the context of scientific structures and processes, bringing the Nexus to life is a little short of simple.

“What we think of as metadata is expanding, and the notion of ‘record types’ is changing” – said Ginny Hendricks. A great majority of scholarly ‘objects’, whether they are data sets, research articles, monographs, or others, undergo many processes (including review, publication, licensing, correction, derivation) and influence knowledge and practice over time.

Making that progress visible and discoverable will allow for tracing the development of ideas and changes in our thinking over time. Transparency of the complete scholarly records will help to understand the impact of science funding and changing policies. It can support a more robust and comprehensive assessment of research, and contribute to improving integrity within as well as public trust in sciences.

The Research Nexus concept was first introduced by Jennifer Lin in 2017 as “Better research through better metadata”. Important adaptations to the model were needed to break it out of the content-specific schema. Ginny also pointed out that the concept is shared among the scholarly infrastructure community, citing a report from 2015 by OCLC Research on conscious coordination for stewardship of the evolving scholarly record.

Patricia Feeney has given us reasons for optimism in building a robust Nexus. She’s shown areas of greatest growth in metadata reported to Crossref and shared a public roadmap of types of information we’re asked to enable in the future. We’re seeing a true boom of datasets and peer review reports registrations, and the relationship metadata for our records is improving too. At the dawn of defaulting to open references, 44% of records we hold have associated references and that is growing. Provision of the newly enabled affiliation information (ROR IDs) is on the rise, as is the funder information. Some conversations and questions followed highlighting the need for further guidance in these areas.

To make a case for enriching metadata records, Martyn Rittman demonstrated examples of traceability of research influence on realities outside academia. He captured recent examples of data citations and other references present not just between scholarly papers, but also in policy documents and popular media. These allow for greater discoverability of literature – but also show the public influence and impact of the research and the work’s context in our wider society.

While Martyn shared our blue-skies aspiration to streamline Crossref’s APIs to offer insight to all these relationships with a single service, Joe Wass grounded those ambitions in the reality of technical work underway. His team’s attention is divided between three main areas. They continue to maintain and de-bug our existing infrastructure. They are developing self-service solutions for members. Finally, they are mapping and planning improved infrastructure, evaluating technology against the Research Nexus vision.

Bringing it back to the source (of metadata), Rachael Lammey offered a very practical guide to key activities enabling Research Nexus that all members can take on now. She highlighted the benefits of collecting and registering data citations, ROR IDs, and grant funding information. She went on to talk about challenges of subject classification (at a journal level) that our research and development efforts are focusing on at the moment.

Summary of discussions

Publishing has changed dramatically and our members recognise increasing opportunities for transparency of the scholarly record. Breaking the distant vision of Research Nexus down into actionable chunks made it more relatable for call participants. Many reflected on seeing their place in it properly for the first time. Yet, challenges remain and many were brought to the fore in the discussions.

The reliability and usability of the technology for registering metadata with Crossref needs to improve. We need to do better in supporting multi-language and multi-alphabet information. Not just developing systems anew, but also streamline the way content is registered and annotated, and continue to disambiguate the competing identifiers. Different record types, chiefly books, present specific challenges in this regard. Finally, making all that metadata accessible and usable is key to enabling insights from the rich data we collectively make available.

Technology is important, but won’t overcome the barriers that exist in the mindsets. Siloed thinking means that publishers may not be sensitive to benefits that improved relationship metadata could have for colleagues working on assessment, even within the same institutions. Greater guidance or best practices for new identifiers, such as ORCID, ROR, grants, would allow more publishers to get on board with the changes. Researchers often don’t help the cause either – many don’t realise the role and benefits of metadata for their work and are reluctant to provide rich information related to it, perceiving it as a bureaucratic burden.

In a nutshell, I learnt that – while the concept of Research Nexus is pretty complex – we’re all already participating in making it a reality. I’m grateful to the call participants for sharing their challenges and ideas so generously. It means we can work to address those. I’ll be sure to follow-up on requests for support and clearer guidelines about citing data, recording ROR IDs and grants information in the metadata, and we’ll engage our community on complex topics of record updates (corrections, retractions and versions). Be sure to keep in touch with the conversations on the Community Forum. I’ll see you there!

Announcing our new Head of Strategic Initiatives: Dominika Tkaczyk

Geoffrey Bilder — Fri, 10 Jun 2022 00:00:00 +0000

TL;DR

A year ago, we announced that we were putting the “R” back in R&D. That was when Rachael Lammey joined the R&D team as the Head of Strategic Initiatives.

And now, with Rachael assuming the role of Product Director, I’m delighted to announce that Dominika Tkaczyk has agreed to take over Rachael’s role as the Head of Strategic Initiatives. Of course, you might already know her.

We will also immediately start recruiting for a new Principal R&D Developer to work with Esha and Dominika on the R&D team.

What does this mean for R&D?

Before I talk about what Dominika’s move means in practice, I just want to take a moment to thank Rachael for the time she spent working with us. Over the past year, she has injected a massive amount of energy into the group and rebuilt the team’s momentum. This is exactly what we asked her to do.

Rachael’s first task was to repatriate her two R&D colleagues, who we had loaned to work on other urgent projects. Dominika was the technical lead on the port and relaunching of the REST API. Esha was the technical lead for the ROR initiative. In addition, Rachael has been working with Esha, Dominika, Paul Davis, and me on several shorter-term strategic projects that are shaping our overall development strategy.

Exploring and implementing a new approach to building content registration front ends. This approach is schema-driven and bakes in localization and accessibility support from the start. The new approach is currently the basis for the grant registration tool that our Product & Tech teams are now testing with our new funder members.
Exploring and ultimately rejecting a “pull-based” approach to registering metadata, where Crossref would harvest structured metadata from member landing pages instead of asking members to deposit it with us via XML. You are not really doing R&D unless some of your ideas fail. In this case, we quickly discovered that the logistics of crawling our members’ websites, combined with the sparsity of structured metadata in landing pages, made a pull-based approach fragile and impractical.
Exploring the use of ML techniques to fill gaps in the journal classification data that is currently in the REST API. Gaining new data science badges in the process.
Exploring alternative approaches to building community-extendable reporting tools using standard data science tooling and techniques.
Exploring how we can help reduce support toil by using data science tools like notebooks to create new support tools and self-serve UIs for information frequently requested by members that can otherwise prove difficult to get using our existing tools.
Looking at extending the matching technology previously developed by labs to try and better match funder grant-information research outputs.

And this is just a sample of projects Rachael helped promote and prioritize. It is the nature of many of the larger R&D projects that you don’t see the immediate results until long after they’ve been conceived and put into motion. This means that Rachael has been working on some things over the past year that are not yet public.

But, with any luck, we may see some significant new developments in how Crossref collects and distributes information about significant updates to the scholarly record- including retractions and withdrawals. We are also likely to see more work to promote data citation amongst our members. And finally, we are likely to see an attempt to create a community-managed and open research classification taxonomy. Of course, as is the case with research projects, there is no guarantee that any of these nascent ideas/projects will make it into a production service. Still, if even one of them does, it will become as vital a part of open scholarly infrastructure as DOIs, ORCIDs, or ROR IDs are now.

And we will have Rachael and the hard work of the R&D group, important cameos from others, and community input to thank for giving them the initial push to realization. So that’s a pretty good track record for just a year in the R&D group.

Passing the torch

And this is a track record I’m confident that Dominika can match as she takes over Rachael’s role.

Soon after Dominika joined the Crossref R&D team, she started to expand her activities to include more production engineering practice, team leadership, and community outreach. She has also worked extensively with support and outreach- providing them with data science consulting and mentoring in software development. Her new role as the Head of Strategic Initiatives will continue this trend. She will spend less time prototyping software and analyzing data and more time liaising with our members and the broader community to understand their needs and design R&D projects to test approaches to meeting those needs. This means a lot more liaising with other Crossref teams, speaking with our members and the wider community, and participating in working groups and conferences.

It also probably means a lot less programming and analysis. But programming and building prototypes are critical to the R&D team. And so the first thing we will do is start recruiting for a new Principal R&D Developer to continue working along with Esha on conducting experiments and developing POCs.

I’m looking forward to the next year. With Rachael taking the role of Product Director and Dominika taking over as the Head of Strategic Initiatives, we are well-positioned to make profound technical and conceptual improvements to Crossref’s services while simultaneously working with the community to line up our next strategic priorities.

Rethinking staff travel, meetings, and events

Ginny Hendricks — Tue, 07 Jun 2022 00:00:00 +0000

As a distributed, global, and community-led organisation, sharing information and listening to our members both online and in person has always been integral to what we do.

For many years Crossref has held both in-person and online meetings and events, which involved a fair amount of travel by our staff, board, and community. This changed drastically in March 2020, when we had to stop traveling and stop having in-person meetings and events. Due to the hard work and creativity of our team and the support of our Ambassadors and Sponsors, we were able to move to exclusively online meetings and events and maintain connections with colleagues, members, and much of the scholarly research community.

Online meetings have benefits compared to in-person ones; they have a much lower carbon footprint, and they can be more inclusive because people don’t have to find the time and money to travel. But there are limitations to online meetings; individual connections made in person do become harder to maintain, and new connections are more difficult to make and grow online. Sometimes just by sitting with someone, meeting their team and drinking their tea, free-flowing conversation leads to real progress.

But with over 17,000 members in 150 countries, our small staff can’t be everywhere, and we need to consider the personal as well as the environmental impacts.

When we started work on the 2022 budget last year, our staff and board took the opportunity to think about our approach, with the goal of not going back to ‘normal’. So we asked ourselves, now that we have a better sense of what works and what doesn’t, how can we make our travel and in-person meetings have a greater impact on our goals, while also traveling less and reducing our impact on the environment?

We decided that in the context of our mission and values, we had to take into account three key areas:

The environment and climate change
Inclusion
Work/life balance.

We developed an updated strategy for in-person and online meetings from 2022 onwards along with a set of recommendations and commitments to reduce our carbon footprint. The commitments were approved by the board at its November 2021 meeting.

Our plan for online and in-person meetings

Online events will generally be aimed at broad groups, in multiple timezones, to inform, update, and test general ideas and assumptions at scale. In contrast, in-person events will be smaller, focusing on deep learning, co-creation, and collaborating through various formats such as workshops, roundtables, or sprints, ideally working toward a specific outcome. These smaller in-person meetings will be scheduled alongside other community events so there will be fewer trips on the whole but each trip more consolidated.

Each in-person meeting will have stated goals such as recruiting and onboarding a new Sponsor, bringing our Ambassadors together to build relationships and share best practices, or getting experts together in a room to help decide important polices, improve some code, or plan new initiatives. At the moment, we are not planning ‘hybrid’ events as we don’t believe they will help meet our goals.

While online meetings and webinars provide a breadth of interactions, in-person meetings can provide greater depth and opportunities for more meaningful engagement and purposeful discussion, and it is this depth that we have missed over the last two and a half years. Therefore, we are identifying focus countries where we plan on engaging more with local community groups. Each country-level engagement plan includes outreach and communications activities and some in-person meetings.

Factors and aims for selecting focus countries

Inclusion is important for us and we are committed to supporting the needs of our community members worldwide. We aim to combine meaningful conversations with informational activities. We want to provide time in the day for technical problem solving and/or a more strategically focused session, both of which have worked well in the past. We hope to learn more about trends in our selected focus countries, including the challenges our members face, local publishing norms, barriers to participation in Crossref, and understand and help to adapt government policies.

We consider a number of factors when selecting countries with which to focus our activities:

Where we have a relatively large number of members.
Where we are seeing an increase in new members joining.
Where we have not undertaken engagement activities in at least 3 years.
Where we have good contacts to collaborate with, i.e., a national funder, a sponsor or ambassador, a government body, or another organisation aligned with our mission.
Where we have very few members but where research output is high according to other sources, in order to understand and overcome barriers to participating in Crossref.
Where we can consolidate multiple engagement activities in one trip, for example run a LIVE (informational) meeting or workshop, develop relationships with a key Sponsor, or discuss national research policy with government representatives.
Where we can coordinate our engagement efforts alongside other local community events.

Our environmental commitments

In line with rethinking how we engage with our members and making sure we do so in the most sustainable, inclusive, and impactful way, we are making the following commitments:

Crossref staff will think strategically and consider environmental, inclusion, and work/life balance issues when they plan travel. We will make the most of in-person events by focusing on those that involve interaction, such as listening and learning from our members and users, deepening relationships, co-creating, and forming new alliances
We will travel less and have fewer face-to-face meetings going forward compared with 2019 as a baseline year. The 2022 travel and events budget was reduced by 40% and set at 60% of the 2019 budget. Travel and in-person events for the first half of 2022 have been limited so we will make this same commitment for 2023 still using 2019 as the baseline.
Crossref will track the carbon footprint of staff travel to meetings and events. We will regularly review the data and find ways to reduce the environmental impact.
Combine stakeholder visits with event trips and vice versa whenever possible (if you do 1 plane trip to a location 1000 miles away instead of 2 trips, you reduce your impact by 0.5t)
As previously planned before the COVID-19 pandemic, the Crossref LIVE Annual Meetings will remain online only and will be held in different time zones. Having them in different time zones will enable global sharing of updates with a lower environmental impact.
Crossref board meetings will be reduced from three in-person meetings per year to one face-to-face and two online meetings per year.
Fewer staff will attend fewer in-person conferences and will combine them with other travel.
For Crossref staff meetings, it is important for our distributed staff to meet face-to-face as a whole organisation and as teams. We will plan for one all-staff in person meeting per year (at which there can also be team meetings). Additional team meetings will be based on the reduced travel and meetings budget. Where possible, team meetings will be combined with other meetings (e.g. conferences or other community events).
While trips that combine meetings may mean longer time away from home, we will still try to avoid staff having to travel or be away on weekends. We will also:
- Avoid short-haul flights (under 2.5 to 3 hours) where trains are available.
- Book hotels within walking distance of the event locations (if safe) in order to reduce taxi use.
- Use public transport and trains (if efficient and safe).
- Select hotels that have good sustainability plans in place, seeking out ‘green’ hotels where (if available and within budget).
- Prioritize locations where the fewest number of staff have to travel or travel the shortest distances.

Reporting

From now on, we will:

Track staff travel incl. the number of trips, miles flown, and the carbon impact.
Estimate the carbon footprint of our two offices, staff home working, our data center, and our cloud infrastructure.
Track all Crossref-hosted events - in-person and online and review annually (what went well, what can be improved, how to further reduce carbon footprint) as part of the budgeting process.

Many organisations are now rethinking how to go about travel, conferences, meetings, and work in general. The pandemic may have been the trigger for a big shift in the ways we work and interact, and not all of it was welcome or should continue; however, sometimes it takes a big event to give us the space to sit back, reflect, and change things for the better going forward. As always, we’ll evaluate these approaches over time.

All of this means we may be declining some in-person meetings (and when we do, please don’t take it personally) but we still look forward to engaging with our community in a purposeful way.

This feels like a good time to give a shout-out to all our Ambassadors and Sponsors around the world who are very important for insight and engagement, and we will continue to partner with them for both online and in-person meetings.

Annual call for board nominations

Lucy Ofiesh — Tue, 31 May 2022 00:00:00 +0000

The Crossref Nominating Committee is inviting expressions of interest to join the Board of Directors of Crossref for the term starting in March 2023. The committee will gather responses from those interested and create the slate of candidates that our membership will vote on in an election in September.

Expressions of interest will be due Friday, June 24th, 2022.

About the our board elections

The board maintains a balance of seats, with eight seats for smaller members and eight seats for larger members (based on total revenue to Crossref). This is in an effort to ensure that the diversity of experiences and perspectives of the scholarly community are represented in decisions made at Crossref.

This year we will elect four of the larger member seats (membership tiers $3,900 and above) and one of the smaller member seats (membership tiers $1,650 and below). You don’t need to specify which seat you are applying for. We will provide that information to the Nominating Committee.

The election takes place online and voting will open in September. Election results will be shared at the annual meeting in October. New members will commence their term in March 2023.

About the Nominating Committee

2022 Nominating Committee:

Abel Packer, SciELO, Brazil, chair*
Patrick Alexander, Penn State University Press, US
Nisha Doshi, Cambridge University Press, UK
Marc Hurlbert, Melanoma Research Alliance , US*
Kihong Kim, Korean Council of Science Editors, South Korea*

(*) indicates Crossref board member

What does the committee look for

The committee looks for skills and experience that will complement the rest of the board. Candidates from countries and regions that are not currently reflected on the board are strongly encouraged to apply. Successful candidates often demonstrate a commitment to or understanding of our strategic agenda or the Principles of Open Scholarly Infrastructure; hold positions within their organisations that may be underrepresented on the board currently; and/or have experience with governance or community involvement. The Nominating Committee will also review the member organisation’s participation report.

Who can apply to join the board?

Board roles and responsibilities

Crossref’s services provide central infrastructure to scholarly communications. Crossref’s board helps shape the future of our services, and by extension, impacts the broader scholarly ecosystem. We are looking for board members to contribute their experience and perspective.

Setting the strategic direction for the organisation;
Providing financial oversight; and
Approving new policies and services.

The board is representative of our membership base and guides the staff leadership team on trends affecting scholarly communications. The board sets strategic directions for the organisation while also providing oversight into policy changes and implementation. Board members have a fiduciary responsibility to ensure sound operations. Board members do this by attending board meetings, as well as joining more specific board committees.

What is expected of board members?

Board members attend three meetings each year that typically take place in March, July, and November. Meetings have taken place in a variety of international locations and travel support is provided when needed. Following travel restrictions as a result of COVID-19, the board adopted a plan to convene at least one of the board meetings virtually each year and all committee meetings take place virtually. Most board members sit on at least one Crossref committee. Care is taken to accommodate the wide range of timezones in which our board members live.

While individuals apply to join the board, the seat that is elected to the board ultimately belongs to the member organisation. The primary board member also names an alternate who may attend meetings in the event that the primary board member is unable to. There is no personal financial obligation to sit on the board. The member organisation must remain in good standing.

Board members are expected to be comfortable assuming the responsibilities listed above and to prepare and participate in board meeting discussions.

How to apply

Please contact me with any questions at lofiesh@crossref.org

2022 public data file of more than 134 million metadata records now available

Patrick Polischuk — Fri, 13 May 2022 00:00:00 +0000

In 2020 we released our first public data file, something we’ve turned into an annual affair supporting our commitment to the Principles of Open Scholarly Infrastructure (POSI). We’ve just posted the 2022 file, which can now be downloaded via torrent like in years past.

We aim to publish these in the first quarter of each year, though as you may notice, we’re a little behind our intended schedule. The reason for this delay was that we wanted to make critical new metadata fields available, including resource URLs and titles with markup.

Crossref metadata is always openly available via our API. We recommend you use this method to incrementally add new and updated records once you’re up and running with an annual public data file. If you’re interested in more frequent and regular “full-file” downloads, consider subscribing to our Metadata Plus program. Plus subscribers have access to monthly snapshots in JSON and XML formats.

Every year our metadata corpus grows. The 2020 file was 65GB and held 112 million records; 2021 came in at 102GB and 120 million records. This year the file weighs in at 160 GB and contains metadata for 134 million records, or all Crossref records registered up to and including April 30, 2022.

Tips for using the torrent and retrieving incremental updates

Use the torrent if you want all of these records. Everyone is welcome to the metadata, but it will be much faster for you and much easier on our APIs to get so many records in one file. Here are some tips on how to work with the file.
Use the REST API to incrementally add new and updated records once you’ve got the initial file. Here is how to get started (and avoid getting blocked in your enthusiasm to use all this great metadata!).
‘Limited’ and ‘closed’ references are not included in the file or our open APIs. And while bibliographic metadata is generally required, lots of metadata is optional, so that records will vary in quality and completeness.

Questions, comments, and feedback are welcome at support@crossref.org.

Announcing our new Director of Product: Rachael Lammey

Ed Pentz — Thu, 12 May 2022 00:00:00 +0000

Unfortunately, Bryan Vickery has moved onto pastures new. I would like to thank him for his many contributions at Crossref and we all wish him well.

I’m now pleased to announce that Rachael Lammey will be Crossref’s new Director of Product starting on Monday, May 16th.

Rachael’s skills and experience are perfectly suited for this role. She has been at Crossref since 2012 and has deep knowledge and experience of all things Crossref: our mission; our members; our culture; and our services.

In all her roles at Crossref Rachael has demonstrated how community-focused product development can be done.

Starting as a Product Manager for Similarity Check and Crossmark, she then led community discussions on text and data mining and taxonomies, introduced our support of preprints, and led the very successful ORCID Auto-update integration. She initiated our important partnership with the Public Knowledge Project including scoping and overseeing the joint plugin development work over the years. She helped to grow the Sponsors program, establish the LIVE informational events, oversaw the founding of our ambassador program, engaged more research funders and institutions, and became a go-to person for data citation expertise in our community.

In her brief time in our Research & Development team, she helped to kick off that group’s reinvigoration and has engaged with numerous new community and technical initiatives. Such relationships—together with her knowledge of our systems and API—have enabled her to be a key driver in the development and adoption of ROR and grants - two of the highest strategic priorities of recent years.

Rachael says:

“Alignment in planning and focusing on delivering outcomes will be my initial priorities. I’m conscious that we have a lot in play and I want to support the product team in their existing and ambitious goals while working with the leadership team and our very diverse community to focus and prioritise our development roadmap. I’m really grateful for this opportunity and I am looking forward to working with our members, users, and other open infrastructure organisations in this new capacity”.

Our staff and the board are very enthusiastic about Rachael’s appointment and we know our community will be too. Please join us in congratulating Rachael!

Similarity Check: what’s new with iThenticate v2?

Fabienne Michaud — Tue, 10 May 2022 00:00:00 +0000

Since we announced last September the launch of a new version of iThenticate, a number of you have upgraded and become familiar with iThenticate v2 and its new and improved features which include:

A faster, more user-friendly and responsive interface
A preprint exclusion filter, giving users the ability to identify content on preprint servers more easily
A new “red flag” feature that signals the detection of hidden text such as text/quotation marks in white font, or suspicious character replacement
A private repository available for browser users, allowing them to compare against their previous submissions to identify duplicate submissions within your organisation
A content portal, helping users check how much of their own research outputs have been successfully indexed, self-diagnose and fix the content that has failed to be indexed in iThenticate.

We’ve received some great feedback from iThenticate v2 users and user testers:

“There are a lot of new and helpful features implemented in version 2 of iThenticate.”

– Beilstein Institut

“The updates to the user interface make working with the new version a pleasure. It has a very modern feel and is easy to use, as an app on a phone. We particularly like being able to click on a link and easily exclude a source from view with just a few clicks. The response time and speed of download are also greatly improved which will cut down processing time on our end.”

– Frontiers

“I like the ability to be able to exclude content directly from the report.”

– American Chemical Society

More information for administrators and users is available on the Turnitin website: iThenticate v2 documentation.

Upgrading to iThenticate v2

In September, we started inviting new and existing Similarity Check subscribers using iThenticate in the browser to upgrade to this new version. And now some of the manuscript submission systems have completed their integrations with the new version of iThenticate too, so users of these systems can start to migrate. Morressier users are already using iThenticate v2, and in the next few days, we will be emailing all eJournalPress users. We know the other major manuscript submission systems are also working on their integrations, and we’ll be in touch with members using them as soon as they confirm they are ready.

Manuscript tracking system integrations

All Similarity Check subscribers using a manuscript management system will particularly appreciate a closer integration with iThenticate v2 which means that users will be able to view their Similarity Report and investigate sources within their manuscript tracking system.

eJournalPress

eJournalPress users will also be able to customise their iThenticate v2 settings via a configuration interface and to decide, for example, to include or exclude bibliographies from their Similarity Reports. The new integration will also show the top five matches returned by iThenticate directly in the eJournalPress interface.

eJournalPress configuration settings in iThenticate v2

Editorial Manager and ScholarOne

Aries (Editorial Manager) and Clarivate (ScholarOne) are planning to release their iThenticate v2 integrations later this year and we will be inviting users to upgrade in the coming months.

Please check our community forum for updates on manuscript tracking system integrations.

More new and improved features

User-friendly PDF report

“The report is clean and easy to read.”

– The National Academies of Sciences, Engineering, and Medicine

“The clickable links will save us a considerable amount of time as they make it easy for the author to understand where the overlap is coming from, meaning we do not need to spend time clarifying overlap reports to the authors. The summary page is also very useful as authors and editors are easily able to see which sections have been included and excluded from the report.”

– Frontiers

The PDF version of the Similarity Report has been completely redesigned and can easily be downloaded, emailed and printed. It contains a summary of the report i.e. word count, character count, number of pages, file size, excluded sections, submission, and report dates as well as the similarity score and a list of the top sources with clickable links.

First page of the Similarity Report in iThenticate v2

Summary and clickable links in the new Similarity Report in iThenticate v2

Custom section exclusion filter

In iThenticate v2, users can now exclude sections that are standard such as authors, affiliations, ethics statements, acknowledgments, etc. from the Similarity Report which often impacts similarity scores. You can choose from the templates available and/or create your own custom section exclusions from the admin portal.

Custom section exclusion filter in the iThenticate v2 admin portal

Summary of excluded custom sections on the iThenticate v2 Similarity Report

“The user interface is definitely more responsive than v1, especially when I am looking at the full-text viewing mode, scrolling through the text to compare matches, reading through the box of text in the matching source […] I also especially like the options around excluding, I was able to see our submitted work was also taken into the database and showed matches against the papers we’d uploaded already. Going forward, this is a really interesting thing for us, especially if we are looking at duplicated content in the same journal.”

– Taylor & Francis

User reporting

Details of user activity including folder names, similarity scores, word count, and file format are now also available in iThenticate v2 and downloadable as Excel and csv. files.

Up next

Product development

Further enhancements to existing features and interface such as the view full-text mode, user groups, and custom section exclusions are planned for this year. Paraphrase detection and citation matching are currently in development.

iThenticate v2 training

iThenticate v2 documentation is available from the Turnitin website. Training videos and webinars will be available later on in the year.

✏️ Do get in touch via support@crossref.org if you have any questions about iThenticate v1 or v2 or start a discussion by commenting on this blog post below.

Do you want to be a Crossref Ambassador?

Vanessa Fairhurst — Thu, 14 Apr 2022 00:00:00 +0000

A re-cap

We kicked off our Ambassador Program in 2018 after consultation with our members, who told us they wanted greater support and representation in their local regions, time zones, and languages.

We also recognized that our membership has grown and changed dramatically over recent years and that it is likely to continue to do so. We now have over 16,000 members across 140 countries. As we work to understand what’s to come and ensure that we are meeting the needs of such an expansive community, having trusted local contacts we can work closely with is key to ensuring we are more proactive in engaging with new audiences and supporting existing members.

We know that Crossref still remains inaccessible to many around the world, and in line with our strategic goal to engage communities, we want to lower the barriers to participation. Our Ambassadors are essential to us achieving this goal as we look to develop additional content in languages other than English, identify organisations to work closer with to support local research ecosystems, provide more in-person and online events in local time zones and languages, and do more in terms of open support via our community forum.

What are our ambassadors up to now?

We currently have a team of 30 ambassadors, spanning Indonesia, Turkey, Ukraine, India, Bangladesh, Colombia, Mexico, Tanzania, Cameroon, Nigeria, Russia, Brazil, USA, UAE, Australia, China, Malaysia, Mongolia, Singapore, and Taiwan. The program is reviewed annually, welcoming new faces and sometimes sadly saying goodbye to others. This enables us to continue improving how we work together and ensures the Ambassador team remains a diverse group of committed individuals that have the time and support from Crossref to fully participate in the program.

Over the last 3 years, we’ve had some great successes alongside a few challenges, not least of which has been working across 15 countries during a pandemic. We have all experienced the additional personal and professional strain that COVID-19 brought along, including shifts in the way we work and anxieties in the way we go about our lives. Of course, it has also meant that all our interactions have been restricted to Zoom, which has many benefits but doesn’t compare to face-to-face interactions when it comes to building strong working relationships, particularly across language and cultural barriers.

Despite this, our ambassador team helped us run 15 multi-lingual webinars last year, including Content Registration in Arabic, Getting Started with Books in Brazilian Portuguese, and an Introduction to Crossref in Chinese. They also helped us translate various materials and content into other languages, provided feedback on our new developments, took part in beta-testing, provided support to members on our community forum, and participated in calls to contribute to the program’s future.

I love helping people get to know Crossref’s products and services.

I was proud to work as Ambassador and give an online Chinese webinar to introduce Crossref and the services in Oct. 2021.

I am glad to be of help to Spanish speakers who are not able to grasp all the Crossref information correctly because of a language barrier or because they don’t have the time to read and explore all the information available.

Muy contento de poder formar parte como Embajador y con ello poder promover el uso y aprovechamiento de los productos de Crossref.

I feel so blessed meeting with many diverse friends in Crossref ranging from Europe to Asia continents.

Feeling happy by giving back knowledge to my regional community.

The future is ours to co-create

As countries are slowly dropping restrictions and we are taking our first cautious steps into a potential ‘post-pandemic’ world, our Community Engagement and Communication team has been looking at what this means for our activities in 2022 and beyond.

A big part of this is identifying local communities and groups to engage with to learn what challenges our members are facing, what barriers to participation in Crossref still exist, and how we can overcome these together. This practice is also fundamental to our vision of the Research Nexus––a rich and reusable open network of relationships connecting research organisations, people, things, and actions––which can only become a reality if everyone can fully contribute to the scholarly record.

As such, we would like to expand our Ambassador Program and particularly encourage applications from those based in the following countries:

Argentina
Chile
Canada
Croatia
El Salvador
Germany
Ghana

Iraq
Kenya
Nicaragua
Nigeria
Peru
Poland
Vietnam

By being one of our ambassadors, you will become a key part of the Crossref community; our first port of call for updates or to test out new products or services, become well connected to our wide network of members, and work closely with us to make scholarly communications better for all.

If you are interested in participating, please read more on our Ambassadors page. You can submit an application letting us know why you are interested, how you work with Crossref currently, and a bit more about yourself. We will then follow up with you to discuss your ideas and the program in more detail.

The Ambassador Program is quite flexible, so you can choose how and when you contribute based on your comfort levels and other commitments. However, it does come with some minimum requirements of attending two team calls a year, being responsive and letting us know if anything is preventing you from participating, and completing our annual feedback survey so we can continue to improve the program going forward. A good level of English and a firm understanding of our services and systems at Crossref is also a must to participate fully in the program and provide support to others in your local community. If you have just joined Crossref or want to learn more about how to work with us, then the Ambassador program may be too much for you right now, but our documentation has lots of helpful information and step-by-step guides, and you could also look at attending one of our events or joining our community forum.

If you have any questions, you can always contact us at feedback@crossref.org. We look forward to hearing from you!

Amendments to membership terms to open reference distribution and include UK jurisdiction

Ginny Hendricks — Mon, 04 Apr 2022 00:00:00 +0000

Tl;dr

Forthcoming amendments to Crossref’s membership terms will include:

Removal of ‘reference distribution preference’ policy: all references in Crossref will be treated as open metadata from 3rd June 2022.
An addition to sanctions jurisdictions: the United Kingdom will be added to sanctions jurisdictions that Crossref needs to comply with.

Sponsors and members have been emailed today with the 60-day notice needed for changes in terms.

Reference distribution preferences

In 2017, when we consolidated our metadata services under Metadata Plus, we made it possible for members to set a preference for the distribution of references to Open, Limited, or Closed. Prior to the 2017 change, we acted as a broker of 1:1 feeds of parts of metadata for parts of our community - clearly a role that was not scalable.

We are well underway to pay back technical debt on our 20-year-old metadata system and effectively rearchitect it. We therefore recently needed to decide whether to rewrite code for a capability that hardly any member was using. Just one member has chosen Closed, and Limited was the default for a while, but the vast majority of our members now prefer Open distribution. Additionally, bringing references in line with other metadata significantly simplifies this work and will speed up the technical development.

The Crossref Board discussed the issue in our meeting on 10th March 2022, and voted to remove the reference distribution policy set in 2017. All board motions go on our website, and the wording of this particular motion is:

Resolve that, based on a technical assessment, we will change the reference distribution policy so that all references registered with Crossref are treated the same as other metadata, following a planned transition.

This motion means that 60 days from today—3rd June 2022—all references in Crossref will be open and after that available through our API. As with all other metadata, if members cannot make references available, or do not want them openly distributed, they can choose not to deposit them. However, depositing references is necessary in order to retrieve citation links from our members-only Cited-by API.

Check the documentation for information on how to deposit references and use Cited-by. Also look up your participation dashboard to see if you are already registering references and your current distribution setting.

Sanctions jurisdictions

Following the UK departing from the European Union, we needed to add the United Kingdom as a separate jurisdiction that we must comply with, alongside the United Nations, the United States of America, and the European Union.

Where there are either relevant financial or governance-based sanctions against individuals, organisations, geographic regions, or whole countries, Crossref is legally bound to comply with these four different jurisdictions. These laws supersede our own governing bylaws.

We have launched a new operations and sustainability section of our website, which includes a sanctions page which we will keep updated with any changes and actions we’re taking.

The specific terms that will change

The complete membership terms are online here. In the text below, any text to be removed is shown in ‘strike-through’ text and any additions are in bold. These new terms will be in effect from 3rd June 2022.

5. Distribution of Metadata by Crossref. Without limiting the provisions of Section 4 above, the Member acknowledges and agrees that~~, subject to the Member’s reference distribution preference,~~all Metadata and Identifiers registered with Crossref are made available for reuse without restriction through (but not limited to) public APIs and search interfaces, which enhances discoverability of Content. Metadata and Identifiers may also be licensed to third party subscribers along with an agreement for Crossref to provide third parties with certain higher levels of support and service. For the avoidance of doubt, the scope of Crossref’s distribution (if any) of a Member’s references is based on such Member’s reference distribution preference, as established by the Member in accordance with the “Reference Distribution” page on the Website.

20. Compliance. Each of the Member and Crossref shall perform under this Agreement in compliance with all laws, rules, and regulations of any jurisdiction which is or may be applicable to its business and activities, including anti-corruption, copyright, privacy, and data protection laws, rules, and regulations.

The Member warrants that neither it nor any of its affiliates, officers, directors, employees, or members is (i) a person whose name appears on the list of Specially Designated Nationals and Blocked Persons published by the Office of Foreign Assets Control, U.S. Department of Treasury (“OFAC”), (ii) a department, agency or instrumentality of, or is otherwise controlled by or acting on behalf of, directly or indirectly, any such person; (iii) a department, agency, or instrumentality of the government of a country subject to comprehensive U.S. economic sanctions administered by OFAC; or (iv) is subject to sanctions by the United Nations, the United Kingdom, or the European Union.

As always, please get in touch with us via member@crossref.org with any questions.

With a little help from your Crossref friends: Better metadata

Jennifer Kemp — Thu, 31 Mar 2022 00:00:00 +0000

We talk so much about more and better metadata that a reasonable question might be: what is Crossref doing to help?

Members and their service partners do the heavy lifting to provide Crossref with metadata and we don’t change what is supplied to us. One reason we don’t is because members can and often do change their records (important note: updated records do not incur fees!). However, we do a fair amount of behind the scenes work to check and report on the metadata as well as to add context and relationships. As a result, some of what you see in the metadata (and some of what you don’t) is facilitated, added or updated by Crossref.

Much of the work is automated but some of it still requires manual intervention (sound familiar?). Here’s an overview:

Before registration

Our open APIs allow for Crossref metadata to be used throughout research and scholarly communications systems and services, before and after records are registered with us. Those who have used a search function in something like a manuscript submission system, rather than having to hand key or copy and paste the information, will appreciate how these integrations reduce time, effort and the likelihood of errors in collecting metadata well before it gets to Crossref.

For one example, it’s very common for members to use the metadata to add DOIs to reference lists when preparing deposits. Of course, new members first need a prefix (and a memberID and name, but more on that later) in order to register content. We also provide a suffix generator for help in constructing DOIs. If you’re not sure how best to make use of existing metadata in deposits, we’ve got a few options for you. Questions are welcome.

We don’t often put it this way but we should: Crossref members rely on the metadata as much, if not more, than the rest of the community. More and better metadata directly benefits our members.

Upon registration

There are a number of ways we work with the metadata when deposits are received.

Checking for uniqueness In order to avoid duplicate records, we check to make sure that a title or work hasn’t been registered before. Depending on what we find, a conflict report or failed registration may result.
Adding DOIs to references When references come to us without DOIs, we’ll try to match and add them.
ORCID auto-update We automatically update authors’ ORCID records (with their permission of course) whenever deposits include their ORCID iDs.
Preprint to VoR reports We compare title information and provide notifications of matching records to members depositing preprints, to help them fulfill their obligation to link to Versions of Record (VoRs), where they exist.
Relationships Like preprint to VoR links, components are another kind of relationship. These might be supplementary material such as figures we can link to the ‘parent’ record.
Funding data When members register only a funder name as part of the information on who funded the work, we’ll try to match it to its identifier from the Funder Registry, to support better linking between funders and works.
Timestamps We add date-times for first created and last updated to member-supplied timestamps.
Count of references That’s right, we count all the references for each record that includes them and add the total to the metadata.

After registration

Once registered, we check, report on and update metadata in a few ways.

Link checking We email each member a monthly Resolution Report with details of the number of failed and successful resolutions for their DOIs. If someone in the community reports a DOI that isn’t registered, we email the member a DOI Error Report.
Citation counts and matches Citation counts for records of members participating in our Cited-by service are openly available in our REST API. The matching citations themselves are available to members, for their own records only.
Title transfers Title, prefix and DOI transfers are common and require assistance from our team.
MemberID It’s not uncommon for members to have more than one prefix. The memberID means users of the REST API can query for records associated with all of a member’s prefixes.
Digital preservation We handle the infrequent but critical update of URLs that are necessary when titles are triggered for digital preservation. We also preserve the metadata itself, with both CLOCKSS and Portico.

Of course, since records are often redeposited with updates (note, deposit fees are only charged once per record), some of these processes on our side are repeated as necessary.

This list isn’t exhaustive and other needs and opportunities will emerge. For example, we are looking at matching to add ROR IDs, as we do for funderIDs, and doing some research into how we might determine and assert subject classifications at the work-level. If you’re interested in more about this kind of work, you’ll want to read this recent post by my Labs colleague Dominika on matching grants to outputs.

Get in touch if you have questions or for more information.

Perspectives: Bruna Erlandsson on scholarly communications in Brazil

Bruna Erlandsson — Mon, 28 Mar 2022 00:00:00 +0000

Join us for the first in our Perspectives blog series. In this series of blogs, we will be meeting different members of our diverse, global community at Crossref. We learn more about their lives, how they came to know and work with us, and we hear insights about the scholarly research landscape in their country, challenges they face, and plans for the future.

In our first blog, we meet Bruna Erlandsson, Crossref Ambassador in Brazil, co-owner of Linceu Editorial, and client services manager at ABEC Brasil. Bruna has dedicated her career to scholarly publishing and has worked with Crossref for many years. We invite you to have a read and a listen below to meet Bruna!

<a type="button" style="cursor:pointer;" class="video-language-button" data-videoid="1030565718" data-playerid="video-player-perspectives-bruna-english">Portuguese</a>

<a type="button" style="cursor:pointer;" class="video-language-button" data-videoid="1030565745" data-playerid="video-player-perspectives-bruna-portuguese">English</a>

Tell us a bit about your organisation, your objectives, and your role
Conte-nos um pouco sobre sua organização, seus objetivos e sua função

I am a co-founder of the company Linceu Editorial, dedicated to publishing scientific and technological research in ethical, creative, and innovative ways. We strive to provide quality editorial services that meet standard industry requirements and best practices, increase visibility, attract readers and potential authors, and ensure their work is properly cited. My personal goal is to be recognized by the scientific community for providing excellent service to our clients.

Sou sócia proprietária da empresa Linceu Editorial, que se dedica à editoração de artigos científicos de inúmeras revistas, de forma ética, criativa e inovadora. Buscamos atribuir aos periódicos de nosso portfólio os requisitos de qualidade editorial alinhados às melhores práticas editoriais, de forma que aumentem sua visibilidade e atraiam leitores, potenciais autores e, não menos importante, que recebam citações em seus artigos. Meu objetivo pessoal é obter reconhecimento da comunidade científica por meio de uma prestação de serviço em nível de excelência.

What is one thing that others should know about your country and its research activity?
O que os outros deveriam saber sobre seu país e sua atividade de pesquisa?

Brazil is the South American leader in publishing scientific articles in Open Access journals. However, it faces challenges due to the absence of a more comprehensive public policy to support scientific editors. As a result, most journals are produced by teaching and/or research institutions or scientific associations with volunteer editorial teams that, although lacking professional journal production skills, produce high-quality journals. Only a tiny percentage of Brazilian journals are published through commercial publishers.

O Brasil é o líder sul-americano na publicação de artigos científicos, com destaque para as revistas em acesso aberto. No entanto, enfrenta desafios em função da ausência de uma política pública mais abrangente para apoio aos editores científicos. A maior parte dos periódicos é produzida por instituições de ensino/pesquisa ou Sociedades Científicas, tendo uma equipe editorial voluntária e carecendo de profissionalização em sua produção, embora, em muitos casos, apresentem boa qualidade. Apenas uma pequena porcentagem de periódicos brasileiros é publicada por meio de um publisher comercial.

Are there trends in scholarly communications that are unique to your part of the world?
Existem tendências nas comunicações acadêmicas que são únicas em sua parte do mundo?

I wouldn’t say unique. However, adherence to Open Science practices, such as preprints and making research data available, is already part of the editorial culture. On the other hand, open peer review is not yet well accepted by everyone in the scientific community, and only a few journals adopt it. In addition, in some areas of research, such as Education and Social Science, researchers are very active - on forums, in discussions lists and attending the same conferences - so there’s this feeling that ‘everyone knows everyone’ which can then lead to potential conflicts of interest and apprehensiveness around open peer review, particularly when it comes to publishing a negative review.

Eu não diria única, mas penso que, no Brasil, a adesão às práticas da ciência aberta, como publicação em preprint e disponibilização de dados de pesquisa, já fazem parte da cultura editorial. Por outro lado, a revisão aberta ainda não é bem aceita por toda comunidade científica, sendo poucos os periódicos que o adotam. Além disso, em algumas áreas de conhecimento com grande produção local, como por exemplo a Ciências Sociais e Educação, a interação entre membros da comunidade é muito grande, visto que são pesquisadores muito ativos em fóruns, listas de discussões e conferências da área, causando a sensação de que “todo mundo conhece todo mundo”, resultando em um possível conflito de interesse, visto que existe um grande receio em publicar um parecer aberto, especialmente se o caso for um parecer negativo.

What about any political policies, challenges, or mandates that you have to consider in your work?
E as políticas, desafios ou mandatos políticos que você deve considerar em seu trabalho?

In Latin America we have a large indexing database, Redalyc, and a digital library of Open Access journals, which has recently excluded a number of journals for charging APCs (Article Processing Charges), upon the understanding that this would go against their Diamond Open Access requirement.

However, in Brazil - in general - the understanding of Open Access is not so limited. Charging APCs are in fact encouraged by many as a form of self-sustainability of the journal while still being Open Access.

As for challenges, one of the biggest is whether or not to publish in English. Although the number of Brazilian journals that publish exclusively in English or both languages (Portuguese and English) is remarkably high. There is still however a belief that local science is only of interest to the local public, and so some question whether there is a value in publishing in English (or other languages). For example, if an author writes a research paper about a small riverside community in the countryside of Acre state in Brazil, they might ask why someone outside the country would be interested in reading that.

Aqui na América Latina, temos uma grande base indexadora, Redalyc, e biblioteca digital de periódicos de Acesso Aberto que, recentemente, excluíu da base um número considerável de periódicos que cobrassem qualquer tipo de taxa de publicação, por entender que isso iria contra os requisitos de seu modelo de Acesso Aberto Diamante (periódicos em acesso aberto livre de taxa de publicação).

No entanto, no Brasil, em geral, o entendimento é outro, a cobrança de taxas de processamento não descaracteriza o acesso aberto, sendo, na verdade, encorajado por muitos como uma forma de auto-sustentabilidade do periódico.

Já em relação a desafios, acredito que um dos maiores é a questão de publicar ou não em inglês. Embora seja notável o número de periódicos brasileiros que publicam exclusivamente em inglês ou ainda nos dois idiomas (português e inglês), existe ainda a crença de que a ciência local só teria interesse do público local, criando assim o questionamento se há ou não o valor em publicar em outro idioma. Por exemplo, se uma pesquisa estuda algo sobre uma comunidade ribeirinha no interior do estado do Acre, aqui no Brasil, é comum existir a dúvida se algo tão específico seria do interesse de alguém de fora do nosso país.

How would you describe the value of being part of the Crossref community; what impact has your participation had on your goals?
Como você descreveria o valor de fazer parte da comunidade Crossref; que impacto teve sua participação em seus objetivos?

I get immense value from being part of the Crossref community. Being a Crossref Ambassador brings greater recognition and legitimacy to my role working with editors and adds value to my company’s services as well. The title of Ambassador enhances trust in my opinions, presentations, and when providing support and clarification to those asking questions. However it also comes with a great responsibility to do this well, which motivates me to always keep up to date with developments at Crossref. Through the Ambassador Program I have given several webinars for Crossref and the Associação Brasileira de Editores Científicos (ABEC Brasil), which provide much needed information and support to Portuguese speaking Crossref members as well as enhancing the visibility of my professional activities at Linceu Editorial.

É um valor enorme fazer parte da comunidade Crossref! Ser Embaixadora do Crossref traz um reconhecimento entre os editores e agrega valor aos serviços de minha empresa. Esse título assegura confiabilidade em minhas opiniões, apresentações, e esclarecimentos de dúvidas, o que traz junto uma grande responsabilidade que me motiva a me manter sempre atualizada com tudo em relação ao Crossref. Através do Programa de Embaixadores eu ministrei diversos webinários para a Crossref e também para a Associação Brasileira de Editores Científicos (ABEC Brasil), fornecendo muitas informações necessárias para os membros da Crossref que falam português, e também isso tudo acaba por retornar em visibilidade para as minhas atividades profissionais na Linceu Editorial.

For you, what would be the most important thing Crossref could change (do more of/do better in)?
Para você, qual seria a coisa mais importante que o Crossref poderia mudar (fazer mais/fazer melhor)?

I think there is still a need for more multilingual training both online and face-to-face, which has been particularly lacking during the pandemic, to provide more information on Crossref services beyond Content Registration. For example Similarity Check is a service that people still have a lot of questions about (such as ‘what is the magic similarity percentage score to identify plagiarism?’ Answer - there isn’t one!). Crossmark is another service where I believe people could benefit from more training on it’s importance in the publication process, not only in cases of retraction but also in guaranteeing that the article is up-to-date and trustworthy. In Brazil many people use Open Journal Systems (OJS) and so the development of Crossref service specific plugins and training on how to use them is really useful!

Acho que ainda há necessidade de mais treinamentos multilíngues, tanto online quanto presencial – o que tem sido particularmente escasso durante a pandemia – para fornecer mais informações sobre os serviços do Crossref além do Registro de Conteúdo. Por exemplo, o Similarity Check é um serviço sobre o qual as pessoas ainda têm muitas dúvidas (como ‘qual é a porcentagem de similaridade aceitável para identificar plágio?’ Resposta - não existe!). O Crossmark é outro serviço onde acredito que as pessoas poderiam se beneficiar de mais treinamento sobre sua importância no processo de publicação, não apenas em casos de retratação, mas também para garantir que o artigo esteja sempre atualizado e confiável. No Brasil muitas pessoas usam o Open Journal Systems (OJS) e por isso o desenvolvimento de plugins específicos do serviço Crossref e treinamento sobre como usá-los seriam muito úteis!

Which other organisations do you collaborate with or are pivotal to your work in open science?
Com quais outras organizações você colabora ou é fundamental para o seu trabalho em ciência aberta?

I contribute to ABEC Brasil in a variety of ways including speaking on short courses about Crossref, designing content for lectures as part of an online program called ABEC Educação (which will be launched soon), and as a volunteer consultant to answer a variety of questions from editors regarding content registration at Crossref.

Contribuo com a ABEC Brasil, participando tanto como ministrante de minicursos sobre ferramentas Crossref quanto como conteudista de um curso no Programa EaD ABEC Educação (que será lançado em breve), além de como consultora voluntária para atender a diversas dúvidas de editores em relação a depósito de conteúdo.

What are the post-pandemic challenges you are facing and how are you adapting to them?
Quais são os desafios pós-pandemia que você está enfrentando e como você está se adaptando a eles?

Considering the current situation in Brazil, I don’t think I would consider us having reached ‘post-pandemic’ just yet. Although vaccination is taking place successfully, there are still many uncertainties and fears. A good example of this is Crossref LIVE Brazil which was canceled at the start of the pandemic and at the moment we still don’t know when we will be able to reschedule this. It still feels too risky to bring a number of speakers from abroad to Brazil and too soon to hold such a large in-person event.

However, if I had to highlight one challenge I’ve been facing, it would be something more personal rather than work-related. Beyond a shadow of a doubt, it would be the lack of human contact! It has been really hard to get use to not gathering together with family and friends and not being able to travel, meet new people, and experience new cultures. To deal with it, I spend my free time planning the places I will go to and people I will visit as soon as this whole situation is over!

Para ser honesta, considerando a realidade atual no Brasil, eu ainda não considero o momento atual “pós-pandemia”. Embora a vacinação esteja ocorrendo com sucesso, ainda existem muitas incertezas e medos. Um exemplo bem claro é o Crossref Live in Brazil, que foi cancelado assim que a pandemia foi “anunciada” e, até hoje, não sabemos quando ocorrerá, pois ainda soa muito arriscado trazer palestrantes de fora para o Brasil e também se encontrar com diversas pessoas em um evento presencial.

No entanto, se eu tivesse que destacar um desafio que tenho enfrentado, seria algo mais pessoal e não relacionado ao trabalho. E, sem sombras de dúvidas, seria a falta de contato humano! Está sendo realmente complicado se acostumar em não encontrar amigos e familiares, e também não poder viajar e conhecer novos lugares, pessoas e culturas – o jeito que encontrei para lidar com isso é gastar meu tempo livre planejando todos os lugares que irei e todas as pessoas que visitarei assim que essa situação toda passar.

What are your plans for the future?
Quais são seus planos para o futuro?

My plans for the future include continuously learning more and more about scholarly publishing including the various services that Crossref provides. I want to be able to help publishers implement valuable tools into their workflows such as Similarity Check and Crossmark, and contribute to greater scientific dissemination of Brazilian research so that Brazilian journals can get the global recognition, visibility and value they deserve.

Meus planos para o futuro incluem aprender cada vez mais e mais sobre publicação científica, incluindo os vários serviços que o Crossref oferece. Quero poder ajudar os editores a implementar ferramentas valiosas em seus fluxos de trabalho, como Similarity Check e Crossmark, e contribuir para uma maior divulgação científica das pesquisas brasileiras para que os periódicos brasileiros possam obter o reconhecimento global, visibilidade e valor que merecem.

Thank you, Bruna!
Obrigado, Bruna!

Outage of March 24, 2022

Geoffrey Bilder — Thu, 24 Mar 2022 00:00:00 +0000

So here I am, apologizing again. Have I mentioned that I hate computers?

We had a large data center outage. It lasted 17 hours. It meant that pretty much all Crossref services were unavailable - our main website, our content registration system, our reports, our APIs. 17 hours was a long time for us - but it was also an inconvenient time for numerous members, service providers, integrators, and users. We apologise for this.

Like the outage last October, the issue was related to the data center that we are trying to leave. However, unlike last time, our single nearby network admin wasn’t in surgery at the time. Tim was alerted in the early hours of his morning and was able get up and immediately investigate.

Despite having both secondary and tertiary backup connections, neither activated appropriately.

The problem was with incomplete BGP (Border Gateway Protocol) settings on our primary connection’s network provider’s side. We never noticed this because our backup connection had the correct and complete BGP settings. But our backup circuit went down (we don’t know why yet), and when the router with complete settings went down, only the router with the incomplete settings was available and so everything went down.

We hadn’t yet fully configured the tertiary connection to cut over automatically. This meant cutting over to the tertiary during the outage would have required manual and potentially error-prone reconfiguration. Not something we wanted to do in a hurry with a sleep-deprived network admin.

It’s not an excuse at all. But we are currently down two people in our infrastructure group. One of our infrastructure staff recently left for a startup, and we are already hiring a new third position. In short, our one-long-suffering sysadmin had to field this all by himself. But hey - we are hiring a Head of Infrastructure, and if you are interested you can now see the work you’d have cut out for you!

So things are back up and we’ve resolved the incident but we are carefully and cautiously monitoring. We will further analyze what went wrong and post an update when we have a clearer picture.

I apologize for the downstream pain this outage will have inevitably caused. We realize that many people will now be scrambling to clean things up after this lengthy outage.

More when I have it… but for now I’ll mostly be curled up in a ball.

Announcing the ROR Sustaining Supporters program

Ed Pentz — Wed, 23 Mar 2022 00:00:00 +0000

In collaboration with California Digital Library and DataCite, Crossref guides the operations of the Research Organisation Registry (ROR). ROR is community-driven and has an independent sustainability plan involving grants, donations, and in-kind support from our staff.

ROR is a vital component of the Research Nexus, our vision of a fully connected open research ecosystem. It helps people identify, connect, and analyze the affiliations of those contributing to, producing, and publishing all kinds of research objects. Crossref added support for ROR to its schema and REST API in 2021 and we are asking Crossref members to use ROR IDs for author affiliations in the metadata they deposit with Crossref. But this post is about how the Crossref community can support ROR in another way.

All three lead organisations—as well as the ROR initiative—have publicly committed to the POSI Principles and we know that our diverse and global community is increasingly interested in showing its support for open scholarly infrastructure too. Now there’s an opportunity to show that support; the following blog by Maria Gould, cross-posted from the ROR blog, explains how.

ROR begins a new round of community fundraising

Since ROR launched in 2019, we have been charting a path to sustainability that leverages our broad community network and diversifies our funding sources. ROR is currently funded through a combination of in-kind support from its three operating organisations, project-based grant funds, and financial contributions from community members.

While ROR aims to minimize overhead and contain costs, it still requires resources to build and maintain the registry’s infrastructure, especially as adoption continues to grow. ROR has been working to establish independent revenue streams that complement ROR’s in-kind support, avoid dependence on grant funds, and ensure the registry data remains openly available.

This year, ROR is initiating a new round of community fundraising. Building on the community fundraising campaign we ran during 2019-2021, we are renewing a call for organisations to commit to supporting ROR financially. We are launching a Sustaining Supporters program that opens up new ways for organisations to participate in the collective funding of ROR.

ROR Sustaining Supporters program

With the Sustaining Supporters program, organisations are encouraged to support ROR’s operating expenses on a recurring annual basis. Any organisation that signs up to support ROR through the end of 2022 will be recognized as a Founding Supporter and receive a supporter badge that can be displayed on their website.

We want to make the process of contributing to ROR as easy as possible. To ensure this is the case, organisations can support ROR at any amount that works for their budget and capacity. Also, to simplify the invoicing process, organisations that are already members of Crossref or DataCite can choose to receive an invoice directly from Crossref and DataCite for their ROR contributions. However, if organisations prefer, they can also be invoiced directly from ROR.

Why support ROR

ROR aims to be an example of the power and potential of community-funded open infrastructure. ROR is committed to providing open, stakeholder-governed infrastructure for research organisation identifiers and associated metadata. Implementation of ROR IDs in scholarly infrastructure and metadata enables more efficient discovery and tracking of research outputs across institutions and funding bodies.

The Sustaining Supporters program is the next step in ROR’s sustainability journey. ROR is continuing to explore future potential paid service tiers designed for those organisations and companies that rely heavily on our infrastructure, which would complement the supporters program. However, rest assured that any paid services will not impact the availability of ROR data or our commitment to supporting our community, in line with our commitment to the Principles of Open Scholarly Infrastructure (POSI).

We’ve all seen key infrastructure components disappear, be enclosed, or get acquired. We are also realistic about how much effort and cost is involved in sustaining key components of open infrastructure that the scholarly community depends on. And we are committed to doing this right. That means not just sustaining core infrastructures, but investing in them so that they can evolve alongside community needs.

ROR is a free resource for the research community. However, this shared infrastructure does require a collective funding approach that can sustain it as a common good.

Join us!

This is an exciting moment to be part of ROR’s growth. Let’s fund open infrastructure together!

If your organisation is interested in supporting ROR and helping to fund open, community-led infrastructure, sign up here.

Follow the money, or how to link grants to research outputs

Dominika Tkaczyk — Tue, 22 Mar 2022 00:00:00 +0000

The ecosystem of scholarly metadata is filled with relationships between items of various types: a person authored a paper, a paper cites a book, a funder funded research. Those relationships are absolutely essential: an item without them is missing the most basic context about its structure, origin, and impact. No wonder that finding and exposing such relationships is considered very important by virtually all parties involved. Probably the most famous instance of this problem is finding citation links between research outputs. Lately, another instance has been drawing more and more attention: linking research outputs with grants used as their funding source. How can this be done and how many such links can we observe?

TL;DR

We looked for links between research outputs and grants registered with Crossref.
Grant DOIs alone are not enough for linking research outputs with grants, because the funding information in research outputs typically does not contain grant DOIs (yet). Award numbers alone are also not enough because they are not globally unique.
We used either grant DOIs (if available) or the combination of award number and funder information to match grants to research outputs.
In total, we found 20,834 links between research outputs and registered grants, involving 17,082 research outputs and 3,858 grants (10% of all registered grants)¹.
Erroneous and incomplete metadata, especially involving award numbers, is the main factor that prevents linking research outputs to grants.

Introduction

The ecosystem of scholarly metadata is filled with relationships between items of various types: a person authored a paper, an author works at a university, a paper cites a book, a book contains a chapter, a funder funded research. Those relationships are absolutely essential: an item without them is missing the most basic context about its structure, origin, and impact.

No wonder that finding and exposing relationships between items in the scientific ecosystem is considered very important by virtually all parties involved. Probably the most famous instance of this problem is finding citation links between research outputs. Another, relatively new example, is linking research outputs with grants used as their funding source.

At Crossref, for some time now we have been seeing a steady growth of funder membership and grant registration. We are aware that the possibility of finding relationships between grants and research outputs is a big reason why funders are registering grants with us in the first place. Being able to see which research outputs are being supported by which grants helps reduce the reporting burden on researchers, funders, and institutions alike, especially now with the addition of ROR IDs to help complete the picture. Exposing relationships between research outputs and grants also increases the transparency of funding sources of the research, making it easier to assess and trust scientific findings.

But how can we find those relationships and how many of them can we already observe? Thankfully our REST API, recently equipped with the grant metadata, can help us answer these questions.

The perfect scenario

Imagine a world where the metadata of any scientific output states all relationships with other items existing in the scientific ecosystem, and those related items are always referred to by their persistent identifiers, allowing all this information to be accessed in a fully machine-readable way… Lovely, right?

In the case of citations, in such a perfect world every bibliographic reference has a DOI of the cited item. And in the case of funding information, a scientific paper contains grant DOIs, stating the funded-by relationships between the paper and the grants.

But, as the last two years have painfully taught us all, life is not all rainbows and unicorns.

The reality kicks in

We know that around 71% of bibliographic references are deposited with Crossref without a DOI of the cited item. This means that if we want to establish citation links between items, we need to match the bibliographic references using the provided metadata, which is not a trivial task and can potentially introduce errors.

And the situation with the funding information and grant DOIs is even worse.

Problem #1: our schema does not allow the publishers to attach grant DOIs to research outputs

This issue is 100% on us. Because grant DOIs are relatively new, our deposit schema does not yet allow to specify the grant DOI in the funding information of a research output, even if the publisher wanted to. We are working on changing this.

Interestingly, it looks like persistent identifiers always find a way. Within over 7.4 million research outputs with funding information, we noticed 6 cases where a grant DOI was provided as an award number. For example in 10.1093/nar/gkaa994 we have the following:

funder: [
{
name: "Wellcome Trust",
award: ["10.35802/108758"],
doi-asserted-by: "publisher",
DOI: "10.13039/100010269"
}, ...
]

This may not be 100% correct from the schema perspective, but it is very useful when one is interested in linking grants to research outputs!

But those cases are extremely rare outliers. For the vast majority of the outputs, grant DOIs are not present in the metadata. This means that, just like in the case of bibliographic references, we have to use the metadata to match funding information to grants.

Funding information is typically given as a pair: award number, funder information. Grants contain similar metadata. One might be tempted to use only the award number for linking, as in some cases it can look like a grant identifier.

Let’s consider an example. We want to find all papers funded by grant 10.37807/gbmf7622. The award number is GBMF7622. A simple approach might be to search for items with this award number in Crossref’s REST API, which returns 12 results². However, one of the resulting items is the grant itself³. So excluding that, it seems like there are 12-1=11 research outputs funded by this grant.

Simple and easy, right? Well, think again.

Problem #2: award numbers are not unique

Let’s look at another example grant: 10.25585/60000600. Its award number is 2817 and the funder is the US Department of Energy.

When we search for this award we get 10 results⁴. Like before, one of them is our grant. After examining the remaining 9 we will see that:

3 items have been funded by the Joint Genome Institute, which according to the Funder Registry has been incorporated into Basic Energy Sciences, which is a descendant of the US Department of Energy
2 items have been funded by International Rett Syndrome Foundation from the US
2 items have been funded by Agencia Nacional de Promoción Científica y Tecnológica from Argentina
1 item has been funded by Arak University of Medical Sciences from Iran
1 item has been funded by Shahrekord University also from Iran

So among only 9 items mentioning the same award number we have in fact 5 different grants. Our input grant should probably be linked only to the three items mentioning Joint Genome Institute. The main problem illustrated here is that the award numbers are not globally unique, and thus should not be treated like identifiers.

Indeed, within 38,326 grants registered so far, we have 37,608 distinct award numbers, and among those, there are 716 award numbers, each of which appears in multiple grants. This issue comes in two flavours: conflicts between and within funders.

Between-funder award number conflicts

A conflict between funders is when more than one funder uses the same award number for one of their grants. This is expected - award numbers are assigned by funders internally and are not designed to be a globally unique identifier.

Out of 716 award numbers that appear in multiple grants, 12 are numbers that appear in grants of different funders. For example, there are two grants with the award number 105626:

Systemic MFG-E8 Blockade as Melanoma Therapy funded by Melanoma Research Alliance
Institutional Strategic Support Fund Phase2 FY2014/16 funded by Wellcome Trust

Because of those conflicts, we cannot simply rely on the award numbers for linking grants to research outputs. Instead, we have to use more information to be sure that the links are correctly established.

Within-funder award number conflicts

To our big surprise, it turns out that the majority of the award number conflicts happen not between different funders, but within the grants of a single funder. Out of 716 award numbers that appear in multiple grants, 704 appear in multiple grants of a single funder only. Such situations are not expected and could indicate an error or some other systematic issue with the data.

Interestingly, out of those 704 award numbers, 700 are associated with the US Department of Energy. We’ve followed up with them in order to clarify or resolve this. The US Department of Energy pointed out a fundamental issue with the data model: currently a grant deposited with Crossref has to have at least one funder DOI, and no other way of identifying the associated organisation is allowed. At the same time, some of the facilities that should appear in their grants’ metadata are not funders at all and thus cannot be identified by a funder DOI. In the future, they plan to identify those facilities in their grant metadata by providing ROR IDs.

Because of within-funder award number conflicts, in some cases it might be difficult to distinguish between two grants with the same award number and funder. A solution might be to use additional information or simply not accept any links if a research output cannot be reliably linked to one grant only.

Our linking approach

Based on all those observations, we adopted the following approach:

We iterated over all registered grants, for each we performed the following steps:
- We used award.number:<grant DOI> filter in the REST API to find all items listing a given grant’s DOI as the award number. Because this is based on the grant’s persistent identifier, we recorded those links without any further verification.
- We used the award.number:<grant award number> filter in the REST API to find all items listing grant’s award number in the funding information. Each resulting item was then verified by comparing the funder information in the item to the funder information in the grant. We recorded the link between the grant and the candidate item only if the verification succeeded.
In the final step, we examined all recorded links to make sure that each pair (research output, award number) is linked to at most one grant. Links violating this rule were flagged as not reliable.

We used different techniques to verify the funder information between the research output (item) and the grant, depending on what information is available. Grants always have the funder DOI. The item, however, can have the funder DOI, the funder name, or both.

If the funder DOI was available on both sides, the following rules were used for the funder verification (ordered by decreasing confidence):

Both the item and the grant contain the same funder DOI, for example, 10.35802/089928 and 10.1242/jcs.196758
The funder in the item replaced or was replaced by the funder in the grant (according to the Funder Registry), for example, 10.35802/104848 and 10.1136/medethics-2020-106821
The funder in the paper is an ancestor or a descendant of the funder in the grant (according to the Funder Registry), for example, 10.46936/sthm.proj.2010.40084/60004575 and 10.1016/j.heliyon.2018.e00629

If the funder DOI was not available in the item, the following rules were used for the funder verification (ordered by decreasing confidence):

The funder name in the paper is the same (ignoring the case) as the funder name in the grant, for example, 10.35802/110166 and 10.12688/wellcomeopenres.14645.4
The funder name in the item is the same (ignoring the case) as the name of the funder that replaced/was replaced by the funder in the grant, for example, 10.35802/206194 and 10.1172/jci.insight.96381
The funder name in the item is the same (ignoring the case) as the name of the ancestor/descendant of the funder in the grant, for example, 10.46936/cpbl.proj.2001.2191/60002922 and 10.1109/tkde.2016.2628180

Note that this is in fact very similar to our reference matching approach. In both cases, first we search for candidate items, and then verify the candidates by comparing the metadata. The actual metadata used for the verification varies, because different information is typically given in the bibliographic reference and the funding information.

What we found

This procedure applied to the entire Crossref dataset resulted in 20,846 links between research outputs and grants⁵. Of those, 12 were flagged as unreliable, because they involved more than one grant linked to the same item and award number. The rest of this section focuses on the remaining 20,834 links.

Within the 20,834 links, we have 17,082 research outputs and 3,858 (10.1%) grants.

Here is the breakdown into the verification approaches used:

Verification	#links	%links
The item contains grant DOI - no verification	6	<0.1%
Funder DOIs are the same	8,364	40.1%
Funder DOIs are related with a replaced/was replaced by relationship	3,704	17.8%
Funder DOIs are related with an ancestor/descendant relationship	7,718	37.0%
Funder names are the same	591	2.8%
The name of the funder in the item is the same as the name of the funder that replaced/was replaced by the funder in the grant	364	1.7%
The name of the funder in the item is the same as the name of the ancestor or descendant of the funder in the grant	87	0.4%

In most cases, just using the funder DOIs for the verification was enough. Verifying by the funder name added 1,042 links, which is 5% of all links.

And here are statistics for individual funders. Only funders with at least 10 deposited grants are listed in the table. The table shows the number of detected links, the number of distinct research outputs linked, the total number of outputs mentioning the given funder DOI, and the number of grants.

Funder	#links	#linked research outputs	#total outputs with funder DOI	#grants
Japan Science and Technology Agency	11,922	10,411	25,779	9,383
Wellcome Trust (including both funder DOIs 10.13039/100004440 and 10.13039/100010269)	8,001	6,246	49,492	17,534
James S. McDonnell Foundation	463	457	2,534	557
Melanoma Research Alliance	152	150	894	392
Asia-Pacific Network for Global Change Research	100	100	838	539
ALS Association	84	78	909	434
U.S. Department of Energy	56	52	97,482	8,462
Gordon and Betty Moore Foundation	51	50	5,928	94
American Cancer Society	3	3	7,276	107
Children’s Tumor Foundation	1	1	759	630
American Parkinson Disease Association	0	0	181	12
Neurofibromatosis Therapeutic Acceleration Program	0	0	101	68
International Anesthesia Research Society	0	0	94	34
Australian National Data Service	0	0	92	67

Note that the fourth column reports the total number of outputs registered with Crossref and mentioning the given funder DOI, including grants, journal papers and all other record types.

It is interesting to compare the number of linked research outputs for a given funder with the total number of research outputs mentioning a given funder DOI. In general, for a funder that registers grants, the more research outputs mentioning this funder, the more links we should be able to find.

And for some funders (Japan Science and Technology Agency, Melanoma Research Alliance, Asia-Pacific Network for Global Change Research, Wellcome Trust, James S. McDonnell Foundation), the number of linked outputs is indeed high, as compared with how many outputs mention the funder in the first place. This suggests our procedure was quite successful in linking outputs funded by these funders, meaning that in general the metadata in their grants and the funding information in the research outputs match.

On the other hand, we have a few funders for which we managed to link only a very small fraction of research outputs. There are several potential explanations here. A simple one is that not all relevant grants have been deposited yet. For example, a funder might be registering new grants only, whereas many research outputs mention older, not yet registered grants. It is also possible that there are systematic differences in how the publishers deposit the funding information in articles and other outputs, and how it is given in grants. Such differences might prevent us from establishing links, contributing to the overall low percentage of linked grants.

The importance of being precise

Here are some examples of existing links that should’ve been found, but were not.

The award number in grant 10.48105/pc.gr.93156 is CTF-2020-01-004. This article: 10.3390/ijms22094716 mentions award number 2020‐01‐004 and the same funder (Children’s Tumor Foundation). It is very probable that this is the same grant, but our procedure expects exactly the same award number, and so the two were not linked.

Paper 10.1128/genomea.00159-18 contains award number 1931 and U.S. Department of Energy as the funder. There are two grants with the same award number and funder: 10.46936/10.25585/60001053 and 10.46936/genr.proj.2000.1931/60002530. It is difficult to choose between them, and these links were marked as unreliable.

These examples could be signs of systematic errors and/or discrepancies that effectively prevent linking of those funders’ grants.

What’s next

In problems such as linking grants to research outputs, there are typically two key ingredients of the success, which at the same time are the main areas of improvement: the quality of the metadata, and the strength of the linking approach.

The metadata could be improved greatly by addressing existing discrepancies between grants and research outputs and allowing (and encouraging!) the publishers to provide grant DOIs in the funding information. Thankfully, we are not alone in those efforts. Both this recent Upstream blog from Alexis-Michel Mugabushaka, and this Scholarly Kitchen post from Robert Harrington call for the development and adoption of grant DOIs in scholarly metadata.

In terms of the linking approach, there are some ideas that could be used to further improve the linking accuracy and completeness:

The verification by funder name could be fuzzy and allow for minor variations like typos or additional words.
Apart from replaced/replaced by and ancestor/descendant, there are other relationships between funders in the Funder Registry: continuation of, incorporates/incorporated into, merged with, renamed as, split into/split from. We could also consider those relationships during the funder validation.
Apart from the funder information, there is other information that could be potentially used for verification, for example, the names of the authors and the investigators, the domain, or keywords.

If you have any questions, do get in touch!

All numbers are as of March 8, 2022 ↩︎
https://api.crossref.org/works?filter=award.number:gbmf7622 ↩︎
https://api.crossref.org/works?filter=award.number:gbmf7622,type:grant ↩︎
https://api.crossref.org/works?filter=award.number:2817 ↩︎
The code and data available here: https://gitlab.com/crossref/labs_data_analyses/-/tree/master/analyses/22-01-26-grants-matching ↩︎

A Registry of Editorial Boards - a new trust signal for scholarly communications?

Fabienne Michaud — Wed, 09 Mar 2022 00:00:00 +0000

Background

Perhaps, like us, you’ve noticed that it is not always easy to find information on who is on a journal’s editorial board and, when you do, it is often unclear when it was last updated. The editorial board details might be displayed in multiple places (such as the publisher’s website and the platform where the content is hosted) which may or may not be in sync and retrieving this information for any kind of analysis always requires manually checking and exporting the data from a website (as illustrated by the Open Editors research and its dataset).

For well-established as well as early career researchers, membership of an editorial board demonstrates their contribution to their community, brings prestige, improves (or maintains) their professional profile and often increases their chances of being published.

Whilst most journal websites only give the names of the editors, others possibly add a country, some include affiliations, very few link to a professional profile, an ORCID ID. Even when it’s clear when the editorial board details were updated, it’s hardly ever possible to find past editorial boards information and almost none lists declarations of competing interest.

We hear of instances where a researcher’s name has been listed on the board of a journal without their knowledge or agreement, potentially to deceive other researchers into submitting their manuscripts. Regular reports of impersonation, nepotism, collusion and conflicts of interest have become a cause for concern.

Similarly, recent studies on gender representation and gender and geographical disparity on editorial boards have highlighted the need to do better in this area and provide trusted, reliable and coherent information on editorial board members in order to add transparency, prevent unethical behaviour, maintain trust, promote and support research integrity.

Registry of Editorial Boards

We are proposing the creation of some form of Registry of Editorial Boards to encourage best practice around editorial boards’ information and governance that can easily be accessed and used by the community.

What we have in mind

A Registry of Editorial Boards could be a new trust-signal for Crossref members and details would be included on a member’s Participation Report.

Crossref members would register and maintain this information for their journal titles in a similar way as they currently manage their metadata. Only the owner of the title, or their trusted service provider, would be able to update it. Editors would be linked by ORCID iD and ROR and Crossref would use ‘autoupdate’ to push editorship information to ORCID profiles, saving researchers time. The information would be made available via Crossref’s API.

This new service would introduce more transparency and automation to the editorial process and connect content platforms (i.e. peer review management systems, publishers’ websites, ORCID and other author register systems, ROR, bibliographic databases, etc.) and make available current and historical information on editorial boards including metadata on the editorial boards’ full affiliations.

The benefits for the community

The benefits would be wide-ranging for the different stakeholders in the scholarly communications community, from publishers, researchers, institutions, funders, bibliometricians to librarians including:

providing those involved in the peer review process and research ethics a single, authoritative and up-to-date resource on editorial boards
reducing fraudulent claims to be or to have been on an editorial board of a publication in order to be published or publish others
connecting and automating editorship role updates with e.g. ORCID, ROR, etc.
generating a detailed analysis of the publication practices of editorial board members and their close contacts
assessing any relationships between authors, reviewers and editorial board members for conflict of interest, etc.
supporting researchers responding to a request to join an editorial board, making proactive approaches to a journal or wanting to ensure that an editorial board is representative of its community and assess its levels of diversity and inclusivity
providing increased visibility to researchers, particularly to early career researchers

Your feedback

Before we progress further, we would like to fully understand what the needs of the community are and whether members would be willing and have the capacity to participate and contribute regularly in registering and maintaining details of their editorial boards.

✏️ Please let us know what your thoughts and experience are with editorial boards by completing this brief survey by 31 March 2022.

POSI fan tutte

Geoffrey Bilder — Tue, 08 Mar 2022 00:00:00 +0000

Just over a year ago, Crossref announced that our board had adopted the Principles of Open Scholarly Infrastructure (POSI).

It was a well-timed announcement, as 2021 yet again showed just how dangerous it is for us to assume that the infrastructure systems we depend on for scholarly research will not disappear altogether or adopt a radically different focus. We adopted POSI to ensure that Crossref would not meet the same fate.

POSI proposes three areas that an Open Infrastructure organisation can address to garner the trust of the broader scholarly community: accountability (governance), funding (sustainability), and protection of community interests (insurance). POSI also proposes a set of concrete commitments that an organisation can make to build community trust in each area. There are 16 such commitments.

In our announcement of Crossref’s adoption of POSI, we made two critical points:

One doesn’t have to meet all the commitments of POSI already to adopt it. For one thing, this would make it impossible for new organisations to adopt POSI. So instead, we should view the adoption of the POSI principles as a “statement of intent” against which stakeholders can measure an organisation’s progress.
That, conversely, meeting all of the POSI principles doesn’t mean an organisation can relax. It is always possible for an organisation to regress on a particular commitment. For example, an emergency expenditure might mean that the organisation no longer maintains a 12-month contingency fund and therefore has to replenish it.

With these two points made, we ended our announcement with a candid self-audit against the principles. We concluded that Crossref was already entirely or partially meeting the requirements of 15 of the 16 POSI commitments. And adopting the 16th commitment would just formalize a direction Crossref had already been heading toward for several years. We also said that we would update our self-audit regularly.

But before we continue with the Crossref POSI audit update, we should talk about the immediate aftermath of our adopting the principles.

Since Crossref adopted POSI, nine other organisations have made the same commitment and conducted similar self-audits. We affectionately call them the “POSI Posse”.

Dryad
ROR
JOSS
OurResearch
OpenCitations
DataCite
OA Switchboard
Sciety
Europe PMC

These organisations represent a critical part of the hidden infrastructure that scholarly research depends on every day. By committing to POSI, they are helping ensure their accountability to the research community. They are also emphasizing that stakeholders must participate in the governance and stewardship of organisations running that infrastructure.

But perhaps most importantly- these ten organisations that have publicly committed to adopting POSI will not suddenly disappear or change priorities without giving the community time to react and, if need be, intervene.

There are also more quotidian advantages to these organisations adopting POSI. Adopting the principles makes it easier for the respective organisations to collaborate to make research infrastructure more effective and efficient. The foundation of effective collaboration is trust. And, so by agreeing that we share basic principles of operation, we virtually eliminate a whole slew of negotiations that typically need to occur before two organisations trust each other enough to collaborate closely on projects.

One of Crossref’s strategic priorities is to “collaborate and partner” with other organisations on improving our open scholarly infrastructure. And the easiest way to collaborate with us is to adhere to the same principles. So we look forward to more scholarly infrastructure organisations adopting POSI in 2022 so that, together, we can make research infrastructure work better.

Establishing this level of trust has already paid significant dividends with the Research Organisation Registry (ROR) - a relatively new infrastructure project founded jointly by DataCite, CDL, and Crossref.

Having nine organisations adopt POSI so soon after our announcement was a wonderful feeling. It is hard for us to convey how happy we are about this without gushing.

Here is a picture of me gushing.

But now we have some outstanding business to update our self-audit.

This post is the first of our regular updates on our progress (or regress) on meeting the POSI principles.

TL;DR

We didn’t regress on any commitment. We’ve improved a little bit where we were not meeting the POSI principles, but we have still not met all our POSI commitments.

Area	Commitment	2020	2021
Governance	Coverage across the research enterprise
	Non-discriminatory membership
	Transparent operations
	Cannot lobby
	Living will
	Formal incentives to fulfill mission & wind-down
	Stakeholder-governed
Sustainability	Time-limited funds are used only for time-limited activities
	Goal to generate surplus
	Goal to create a contingency fund to support operations for 12 months
	Mission-consistent revenue generation
	Revenue based on services, not data
Insurance	Available data (within constraints of privacy laws)
	Patent non-assertion
	Open source
	Open data (within constraints of privacy laws)

Details

Stakeholder governance moves from red to yellow

Our only red mark in our POSI self-audit was against the principle of stakeholder governance. Our board did not yet reflect our members’ diversity or the broader stakeholder community. In particular, as funders have become more central to shaping the scholarly communications landscape, it seemed important that Crossref have funder representation in our governance.

So this year, the Crossref nominations committee was charged with proposing a board slate that addressed some of our representational gaps. They did this, and as a direct result, two of the members elected to next year’s board were a funder (Melanoma Research Alliance) and a significant preprint platform (Center for Open Science).

These new additions to our board mark a significant improvement in stakeholder governance, but we can do more. Researchers and research institutions are also substantial Crossref stakeholders. We need to have a better representation of their concerns.

Also, there are still members of the scholarly communications community who depend on Crossref but cannot afford to join it because our fees are too high for them. Since membership is a prerequisite to participation in Crossref governance, we are also placing emphasis on figuring out how to further extend Crossref membership to those who still cannot afford it, through programs like Sponsorship, country-level journal gap analyses work, and a forthcoming fee review. So this is a source of stakeholder governance inequity that may be best handled by our membership & fees committee rather than our nominations committee.

In short, we’ve made progress on our stakeholder governance commitment. Still, we need to do more- so we are updating our adherence to the POSI stakeholder governance principle from red to yellow.

Another place where we have improved things is under the banner of “transparency.” But here, we see one of the shortcomings of the ‘traffic light” representation used in the self-audit. The degree that one meets a commitment falls along a gradient. And this gradient cannot be represented accurately in the ternary classification of red/yellow/green. So while last year we marked ourselves as “green” under the commitment to transparency, over the past year we have become greener. We did this by creating sections on our website that provide further detail on our governance and finances- even including the 990 forms that are required by US tax authorities for non-profits when they submit their taxes. So what do we do here? Make it neon-green? Make it blink?

Sustainability moves from yellow to chartreuse stays yellow

In our first self-audit, we had several yellow marks- places where we were doing OK, but where we needed to make improvements.

The first yellow mark involved one of the principles of “sustainability,” which stipulates that an organisation should have a goal to create a contingency fund to support operations for 12 months. At the time, we had a contingency fund of 9 months. The board instructed the finance committee to develop a plan for meeting the new 12-month goal. To do this, the board decided to create three funds. The first is fairly flexible and holds operating expenses for three months. Staff leadership can use this fund at their discretion to manage cash flow issues and support budgeted expenses. The second fund is the fund that holds operating expenses for 12 months. This fund is board-restricted and is only meant to be used in emergencies to help with substantial changes in our financial position or to, in extremis, fund an orderly wind-down of Crossref’s operations. Furthermore, the board’s investment committee established guidelines for investing our operating and investment surpluses. Any surpluses are first applied to supporting the 3-month fund. Once that funding goal is met, any surpluses are applied to the 12-month fund. And once both the 3-month and 12-month funding goals are met, any further surpluses will be put into another board-restricted fund that can be used to fund new investments or new Crossref initiatives.

But again, the simple yellow mark against this item does not capture this level of detail. We only get to turn it green once we have the 12-month fund in place.

It looks like we will meet the goal in 2022, but it is hard to say exactly when. If we did shades of color- we might make it chartreuse. But nobody wants to see chartreuse. So while we have made significant progress here, our commitment to maintaining a 12-month contingency fund remains yellow until we have reached our goal.

Patent non-assertion stays yellow

The second yellow mark was against our publishing a patent-non-assertion statement. This feels like a missed opportunity because it will be straightforward for us to do, but we have not yet done it. We have never applied for patents, and we don’t intend to start. In short, nothing is blocking us from doing this other than our natural reluctance to have to draft anything that involves lawyers. Our lawyers are very nice people, but everything we have to draft with them makes our eyes glaze over. We need to get this done ASAP in 2022.

Open source remains yellow

The third yellow mark makes me cringe because, as technical director, it is firmly in my bailiwick. We have committed to open-sourcing all of our code. In last year’s self-audit, I predicted that we should be able to open all of our code within 12 to 18 months. I was wrong. That means this commitment remains yellow. And what’s more- it is likely to remain yellow for a year or two. Let me try and explain why.

First, I should note that all new services that we’ve written since 2007 have been released as open-source (under an MIT license). These include our REST API, Crossmark, Metadata Search, and Event Data. You can find all our open-source code on Gitlab.

This leaves us with our “content system” with its legacy code, which handles content registration, OAI-PMH, OpenURL, and XML APIs. This code was originally developed for Crossref by a third party (who I won’t name because they are in no way to blame for our predicament). Crossref only took over the development of the code base internally ~ 2010. But the system has accumulated over twenty years of technical debt and includes many once-common engineering practices that are deprecated (to put it delicately). Additionally, the code is a labyrinth of dependencies on very old libraries under very old licenses.

And although we have spent much of the past two years replacing critical parts of the system’s authentication and authorization code, I am certain that there remain swathes of code that, under scrutiny, would prove a security nightmare.

Now we know that so-called “security through obscurity” is bad practice. Our legacy code base illustrates the point. We had credentials embedded in the code. We had backdoors and application-level root access. We had countless places where we didn’t sanitize input. But the code was private- and so it gave developers a false sense of confidence when they occasionally made these shortcuts in the interest of developing new features more quickly. And in those early days of hyper-growth, we often had to develop things very, very quickly. Technical debt, like any debt, is a tradeoff.

As I said- we’ve cleaned a ton of this stuff up. For example, we’ve replaced our primary authentication system. But this experience has made us better appreciate just how difficult it would be to harden a system this old.

And besides, we are already replacing it - albeit incrementally. We have been extracting and rewriting key components of the old system, and we plan to continue to extract and rewrite until there is nothing left of the old code. All this new code is, naturally, open-source. And it follows modern security practices.

And so we face a difficult choice- do we try and fix code that is hard to fix and that we are replacing anyway- or do we just focus just on replacing the code and making sure the new, open-source code follows modern security best -practices? We’ve chosen to take the latter route. But it does mean this entry will have a yellow circle next to it for a few more years as we replace things.

Open data moves from yellow to green

And this brings us to our final yellow mark- which was next to the principle of open data. The root of the problem is that what we colloquially call “Crossref metadata” is a mix of elements, some of which come from our members, some from third parties, and some from Crossref itself. These elements, in turn, each have different copyright implications.

On top of this, Crossref has terms and conditions for its members and terms and conditions for specific services. These terms and conditions grant Crossref the right to do things with some classes of metadata and not do things with other classes of metadata - regardless of copyright.

The net result is that users can freely use and redistribute any metadata they retrieve via our APIs or in our periodic public data files. But it also means we cannot just slap a CC0 waiver on all the data. Instead, we have to specify exactly what copyright and terms apply to each class of data. We’d never done this in a clear and accessible way, so some of our users were understandably concerned that maybe we were hedging or perhaps the reuse rights were unclear. But we are not hedging; they are clear. They just weren’t documented. And now they are. In human-readable form. And soon-to-be in machine-readable form. So we can move this from yellow to green.

Reflections on the year since our adoption of POSI

When the Crossref board adopted POSI last year, frankly, a few of us were surprised. We never doubted Crossref’s direction as an open infrastructure organisation, but we were not sure that others would see the value in making a public commitment to the principles. We’d heard some people say that they thought adopting them would be seen as “Virtue Signaling.” Which, to be fair, it is. This shouldn’t be surprising or contentious. Our entire scholarly communication system is based on virtue signaling. But, of course, the term “virtue signaling” (with scare quotes) is also sometimes used to insinuate that such signaling is disingenuous and designed primarily for marketing purposes. And that would be a real danger. But the principles were drafted with a built-in safeguard against disingenuous use. The commitments POSI lists are practical things that can be verified by anyone. Is our data open? Does the diversity of our board reflect the diversity of our stakeholders?

So from the start, we knew that the community would be able to hold us to our commitments. And knowing that made it imperative that we develop a mechanism and process for tracking whether we were meeting them. Thus was born the self-audit.

And the self-audit, in turn, has served as a forcing function to ensure that we didn’t just launch a proclamation and then forget about it. We needed to integrate our POSI commitments into all aspects of our day-to-day work. As such, “Live up to POSI” is now a prominent part of Crossref’s Strategic Agenda. POSI has become a fundamental part of our planning and our public product roadmap. POSI has even become a part of our internal staff annual development plans.

Adopting POSI has changed the way we work. It has changed the way the board works. It has changed the way staff works.

And we hope that it is having a similar effect on our fellow POSI Posse.

But how about changing the way POSI works?

Now that Crossref and the nine other members of the POSI Posse have had a year of considering and/or living up to the POSI standards, what would we change? What would we add?

A few themes have started to emerge as we’ve fielded questions from the current POSI Posse and others who have expressed an interest in adopting POSI.

How does POSI apply to non-membership organisations?
Can POSI apply to commercial organisations?
How could POSI be extended to apply to open infrastructure organisations outside of scholarly communication?
How in the hell do you pronounce “POSI?”

We’ve tried to answer some of these questions in the POSI FAQ, but can we update POSI so that we don’t need the FAQ? Or at least so that we can start a new FAQ?

And, critically, if we change POSI, how do we ensure we make it stronger and not weaker? Because, to be candid, some of the questions that we’ve fielded have come from parties concerned that POSI is too restrictive. That, for example, the stipulation that revenue should be based on services and not on data makes for inflexible business models. Yes. It does. Deliberately.

Because one of the biggest barriers to a community being able to fork digital infrastructure is closed (incl. fee-based) data. And one of the fundamental positions of POSI is one the authors learned from open-source communities. This is that these efforts can fail no matter how much care you take to ensure financial sustainability and how much care you take to ensure community-based governance. The ultimate power the open-source community has is to take the code and fork it. This is the insurance policy that helps keep open source projects honest. And we have tried our best to bake this lesson into the POSI principles. We don’t want to weaken POSI. They are, after all, principles.

So in 2022, we look forward to more organisations endorsing POSI. And the current POSI Posse has started a conversation about how we can strengthen the principles and also extend them so that they can more easily be applied to different kinds of organisations and perhaps even in different sectors. A summary of these discussions will be published in the coming weeks.

But how will we open these conversations to the broader community? How will we engage those who have yet to adopt the principles but are interested in doing so? What about those interested but perhaps only if they are adapted in some way?

We already have a mechanism for soliciting feedback, questions, and suggestions concerning POSI. However, it is a relatively primitive system, based on either sending email to one of the POSI Posse or raising a GitLab ticket. It was the best we could do in the short time we had to put together the POSI site. An MVP, if you will. The feedback mechanism served us well over the past year; we engaged with many interested parties and even managed to help nine of them adopt the principles.

But as with all things POSI - there is room for improvement. And so, we hope to have a more user-friendly way to solicit public feedback and hold discussions. This feedback and our own experiences with adopting POSI over the past year will, in turn, inform our efforts at revising POSI to take into account the things we’ve learned since POSI was originally written.

So look out for announcements on the POSI site. And we look forward to another year of expanding the list of POSI adopters and continuing our own POSI progress. If you’re POSI-curious, get in touch with any of the ten POSI adopters to start a conversation about your own path towards truly open infrastructure.

Image integrity: Help us figure out the scale of the problem

Fabienne Michaud — Mon, 07 Feb 2022 00:00:00 +0000

Some context

The Similarity Check Advisory Group met a number of times last year to discuss current and emerging originality issues with text-based content. During those meetings, the topic of image integrity was highlighted as an area of growing concern in scholarly communications, particularly in the life sciences.

Over the last few months, we have also read with interest the recommendations for handling image integrity issues by the STM Working Group on Image Alteration and Duplication Detection, followed closely image integrity sleuths such as Elizabeth Bik and have, like many of you, noticed that image manipulation is increasingly given as the reason for retractions.

Image integrity issues are often associated with paper mill activity but can also originate from an individual’s intentional or unintentional unethical behaviour. Currently, such issues with figures and images are being identified manually or by using an image integrity tool, comparing images within the same article and/or the publisher’s past publications only - and we know that this is a source of frustration for the Crossref members we have spoken to.

What next ?

As reported in Nature last December, we believe Crossref is in a unique position to spearhead a cross-publisher solution, similar to what we do for text-based originality checking, as part of our Similarity Check service.

Before we start exploring potential software options, we need your help to understand:

the scale of the issues and whether these are focused on specific disciplines
the type of issues we should prioritise e.g. duplication, beautification, rotation, plagiarism, GAN-generated images/deep-fakes, etc.
what software (if any) members are using or trialling
whether a cross-publisher service with the collective benefit of shared images would be of sufficient interest to the community

✏️ Let us know what your experience and thoughts are on image integrity by completing this survey.

We’re planning to complete our research and share with you the results along with our proposed next steps soon.

Hiccups with credentials in the Test Admin Tool

Isaac Farley — Wed, 26 Jan 2022 00:00:00 +0000

TL;DR

We inadvertently deleted data in our authentication sandbox that stored member credentials for our Test Admin Tool - test.crossref.org. We’re restoring credentials using our production data, but this will mean that some members have credentials that are out-of-sync. Please contact support@crossref.org if you have issues accessing test.crossref.org.

2025 update

We’re working to scale back our support for the test admin tool. We will continue to support our XML parser for anyone wanting to test their XML. If you’re a service provider and would like to test your integrations, which we will continue to support, you may POST submissions to our test system using https://test.crossref.org/servlet/deposit. You’ll need to email us at support@crossref.org so we can configure an account within the test system before you test your integration.

The details

Earlier today the credentials in our authentication sandbox were inadvertently deleted. This was a mistake on our end that has resulted in those credentials no longer being stored for our members using our Test Admin Tool - test.crossref.org.

To be clear, this error has had no impact on the production Admin Tool - doi.crossref.org - or any member’s access to registering content therein. If you’re a member who registers content with us using our helper tools (e.g., the web deposit form) or OJS, you’re likely unfamiliar with the Test Admin Tool, and this issue will not affect you or your registration of content.

We don’t configure all member accounts for the Test Admin Tool, so, fortunately, this is an issue for the minority of our members. That said, for those members who do use the Test Admin Tool, this is not a trivial problem. And, we’re going to dedicate additional resources across the organisation to ensure it is fixed.

Next steps

We’ve repopulated the credentials in the Test Admin Tool based on our production accounts. It was our best option. While we don’t know your current credentials, our support and membership teams do know that the majority of our members using the Test Admin Tool have historically shared credentials between the Test Admin Tool and our production Admin Tool - doi.crossref.org. That means that many of you will be able to access the Test Admin Tool using those shared credentials; but some of you - who have used different credentials between the two systems - will not.

We also know that for many of you testing submissions is an integral step in your workflow, so we’ve determined this is an all-hands-on-deck situation and our staff, across the organisation, will be assisting members who have issues with access to test.crossref.org. Starting today, we’re actively monitoring submissions to the Test Admin Tool for access errors through Friday, 11 February. We’ll be proactively contacting affected members to reset their passwords. If you encounter problems before we reach out to you, please do contact us at at support@crossref.org and include ‘Accessing Test Admin Tool’ in your subject line.

A ROR-some update to our API

Rachael Lammey — Wed, 19 Jan 2022 00:00:00 +0000

Earlier this year, Ginny posted an exciting update on Crossref’s progress with adopting ROR, the Research Organisation Registry for affiliations, announcing that we’d started the collection of ROR identifiers in our metadata input schema. 🦁

The capacity to accept ROR IDs to help reliably identify institutions is really important but the real value comes from their open availability alongside the other metadata registered with us, such as for publications like journal articles, book chapters, preprints, and for other objects such as grants. So today’s news is that ROR IDs are now connected in Crossref metadata and openly available via our APIs. 🎉

This means ROR can be used by and within all the tools services that integrate with Crossref APIs to analyse, search, recommend, or evaluate research. It’s an important element of the Research Nexus, our vision of a fully connected open research ecosystem, and helps identify, share, and link the affiliations of those producing and publishing different types of research or receiving grants.

Now that this metadata is available, it helps confer the downstream benefits of ROR for different (and interconnected) groups:

It makes it easier for institutions to find and measure their research output by the articles their researchers have published, or perhaps make it easier to track the grants they’ve received.
Funders need to be able to discover and track the research and researchers they have supported.
Academic librarians need to easily find all of the publications associated with their campus.
Journals need to know where authors are affiliated so they can determine eligibility for institutionally sponsored publishing agreements.
Editors can use more accurate information on author and reviewer institutions during the peer review process, which can help avoid potential conflicts of interest.

Those are just a handful of use cases, which is why disseminating ROR affiliation identifiers via our APIs is so important; it lets others choose to do what they need to with the information, without restriction.

The story so far

A growing number of our members have started to include ROR in the metadata they register with us, so we’re excited to be able to see this via simple API queries.

At the time of writing we can see nearly 4,000 RORs being registered by these 21 members (we’ve removed test accounts). Note that many of these are being baked into metadata being registered for grant records, also recently released and now findable through the REST API:

"Wellcome": 2821,
"Natural Resources Canada/CMSS/Information Management": 277,
"University of Szeged": 139,
"RTI Press": 104,
"American Cancer Society": 103,
"University of Missouri Libraries": 77,
"Keldysh Institute of Applied Mathematics": 52,
"Boise State University, Albertsons Library": 52,
"Australian Research Data Commons (ARDC)": 52,
"The Neurofibromatosis Therapeutic Acceleration Program": 49,
"Boise State University": 12,
"The ALS Association": 11,
"Children's Tumor Foundation": 9,
"Episteme Health Inc": 3,
"The University of the Witwatersrand": 2,
"Office of Scientific and Technical Information": 2,
"AGH University of Science and Technology Press": 2,
"York University Libraries": 1,
"SZTEPress": 1,
"Masaryk University Press": 1,
"Institut für Germanistik der Universität Szeged": 1,

Our grants schema accommodated ROR first, so it’s the funder members and grant records that dominate the adoption of ROR… so far! But there are a few articles and reports there too already. These record types include ROR in their records:

"Grant": 3047,
"Report": 382,
"Dissertation": 164,
"Journal Article": 140,
"Conference Paper": 22,
"Posted Content": 12,
"Dataset": 7,
"Monograph": 6,
"Book": 3,
"Chapter": 2,
"Proceedings Series": 1,
"Peer Review": 1,
"Journal Issue": 1,
"Book Set": 1,
"Book Series": 1

We can currently see 205 different ROR IDs in Crossref metadata, with the most frequently provided ROR ID being: https://ror.org/02jx3x895, or University College London as it’s also known as.

If you’re a Crossref member keen to assert affiliation identification in your content, our recent webinar, Working with ROR as a Crossref member: what you need to know, covers all the detail.

Interested in using the information? Dig into our REST API documentation and into the API itself, use the polite pool if you can (i.e. identify yourself). There’s also a wealth of information on the ROR support site or being shared among integrators in the growing ROR community.

Join us in doing more with ROR!

Event Data now with added references

Martyn Rittman — Wed, 10 Nov 2021 00:00:00 +0000

Event Data is our service to capture online mentions of Crossref records. We monitor data archives, Wikipedia, social media, blogs, news, and other sources. Our main focus has been on gathering data from external sources, however we know that there is a great deal of Crossref metadata that can be made available as events. Earlier this year we started adding relationship metadata, and over the last few months we have been working on bringing in citations between records.

Our members deposit references alongside other metadata, and we have a lot of them. In fact, we have over 1.2 billion, with hundreds of thousands of new references added each day. While our metadata APIs make it easy to see which works are cited, it is much more difficult to find a list of citations to a specific work. We can make this easier by presenting citations as events in Event Data. Now that the huge majority of our members have responded positively to the Initiative for Open Citations (I4OC) campaign and Crossref’s open-by-default reference policy, the move to make this data available via Event Data is a natural step.

A bumpy ride, but we got there

Adding such a large amount of data means a significant increase in the data coming into Event Data, which has presented some challenges. We’ve known for some time that Event Data is not very stable, but we expected it to cope with the new data coming in. We have mitigated by initially only looking at new data, not trying to immediately back-fill with old references. Unfortunately, even with this limitation it hasn’t been a smooth ride, and our first effort to put references into Event Data uncovered bugs we didn’t know about and we had to walk back the changes. We tried again and found that we were hitting rate limits for our own APIs. This is a sure sign of technical debt: we shouldn’t need to be shifting large amounts of our own data from one place to another, and not at rates that could be putting stress on APIs used by others in the community.

We have managed to work around these problems and I’m pleased to say that we are now adding metadata from reference lists to Event Data. They can be accessed via the Event Data API: https://api.eventdata.crossref.org/v1/events?rows=10&source=crossref&relation-type=references&from-collected-date=2021-10-01

Where to next?

There remains work to be done. We would like to backfill references, and there is also further work to include relationships to objects that have identifiers other than Crossref records (genes, proteins, ArXiv identifiers, and so on). Our work on investigating sources is proceeding and we will be looking to add more next year. While possible, these steps will be costly and time-consuming if we proceed without significant changes to the infrastructure supporting Event Data.

When we started Event Data the volumes of data were much smaller and our infrastructure coped well, but as we’ve said here before, it’s in need of an overhaul. In fact, our recent experience and some other considerations are making us look at some very fundamental changes in how we record events.

We are therefore working on a new data model that will allow events to be stored alongside the rest of our metadata. This work is still in the early stages, but if we are successful it will mean that we won’t need to move data between databases. It will also make it easier to provide access to all of our reference metadata along with other relationships that we’re not currently able to provide, and give us the capacity to add new data sources.

Open references

[EDIT 6th June 2022 - all references are now open by default with the March 2022 board vote to remove any restrictions on reference distribution].

It is worth noting that only open references will be available via Event Data. This covers 88% of works with references at present. Members have the option to deposit references with limited visibility, meaning only Metadata Plus users can access them; or closed visibility, meaning that only the member who owns the cited work can retrieve the citation, via Cited-by.

We encourage our members to make their references open and deposit them as metadata. It makes them usable downstream by thousands of tools that researchers use. Including open references also improves the quality of metadata, and there are reciprocal benefits for the large number of members who openly share their reference data: they contribute to a large, openly available pool of data with many applications that advance research, and drives usage of the content published by our members.

If you are a Crossref member and unsure whether your reference metadata is open or not, check your participation report. This will tell you the percentage of your records with deposited references, and the percentage of those that are open. You can change the reference visibility preference for each DOI prefix that you own by contacting our support team. For guidance on how to deposit references, see our user documentation.

Come and get your grant metadata!

Rachael Lammey — Mon, 08 Nov 2021 00:00:00 +0000

Tl;dr: Metadata for the (currently 26,000) grants that have been registered by our funder members is now available via the REST API. This is quite a milestone in our program to include funding in Crossref infrastructure and a step forward in our mission to connect all.the.things. This post gives you all the queries you might need to satisfy your curiosity and start to see what’s possible with deeper analysis. So have the look and see what useful things you can discover.

How it started

Back in 2017 we posted the outcomes of some discussions with a newly-reformed Funder Advisory Group, plotting Crossref’s path. In 2018, Wellcome described their rationale for supporting the grants effort with the help of Europe PMC, and in 2019 the sub-groups of the Advisory Board put out a call for feedback on the metadata plan as the fee model they created was also approved by our board.

Since late 2019, research funders have been registering metadata and identifiers for their grants with us. We currently have a healthy 26k grants registered with us, via 13 funding organisations. I’d specifically highlight Wellcome for volume (registering via Europe PMC), and the Australian Research Data Commons (ARDC) who was the first funder that included ROR IDs in their grant metadata, really getting the value of connecting all related entities and contributors.

The reasons for registering grants with Crossref? Let’s recap:

Support of open data and information about grants
Streamlined discovery of funded content
Improved analytics and data quality
More complete picture of outputs and impact
Better value from investments in reporting services
Improved timeliness, completeness and accuracy of reporting: save time for researchers
More complete information to support analysis and evaluation without relying on manual data entry

How it’s going

For grant information to be used, it’s key that it is is openly available and disseminated as widely as possible. That work starts with funders registering their grants, and continues with us. Now that we’ve completed the REST API’s Elasticsearch migration, we’re happy to announce that all our grant information is now available via our REST API.

Here’s a snippet of the kind of metadata you can see related to the grants registered with us. This is information related to grant record https://doi.org/10.35802/218300, found using this request (https://api.crossref.org/works/10.35802/218300) which you can use to see the full metadata record:

"publisher": "Wellcome",
"award": "107769",
"DOI": "10.35802/107769",
"type": "grant",
"created": {
"date-parts": [
[
2019,
9,
25
]
],
"date-time": "2019-09-25T07:17:20Z",
"timestamp": 1569395840000
},
"source": "Crossref",
"prefix": "10.35802",
"member": "13928",
"project": [
{
"project-title": [
{
"title": "Initiative to Develop African Research Leaders (IDeAL)"
}
],
"project-description": [
{
"description": "Research is key in tackling the heath challenges that Africa faces. In KWTRP we have been committed to building sustainable capacity alongside an active and diverse research programme covering social science, health services research, epidemiology, laboratory science including molecular biology and bioinformatics. Our strategy has been successful in delivering high quality PhD training, leveraging individual funding and programme funding in order to place students in productive groups and provide high quality supervision and mentorship. Here we plan to consolidate and build on these outputs to address long-term sustainability. We will emphasise the full career path needed to generate research leaders. KWTRP aims to address capacity building for research through an initiative that employs a progressive and long term outlook in the development of local research leadership. The overall aim of the \"Initiative to Develop African Research Leaders\" (IDeAL) is to build a critical mass of African researchers who are technically proficient as scientists and well-equipped to independently lead science at international level, able to engage with funders, policy makers and governments, and to act as supervisors and mentors for the next generation of researchers.",
"language": "en"
},

If you dig in, you can see information about the project, investigators (including their ORCID iDs), the funder, award type, amount, description of the grant, and a link to the public page showing information about the grant. More information on the required and optional fields is available in our grants markup guide.

Here are some examples of the kind of things you can now ask:

Show me who is registering grants:

https://api.crossref.org/types/grant/works?rows=0&facet=funder-name:*

Show me all of the grants registered by Wellcome:

https://api.crossref.org/works?query.funder-name=Wellcome&filter=type:grant

Show me all of the grants associated with the investigator name Caldas:

https://api.crossref.org/works?query.contributor=Caldas&filter=type:grant

And bibliographic queries finding entries in…

Award number:

https://api.crossref.org/works?query.bibliographic=7196&filter=type:grant

Project title:

https://api.crossref.org/works?query.bibliographic=RIZ1&filter=type:grant

More to do

This is a milestone but it’s not the end of the story. We have more to add relationships, encourage the use of this metadata amongst publishers and their platforms, and to add grant records to our tools such as Participation Reports and Metadata Search. But in the meantime, feel free to get in touch if you have queries about registering grants with us or about using the related metadata in your tools and services.

This information will grow over time as more funders join Crossref and add their grant metadata and as more analyses is possible. We’re looking forward to the next steps!

Update on the outage of October 6, 2021

Geoffrey Bilder — Wed, 27 Oct 2021 00:00:00 +0000

In my blog post on October 6th, I promised an update on what caused the outage and what we are doing to avoid it happening again. This is that update.

Crossref hosts its services in a hybrid environment. Our original services are all hosted in a data center in Massachusetts, but we host new services with a cloud provider. We also have a few R&D systems hosted with Hetzner.

We know an organisation our size has no business running its own data center, and we have been slowly moving services out of the data center and into the cloud.

For example, over the past nine months, we have moved our authentication service and our REST APIs to the cloud.

And, we are working on moving the other existing services too. For example, we are in the midst of moving Event Data and, our next target, after Event Data, is the content registration system.

All new services are deployed to the cloud by default.

While moving services out of the data center, we have also been trying to shore up the data center to ensure it continues to function during the transition. One of the weaknesses we identified in the data center was that the same provider managed both our primary network connection and our backup connection (albeit- on entirely different physical networks). We understood that we really needed a separate provider to ensure adequate redundancy, and we had already had a third network drop installed from a different provider. But, unfortunately, it had not yet been activated and connected.

Meanwhile, our original network provider for the first two connections informed us months ago that they would be doing some major work on our backup connection. However, they assured us that it would not affect the primary connection- something we confirmed with them repeatedly since we knew our replacement backup connection was not yet active.

But, the change our provider made did affect both the backup (as intended) and the primary (not intended). They were as surprised as we were, which kind of underscores why we want two separate providers as well as two separate network connections.

So both our primary and secondary networks went down while we had not yet activated our replacement secondary network.

Also, our only local infrastructure team member was in surgery at the time (He is fine. It was routine. Thanks for asking).

This meant we had to send a local developer to the data center, but the data center’s authentication process had changed since the last time said developer had visited (pre-pandemic). So, yeah, it took us a long time to even get into the data center.

By then, our infrastructure team member was out of surgery and on the phone with our network provider, who realized their mistake and reverted everything. This whole process (getting network connectivity restored, not the surgery) took almost two hours.

Unfortunately, the outage didn’t just affect services hosted in the data center. It also affected our cloud-hosted systems. This is because all of our requests were still routed to the data center first, after which those destined for the cloud were split out and redirected. This routing made sense when the bulk of our requests were for services hosted in the data center. But, within the past month, that calculus had shifted. Most of our requests now are for cloud-based services. We were scheduled to switch to routing traffic through our cloud provider first, and had this been in place, many of our services would have continued running during the data center outage.

It is very tempting to stop this explanation here and leave people with the impression that:

The root cause of the outage was the unpredicted interaction between the maintenance on our backup line and the functionality of our primary line;
Our slowness to respond was exclusively down to one of the two members of our infrastructure staff being (cough) indisposed at the time.

But the whole event uncovered several other issues as well.

Namely:

Even if one of our three lines had stayed active, the routers in the data center would not have cut over to the redundant working system because we had misconfigured them and we had not tested them;
We did not keep current documentation on the changing security processes for accessing the data center;
Our alerting system does not support the kind of escalation logic, and coverage-scheduling that would have allowed us to automatically detect when our primary data center administrator didn’t respond (being in surgery and all) and redirect alerts and warnings to secondary responders; and
We need to accelerate our move out of the data center.

What are we doing to address these issues?

Completing the installation of the backup connection with a second provider;
Scheduling a test of our router’s cutover processes where we will actually pull the plug on our primary connection to ensure that failover is working as intended. We will give users ample warning before conducting this test;
Revising our emergency contact procedures and updating our documentation for navigating our data center’s security process;
Replacing our alerting system with one that gives us better control over escalation rules; and
Adding a third FTE to the infrastructure team to help us accelerate our move to the cloud and to implement infrastructure management best practices.

October 6th, 2021, was a bad day. But we’ve learned from it. So if we have a bad day in the future, it will at least be different.

More new faces at Crossref

Lindsay Russell — Thu, 21 Oct 2021 00:00:00 +0000

Looking at the road ahead, we’ve set some ambitious goals for ourselves and continue to see new members join from around the world, now numbering 16,000. To help achieve all that we plan in the years to come, we’ve grown our teams quite a bit over the last couple of years, and we are happy to welcome Carlos, Evans, Fabienne, Mike, Panos, and Patrick.

Our Software Development team has seen the most growth with the addition of Carlos, Mike, Panos, and Patrick; collectively, they bring specialist skills that are helping us to pay down technical debt, modernize our underlying infrastructure, and prepare for a consistent front-end experience. As a member of the Product team, Fabienne has a fresh take on our Similarity Check service, steering the upgrade to iThenticate v2. And Evans brings a scientific researcher perspective to our Member Experience team along with experience as a member who’s worked with our tools.

And now some words from each of them.

Carlos Del Ojo Elias

I am a computer scientist with a master’s degree in Bioinformatics. Previously I used to work as a security auditor. I’ve got experience in research and software development both in academia and industry. It’s very exciting for me to join Crossref as a Senior software developer on the technology team. My current project involves working on the authentication and authorization subsystems, exploring state-of-the-art technologies in order to improve our services. I have always enjoyed contributing to the open-source community, so it is a pleasure for me to work in an organisation that promotes the principles of openness and transparency of software and data.

Evans Atoni

I am a member of the Technical Support team having joined Crossref just a few weeks ago. I’m passionate about advancing open access and POSI. Helping our members sort through knotty technical queries and building relations with them to service their very diverse needs is what I’m most excited about in my role. In my spare time, I enjoy anything outdoors, family time, and traveling. I work remotely from Nairobi, Kenya.

Fabienne Michaud

I joined Crossref in April 2021 as a Product Manager for scholarly stewardship which includes the content comparison tool Similarity Check and I am thrilled to be a member of such a lovely, supportive and international team. I have a background in teaching and have worked in academic, research, and not-for-profit libraries in the UK for over 20 years in academic liaison, customer services, and management roles. These experiences have given me a user-centered approach and a drive to find collaborative, reliable, and pertinent technological solutions to support the research and scholarly community. Since starting at Crossref and, through my work with the Similarity Check Advisory Group, I have developed a good understanding of the current ethical issues facing the publishing sector (such as paper mills and other manipulations of the publication process) and a particular interest in how AI and automation tools can play a part in addressing these challenges.

Mike Gill

I’ve been a software developer for twenty years, having studied software engineering at university. During my career, I have worked mostly in the banking and engineering industries so this is my first time working in scholarly publishing. I confess that before joining Crossref I wasn’t aware that the community existed so I was excited to see how I could ply my trade in this new (to me!) field. The role also appealed as, having primarily been a team leader/line manager in my recent career, this was an opportunity to be hands-on again and work with modern languages such as Kotlin. In the end, though, what really sealed it for me was reading on the Crossref website that ‘we take the work seriously but not necessarily ourselves’ which basically sums me up. So I knew I’d be in good company and that has proven to be the case!

Panos Pandis

I joined Crossref as a Senior Software Developer in 2020, in the middle of the coronavirus pandemic. Moving to Crossref has been a much-needed breath of fresh air. I’m a big fan of open-source, and at Crossref, it just feels like home. Even more so after our recent commitment to the Principles of Open Scholarly Infrastructure (POSI). My main focus at the moment is Crossref’s Event Data service. I’m fascinated by the potential of Event Data and the broad audience I get to support and communicate with through the project. So if you spot me in a room, feel free to ask me anything about Clojure/Kotlin, Event Data, obscure technology, or kombucha recipes.

Patrick Vale

I’m delighted to have joined Crossref as the first Frontend Developer. My role covers the inauguration of a scalable framework in which we can build future User Interfaces, and generally making people’s lives easier as they interact with our products and services - if a human uses it, I’m interested! It’s my intention to provide a platform on which we can quickly iterate to build and adapt our interfaces to suit the rapidly changing needs of our community. It’s been a pleasure to learn about the impact Crossref has across the scholarly spectrum; and to work with a team of open, practical, and downright friendly colleagues is a privilege. Outside of work, I enjoy cycling, growing things, and most recently, avoiding two small cats while moving from anywhere to anywhere around the house.

Your contributions have been impactful and it will be fun to see all that you will surely contribute to our road ahead!

Outage of October 6, 2021

Geoffrey Bilder — Wed, 06 Oct 2021 00:00:00 +0000

On October 6 at ~14:00 UTC, our data centre outside of Boston, MA went down. This affected most of our network services- even ones not hosted in the data centre. The problem was that both of our primary and backup network connections went down at the same time. We’re not sure why yet. We are consulting with our network provider. It took us 2 hours to get our systems back online.

We are going to reprocess content that was in the process of being registered at the time of the outage in order to make sure everything gets registered correctly. This may take a few days to complete.

Why did we have such a complete outage and why did it take us so long to fix it?

We still run a significant amount of our infrastructure in a data centre outside of Boston that we manage ourselves. Even though we’ve been moving many of our services to the cloud, all our traffic was still routed through the data centre - so when it went down, most of our cloud services were unavailable as well.
It took us a long time to fix this because our infrastructure team only has two people in it. Only one of them is located near the data centre and was at the doctor’s when the outage occurred. Although we were alerted to the problem immediately, we had to send one of our development team members to the data centre to diagnose and fix the problem.

We have been aware of these weaknesses in our system since I took the role of director of technology in 2019, and we have been putting most of our efforts over the past two years into fixing them.

We know that an organisation of our size has no business trying to run and maintain a physical data centre ourselves. One of the strengths of cloud-based systems is that they can be administered from anywhere and don’t require anyone to physically go to a data centre to replace failed hardware or check that network connections are, in fact, live. We’ve been trying to move to the cloud as fast as we can. All new services that we build are cloud-based. At the same time we’ve been moving systems out of the data centre - starting with those that put the biggest load on our systems. To further aid this process we have budgeted to add an FTE to the infrastructure team in 2022.

What is really painful about this event is that we had just completed the last bit of work we needed to do before changing our traffic routing so that it would hit the cloud first instead of the data centre first. This would not have avoided the outage we just experienced, but it would have made it a bit less severe.

What is even more painful is that we had recently installed a third network connection with an entirely different provider because we were worried about just this kind of situation. But this third connection wasn’t yet active.

We already have a long list of tickets that we’ve created to address problems we faced in recovering from this outage. The list will undoubtedly grow as we complete a postmortem over the next few days. I will report back when we have more detail of what happened and have a solid plan for how to avoid anything similar in the future.

We know that an outage of this severity and duration has caused a lot of people who depend on our services extra work and anxiety. For this, we apologise profusely.

But at least we didn’t need to use an angle grinder.

2021 Board Election

Lucy Ofiesh — Tue, 28 Sep 2021 00:00:00 +0000

We are pleased to share the 2021 board election slate. Crossref’s Nominating Committee received over 60 submissions from members worldwide to fill five open board seats. It was a fantastic group of applicants and showed the strength of our membership community.

There are five seats open for election (three small, two large), and the Nominating Committee presents the following slate.

The 2021 slate

Candidate organisations, in alphabetical order, for the Small category (three seats available):

California Digital Library, University of California, Lisa Schiff
Center for Open Science, Nici Pfeiffer
Melanoma Research Alliance, Kristen Mueller
Morressier, Sebastian Rose
NISC, Mike Schramm

Candidate organisations, in alphabetical order, for the Large category (two seats available):

AIP Publishing (AIP), Penelope Lewis
American Psychological Association (APA), Jasper Simons
Association for Computing Machinery (ACM), Scott Delman

Here are the candidates’ organisational and personal statements

You can be part of this important process by voting in the election

If your organisation is a voting member in good standing of Crossref as of September 20, 2021, you are eligible to vote when voting opens on September 29, 2021.

How can you vote?

On September 29, 2021, your organisation’s designated voting contact will receive an email with the Formal Notice of Meeting and Proxy Form with concise instructions on how to vote. You will also receive a user name and password with a link to our voting platform.

The election results will be announced at the LIVE21 online meeting on November 9, 2021. Save the date!

Similarity Check news: iThenticate v2.0 ready for launch

Fabienne Michaud — Mon, 20 Sep 2021 00:00:00 +0000

Crossref Similarity Check news: iThenticate v2.0 ready for launch

Last year, we announced the upcoming launch of a new version of iThenticate, the product from Turnitin that powers Crossref Similarity Check. We know some of you have been waiting a long time for this upgrade and we are very happy to share with you that we are now ready to release it.

We will be rolling out this new version in stages, so not everyone will be able to upgrade to the new version immediately. We’ll start with new Crossref Similarity Check subscribers who use iThenticate in the browser, and one member who uses iThenticate via the eJournalPress API integration.

Next month, we will reach out to existing Crossref Similarity Check subscribers who use iThenticate in the browser (rather than through a manuscript tracking system), and further eJournalPress users. From then on, we’ll be contacting those of you who use Similarity Check through your manuscript tracking system, as and when your providers are ready to work with the new version.

Crossref Similarity Check - first things first

Crossref Similarity Check is a content comparison tool, powered by iThenticate and produced by Turnitin, to check the originality of scholarly works and detect potential cases of plagiarism. Crossref members are eligible for this service, which offers them a reduced rate for document checking (plus enhanced functionality) in exchange for making their own published content available to be indexed into the iThenticate database.

The Crossref Similarity Check service continues to grow in membership (1,531 members in 2020; 1,964 members in 2021, to date) and in the number of documents checked (1,922,621 manuscripts checked between January and July 2020 and 2,419,612 over the same period this year).

Just as with the current version of iThenticate, Crossref Similarity Check subscribers will be able to compare documents against a vast database of internet sources and over 78 million full-text documents contributed by the Crossref members that use the service:

Crossref - research articles, books, and conference proceedings provided by publishers of scholarly content all over the world
Crossref posted content - preprints, eprints, working papers, reports, dissertations, and many other types of content that has not been formally published but has been registered with Crossref
Internet - a database of archived and live publicly-available web pages, including billions of pages of existing content, and with tens of thousands of new pages added each day
Publications - third-party periodical, journal, and publication content including many major professional journals, periodicals, and business publications from sources other than Crossref Similarity Check members
Your Indexed Documents - other documents you have uploaded for checking (within your Crossref Similarity Check user account only, and not added to iThenticate’s main indexes)

What’s new

We are delighted to introduce the following new features and enhancements with iThenticate v2.0:

Increased document upload capacity
Suspicious and hidden character detection
Preprint exclusion filter
Refreshed and responsive interface
Similarity reports - save and share
Annotations
Content portal
Improved API

Increased document upload capacity

This new version of iThenticate has an increased document upload capacity of up to 800 pages/200 MB and a Google Drive document upload functionality. Please note that per-document fees allow for a maximum of 25,000 25,000 ~~characters~~ (EDIT 21/11/4: words), as one billable unit (25,001-50,000 25,000 ~~characters~~ (EDIT 21/11/4: words) is two billing units, and so on).

Suspicious or hidden character detection

A new ‘Red flag’ feature, highlighted at the top right hand side of the Similarity report and with in-line markers, signals the detection of hidden text such as text/quotation marks in white font or suspicious character replacement e.g., the substitution of a Latin e for a Cyrillic е or a Latin o for a Greek ο, which may have been deliberately added to avoid text-matching detection.

Red flag feature: Hidden characters in the iThenticate v2.0 Similarity report

Preprint exclusion filter

Increasingly, authors are making available a preprint of their article, either before or at the same time as submitting it to a journal. With Turnitin, we have therefore developed a new exclusion filter for ‘Preprint Sources’, which can be applied directly from your Similarity report.

Refreshed and responsive interface

The new iThenticate has a cleaner, more intuitive and accessible interface, with responsive design for ease of use on different screen sizes. The Similarity report is no longer a static image but a text that can be searched, copied and pasted. The display of matches has been improved and simplified with two views only: ‘Sources overview’ and ‘All sources’.

Similarity report in iThenticate v2.0

You can now save Similarity reports as a PDF file and share them via email through the iThenticate interface with authors. Please note: this is still work in progress and enhancements to this feature will be released in the coming months.

Annotations

Annotations in Similarity reports is a brand new feature available in private mode only (in shared folders) in this initial release. Annotations will display the date, time and comments and can be edited or deleted as required. These private annotations will not be included in the ‘save and share’ features mentioned above. Public, shareable, annotations will be included in a future release.

Private annotations in the new Similarity report

Content portal

The new ‘Content portal’ is a useful tool to check how much of your own published content has been successfully indexed into the iThenticate database and is now searchable. It will also help you self-diagnose and fix the content that has failed to be indexed.

Improved API for subscribers who integrate Similarity Check with their manuscript tracking system

API users will benefit from a new integration with manuscript tracking systems which will allow the display of the largest matching word count and the top 5 source matches alongside the Similarity score.

What’s next

We’re expecting a number of new features and enhancements to iThenticate version 2.0 as well as further manuscript tracking system API integrations in the coming months:

User/usage reporting functionality
Editorial Manager API integration
Further enhancements to the Similarity report user interface
Parent/child account management reporting, to assist Crossref Sponsors
Public vs. private annotations
Document resubmission flow
Customisable welcome email

We’ll keep you posted

We will post updates here as soon as new features, enhancements and API integrations are available and/or we are ready to upgrade the next group of members.

We’ll be contacting subscribers in stages to upgrade you to the new version, so keep your eyes open for an email from us. As you know, you have to supply full-text Similarity Check URLs in your Crossref metadata for over 90% of your own published content in order to be eligible for the service. We’ll be checking that anyone who wants to upgrade to v2.0 is still at 90% or above. You can check this yourself in advance on our eligibility checker - if you’ve fallen below 90%, the tool will give you instructions for adding your missing full-text Similarity Check URLs.

In the meantime, you will find the Similarity Check service documentation for the current version of iThenticate on our website. The documentation for the new version can be found on the Crossref Similarity Check site provided by Turnitin.

✏️ Do get in touch via support@crossref.org if you have any questions or suggestions or start a discussion on our Community Forum

Lesson learned, the hard way: Let’s not do that again!

Isaac Farley — Wed, 08 Sep 2021 00:00:00 +0000

TL;DR

We missed an error that led to resource resolution URLs of some 500,000+ records to be incorrectly updated. We have reverted the incorrect resolution URLs affected by this problem. And, we’re putting in place checks and changes in our processes to ensure this does not happen again.

How we got here

Our technical support team was contacted in late June by Wiley about updating resolution URLs for their content. It’s a common request of our technical support team, one meant to make the URL update process more efficient, but this was a particularly large request. Shortly thereafter, we were provided with nearly 1,200 separate files by Atypon on behalf of Wiley in order to update the resolution URLs of ~9 million records. We manually spot checked over 50 of these files, because, prior to this issue, our technical support team did not have a mechanism to automatically check for errors. That labor intensive review did not turn up any problems. That is, those 50 samples had no errors with the headers, like were found later.

Among the files we didn’t check, there were headers included in the files with different owning fromPrefix and acquiring toPrefix members’ DOI prefixes. In a URL update request, the prefixes should always be the same.

And still other files included requests to update records with DOIs that had never even been registered. Here are some examples:

_{H:email=support@crossref.org;fromPrefix=10.5555;toPrefix=10.5555

10.5555/doi1 http://www.newurl.com/whatever

10.5555/doi2 http://www.newurl.com/whatever2}

In the example above, these fictional DOIs are both under prefix 10.5555. Thus, the result of this request will ONLY be that the resolution URLs of DOI 10.5555/doi1 and 10.5555/doi2 are updated in the metadata.

_{H:email=support@crossref.org;fromPrefix=10.5555;toPrefix=10.9876

10.5555/doi1 http://www.newurl.com/whatever

10.5555/doi2 http://www.newurl.com/whatever2}

In this second example, these fictional DOIs are both under prefix 10.5555, but because the toPrefix in the header differs from the fromPrefix, the result of this request will be that the resolution URLs of 10.5555/doi1 and 10.5555/doi2 are updated in the metadata AND the owning prefix of both records will be transferred from prefix 10.5555 to prefix 10.9876.

We kicked off the URL update request on 30 June and all legitimate DOIs whose files were free of errors were updated by 7 July (yes, it takes about a week to update the resolution URLs for ~9 million records).

On 9 July, Peter Strickland of the International Union of Crystallography, one of 22 members affected by this mistake, contacted us to enquire how/why much of their content was resolving to incorrect URLs and why ownership of their content appeared within our search interface to be Wiley. Peter was rightly concerned. We were, too. Our technical support team quickly elevated this issue, because, frankly, this is not the first time our finicky URL update process has caused unwanted metadata updates, albeit not quite at this volume.

How we investigated the problem

We rallied our internal team. We investigated and discovered that we believed that some ~600,000 DOIs were erroneously included and updated in the requested 1,200 files. We later extended that estimate to include other conditions, in order to be as cautious as we could, to over 1 million DOIs. In the end, we determined that the incorrect files attempted updates of 1,228,041 DOIs. Due to the errors in the files (i.e., erroneous headers and non-registered DOIs), we only actually updated and then reverted 520,512 DOIs. The other 700,000+ DOIs were never updated (because of errors in the original files provided to us) or simply had never been registered with us.

Prior to this mistake, Crossref had never reverted a member’s metadata update before. To be clear, and as I said above, we have had other URL update mistakes over the years, like this one; they were just smaller in scale. We knew there were holes in our process that needed to be plugged. And we knew we needed a better solution for members to manage these updates themselves without our manual intervention. So, while there were mistakes made in the files supplied to us, this was our error and we’re fixing it; more on that below.

For this situation, we quickly realized that reversion of the metadata update was the best option for us, albeit we did not have an existing process in place to execute that reversion. That’s because we only keep the current version of each metadata record. We couldn’t back out of the change; we couldn’t simply restore these records to the metadata registered with us as of late June, because we no longer had an easily accessible, central record of those previous resolution URLs. What we did have was a record of all the previous submissions made against each DOI, so our technical team, focused their efforts there.

How we fixed all those records

We had two errors to correct: the ownership transfers (those records that had inadvertent and mismatched from/to prefixes) and the incorrect resolution URLs. We reverted all of the ownership transfers on 9 July and then double and triple checked that ownership during the week of 12 July to ensure we didn’t miss anything.

The resolution reversion was more complicated. We invested in creating a patch to identify the records that had been updated by our team, and then extract the last legitimate resolution URL registered with us by the owning member in order to revert the metadata for each record. In order to provide confidence that this mistake was contained, we also built a check into the patch to ensure that those DOIs that did have their ownership temporarily transferred were not updated during the few days that ownership was incorrect. That check helped us determine that none of the 520,512 DOIs were incorrectly updated beyond this mistaken URL update request.

The technical team built and tested this patch. The tests turned up gaps in the patch, so we refined it during the week of 2021 July 12. We kicked off the reversion of these records on Monday, 19 July at 20:05 UTC and the patch completed all reversions at 20:14 UTC, Thursday, 22 July.

In the end, we successfully reverted all of the resolution URLs for those 520,512 DOIs we identified; provided daily updates and apologies to the 22 affected members; together we worked some longer hours; and persevered.

Ed updates everyone internally on the situation and thanks all the people who worked together to resolve the issue

Next up

We don’t want this to ever happen again. Like, never. We clearly need to make changes to our internal processes to prevent this in the future.

Here’s what’s ahead:

We are building a checker that we can run URL update files through to automate and our checks. This means we will be able to check every single file in a large batch, rather than relying on manual and labor intensive spot-checking;
As said above, one compounding issue in this mistake was the mismatched from/to prefixes in the file headers. Our technical support team uses the same file headers to transfer ownership/stewardship of a record or set of records between members AND to update resolution URLs. These two tasks are almost never legitimately completed in the same file. That is, there is usually a lag between ownership transfers and resolution URL updates (most members will request an ownership transfer and then a month or two later update their URLs). Because of this, simply decoupling these two tasks (feel free to follow our work at this link) would help eliminate a glaring risk, so we’re working on that too;
Lastly, we’re researching ways we can streamline resource resolution URL updates. You can also monitor our progress on this one. No promises or specifics yet, but we’re eager to reduce toil on our technical support team, avoid problems like this one, and provide members safe and straightforward ways to better update your metadata.

Thanks for the support of the whole Crossref team and our community - and for reading this far! Never a dull moment…

Crossref Conversations: audio blog about helping open science

Rosa Morais Clark — Fri, 20 Aug 2021 00:00:00 +0000

Crossref Conversations is an audio blog we’re trying out that will cover various topics important to our community. This conversation is between colleagues Anna Tolwinska and Rosa Morais Clark, discussing how we can make research happen faster, with fewer hurdles, and how Crossref can help. Our members have been asking us how Crossref can support open science, and we have a few insights to share. So we invite you to have a listen.

[UPDATE: Since this recording ROR IDs are now part of the Crossref schema.]

Helpful links

Here are links to all the sources mentioned in the recording.

Thanks for listening!

Some rip-RORing news for affiliation metadata

Ginny Hendricks — Mon, 26 Jul 2021 00:00:00 +0000

We’ve just added to our input schema the ability to include affiliation information using ROR identifiers. Members who register content using XML can now include ROR IDs, and we’ll add the capability to our manual content registration form, participation reports, and metadata retrieval APIs in the near future. And we are inviting members to a Crossref/ROR webinar on 29th September at 3pm UTC.

The background

We’ve been working on the Research Organisation Registry (ROR) as a community initiative for the last few years. Along with the California Digital Library and DataCite, our staff has been involved in setting the strategy, planning governance and sustainability, developing technical infrastructure, hiring/loaning staff, and engaging with people in person and online. In our view, it’s the best current model of a collaborative initiative between like-minded open scholarly infrastructure (OSI) organisations.

Last year, Project Manager Maria Gould described the case for publishers adopting ROR and ROR was ranked the number one priority at our last in-person annual meeting. Now it’s time that Crossref’s services themselves took up the baton to meet the growing demand.

The inclusion of ROR in the Crossref metadata will help everyone in the scholarly ecosystem make critical connections more easily. For example, research institutions need to monitor and measure their output by the articles and other resources their researchers have produced. Journals need to know with which institutions authors are affiliated to determine eligibility for institutionally sponsored publishing agreements. Funders need to be able to discover and track the research and researchers they have supported. Academic librarians need to easily find all of the publications associated with their campus.

Earlier this month, GRID and ROR announced that after working together to seed the community-run Research Organisation Registry, GRID would be retiring from public service and handing the proverbial torch over to ROR as the scholarly community’s reliable universal open identifier for affiliations. That means that our members who have been using GRID now need to consider their move to ROR and think about how they can add ROR IDs into the metadata that they manage and share through Crossref.

The plan

We’ve been able to include ROR IDs for our grant metadata schema as affiliation information for two years, since July 2019. And the Australia Research Data Commons (ARDC) was the first member to add ROR IDs to the Crossref system in 2020. In early July, we completed the work to accept ROR IDs for affiliation assertions for all other types of records with an affiliation or institution element, such as journal articles, book chapters, preprints, datasets, dissertations, and many more.

Next, we will commence the plans to support ROR in our other tools and services, such as Participation Reports. We’ll work on alignment with the Open Funder Registry and share our plans to collect the information via the new user interface we’re developing for registering and managing metadata. Open Journal Systems (OJS) already has a ROR Plugin, developed by the German National Library of Science and Technology (TIB). This supports the collection of ROR IDs and future releases of this plugin and the OJS DOI plugin will allow including ROR IDs in the metadata sent to Crossref, to support thousands of our members to share ROR IDs via their Crossref metadata. We also aim to add ROR to our metadata retrieval options, including the REST API, which recently saw the start of an unblocking with our move to a more robust technical foundation.

The call for participation

Many Crossref publishers, funders, and service providers are already planning to integrate ROR with their systems, map their affiliation data to ROR, and include ROR in Crossref metadata. In addition to publishers and funders, libraries, repositories, and other stakeholders are developing support for ROR. For example, the Plan S Journal Checker tool uses ROR IDs to let people check whether a particular journal is compliant with an author’s funder and institutional open access policies. In addition, the ROR website shows a growing list of active and in-progress ROR integrations.

Crossref members registering research grants via Altum’s ProposalCentral system can already add ROR IDs. Now those registering articles, books, preprints, datasets, dissertations, and other research objects, can start including much clearer and all-important affiliation metadata as part of their content registration going forward. As with all newly-introduced metadata elements, we recommend adding ROR IDs from now and ongoing, but planning a distinct project to backfill older records. We know that more than 80% of records have been updated and enriched at least once with additional and cleaner metadata, so as members do this routinely, they can include ROR IDs alongside updating URLs, license or funding information, and other metadata.

For information on how ROR will be supported in the Crossref metadata, take a look at our latest schema release (version 5.3.0) or in this journal article example XML.

Join the discussion in our forum below and register for the Crossref/ROR webinar on September 29th at 3pm UTC to learn all you need to know about incorporating ROR into your Crossref metadata.

RFP: Help evaluate the reach and effects of metadata

Jennifer Kemp — Wed, 21 Jul 2021 00:00:00 +0000

UPDATE, 14 October 2021:

We received several excellent proposals in response to this RFP and we’d like to thank everyone involved for their time and enthusiasm.

We are excited to announce the two projects that have been selected, to run through early 2023. Stay tuned!

With or Without: Measuring Impacts of Books Metadata
This project will test the premise that academic books metadata improves discoverability and usage by assessing the impact of book chapter records with DOIs (unique from metadata associated with the entire book) with associated chapter and book attributes. The study aims to prove or disprove its hypothesis and rank metadata attributes by their association with successful content discovery and access. The findings will be considered alongside similar metadata research in order to develop a metadata efficacy framework, which can be used to determine the return on metadata investments by publishers and service providers.

Lettie Y. Conrad and Michelle Urberg, Independent consultants

Metadata For Everyone
This project will explore the metadata quality, consistency and completeness from various individual journals and communities. The project will pay special attention to elements that are most likely to vary across cultures, such as names and those that are potentially multi-lingual, with the understanding that metadata issues do not affect nor impact all communities in the same way.

Juan Pablo Alperin, Associate Director of Research, Public Knowledge Project & Co-Director, Scholarly Communications Lab
Mike Nason, Scholarly Communications & Publishing Librarian, University of New Bruinswick Libraries
Marco Tullney, Head of Publishing Services & Coordination Open Access at TIB – Leibniz Information Centre for Science and Technology

We’re excited (and a little nervous) to launch a new research project designed to assess the effects of metadata on research communications. We’re expecting this effort to be a significant contribution to the existing research on the topic and we’re really looking forward to getting started. We’re also a little nervous because of course we don’t know what the conclusions will be (after all, if we did, we wouldn’t be starting this project).

Assume nothing

It seems logical and very widely accepted that more and better metadata leads to good things. Does it? If so, how and how do we know that? What does the ‘before and after’ look like when metadata is corrected or enhanced? There are so many questions, so many stakeholders and enough variation around record types (books come to mind) and disciplines (hello citation styles) that the topic warrants all the attention it gets and more. This project is designed to be very broad in scope, sampling from various criteria, and is expected to last about a year.

Interested in getting involved?

If you’re a researcher involved in scientometrics or bibliometrics or if you’re a consultant with experience in original research, please have a read of the RFP and get in touch with a statement of interest by 1st September or with questions in the meantime. We’re looking for an individual, research group or organisation that will work with us over the course of the project to define terms, finalize the approach, analyze the data and communicate the results, whatever they may be.

RFP responses are requested by 1st September so don’t hesitate to get in touch with questions.

If you’re interested in the project but not in responding to the RFP, you may still be able to help. We would appreciate wide circulation of this announcement to help us find qualified respondents to the RFP so please do share this with your network. And, of course, we hope you stay tuned for the outcome of the work. Check back with us on that in about a year…

Behind the scenes improvements to the REST API

Patrick Polischuk — Tue, 06 Jul 2021 00:00:00 +0000

UPDATE, 24 August 2021: All pools have been migrated to the new Elasticsearch-backed API, which already appears to be more stable and performant than the outgoing Solr API. Please report any issues via our Crossref issue repository in Gitlab.

UPDATE, 9 August 2021: The cutovers for the polite and Plus pools are delayed again. We’re still working to ensure acceptable performance and stability before serving responses from the new application and infrastructure. Each cutover is currently delayed by one more week–the polite pool is scheduled for 2021 August 17 and the Plus pool is scheduled for 2021 August 24.

UPDATE, 2 August 2021: The cutovers for the polite and Plus pools are delayed. We’ve been mirroring traffic to the new polite pool and want to ensure acceptable performance and stability before serving responses from the new application and infrastructure. Each cutover is currently delayed by one week–the polite pool is scheduled for 2021 August 10 and the Plus pool is scheduled for 2021 August 17.

UPDATE, 13 July 2021: The first stage of the cutover is complete, so requests to the public pool are now being served by the new REST API. We took a slightly different approach to performing the cutover, so the “Documentation” and “Temporary domain” sections below have been updated.

Our REST API is the primary interface for anybody to fetch the metadata of content registered with us, and we’ve been working hard on a more robust REST API service that’s about to go live.

The REST API is free to use and it gets around 300 million requests each month (we encourage users to adhere to our etiquette guidelines to keep things running smoothly). It is used for bibliometric studies, by platforms like Dimensions, by organisations like the National Library of Sweden, and to support countless other efforts.

We also offer enhanced access to our APIs and other services with Metadata Plus, and we recommend it for production services and others that benefit from guaranteed up-time, a higher rate limit, and priority support from our helpful staff.

For a while now, we’ve been working to migrate the REST API from Solr to Elasticsearch and from our datacenter to a cloud platform in order to address issues of scalability and extensibility.

We’re pleased to announce that we’ll be cutting over to the Elasticsearch-backed version of the REST API over the next few weeks, beginning July 13. This cutover will occur one pool at a time–the public pool will be migrated first, followed by the polite pool on August 3, and the Plus pool on August 10 (see ’etiquette’ link above if you’re unfamiliar with our different pools). Please note updates at the top of this post for changes to the original schedule.

We’ve thoroughly tested the functionality and performance of the new REST API, and we’d like to invite you to test it out before we move production traffic to the new service. Try out your favorite API queries at https://api.production.crossref.org/.

Feature parity, but note a few differences

One of our primary objectives was to maintain feature parity between the old and new services, avoiding any breaking changes that might cause problems for existing services integrating with the REST API. We implemented a regression test suite which has given us the confidence to make such a foundational change. During the course of this project, we found it necessary and a good opportunity to make a few modifications. In each case, we analyzed usage and aimed to avoid making any breaking changes. We hope these represent improvements to the behavior and consistency of the REST API.

The group-title filter uses exact matching. This filter previously worked but was undocumented and unsupported.
The directory filter is deprecated. This was meant to be an experimental, unsupported filter, and the data has not met the standard we require.
The affiliation facet returns counts of affiliation strings rather than counts of terms within affiliation fields (thus resolving this Github issue).
Cursors may be used to page through results from the /members, /funders, and /journals routes, in addition to /works.
While we suggest that everyone use cursors for pagination, we still support the offset functionality. We have introduced a limit of 80000 for offset values for the /members /funders and /journals routes
offset behavior is slightly changed, now applying to the sum of rows and offsets rather than just offsets.
The published field is now present in API responses.
The /licenses route returns paged results.
Sorting by submitted is no longer supported. This was never officially supported or documented.
The /quality route has been removed. This was an undocumented, experimental feature.
Funder name in /works metadata is the name provided by the publisher.
Empty relation fields correctly return an empty object.
Only ISBN and isbn-type for a record will be returned. ISBNs for associated volumes will be omitted.
The institution field is a list.
query uses different stop word defaults, though we expect querying to remain roughly the same.
API responses may feature slightly different scores, as they come from different backends.

Some technical notes on the cutover

Documentation

The above changes are documented in our new REST API documentation, which is now automatically generated via Swagger, resulting in more comprehensive coverage and more efficient feature development. During the cutover, the right documentation for you will depend on which pool you are using. The documentation for the new API can be found by visiting the API in a browser, or by navigating to https://api.crossref.org/help; and the docs for the old API remain here: https://github.com/CrossRef/rest-api-doc. The Github-hosted documentation will be deprecated once the cutover is complete.

This may not come as news, but bears repeating as we mentioned GitHub. We have moved our source code repositories from GitHub to GitLab, including all of our issue tracking.

Temporary domain

UPDATE: We ended up performing the public pool cutover via reverse proxies instead of redirects–please disregard the note about temporary domains below. The api.crossref.org domain will remain the domain regardless of which pool you’re using or where we are in the cutover process.

Please note that the api.production.crossref.org domain is a temporary domain we are using during this cutover period. Traffic will be redirected to the new service one pool at a time via a 307 http redirect. Once the cutover is complete, we will go back to using the api.crossref.org domain. Do not update any software, scripts, libraries, tools, etc. to use the temporary domain.

Differences in query results

Due to inherent differences in how Solr and Elasticsearch perform queries and rank results, you may see slightly different results when comparing the old and new services. If for whatever reason your workflow involves using multiple API pools (which we don’t recommend), you may see inconsistent results.

Cursor behavior

Cursors may break if your script is paging through results at the exact moment the cutover is performed, and you should retry your request once the release is complete. We will post the precise maintenance window to https://status.crossref.org/.

Filing issues

Feature requests and bug reports should be filed into the Crossref issue repository in Gitlab during this testing phase and once the new Elasticsearch-backed API is live in production.

Coming next

While we hope the benefits of improved stability and extensibility are as exciting to you as they are to us, “feature parity” may not be the most thrilling message for our API users. In truth, one of the more exciting aspects of completing this migration is the end of the code freeze we instituted at the start of this effort. Now, we can work on new feature development and a continuous stream of bug fixes. We also improved the automatic test coverage as part of the work, meaning we can deliver features with greater confidence.

The first new feature we’ll be delivering via the REST API will be support for the “grants” record type, allowing for the retrieval of metadata for grants that have been registered with us, now numbering over 20,000 from 8 different funder members. This work is well underway and will be released once we are confident that the new REST API is stable in production. From there, we’ll continue to select the highest priority issues from our REST API backlog.

As always, should you have any questions about our REST API, check out the metadata retrieval section of our website, start a discussion on our community forum, file a Gitlab issue as mentioned above, or you can contact us via support@crossref.org.

DOAJ and Crossref sign agreement to remove barriers to scholarly publishing for all

Ginny Hendricks — Mon, 21 Jun 2021 00:00:00 +0000

22 June 2021, London, UK and Boston, MA, USA — The future of global open access publishing received a boost today with the signing of a Memorandum of Understanding between the Directory of Open Access Journals (DOAJ) and Crossref. The MOU formalizes an already strong partnership between the two organisations and furthers their shared pursuit of an open scholarly communications ecosystem that is inclusive of emerging publishing communities.

Both organisations aim to encourage the dissemination and use of scholarly research using open infrastructure, online technologies, regional and international networks, and community partners - all supporting local institutional capacity and sustainability around the world.

“DOAJ is delighted to be formalizing today’s agreement with Crossref, an organisation we are already closely aligned with. Together we stand a greater chance of encouraging an open, fair, and fully inclusive future for scholarly publishing,” said Lars Bjørnshauge, DOAJ Founder and Managing Director.

The agreement will enable content from journals indexed on DOAJ to be more easily identified through the use of Crossref metadata. The MOU also covers the exchange of a variety of services and information and greater coordination of technical and strategic requirements between DOAJ and Crossref. Included too is the development of outreach and training materials, coordination of service and feature development, as well as research studies to explore the overlaps and gaps in the journals and metadata covered by each organisation.

“As academic-led journals continue to grow in number and geographic reach, it’s important we support this community more effectively. Our partnership with DOAJ means we can share strategies, data, and resources in order to lower barriers for emerging publishers around the world,” said Ginny Hendricks, Crossref’s Director of Member & Community Outreach.

About DOAJ

DOAJ is a community curated online directory that indexes and provides access to high quality, open access, peer reviewed journals. DOAJ deploys more than one hundred carefully selected volunteers from among the community of library and other academic disciplines to assist in the curation of open access journals. This independent database contains over 15,000 peer-reviewed open access journals covering all areas of science, technology, medicine, social sciences, arts and humanities. DOAJ is financially supported worldwide by libraries, publishers and other like-minded organisations. DOAJ services (including the evaluation of journals) are free for all, and all data provided by DOAJ are harvestable via OAI/PMH and the API. See doaj.org for more information.

About Crossref

Crossref makes research objects easy to find, cite, link, assess, and reuse. We’re a not-for-profit membership organisation that exists to make scholarly communications better. We rally the community; tag and share metadata; run an open infrastructure; play with technology; and make tools and services—all to help put research in context. Visit crossref.org for further information.

Please contact louise@doaj.org or feedback@crossref.org with any questions.

Event Data: Help us fill in the gaps

Martyn Rittman — Fri, 11 Jun 2021 00:00:00 +0000

UPDATE August 2, 2021: This work was awarded to Laura Paglione of the Spherical Cow Group.

To date, we have collected around 740 million events from 12 different source since we launched our Event Data service service in 2017. Each event is an online mention of the research associated with a DOI, either via the DOI directly or using the associated URL. However, we know that there is much more out there. Because of this, we would like to explore where we could expand.

We invite proposals to conduct a gap analysis for Event Data sources, looking at what we currently collect and seeing what more could be added. For the most relevant new sources, we are seeking an estimate of the effort to include them, and establish whether it is possible: we know that there are sources that are paywalled or with restrictive licensing not compatible with Event Data.

The aim of the project is to identify a list of potential new sources. With community input, we will look to add a number of these to Event Data in the future based on needs and priorities.

For full details of the requirements and how to make a proposal, see here. The deadline for proposals is 11 July 2021 and we anticipate that the work will be completed by the end of October 2021.

An Advisory Group for Preprints

Martyn Rittman — Wed, 09 Jun 2021 00:00:00 +0000

We are delighted to announce the formation of a new Advisory Group to support us in improving preprint metadata. Preprints have grown in popularity over the last few years, with increasing focus brought by the need to rapidly disseminate knowledge in the midst of a global pandemic. We have supported metadata deposits for preprints under the record type ‘posted content’ since 2016, and members currently register a total of around 17,000 new preprints metadata records each month.

As preprints develop and different practices arise, we are keen to re-examine the metadata schema: to do this properly we need community input. We want to ensure that the schema is fit for purpose and supports the diversity of ways in which preprints are posted, linked with other objects, and used. Metadata schema need regular review, and this is just one example of a number of areas we are looking to update. Several topics we see as a high priority for preprints are better notification for when a preprint has been withdrawn or removed, accurate recording of versioning, and better indication of preprint server names.

We have invited a number of organisations we know to be active in this area, and are looking forward to some very positive discussions. Participants span five continents and include members who post preprints, indexing services, and others with significant experience in the area of preprints. The first meeting took place earlier this week and brought up a diverse range of themes that will be tackled in future meetings.

Time to put the "R" back in "R&D"

Geoffrey Bilder — Mon, 07 Jun 2021 00:00:00 +0000

It is time to put the ‘R’ back into R&D.

The Crossref R&D team was originally created to focus on the kinds of research projects that have allowed Crossref to make transformational technology changes, launch innovative new services, and engage with entirely new constituencies. Some Illustrious projects that had their origins in the R&D group include:

DOI Content Negotiation
Similarity Check (originally CrossCheck)
ORCID (originally Author DOIs)
Crossmark
The Open Funder Registry
The Crossref REST API
Linked Clinical Trials
Event Data
Grant registration
ROR

And for each project that has graduated, there have been several that have not. Some projects were simply designed to gather data. Others just didn’t generate enough interest. You are not truly experimenting if you don’t fail occasionally too.

Recently we’ve been doing very little experimenting of any kind. Instead, the R&D team has mostly been seconded to the software development team to help them through a period of organisational and process change. We would not have made it through the past two years without their help.

But now we’re ready to focus on more ‘R’ and less ‘D’. And to that end, we are increasing the size of the team as well. Rachael Lammey will be joining the team as Head of Strategic Initiatives. She will work alongside our Principal R&D Developers, Esha Datta and Dominika Tkaczyk. Together they will be able to engage with new communities and immediately start experimenting with ways in which Crossref might be able to address their needs and use-cases.

We hope to soon add to our list of distinguished R&D project alumni.

Rationale & details

The Crossref R&D group (AKA “Labs”) has been the incubator of many services that are now in production and which form a fundamental part of Crossref’s identity and value. Similarity Check, ORCID, Crossmark, Open Funder Registry, The REST API, Linked Clinical Trials, and Event Data all started as R&D projects. More recently the enhancement of our reference matching infrastructure and the development and launch of ROR were also R&D projects.

And prior to the formation of the outreach group in 2015, the R&D group also led a critical function engaging with communities that, at the time, Crossref only had tangential connections with: PKP; DOAJ; funders; and the data and altmetrics communities.

But since the R&D group merged with the technology team back in 2019, we have done very little “R.” and very little community engagement of our own. Instead, the R&D team has supported the development team through a period of major cross-cutting projects and organisational change. Dominika has led the REST API rewrite and Esha—when she is not acting as technical lead on ROR—has also worked on the API rewrite and has kept Crossref metadata search on its feet. We would not have been able to make it through the past few years without their help.

Throughout this period, Rachael Lammey has continued the vital work of identifying, engaging with, and advocating for members of our community who we previously didn’t even know were members of our community.

The strength of the R&D group was that it combined outreach, product, and development functions. It was not only able to engage with new constituencies, but to quickly experiment with ways in which Crossref might be able to serve them. Previously, members of the R&D team would return from a conference or workshop that no Crossref member had ever attended before with a set of new contacts and ideas for new services and tools. They’d form interest groups and develop prototypes. Sometimes the interest groups would lead nowhere and sometimes the prototypes would be discarded. But critically, some of them would turn into the major services and organisations that now form a foundational part of open scholarly infrastructure.

And this is why it makes so much sense for Rachael to join the R&D team. The group is most effective when it is able to engage with new communities and immediately start experimenting with ways in which Crossref might be able to address their needs and use-cases. Rachael’s extensive experience in both product management and outreach—combined with Esha and Dominika’s experience leading development projects—is exactly what we need to reinvigorate the group and put the R back into R&D.

To kick off, we are going to be working on some small-ish, discrete projects. These include:

Better matching and linking of preprints to published articles;
Extending our journal title classification to cover all journal and conference proceedings titles; and
Tools to allow us to community-source structured metadata correction information and feed it back to our members.

We will consult with and update the community on the kinds of projects we are working on through regular tech updates and a revitalised Labs area of our website.

Oh- and we will certainly be designing some new Labs creatures.
–G

The road ahead: our strategy through 2025

Ginny Hendricks — Thu, 03 Jun 2021 00:00:00 +0000

This announcement has been in the works for some time, but everything seems to take longer when there is a pandemic going on, including finding time and headspace to plan out our strategy for the next few years.

Over the last year or so we have had our heads down addressing how to scale our 20-yr-old system and operation – and adapting to new ways of working. But we’ve also spent time talking to people, forging alliances, looking ahead, and making plans. So we’re happy to now let everyone know exactly what we’ve been up to lately, what we are heading towards in 2025, and what projects and programs are prioritised on our near-term agenda.

Tl;dr

Introducing the new Crossref strategy through 2025, extending the one we published in 2018
There are now two additional strategic goals, to make six: bolstering our team; living up to POSI
Good progress has been made in reducing operational and technical debt - a lot of learning too
We’re unblocking stuff to get more done, including expanding R&D (more on that next week)
We have a new public roadmap 🎉
Come to next week’s mid-year update webinar to hear what’s happening and up next.

The emergence of a strategic agenda

2018 seems like a decade ago, doesn’t it? Back then we set out a 2018-2021 strategic direction—now archived—that described four goals: adapt to expanding constituencies; simplify and enrich services; selectively collaborate and partner with others; and improve our metadata quality and comprehensiveness. These themes were formed from the output of a planning exercise with our board in mid-2017 which tackled scenarios that remain true today, including: the increasing diversity in scholarly publishing (library-publishing, academic-led journals, shifting geographic dominance, etc.); the growth in preprints and other content formats; the sustainability of scholarly publishing (who is funding it and whether that is an expanding or shrinking pool); and the increase in policy and regulation in this space.

That meeting was the catalyst for embracing openness and a broader set of constituents. It was also decisive about Crossref’s role in this evolving community to focus on our core competencies, defined as:

A reputation as a trusted, neutral one-stop source of metadata and services

Managing scholarly infrastructure with technical knowledge and innovation

Convening and facilitating scholarly community collaboration.

So you can see how we got to focusing on metadata, services, infrastructure, and broad community collaboration.

Ahh, 2019, such an innocent time

When we wrote our post at the end of 2019 A turning point is a time for reflection we highlighted—with data—how different the Crossref community is nowadays. The post also linked to the results of our ‘value’ research project and a fact file which had even more hard data and posed the question Which Crossref initiatives should be top or bottom priorities?. To answer that, the LIVE19 annual meeting group voted (using betting chips) on priority initiatives, with the following results:

Support and implement ROR
Metadata best practices and principles
Support for multiple languages
Address technical and operational debt
Schema updates such as JATS and CRediT
Engagement with funders

We all know what happened next: the collective health and social trauma of the COVID-19 pandemic. All of us struggled. You all did too. Homeschooling, homeworking, homestaying. Caring for—and even saying goodbye to—sick friends and family. Also beloved colleagues. Alongside these unfamiliar new stresses, members were joining in growing numbers, funders kept joining to register grants, conferences went online and we loved them (before then hating them), the number of records we hosted kept going up, and publishing (especially preprints) skyrocketed.

The plan hasn’t actually changed much. Those charts in the 2019 fact file still make for remarkable reading as those same trends continue. We simply haven’t had time to update people on where we are with plans. So it’s high time we give an update on these priorities as well as contextualise them in longer-term goals.

But first, some framing

The chart below shows the approach we took to organise our thinking. A lot of it isn’t new; we have had the current mission statement, key messages (rally, tag, run, play, make), and truths since the rebranding work in 2015/2016. More recently, we have added POSI to our values, describing the principles and rules by which we operate as a committed open scholarly infrastructure organisation.

We already have a lot of 'words'. So why do we also need a vision statement and where do the goals fit in? In order to prioritize the things we will work on first, we need to be able to track everything to a higher vision, ensuring that everything we do is working toward an agreed destination. When we have organisation-wide goals, it means that everyone is clear on the direction, is able to prioritize individual and team work, and can see how their contribution fits in. This, in turn, instills confidence, and motivation - amongst staff as well as members and users.

Our working vision statement (feedback needed!) is:

We envision a rich and reusable open network of relationships connecting research organisations, people, things, and actions; a scholarly record that the global community can build on forever, for the benefit of society.

A vision is, of course, shared. It isn’t Crossref-specific but describes the world in which we all want to work together in future.

Now for those contextual six goals

Full details are on the new strategy page but here’s a summary below.

This goal is all about people, support, culture, and resilience. Not just because we’re coming through a panedmic, but also because we’re growing and we need to be able to scale and manage growth more purposefully, with appropriate policies, fees, and resources.

We published a POSI self-assessment earlier this year and like-minded initiatives are following suit. This is a stated goal because we want to be held publicly accountable to the Principles of Scholarly Infrastructure standards of governance, insurance, and sustainability.

This goal centres on growth, strengthening relationships, community facilitation, and content. Working with a growing number of Sponsors helps us lower barriers to participation around the world, including in languages other than English. Expanding the support we offer for research funders and institutions are priorities.

This goal involves researching and communicating the value of richer, connected, and reusable, open metadata, and incentivising people to meet best practices, while also making it possible (and easier) to do so.

We’ve always collaborated but we want to work even more closely with like-minded organisations to solve problems together. Perhaps in future we could also partner with others to find operating efficiencies for our overlapping stakeholders.

This goal is all about focus. And about delivering easy-to-use tools that are critically important for our community. A lot of invisible work has been happening behind the scenes; we’ve been strengthening (and will continue to strengthen) our code-base (while opening up all code) in order to unblock some of the initiatives we know people have been waiting for.

Read more about what projects are included in the above goals in our full 2025 strategic agenda.

You’re invited to a mid-year update webinar

Rather than saving everything for our annual—usually November—meeting, we’ll also do a mid-year update and plan to do so in May or June every year from now on, in addition to the November updates which include the board election and governance and budget information.

This year, we’re covering some of the main product development work we have completed, underway, and planned for the next quarter. We’ll run it live twice - once for those nearby The Americas timezones (June 8th 3pm UTC) and once for those nearby Asia Pacific timezones (June 9th 6am UTC). We have a lot to cover in 90 minutes—including unveiling [our public roadmap[(http://bit.ly/crossref-roadmap)]—but we’re going to try really hard to have a few minutes to discuss questions too.

In the meantime, or indeed anytime, join the discussion over on our community forum - see the discussion below and join in on our forum.

We want to be held accountable to these goals so we’re reliant on you, as a community, to let us know what you think of our 2025 strategic agenda. As always; we’re grateful for your support and advice.

Our annual open call for board nominations

Lucy Ofiesh — Thu, 27 May 2021 00:00:00 +0000

Crossref’s Nominating Committee is inviting expressions of interest to join the Board of Directors of Crossref for the term starting in 2022. The committee will gather responses from those interested and create the slate of candidates that our membership will vote on in an election in September. Expressions of interest will be due Friday, June 25th, 2021.

Board roles and responsibilities

Setting the strategic direction for the organisation;
Providing financial oversight; and
Approving new policies and services.

The board is representative of our membership base and guides the staff leadership team on trends affecting scholarly communications. The board sets strategic directions for the organisation while also providing oversight into policy changes and implementation. Board members have a fiduciary responsibility to ensure sound operations. Board members do this by attending board meetings, as well as joining more specific board committees.

Who can apply to join the board?

Any active member of Crossref can apply to join the board. Crossref membership is open to organisations that produce content, such as academic presses, commercial publishers, standards organisations, and research funders. In fact, this year the board has specifically included in the committee’s remit to “propose at least one name from a funder member for the current round of elections.”

There is a link at the bottom of this post to submit your expression of interest.

What is expected of board members?

Board members are expected to be comfortable assuming the responsibilities listed above and to prepare and participate in board meeting discussions.

About the election

This year we will elect two of the large member seats (membership tiers $3,900 and above) and three of the small member seats (membership tiers $1,650 and below). You don’t need to specify which seat you are applying for. We will provide that information to the nominating committee.

The election takes place online and voting will open in September. Election results will be shared at the November board meeting and new members will commence their term in 2022.

About the nominating committee

The nominating committee will review the expressions of interest and select a slate of candidates for election. The slate put forward will exceed the total number of open seats. The committee considers the statements of interest, organisational size, geography, gender, and experience.

2021 Nominating Committee:

Liz Allen, F1000/Taylor & Francis, London, UK, committee chair
Melissa Harrison, eLife, Cambridge, UK
Andrew Joseph, Wits University Press, Johannesburg, South Africa
Abel Packer, SciELO, São Paulo, Brazil
Lisa Scott, New England Journal of Medicine, Boston, USA

How do you apply to join the board?

Please click here to submit your expression of interest or contact me.

Service Provider perspectives: A few minutes with our publisher hosting platforms

Jennifer Kemp — Mon, 24 May 2021 00:00:00 +0000

Service Providers work on behalf of our members by creating, registering, querying and/or displaying metadata. We rely on this group to support our schema as it evolves, to roll out new and updated services to members and to work closely with us on a variety of matters of mutual interest. Many of our Service Providers have been with us since the early days of Crossref. Others have joined as scholarly communications has grown and services have evolved. Though fewer than 20 in number, their impact far outweighs the size of the group.

They, like us, work with a great variety of members and have a broad view into publishing trends. In this post, we focus on views from some of the publishing hosting platform Service Providers, who’ve taken the time to share their thoughts on a few questions:

What is the biggest change you’ve experienced working with publisher metadata over the last few years and how have you adapted to it?

It has become more and more important that not only the DOIs are registered with the minimum of necessary metadata to get the DOIs registered, but that a most complete set of metadata is being sent along – including author identifiers, funding information, abstracts, licenses, to support other Crossref services and improve discoverability.

– de Gruyter

Our clients are increasingly aware of the key role metadata plays in the effective dissemination of research. With an increasing number of published articles and a clear domination of “search engines” and aggregation of content, metadata is the primary means of making sure that publications reach the right audience. Publishers’ value-add includes not just copy editing, formatting, and packaging, but also now creating journal articles for the digital age that are discoverable and well linked to the research corpus. Furthermore, we sense a clear move toward standardization, which goes beyond the structure to introduce standardized semantics: adopting common taxonomies for classifying content in different dimensions. Our response is to introduce effective, automated and consistent services that capture, and surface metadata throughout the value chain from authoring to publication and search.

– Atypon

Highwire’s publishers are always looking to use the latest DTD (Document Type Definition) for the content to stay up to current standards. Currently this would be JATS 1.2. They are choosing to remain current so that they can stay on top of all or new metadata that can enrich their deposits. We have handled this well and offer support for the latest version of DTD when they are released, but some publishers are not always familiar with what can/should be deposited with their content and this can be a learning process for them.

– MPS Limited

How do you explain to clients (and others!) why correct, quality metadata is important?

In the digital age, metadata is the key to enabling effective content consumption. Publications that cannot be effectively discovered are of little value. We can only increase the impact of research with “discoverable” and “machine readable” publications. So ensuring correct and quality metadata is the key to optimizing not only the processing (finding the right journal, editor, reviewers) but also to positioning each publication properly. As the volume of published scientific research increases, article metadata is the way forward — it brings “order” and enables our community to manage this volume.

– Atypon

Highwire always positions itself as “good content in” means “good content out”. This is true for our own content stores. Strong and valid metadata will result in valid and strong deposits. We explain this to all new clients on-boarded with Highwire and the use of current standards and for current client projects where content should/can be enriched through re-load.

– MPS Limited

Getting our journals to care about metadata is a two step process: First, make sure they understand how metadata will help their journal succeed (i.e. why it matters to them). Second, make it easy for them to produce metadata while minimizing the cost, time, or complexity of their workflow. The first step – making a case for why metadata matters – is often easier than you’d think. At the very least, most journal editors understand that metadata, e.g., JATS or DOI registration, is an important signifier of professionalism / prestige. In other words, they see that top journals publish metadata and want the same for their journal. From a more technical standpoint, metadata is important because that’s the format computers understand and, like it or not, the publishing ecosystem relies on computers to deliver all sorts of critical services – such as indexing, archiving, and discoverability. So, if you’re not publishing metadata, you’re likely missing the benefit of these services. The second step – making it easy to produce metadata – is more difficult. Journal editors generally understand metadata matters but often lack the technical skills or resources necessary to create metadata. This is where a platform, such as Scholastica, can be very helpful. Because platforms work with many journals, they can invest in tools to automate the creation of metadata, reducing costs for all their clients. For example, most platforms offer integrations to support automatic DOI registration. At Scholastica, we’re pushing this idea even further with automatic integration to more complicated services such as PubMed Central. By reducing cost and complexity, we can help new or small-budget journals have the same quality metadata normally reserved for large, established journals.

– Scholastica

We are sending other publishers’ metadata to academic libraries and distribution channels. Erroneous metadata will have a direct impact on how discoverable a title may be. The more uniform and correct the metadata, the better it will be indexed in other places.

– de Gruyter

What is the one industry development or trend you’re most excited about for the near future and why?

Open Science and the ability to deliver research with the tools for reproducing it is the most exciting and game changing trend. Technology has enabled the output of science to transition from two-dimensional printed text delivery into globally accessible and responsive web-based delivery. We are now taking the next steps to further leverage web technology to enhance research output with rich assets ranging from audio and video, datasets, executable code, high-resolution imagery, interactive applications and more. As more assets accompany research publications, viewing these assets as modular, individually citable, and reusable becomes a requirement. We are reviewing the whole research output flow from authoring to publishing, and most importantly to its dissemination through the myriad of discovery tools now available.

– Atypon

The move of everything to the cloud – this is changing and improving our infrastructure, our possibility to scale and to stay on top of technological development.

– de Gruyter

Thanks very much to the interviewees for their time and thoughts. We look forward to working with our entire Service Provider group on questions like these and many more. If you’d like more details, you can read about our Service Provider program or contact me for more information.

Next steps for Content Registration

Sara Bowman — Mon, 17 May 2021 00:00:00 +0000

UPDATE, December 2025: The legacy Metadata Manager interace will be switched off on 1 January 2026. We have been in touch with affected members throughout the year with guidance and resources on making the switch to our newest helper tool or alternative content registration methods.

---

Hi, I’m Sara, one of the Product Managers here at Crossref. I joined the team in April 2020, primarily tasked with looking after Content Registration mechanisms. Prior to Crossref, I worked on open source software to support scientific research. I’ve learned a lot in the last year about how our community works with us, and I’m looking forward to working more closely with you in the coming year to improve Content Registration tools.

Just over a year ago, we updated you on the status of Metadata Manager. TL;DR: We learned that our approach with the tool wasn’t flexible enough to easily and quickly add other record types or update the input schema, and paused new development. We’re back with another update on Metadata Manager and our strategy for Content Registration user interfaces (UIs) going forward.

Our helper tools for Content Registration

The bulk of content registered with us is done so programmatically; that is, our members’ (or their service providers’) machines talking to our machines using our APIs. But, there are plenty of our members that don’t have the technical expertise to work with us this way. For those members, we provide various helper tools to assist with manual content registration.

We offer a variety of interfaces for registering many different types of content, including Web Deposit form for most record types, Metadata Manager for journal content, and Simple Text Query to register references. Each of these has its own use cases and limitations, leading to a confusing and inconsistent experience for members who are manually depositing metadata. From our perspective, maintaining this many interfaces in different codebases is inefficient, in part because an update to the schema likely leads to separate updates in each of them. A unified user interface to register content would both improve and simplify the user experience for you, our community, and make updates quicker and more efficient. The original goal of Metadata Manager was to be this unified interface. But we’ve learned that the approach we took was flawed: there have been problems reported by users, and the tool itself isn’t flexible enough to easily and quickly add new record types or support new fields when our input schema changes.

A new approach to helper tools

So we’ve decided to build something new and retire the old. We’ll be focusing on creating a brand new Content Registration user interface that will eventually replace Metadata Manager, the Web Deposit form, and Simple Text Query. And what we’ve learned from our experiences with Metadata Manager and Web Deposit has greatly influenced our strategy going forward. The new tool will:

Have a Community focus

Design for small - Our membership demographic is evolving. A large (and growing) number of our members are very small, often with a single publication and no technical resources. Creating XML can be a barrier to participating in Crossref, and our helper tools are designed to lower that barrier.
Accessibility and localization support - All of our UIs should support major international accessibility guidelines and translation into local languages, to meet the needs of our global membership.
Open source code - Build in the open, so that others can contribute. This could mean an entire UI that we haven’t prioritized, or adding a new translation file, or tweaking some CSS.

Follow user-centered design processes

Unified user interface - Improve user experience and simplify tools and services by providing members with one place to go to register content via a UI.
Rapid iteration - Focus on a technical solution that allows for rapid development of UIs to support new record types and updates to our schema.
Building the right features for the right users - The needs of our large members and smaller members are different. Experience has shown us that the core audience for a helper tool is smaller members; we’ll tailor the features to solve the challenges of our smaller members.

Allow us to build content for the future

Tactical approach to record types - Quickly build UIs in a strategic order. We can’t build support for every record type at once, so we want to identify and build in the areas of highest impact/lowest effort first.
Deliberate approach to supported fields - Not all members will supply metadata for all fields in our schema. Building a UI to support all fields for a specific record type before moving on to another slows progress on that next record type. We’ll identify the most-used and most-useful fields to support first, and add more in a future iteration if needed.

Deprecating Metadata Manager

In order to free up the resources to develop the new Content Registration UIs, we need to stop doing other things - that means not adding to, supporting, or bug-fixing other Content Registration tools. We’re setting an aggressive goal of sunsetting Metadata Manager by the end of 2021, with a commitment to a smooth transition to our new tool. This means that new members should not start using Metadata Manager. New members who need a helper tool have a few choices:

those who use the OJS platform from PKP to host their journals (OJS V3 and above) should use the third party Crossref OJS plugin to register their content.
other new members should use the Web Deposit form
current members who are using Metadata Manager may continue to do so, but are advised that we won’t be doing bug fixes or further development on the tool, and that support will be scaled back. If possible, you should transition over to using the Web Deposit form.

This wasn’t a decision made lightly, but one made after considering multiple options and all the data available to us about member usage and internal resources.

To highlight some of the data that led to this decision: the Support team tracks the types of support tickets they handle. In 2020, the 3rd most common ticket type was Metadata Manager-related. But less than 4% of metadata records registered with us are registered using Metadata Manager. Supporting Metadata Manager requires resources disproportionate to the amount of use the tool gets. For comparison, twice as many records are registered using the Web Deposit Form, but it generates far fewer Support tickets. To fix the bugs and issues reported about Metadata Manager requires an equally disproportionate amount of developer resources. So far, we have been unable to free up resources we would need to fix them all. Continuing to maintain this tool is effectively preventing us from building something new that will better meet the needs of our smaller members.

We know this will surprise and concern some of you, especially heavy users of Metadata Manager. We’re committed to making this a smooth transition, and over the coming months, we’ll provide more guidance to help current members migrate to our other tools.

Involving the community

Building a tool that allows us to create and adapt content registration forms based on example input files is an exciting new approach - one that will allow us to better serve the needs of our smaller members across multiple record types and support those who want to adapt our tools to their own needs. We’ve already begun work on a proof-of-concept tool aligned with this new strategy and I’m excited to drive it to production. As this project develops, we’ll keep in close contact with members, conducting user interviews, feedback sessions, and using usage data to help guide our decision-making on features and design. As we’ll be building in the open, we’ll have prototypes to share along the way as we iterate to produce a tool that will stand the test of time as well as scale to support even more content and members in future. We welcome your feedback over on our Community Forum, where we’ve set up a dedicated category to discuss this topic.

Doing more with relationships - via Event Data

Martyn Rittman — Fri, 14 May 2021 00:00:00 +0000

Crossref aims to link research together, making related items more findable, increasing transparency, and showing how ideas spread and develop. There are a number of moving parts in this effort: some related to capturing and storing linking information, others to making it available.

By including relationship metadata in Event Data, we are taking a big step to improve the visibility of a large number of links between metadata. We know this is long-promised and we’re pleased that making this valuable metadata available supports a number of important initiatives. We will also be backfilling, so all previously deposited relationships will eventually become available as events. The first step will be to add relationships between items that have DOIs, such as between a research article and a related review report or dataset.

What are relationships?

When members register metadata with us, they have the possibility to identify other works, items, and websites that they know are related. This might be supplementary material or previous versions of a work (especially for preprints and working papers). Equally, identifiers for a protein, gene, or organism used in the research can be included. These are recorded as ‘relationships’ and can be accessed in the same way as the rest of the metadata we hold about registered content.

Some examples

Relationships in the metadata show links to the published article from this bioRxiv preprint. In the Crossref Rest API:

"relation": {
 "is-preprint-of": [
 {
 "id-type": "doi",
 "id": "10.1038/s41467-020-17892-0",
 "asserted-by": "subject"
 }
 ],
 "cites": []
},

And now in Event Data:

"subj": {
 "pid": "https://doi.org/10.1101/2020.05.21.109546",
 "url": "https://doi.org/10.1101/2020.05.21.109546",
 "work_type_id": "posted-content"
},
"obj": {
 "pid": "https://doi.org/10.1038/s41467-020-17892-0",
 "url": "https://doi.org/10.1038/s41467-020-17892-0",
 "method": "doi-literal",
 "verification": "literal",
 "work-type-id": "journal-article"
},

Linking to a dataset in the Dryad Digital Repository by a recent eLife article. In the Crossref metadata:

"relation": {
 "is-supplemented-by": [
 {
 "id-type": "doi",
 "id": "10.5061/dryad.s58qh",
 "asserted-by": "subject"
 }
 ],
 "references": [
 {
 "id-type": "doi",
 "id": "10.5061/dryad.s58qh",
 "asserted-by": "subject"
 }
 ],
 "cites": []
},

And now in Event Data:

"subj": {
 "pid": "https://doi.org/10.7554/elife.19920",
 "url": "https://doi.org/10.7554/elife.19920",
 "work_type_id": "journal-article"
},
"obj": {
 "pid": "https://doi.org/10.5061/dryad.s58qh",
 "url": "https://doi.org/10.5061/dryad.s58qh",
 "method": "doi-literal",
 "verification": "literal",
 "work-type-id": "Dataset"
},

If you are interested in relationships for a single DOI, we still recommend checking the metadata of that record, however Event Data is a great option for looking across multiple records. For example, to check for relationships across a prefix, in a given time period, or for a specific type of relationship.

Data citation

Data citations can be included in data deposits in relationship metadata, usually using the ‘is-supplemented-by’ relationship. By creating an event from each relationship, the links between journal articles and books, and the data they rely on are more visible. This makes the data much easier to locate.

Many datasets have DOIs which are usually recorded with DataCite, meaning you are unlikely to find them via searches of Crossref metadata. Making data citation relationship metadata available in Event Data means it will be available in the same format as citations from datasets to articles (which DataCite sends to Event Data) and citations from articles to datasets from Crossref reference metadata (more to come on this later this year). It also means we will convert this information into Scholix format so that it can be harvested and combined with other sets of Scholix-compliant article/data links. Data citations will therefore be available for the community to identify, share, link and recognise research data. We’re working with initiatives like Make Data Count and STM’s research data program to support the growing uptake of good data citation practices. This is a big step forward in making data citation happen for the community; we have more to do, but Crossref is committed to completing this work as a strategic priority.

What’s next?

In this first stage we are adding relationships that link two objects with a DOI, and later this year we will bring in relationships using other identifiers such as accession numbers and URIs. That will make it more straightforward to ask questions of Event Data such as which organisms have relationships to which works with a DOI.

More info and staying in touch

Find out more about Event Data in our support documentation or check out tickets in the GitLab repo.
Keep informed and ask us anything via our community forum for Event Data discussion

Open-source code: giving back

Joel Schuweiler — Fri, 30 Apr 2021 00:00:00 +0000

TL:DR;

Hi, I’m Joel
GitLab UI unsatisfactory
Wrote a UI to use the API
Wrote a missing API
Open company contributes changes back to another open company
Now have a method for getting work done much easier
Hurrah!

I’m Joel, a Senior Site Reliability Engineer here at Crossref. I have a long background in open source, software development, and solving unique problems. One of my earliest computer influences was my father. He wrote software to support scientists in search of things like the top quark, the most massive of all observed elementary particles.

One day my father came home with over 40 floppy disks, excited to have this cool, free operating system called Linux. Together we installed Linux and ended up with a fully functional computer. Learning and using Linux opened up an entirely new world to me of amazing open-source software that I could use freely. As I enjoyed all this new software now available to me, I tried to fix any bugs or problems I’d encounter and report solutions for them to the software developers. It felt great to be able to contribute back so others could benefit.

Software teams tend to manage their workflow by writing issues, reviewing them to make sure they make sense and have an achievable goal, estimate how much time it will take to complete, and finally––the crucial step––putting the issues in the order in which they should be completed. To manage my work, I’ve always used Jira––a product designed to help teams of all types prioritize work––and for the first time in over a decade, I find myself not using it in my work.

Product development tracking with GitLab

The Crossref team took the decision a few years ago to move all our development and product tracking work via GitLab––a commercial open-source product anyone can use to help keep track of software throughout the development life cycle––with an open-by-default policy. Work is tracked using the issues feature of Gitlab. GitLab will host it, so you don’t have worry about maintenance and backups. One major drawback I discovered with GitLab, is a lack of maturity when it comes to doing light project management work.

This is where the trouble begins with GitLab.

In the board view of your issues, you can transition your issues from waiting, to in progress, from in progress to done. The problem with this view is its width-restricted, and things like tags on issues, which are used to help categorize, take up valuable vertical space. With enough tags and a long enough subject line, you can only see five issues at a time on a MacBook Pro monitor, for example.

In the list view of your issues, you get a clean compact view; the perfect view to order issues. However there’s one major flaw, it’s paginated. (You know when you’re shopping and they make you click to see another page of goods? Yes, like that.) The problem with GitLab’s implementation is you can drag and drop issues on a given page, but there is no way to move the issues to another page in the list of results. Additionally, all newly-created issues are added to the end of the list.

The solution

I went about finding a solution by visiting GitLab’s own public issue page and found that requests requiring user interface (UI) changes would languish; in some cases, they would go years without getting approval. Instead of putting in all the work to open an issue with them, only to have it be discarded or ignored, I decided to look for another way.

GitLab has an API, what more could I need? I discovered I could log in and get a list of all the issues, by project, and by group. “This is perfect!”, I thought. I can write my own UI around it. It took three evenings writing a UI that was satisfactory to me. When I started writing javascript to interact with the UI, I learned that the ’re-ordering of issues’ didn’t actually have an API. Further investigation lead me to the issue tracker where I found an issue by a GitLab employee asking for the same functionality––the ability to re-order issues.

While in a chatroom for GitLab development, I was genuinely surprised by my experience. There was quick attentive help on locating the file I would need to implement the change, they set up a development environment, and even helped submit tests for my code while I worked on updating documentation and writing a changelog entry. It felt like GitLab must’ve designated an employee to work with the community on submitting improvements. In no time, the API for re-ordering was implemented. After the scheduled monthly release of GitLab rolled out with my new API, I was able to easily re-order issues.

GitLab’s response when help was needed along the way was impressive. Now there is a much easier method for getting work done that everyone can use. It’s rewarding when you can contribute back to the community for all to benefit.

Is GitLab as polished as Jira? No. Did they embrace me making changes by being open from the start and providing help along the way? Yes. Do I see Jira shifting its culture to match? Unlikely.

By emulating GitLab, an open organisation like Crossref has a shot at encouraging community development.

Stepping up our deposit processing game

Isaac Farley — Mon, 08 Mar 2021 00:00:00 +0000

Some of you who have submitted content to us during the first two months of 2021 may have experienced content registration delays. We noticed; you did, too.

The time between us receiving XML from members, to the content being registered with us and the DOI resolving to the correct resolution URL, is usually a matter of minutes. Some submissions take longer - for example, book registrations with large reference lists, or very large files from larger publishers can take up to 24 to 48 hours to process.

However, in January and February 2021 we saw content registration delays of several days for all record types and all file sizes.

Tell me more

Januaries and Februaries are usually busy at Crossref. Journal ownership changes hands. Members migrate from one platform to another (and can need to update tens of thousands of their resolution URLs). And, many of you are registering your first issues, books, or conferences of the year. Others of you have heard the calls of The Initiative for Open Citations (I4OC) and The Initiative for Open Abstracts (I4OA) and are enriching your metadata accordingly (thank you!). Tickets into our support and membership colleagues peak for the year. But did we see significantly more submissions this year?

As you can see, we did see larger-than-normal numbers of submissions in the first two months of the year. For the entire month of January 2021, we received nearly 1 million more submissions into our admin tool deposit queue than we did in January 2020 (2,757,781 in 2021 versus 1,848,261 in 2020). Under normal circumstances, this would lead to an increase in our processing times, so there’s that to consider. But there was also something else at play this year. We desperately needed to upgrade our load balancer, and so we did. Unfortunately, unforeseen at the time, these upgrades caused hiccups in our deposit processing and slowed down submissions even further, building up the number of unprocessed submissions in the queue.

When we saw the impact this was having we suspended the load balancer work until things were stable again. We also increased the resources serving our queue to bring it back down to normal. To make sure we don’t face the same problem again, we have put in better tools to detect trends in queue usage- tools which, in turn, will allow us to anticipate problems in the queue instead of reacting to them after they’ve already occurred. And as a longer-term project, we are addressing two decades of technical debt and rearchitecting our system so that our entire system is much more efficient.

Gory technical details

As part of our effort to resolve our technical debt, we’re looking to transition more of our services to the cloud. To accomplish this, we first needed to upgrade our internal traffic handling capabilities to route things to their new locations better. This upgrade caused some unforeseen and hard to notice problems, like the queue being stalled. Since the queue still showed things in process, it wasn’t immediately apparent that things were not processing (normally the processing on the queue will clear a thread if a significant problem occurs).

We initially noticed a problem on 5 February and thought we had a fix in place on the 10th. But, we again realized on 16 February that the underlying problem had recurred, and we needed a closer investigation.

For many reasons it took us too much time to realize the connection, until people started complaining.

While our technical team worked on those load balancer upgrades, some of your submissions lingered for days in the deposit queue. In a few examples, larger submissions took over a week to complete processing. Total pending submissions began to push nearly 100,000, an unusually large backlog. We called an emergency meeting, paused all related work, and dedicated additional time and resources to processing all pending submissions. On 22 February, we completed working through the backlog of pending submissions and new submissions were being processed at normal levels. As we finish up this blog on 2 March, there are less than 3,000 pending submissions in the queue, the oldest of which has been there for less than three hours.

This brings us back to the entire rationale for what we are doing with the load balancer - which, ironically, was to move some services out of the data centre so that we could free-up resources and scale things more dynamically to match the ebbs and flows of your content registration.

But before we proceed, we’ll be looking at what happened. The bumps associated with upgrading ancient software were expected, so we were looking for side effects. We just didn’t look in the right place. And we should have detected that the queues had stalled well before people started to report it to us. A lot of our queue management is still manual. This means we are not adjusting it 24x7. So if something does come in when we are not around, it can exacerbate problems quickly.

What are we going to do about it?

In a word: much. We know that timely deposit processing is critical. We can and will do better.

First off, we have increased the number of concurrently processing threads dedicated to metadata uploads in our deposit queue from 20 to 25. That’s a permanent increase. A million more submissions in a month necessitates additional resources, but that’s only a short-term patch. And we were only able to make this change recently due to some index optimizations we implemented late last year.

One of the other things that we’ve immediately put into place is a better system for measuring trends in our queue usage so that we can, in turn, anticipate rather than react to surges in the queue. And, of course, the next step will be to automate this queue management.

All this is part of an overall, multi-year effort to address a boat-load of technical debt that we’ve accumulated over two decades. Our system was designed to handle a few million DOIs. It has been incrementally poked and prodded to deal with well over a hundred million. But it is suffering.

Anybody who is even semi-technically-aware might be wondering what all the fuss is about? Why can’t we fix this relatively easily? After all, 130 million records—though a significant milestone for Crossref—does not in any way qualify as “big data.” All our DOI records fit onto an average sized micro-SD card. There are open source toolchains that can manage data many, many times this size. We’ve occasionally used these tools to load and analyse all our DOI records on a desktop computer. And it has taken in just a few minutes (admittedly using a beefier-than-usual desktop computer). So how can a queue with just 100,000 items in it take so long to process?

Our scale problem isn’t so much about the number of records we process. It is about the 20 years of accumulated processing rules and services that we have in place. Much of it undocumented and the rationale for which has been lost over the decades. It is this complexity that slows us down.

And one of the challenges we face as we move to a new architecture is deciding which of these rules and services are “essential complexity” and which are not. For example, we have very complex rules for verifying that submissions contain a correct journal title. These rules involve a lot of text matching and, until they are successfully completed, they block the rest of the registration process.

But the workflow these rules are designed for is one that was developed before ISSNs were widely deposited and before we had our own, internal title identifiers for items that do not have an ISSN. And so a lot of this process is probably anachronistic. It is not clear which (if any) parts of it are still essential.

We have layers upon layers of these kinds of processing rules, many of which are mutually dependent and which are therefore not easily amenable to the kind of horizontal scaling that is the basis for modern, scalable data processing toolchains. All this means that, as part of moving to a new architecture, we also have to understand which rules and services we need to move over and which ones have outlived their usefulness. And we need to understand which remaining rules can be decoupled so that they can be run in parallel instead of in sequence.

Pro tip: Due to the current checks performed in our admin tool, for those of you submitting XML, the most efficient way to do so is by packaging the equivalent of a journal issue’s worth of content in each submission (i.e., ten to twelve content items - a 1 MB submission is our suggested file size when striving for efficient processing)

Which brings us conveniently back to queues. We did not react soon enough to the queue backing up. We can do much better at monitoring and managing our existing registration pipeline infrastructure. But we are not fooling ourselves into thinking this will deal with the systemic issue.

We recognize that, with current technology and tools, it is absurd that a queue of 100,000 items should take so long to process. It is also important that people know that we are addressing the root of the issues as well. And that we’re not succumbing to the now-legendary anti-pattern of trying to rewrite our system from scratch. Instead we are building a framework that will allow us to incrementally extract the essential complexity of our existing system and discard some of the anachronistic jetsam that has accumulated over the years.

Content Registration should typically take seconds. We wanted to let you know, that we know, and we are working on it.

Discuss all things metadata in our new community forum

Vanessa Fairhurst — Thu, 11 Feb 2021 00:00:00 +0000

TL;DR: We have a Community Forum (yay!), you can come and join it here: community.crossref.org.

Community is fundamental to us at Crossref, we wouldn’t be where we are or achieve the great things we do without the involvement of you, our diverse and engaged members and users. Crossref was founded as a collaboration of publishers with the shared goal of making links between research outputs easier, building a foundational infrastructure making research easier to find, cite, link, assess, and re-use. It is at the very core of what we do and who we are. Our global community now includes publishers, libraries, government agencies, funders, researchers, universities, ambassadors, and more from over 140 countries. We are also actively part of the larger scholarly research community, which includes other open scholarly infrastructure organisations, metadata users and aggregators, open science initiatives, and others with shared aims and values.

What do we mean by ‘community’?

‘Community’ is often one of those words which gets bandied around without much thought given to its meaning. At Crossref, we are aware that expertise lies within our broad, global community and we engage with them (you!) in a variety of ways to ensure that decisions we make are community-led and that what we do, as well as what we don’t do, are in line with the views of our members and developed with your insights and input. We do this via our working groups, committees, ambassador program, beta-testing groups, in-person and online events, webinars, and on-going dialogues and feedback via our support channels and even social media. We are also involved in a number of collaborative projects with other organisations such as ROR, Metadata 2020, Make Data Count, PIDapooloza, and the FREYA project to name but a few.

Community is more than just signing up to be a Crossref member. It’s more than just attending an event or a webinar, or levelling up to include the use of a service like Crossmark or Similarity Check –– it’s really engaging with us and creating something together of shared value for the scholarly community. As an organisation, we’ve been so thrilled that there is a new group dedicated to highlighting community managers and our work. We are working with –– and learning a lot from –– the Centre for Scientific Collaboration & Community Engagement to improve the way we interact and involve people in Crossref. The model below shows a trajectory towards true collaboration that we aim to follow in the coming months and years.

Cite as: Center for Scientific Collaboration and Community Engagement. (2020) The CSCCE Community Participation Model – A framework for member engagement and information flow in STEM communities. Woodley and Pratt doi: 10.5281/zenodo.3997802

In the current climate, there are additional challenges and limitations on how we interact with all the various communities that we as individuals are a part of, both professionally and personally. I wrote in my last blog about how we have moved our events online and thought about new ways to better connect and engage with our community virtually. One of those ways is our Community Forum.

The purpose of our community forum

Hosted on the open-source discussion platform Discourse, you can find our forum at community.crossref.org. The goal of the community forum is to create an inclusive, open space where Crossref members, ambassadors, sponsors, service providers, and others who share a passion for scholarly infrastructure, can connect. This enables collaborative problem-solving, the sharing of expertise and experiences across time zones and languages, and allows members to post questions to be answered by other community members or even our staff. Members of the community engage via creating posts, commenting on existing content in the forum, volunteering for working groups or beta-testing projects, helping to co-create materials that include translations and shared FAQs, giving feedback on new developments, and joining online events and webinars. Throughout these interactions, we expect that those who use the community forum will form relationships –– a collective working together to advance their work with Crossref and shape the future of scholarly infrastructure.

When I joined Crossref as Community Manager over three years ago, the idea of a forum had already begun to take shape, but it wasn’t quite there just yet. There was additional research and consultation with the community to be done to check this was the approach we wanted to take.

This involved speaking to others working in scholarly communications about forums they were involved in running or were an active participant of –– check out the PKP forum for instance if you haven’t already –– and having numerous valuable conversations about successes, potential downfalls, and realistic expectations. The most important –– and commonly cited –– takeaway is that building an online community takes time. We are still at the start of this journey. It will only work if it is a place of value for all and a place where people feel a sense of belonging and co-ownership.

Preparing to rollout the forum

We tested the platform with a small group of beta-testers and also sent out a survey to over 1,700 of our members, taking a sample with a geographical and organisational spread. The responses thankfully held no major surprises and reinforced our belief that this is something of use to people.

Key research findings

77% of respondents had previously contacted our Support team for help resolving an issue.
90% stated either ‘yes’ or ‘maybe’ to whether they would use a community forum to post their questions, though over half have never used a forum before.
Most common reasons of importance for joining are ‘Community support in solving issues or answering questions’, ‘To locate FAQs and quickly find answers to common issues’, and ‘To connect with others working in a similar role and/or with similar interests’
Most commonly-stated things that would discourage or limit member’s participation would be how time-consuming and complex the forum is to use, and any potential language barriers.

Things you can do on the forum

We hope this will provide a much more open level of support for the community, enabling us to bring out all those great questions and thoughtful conversations we receive via our Support channels into the public sphere, where we can all benefit from these rich exchanges. Ultimately our goal for the future is that this space is owned by you, the Crossref community. This is a platform for you to connect and build relationships with others working in scholarly communications: metadata fanatics, identifier aficionados, developer gurus, and open research enthusiasts - we welcome you all!

Share what activities or projects you are working on and get input from others.
Share issues that you need some help resolving, post a question to the forum in your native language and get help from another community member.
Give us feedback on our plans and help us shape future developments at Crossref.
Test out new tools and services.
Find out about upcoming events and webinars, and share any you think are of interest to the community.
Help us identify better ways of working together through Crossref and co-create new materials and projects.

How to get started

So, how do I sign up you ask? Simply head over to community.crossref.org and set up an account. There’s a useful How-To guide available on our welcome post, as well as some Community Guidelines all our members should follow.

Do you have a question about registering or updating your metadata? Then head over to the Content Registration category and post your query to the group. Want to find out about getting started with Similarity Check service? Then take a look at our Similarity Check topic in our services category. Or maybe you want to know more about upcoming multilingual webinars at Crossref, or perhaps you have one of your own you’d like to share? Then check out the Community Calendar.

We’re also looking for talented linguists out there to help us translate our welcome email template into multiple languages so that anyone joining the community can get a welcome in their native language. To join in, visit my post in our ‘Questions from Crossref’ category.

We look forward to seeing you in the community soon!

Event Data: A Plan of Action

Martyn Rittman — Mon, 01 Feb 2021 00:00:00 +0000

Event Data uncovers links between Crossref-registered DOIs and diverse places where they are mentioned across the internet. Whereas a citation links one research article to another, events are a way to create links to locations such as news articles, data sets, Wikipedia entries, and social media mentions. We’ve collected events for several years and make them openly available via an API for anyone to access, as well as creating open logs of how we found each event. Some organisations are already using Event Data and we are keen for more to come on board.

Last year we gave an update on Event Data with apologies for being so quiet and a promise of more information at a later date. It’s been some time, so here goes…

I joined Crossref in the middle of last year as a Product Manager and was tasked with looking into Event Data. The first thing I found was a large amount of enthusiasm for Event Data, both within Crossref and further afield. The idea of gathering information beyond the metadata deposited by our members is popular, and creates valuable connections between DOIs and a range of other sources. Interest spans the spectrum of academic research, publishing, bibliometrics, and beyond.

At the same time, I found a project with a very solid, well-built code base but unstable performance. After being put into production in 2018, we didn’t provide sufficient support. Coupled with staff changes and other competing priorities, Event Data hasn’t had the opportunity to live up to early expectations.

To address these issues, we have embarked on a plan to make the server infrastructure more robust, improve monitoring, and make sure that the future of Event Data makes the best use of the resources we have without over-stretching. It means working with the community to determine the most essential aspects of Event Data, and providing support where it’s needed.

The steps below are not necessarily sequential and some depend on the completion of work in other parts of Crossref, but they outline the priorities we have for Event Data in 2021.

The Plan

Stability

Since we put in place our original Event Data infrastructure, the amount of incoming data has grown, and at an ever-increasing rate. In 2017 we were creating 2 million new events per month, that number is now over 20 million. We have known for some time that we need to refresh the infrastructure, but didn’t have the resources to move forward: now we do.

In the first part of the plan we will renew the server infrastructure that underpins Event Data. Maybe not a headline-grabbing move, but the aim is to reduce downtime and pull in missing data. Through improving our monitoring and shortening the response time when things go wrong, we will be able to ensure that events are added on a regular basis and the API can reliably handle requests.

We’ve made the first steps in this direction by upgrading our API infrastructure and making some other tweaks to improve performance. There is still work to do, but we’ve already seen a significant improvement in performance with nearly >99.99% uptime in December.

Consolidation

The second component of the plan is to review performance and data quality. We will evaluate the event sources, update artefacts (such as the lists of publisher landing pages and news websites, and review performance reporting. This will help us to have a better understanding of Event Data in its current form: if the stability component is about improving what comes in and goes and out, this part will give us increased confidence in what Event Data already contains.

Future roadmap

While the two steps above are being carried out, we will revisit the applications of Event Data and talk to organisations that currently use it or have expressed an interest. These conversations will feed into future development in which we will evaluate new sources and other ways to optimize the service.

Central to the roadmap will be continued support of the data citation endpoint in Scholix format, which we run in close collaboration with DataCite. Additionally, we will add new data from relationships between Crossref works, for example a preprint is matched to a journal article, or where there are corrections, retractions, or translations of works.

We expect to continue supporting the current sources of events and where there are organisations with either a strong interest in a particular source or a database of events that they can send directly, we are keen to build collaborations. Event Data, like everything that Crossref does, is a community-based effort.

Staying in touch

To join the conversation about Event Data and keep informed, head over to our Community pages. You can also check out our Gitlab pages. At the end of last year we updated the Education pages where you can learn more about Event Data.

New public data file: 120+ million metadata records

Jennifer Kemp — Tue, 19 Jan 2021 00:00:00 +0000

2020 wasn’t all bad. In April of last year, we released our first public data file. Though Crossref metadata is always openly available––and our board recently cemented this by voting to adopt the Principles of Open Scholarly Infrastructure (POSI)</agic––we’ve decided to release an updated file. This will provide a more efficient way to get such a large volume of records. The file (JSON records, 102.6GB) is now available, with thanks once again to Academic Torrents.

Use of our open APIs continues to grow, as does the metadata. Last year’s file was 112 million records and 65GB. Just nine months later (though it feels longer than that!), the new file is over 120 million records and over 102GB. That’s all of the Crossref records ever registered up to and including January, 7, 2021. We continue to see around 10% growth in records each year––and while journal articles account for most of the volume, preprints and book chapters are two of our fast-growing record types. In addition to the growth in the number of records, many of the records are getting bigger and better as members look at their participation report and understand the value of enriching metadata records for distribution throughout the scholarly ecosystem. Elsevier recently opened its references, enriching over 12 million records. A number of members, including Royal Society, Sage, Emerald, OUP, World Scientific and more have started adding <a href="/blog/open-abstracts-where-are-we/" target="_blank"gicabstracts which now number over 9 million.

Help us help you––using the torrent and other important notes

We decided to release these public data files largely to help support COVID-19 research efforts but of course use cases for Crossref metadata vary widely and a few pointers should help all users:

Use the torrent if you want all of these records. Everyone is welcome to the metadata but it will be much faster for you and much easier on our APIs to get so many records in one file.
Use the REST API to incrementally add new and updated records once you’ve got the initial file. Here is how to get started (and avoid getting blocked in your enthusiasm to use all this great metadata!).
‘Limited’ and ‘closed’ <a href="/education/content-registration/descriptive-metadata/references/#00564/" target="_blank"gicreferences are not included in the file or our open APIs. And, while bibliographic metadata is generally required, lots of metadata is optional, so records will vary in quality and completeness.

Questions, comments and feedback are welcome at support@crossref.org.

Here’s hoping 2021 is a better year for us all! Stay well.

A tribute to our Kirsty

Crossref — Wed, 16 Dec 2020 00:00:00 +0000

Our colleague and friend, Kirsty Meddings, passed away peacefully on 10th December at home with her family, after a sudden and aggressive cancer. She was a huge part of Crossref, our culture, and our lives for the last twelve years.

Kirsty Meddings is a name that almost everyone in scholarly publishing knows; she was part of a generation of Oxford women in publishing technology who have progressed through the industry, adapted to its changes, spotted new opportunities, and supported each other throughout. We hope this post will do justice to her memory in our profession.

Kirsty’s early career

After completing her degree in English and Spanish American Literature at Warwick University, Kirsty started her career in scholarly communications and publishing at Blackwell’s Information Services. She was there for a year before joining CatchWord, an online journal start-up, in 1998, as Electronic Publisher and Account Manager and in 1999 was promoted to the new role of Library Relations Manager.

CatchWord was acquired by Ingenta and Kirsty moved into product management working on integrating the CatchWord and Ingenta platforms and launching IngentaConnect in 2004. Ingenta became Publishing Technology in 2005 and Kirsty was Product Development Manager working with engineering, business development, and users on developing online products and services. She was also involved in a range of community initiatives including COUNTER, KBART, and ICEDIS.

Joining Crossref

Kirsty’s professional headshot

She was an early pioneer in electronic and online publishing - an innovator who understood scholarly publishing, technology, libraries, and people - a powerful combination. And Crossref was quick to offer her a role.

In Kirsty’s introduction to Crossref she was described by the recruiter as:

An experienced and highly capable individual with a solid background in product development, marketing and customer service issues related to the supply of scholarly electronic content from publishers to library and end user audiences. A good communicator and team worker with sound technical understanding and an excellent grasp of publishing industry issues.

This adequately captures Kirsty’s impressive professional achievements, but not her personality. Kirsty was a Product Manager at Crossref for 12 years and was a valued and loved friend and colleague. Committed to Crossref—its values and people—she was funny, human, and always asked tough questions.

She joined us on October 27th, 2008 as our first Product Manager and the third UK employee. In her time at Crossref, Kirsty made a major impact, working on a range of important projects and services - particularly new, innovative services. Not long after she started at Crossref, she wrote a “day in the life” profile for the journal Serials that perfectly captures what it was like in 2009 at Crossref Oxford (there were three of us in Oxford and only ten total staff at Crossref): Meddings, K., 2009. Mini-profile: a day in the life of a product manager. Serials, 22(1), pp.5–6. DOI: http://doi.org/10.1629/225

Her own biography, on her staff page, states:

Kirsty Meddings has been involved in a diverse set of initiatives that have kept her busy since 2008. She has spent most of her career in scholarly communications, in a variety of marketing and product development roles for intermediaries and technology suppliers. She speaks conversational geek and competent publishing, and is working towards fluency in both.

See? Funny!

Professional achievements

Kirsty started out working on CrossCheck, now Similarity Check, the plagiarism screening service that launched in 2008. The service was in need of some attention and better organisation - Kirsty got stuck in, whipped it into shape and it has gone on to be one of Crossref’s most widely-adopted services. This article that Kirsty wrote for ISMTE’s publication, EON, remains useful nearly 10 years after it was written! Kirsty successfully managed the partnership with Turnitin (starting as iParadigms), the technical provider for Similarity Check, for many years. Colleagues there are mourning her loss too.

Kirsty was instrumental in launching Crossmark, which became a production service in 2012. After a few changes of hands, she resumed work on the service in recent years, and announced the removal of Crossmark fees to better support uptake in 2020.

The addition of clinical trial information to the Crossref metadata was a community-driven initiative, developed from the concept of threaded publications. There were/are lots of moving parts in this initiative, and in many ways it was one of the precursors to the idea of the Research Nexus: linking via metadata and relationships to provide a clearer picture of the ecosystem that exists around a research object.

What was once FundRef (ahh, those logos!) has matured into the Open Funder Registry under Kirsty’s stewardship. In collaboration with Elsevier, the registry has grown from an initial 4,000 funders, to over 25,000 and we can see over 5 million works registered with Crossref that are linked to at least one funder. More recently, Kirsty was the Product Manager for the registration of research grants with Crossref, working with our Funder Advisory Group, and she was starting to work with CDL and DataCite to absorb the Funder Registry into ROR.

In 2018, Kirsty launched our first ever dashboard for member best practice. She led the development and design of Participation Reports and the decision of which checks would be most important for the scholarly community to assess. This has quickly become one of Crossref’s most valuable and used tools.

Public speaking

Kirsty always spoke with authority across a range of topics, appearing totally calm even if she was nervous. Among many talks, she spoke at the STM seminar on Publication Ethics and Research Integrity, ISMTE, UKSG, ALPSP seminars, the COPE Forum, ran numerous CrossCheck, CrossMark, FundRef and TDM webinars, and a recent online LIVE event.

She was a frequent presenter at many of Crossref annual meetings, and enjoyed the opportunity to meet and catch up with our members, the board, and the community (many of whom always ask after her). Checking in after conferences on who said what, who’s moving where, what feedback we had, and picking up on opportunities for further collaboration were all things that we looked forward to sharing.

To use UKSG’s own words, Kirsty was always a staunch supporter of the organisation - attending, exhibiting, and speaking at many UKSG conferences and events over her whole career. She was also a legend at the dinners, on the dance floor, and in the bar. At the 2019 conference she tallied the votes at the quiz night - Kirsty loved a quiz! We had an all-staff end-of-year quiz via zoom last week and it was just not the same without her.

Here are Kirsty’s slides on SlideShare, some videos of Kirsty’s talks on YouTube, and her ORCID record which lists her published works.

Strong friendships

One of the most rewarding experiences of working at Crossref is meeting up with the whole team and with our members. Jetlag, hunting out coeliac-friendly food, staying up far too late chatting, trying to fit in exploring bits of cities around board and other meetings, presenting, organizing, thinking, laughing (I’m sure to the annoyance of other plane passengers)—these experiences were all part and parcel of working with Kirsty, and where many of us cemented connections with her.

We started a message board and within days it was populated with numerous stories, poems, and photos from so many friends and colleagues on whom Kirsty made such a lasting and loving impression.

Kirsty’s message board

It’s impossible to capture someone’s character in a blog, but some of the words that carry across the messages that people have shared are empathy, compassion, honesty, intelligence, brilliance, sincerity, laughter, human, passion, openness, and fun. We’ll miss her immensely.

Kirsty was somewhat of an expert in grief. She lost her first husband, James Culling, to leukemia in December 2012, leaving her a widow with two sons, Dan, 7 at the time, and Luke, just 6-months old. A few years later, through the charity Widowed And Young (WAY), she met Martin Eggleston. Martin and his daughter Amy joined Kirsty, Dan, and Luke, and they created a very happy blended family. Some of us went to their wedding and it was an incredible event full of love and laughter - and of course music. Always music.

Kirsty represented us, along with Rachael, at the funeral of another colleague last year, Christine Hone, in Amsterdam. Kirsty helped all of us get through the grief then. And because she made it okay to grieve and to talk about grief, it is heartbreaking and also comforting that she is indirectly helping us all now to be better able to handle her own death.

How we can honour Kirsty’s memory

We heard that Kirsty’s last words were “I’m listening”. Which is just so fitting. She was always ready with an ear, a shoulder to support us all, and indeed she demanded that we express ourselves honestly.

If you want to share memories of Kirsty, you can join others who have done so on the message board or just take a few minutes to read through.

And there is a justgiving page in memory of Kirsty for Maggie’s Oxford, a branch of a cancer support charity who helped her and her family through James’s death and is now helping her family again.

Professionally, Kirsty made major contributions at Crossref and in scholarly communications in general. More importantly, she had a profound impact on a personal level with many people. Our thoughts are with Martin, Dan, Amy, and Luke, and also with Kirsty’s mum Val, her brother Colin, her in-laws, her close friends, and all the people who—like the rest of us—are better for knowing her, and will never forget her.

Fast, citable feedback: Peer reviews for preprints and other record types

Martyn Rittman — Wed, 09 Dec 2020 00:00:00 +0000

Crossref has supported depositing metadata for preprints since 2016 and peer reviews since 2018. Now we are putting the two together, in fact we will permit peer reviews to be registered for any record type.

Currently, peer reviews can be registered for journal articles, but that means that they can only be related to some of the content our members deposit. Preprints, books, chapters, working papers, dissertations, and a host of other works can also be registered with Crossref. A number of these frequently undergo some form of review and many of our members and voices in the community have called for us to widen the net on peer reviews, including journal publishers, book publishers, review platforms, and preprint servers. We’ve listened and taken action, and from now on Crossref members can add relationship metadata that links peer reviews to any record type. The metadata will also contain the type of review, stating whether it is a referee report, author response, or community comment, etc. This allows accurate reporting on whether the peer review is happening within a traditional editorial process or elsewhere.

Reviews for preprints

In the last decade there has been an increase in the number of disciplines using preprints. Since enabling registration of preprint metadata, it has become our fastest-growing record type. Preprints, working papers, and other forms of early publication help to accelerate dissemination of the latest research and discovery. They can also promote discussion on important topics, and help authors to improve papers before an editorial decision for journal publication. During the COVID-19 pandemic, preprints have become invaluable for speeding the publication of vital research and case studies.

On the other hand, preprints do not undergo formal review and editorial approval, leading to concerns about the dissemination of false information. While the issue of misinformation in preprints has been discussed for some time, the COVID-19 pandemic has brought it more sharply into focus. organisations that post preprints need to balance the benefits of rapid dissemination with promoting their responsible use.

To support the feedback process, preprint servers along with a growing number of other platforms and services offer scholars the opportunity to post public comments on preprints. By doing this, they give extra context for readers, provide suggestions for authors, and raise awareness of work that could be flawed or too preliminary.

Another growing trend is journal publishers adopting editorial processes that involve preprint-first options and open peer review. As Dr. Stephanie Dawson from ScienceOpen says:

“We have long believed in rewarding reviewers by assigning Crossref DOIs to their open reviews to make them citable objects and we were one of the first users of Crossref’s peer review schema. However, a large percentage of the articles reviewed on ScienceOpen are publicly available preprints. The UCL Open: Environment journal hosted on the platform, for example, is based on a workflow of open peer review of preprints. Our customers, editors, reviewers and authors are therefore extremely happy that these reviews can now also be assigned a Crossref peer review DOI for more accountability and transparency in scholarly publishing.”

At Crossref, we’re continually looking to support more record types and relations between them to build trust, support reproducibility and increase discoverability of content. This is another small step in building the research nexus and we look forward to working with members depositing peer reviews of preprints.

404: Support team down for essential maintenance

Amanda Bartell — Fri, 04 Dec 2020 00:00:00 +0000

2020 has been a very challenging year, and we can all agree that everyone needs a break. Crossref will be providing very limited technical and membership support from 21st December to 3rd January to allow our staff to rest and recharge. We’ll be back on January 4th raring to answer your questions. Amanda explains more about why we made this decision.

As we all know, 2020 has been an unprecedented year, with the COVID-19 pandemic affecting lives across the globe.

It’s been amazing to watch our members pivot their working practices and continue to publish content and register it with Crossref to keep the wheels of research and scholarly communications moving.

Since January, we’ve seen 9,079,082 items registered with Crossref, up 13% on 2019. 2628 new members have also joined during that time and we now have almost 13.5k members from 139 countries. We’ve seen over 337 million requests to our REST API on average per month in 2020, a 9% increase over 2019 (and over 600 million total metadata queries on average per month across all our APIs and services).

Of course, all this activity brings an increasing number of requests for help and support. Since the start of 2020, we have answered almost 24,000 support tickets from the community. Sometimes these just need a quick answer or a link to our documentation. Sometimes it’s a straightforward new member application or a routine query. But sometimes a prospective member needs a lots of advice, sometimes a long-standing member or user needs in-depth investigations and consultancy. Sometimes the request highlights a problem in one of our systems that needs input from our product and development colleagues. But either way, it’s keeping our small team of five full-time employees very busy.

Vanessa wrote earlier in the year about how our Community Outreach team has changed its working practices this year. As Head of Member Experience I’ve been incredibly impressed by the way our membership, support and billing staff have done the same - remaining really focused on the needs of the Crossref community while (at the same time) balancing this with the demands of working from home, childcare, home-schooling, and supporting those affected by the pandemic in their own community. Isaac’s thoughtful post on our forum about his first week working at home because of the pandemic really highlighted some of these challenges.

We take work/life balance seriously at Crossref. We want to make sure that we’re are able to continue to help the Crossref community effectively in 2021, but are also able to continue to look after ourselves, our families, and our own communities in this difficult time. We all hope that 2021 will be a very different year, but there’s still likely to be disruption ahead for all of us, and one thing is sure: there will continue to be plenty more requests coming in for our small team to stay on top of in the meantime.

With this in mind, we want to make sure that our support staff are able to properly rest and recharge during what is a holiday period for many of us coming up. We’ll be operating with just one person each on the technical support and membership support side between 23rd December and 3rd January. This means that while we’ll be able to answer urgent queries, non-urgent questions will be left unanswered until 4th January. And we’ll not take on any new members between 21st December and 3rd January too.

We know many of you will be continuing to work during this period. If you have a non-urgent question, do take a look at our support documentation in the meantime, or see if other members (or our amazing Ambassadors) are able to help on our forum. If you can’t find what you’re looking for and it’s urgent, we hope that the limited staff who are on call will still be able to help you out.

Colleagues in the US have recently celebrated their Thanksgiving, and I remain enormously thankful for our team here at Crossref, and for you all in the scholarly community for your enthusiasm for working together collectively to help the world find, cite, link, assess, and reuse scholarly content. We all really appreciate your patience while we reset ready for 2021. Happy Holidays!

Crossref’s Board votes to adopt the Principles of Open Scholarly Infrastructure

Geoffrey Bilder — Wed, 02 Dec 2020 00:00:00 +0000

TL;DR

On November 11th 2020, the Crossref Board voted to adopt the “Principles of Open Scholarly Infrastructure” (POSI). POSI is a list of sixteen commitments that will now guide the board, staff, and Crossref’s development as an organisation into the future. It is an important public statement to make in Crossref’s twentieth anniversary year. Crossref has followed principles since its founding, and meets most of the POSI, but publicly committing to a codified and measurable set of principles is a big step. If 2019 was a reflective turning point, and mid-2020 was about Crossref committing to open scholarly infrastructure and collaboration, this is now announcing a very deliberate path. And we’re just a little bit giddy about it.

Here is a picture of me being “giddy.”

If you just want to see the principles that the board has endorsed, you can see them here:

https://doi.org/10.24343/C34W2H

But if you also want some background and want to understand some of the implications of Crossref adopting the principles, read on…

Warning - this is a long post.

Background and Origins

Some of you may be surprised that we’ve done this - simply because you always assumed we operated under these principles anyway. And we have. Mostly.

The “Principles of Open Scholarly Infrastructure” were largely inspired by a set of uncodified rules and norms that Crossref had been operating under for years. So how did we get to this circular situation where we are making a big announcement about adopting something we have largely been doing anyway?

Six years ago I met with Cameron Neylon and Jennifer Lin when they were still at PLOS and we decided that we wanted to write a blog post about…

Well, it doesn’t really matter.

We never finished writing that blog post because we got distracted by an issue that we kept seeing which was that services that the scholarly community depended on were increasingly taking directions that seemed antithetical to the community’s interests.

We were concerned because the scholarly community was becoming increasingly distrustful of infrastructure services. We wondered if there were any practices that we could point to that might mitigate the risk of infrastructure being co-opted and that would help build trust. Fortunately, we had two great models to look at:

Crossref, which had a set of informal rules and norms that it had followed since its founding (e.g., transparency of operations, being business-model neutral, one member one vote).
ORCID, an organisation that was spun-out of Crossref and which had adopted a written set of principles, based largely on codifying practices that they had seen at Crossref.

And so we wrote these practices up and added a few that we thought were missing. And we posted a different blog post to the one we had originally planned. It was titled “The Principles of Open Scholarly Infrastructures.” And the blog post became popular. And we did a bunch of talks about the Principles. And, much to our surprise, POSI has influenced the directions and policies of a number of organisations and initiatives since, including SPARC, Invest in Open Infrastructure, Open Data Institute, OA Switchboard, and others.

Elsewhere, community organisations and likeminded community members helped further develop the implementation of POSI through discussions at FORCE11 and through additional blog posts and books. Some, like Dryad and ROR, started to work to align their organisational structure to embrace POSI.

And this left Crossref in a strange position. Although we were largely the inspiration for these Principles - we ourselves had never codified and adopted them.

Motivations. Why Now?

Because it is the right thing to do for those that currently depend on Crossref

It is a healthy thing for the organisation to do. Adopting these principles strengthens Crossref’s governance. After twenty years, Crossref infrastructure has become critical to a broad segment of the community. As our membership profile changes, and as our broader stakeholder community expands, we need to explicitly evolve our governance to reflect stakeholders. And it would be irresponsible to continue to have our governance guided by a set of informal conventions. Particularly in the context of a global political period where we’ve seen the informal operating conventions and policy understandings of at least two major democracies ignored or discarded.

Because it could help make the creation of new, sustainable, open scholarly infrastructure easier and less expensive

There is a lot of new interest in open scholarly infrastructure. New infrastructure services and systems are being proposed almost every month. Many of them seek extensive advice and consulting from Crossref. A subset of these are incubated through Crossref. And a subset of these become Crossref services. Others are spun out as separate organisations (e.g., ORCID) or were specifically initiated as collaborations (e.g., ROR).

Our experience has been that the vast majority of work involved in these infrastructure projects was in establishing trust amongst the stakeholder community. We think that Crossref adopting the principles will help to address fundamental questions about accountability and sustainability that are inevitably raised when a new constituency approaches Crossref with an idea for collaborating on a new or existing infrastructure service. In short, adopting the principles will make future collaboration easier.

Adopting the Principles: Plus ça change

The Principles of Open Scholarly Infrastructure (POSI) proposes three areas that an Open Infrastructure organisation can address in order to garner the trust of the broader scholarly community: accountability (governance), funding (sustainability), and protection of community interests (insurance).

POSI proposes a set of concrete commitments that an organisation can make to build trust in each of these areas. There are 16 such commitments. Of these 16 commitments, Crossref is already completely or partially meeting the requirements of 15. And adopting the 16th commitment just formalises a direction Crossref has been heading toward for several years.

Critically, “adopting” POSI does not mean that we have to instantly meet all of the criteria. After all, when ORCID adopted its principles, it didn’t meet any of them. They were adopted to make a statement of intent. And they were publicly adopted so that the community could measure the organisation’s progress as well as to allow the community to detect if ORCID started to stray from its stated intentions.

Adopting the principles is akin to adopting a mission statement or a vision statement. It is an aspirational guide, not a description of the status quo.

Having said that, the principles are more concrete than a mission or vision statement, and this makes them easier to measure.

It is also important to note that the criteria are designed to balance each other. So, for example, one would not want to change the governance or business model to better support the mission if doing so would also threaten the sustainability of the organisation.

And finally, meeting a commitment is an ongoing process - it is not a one-off event. The organisation needs to keep measuring their performance against the principles in order to make sure that they have not inadvertently regressed.

Implications

Before adopting the principles, we did a candid self-audit to see which ones we thought we currently met and which ones we still needed to work on.

The three areas and sixteen commitments that are proposed in POSI are all designed to ensure that an infrastructure can not be co-opted by a particular party or interest group.

And the last area, “Insurance,” is the backstop that makes sure that, if some in the community feel that the infrastructure organisation has gone in a radically wrong direction, they can recreate the infrastructure as it was when they were comfortable with it, and they will not be hindered by practices or policies that lock them into the existing organisation.

This “insurance” is very much inspired by Crossref. Crossref itself was built, in part, to make sure that publishers were not locked into platforms and that journals and societies were not locked into publishers. Using the indirect Crossref DOI linking mechanism ensures that content can move between platforms and publishers without breaking vital citation links. Moving between platforms or publishers is never easy. And it isn’t cheap. But using Crossref DOIs for citation links at least makes it possible.

Crossref has an extra insurance level as well. It is built on the DOI and Handle infrastructure. If Crossref were to take a direction that some of its members found unacceptable, those members could join another DOI Registry agency more amenable to them. It wouldn’t be easy. It wouldn’t be cheap. But it would be possible.

And this knowledge helps keep Crossref grounded and attuned to the needs and concerns of its members. We know that our members are not “trapped” with us. We don’t take lightly the trust placed in us. And we know that there is trust still to build with various corners of our community. And it is this knowledge that helps keep us from developing the disdainful, take-it-or-leave-it, attitude that can be the cliché characteristic of infrastructure organisations.

So the fundamental, overarching goal of POSI is to set out principles that ensure that the stakeholders of an infrastructure organisation have a clear say in setting its agenda and priorities and that, in extremis, the stakeholders can leave and create an alternative infrastructure if the original organisation becomes unresponsive, hostile, or disappears.

As we look at how Crossref currently maps to the principles, please keep in mind three things:

If we have marked something as green, that doesn’t mean we think we do this perfectly. It simply means that we already have internal processes that focus on this commitment and we have evidence that these processes have thus far been working.
The fact that something is green and has “thus-far been working” does not mean that we should rest easy. We could regress. Our processes need to be able to detect and address regressions.
The commitments are supposed to be balanced. So we don’t want to do something to turn something green if it has an irreversible impact on another commitment. So, for example, we should not address a shortfall in the contingency fund by generating revenue in a way that ultimately hurts Crossref’s mission.
The implication of #3 above is that it may take us some time to meet all of the commitments. But again, the community can measure our progress against meeting the commitments.

So how does Crossref currently meet POSI?

Governance

🟢 Coverage across the research enterprise.
🟢 Non-discriminatory membership
🟢 Transparent operations
🟢 Cannot lobby
🟢 Living will
🟢 Formal incentives to fulfil mission & wind-down
🔴 Stakeholder Governed

Sustainability

🟢 Time-limited funds are used only for time-limited activities.
🟢 Goal to generate surplus
🟡 Goal to create contingency fund to support operations for 12 months
🟢 Mission-consistent revenue generation
🟢 Revenue based on services, not data

Insurance

🟢 Available data (within constraints of privacy laws)
🟡 Patent non-assertion
🟡 Open source
🟡 Open data (within constraints of privacy laws)

Governance

If an infrastructure is successful and becomes critical to the community, we need to ensure it is not co-opted by particular interest groups. Similarly, we need to ensure that any organisation does not confuse serving itself with serving its stakeholders. How do we ensure that the system is run “humbly”, that it recognises it doesn’t have a right to exist beyond the support it provides for the community and that it plans accordingly? How do we ensure that the system remains responsive to the changing needs of the community?

– POSI

In the area of governance, Crossref clearly meets six of the seven criteria listed. We will discuss these first.

🟢 Coverage across the research enterprise

it is increasingly clear that research transcends disciplines, geography, institutions and stakeholders. The infrastructure that supports it needs to do the same.

– POSI

Crossref includes members who publish in the STM, HSS and Professional spheres. There are still some gaps in our coverage (e.g., monographs, law), but this is not through policy or lack of trying.

Crossref has members in 139 countries and has agreements with people in 150 countries. However note that geographic diversity is not the same as language diversity. Although we have members in many countries, the vast majority of our registered content is still in English. This does not reflect the trends in research outputs. We still need to do a lot of work to support non-English publications and non-English speaking members. But we have already identified this as a priority and are working on a number of initiatives to better support research communication in languages other than English.

🟢 Non-discriminatory membership

we see the best option as an “opt-in” approach with a principle of non-discrimination where any stakeholder group may express an interest and should be welcome. The process of representation in day to day governance must also be inclusive with governance that reflects the demographics of the membership

– POSI

It is first worth noting that “non-discriminatory” does not mean that we cannot have standards, obligations, and rules that all members of Crossref have to adhere to. It simply means that said rules are clear and that we apply them uniformly.

Crossref has always had catholic membership criteria. Although we have until now historically defined ourselves as a primarily “publisher” organisation, we define “publisher” loosely as anybody who produces content that commonly references or is referenced by scholarly literature. Historically, this has included NGOs, IGO’s, standards bodies, institutional archives, and professional publishers. More recently it has expanded to include preprint archives and funders.

The requirements for joining Crossref are few. We admit any applicant who:

Agrees to the obligations of membership.
Can pay the fees.

In practice we have historically had a policy of rejecting individuals as members. But even this is probably a pointless distinction as many of our members are “organisations” consisting of one person.

And fundamental to Crossref’s governance is that a member’s influence in the governance of Crossref is not tied to the level of financial investment they make in the organisation. All members have the same single vote. All board members have one vote.

Recently, we have also made changes to our governance and election process. The first to introduce contested elections for the board. The second to ensure that board membership was proportionally balanced amongst the membership tiers. Even as recently as 2017, when the Board established a Governance Committee, the idea of weighting votes to membership tiers was roundly rejected - on principle.

This is not to say that we can relax on this point. For example, as more funders and institutions join Crossref, we will need to make sure that our governance reflects that. We talk about this more in the section on governance.

Some will also point out that our fees are themselves a form of discrimination as they can still be an insurmountable barrier to some in the community. We understand this and, without trying to make light of or dismiss the situation, we are also confident that we are constantly looking at ways to lower the barrier-to-entry for joining Crossref. Our fees have gone steadily down since we were founded and we are constantly reviewing them to try and make them more equitable. We have created a category of sponsoring organisations to defray the costs of membership. We collaborate closely with organisations like PKP to try and build tools and services that make participation in Crossref easier and less expensive.

🟢 Transparent operations

achieving trust in the selection of representatives to governance groups will be best achieved through transparent processes and operations in general (within the constraints of privacy laws).

– POSI

Crossref has transparent finances and a transparent governance process. Much of this is simply a byproduct of the regulations governing non-profits with tax exempt status in the US and our specific registration as a non-profit membership association in New York State.

Until fairly recently, the obvious exception to this was Crossref’s use of pre-picked slates in board elections, but we have since improved this with an open election process.

🟢 Cannot lobby

the community, not infrastructure organisations, should collectively drive regulatory change. An infrastructure organisation’s role is to provide a base for others to work on and should depend on its community to support the creation of a legislative environment that affects it

– POSI

Crossref has never lobbied. Partly this is a byproduct of our commitment to be business-model neutral as most lobbying efforts in the industry seem to center around promoting the views held by members who share a business model.

But also, Crossref has never lobbied on its own behalf. We have always relied on our members and the community to point out and promote Crossref if there is any area of legislative policy that the Crossref infrastructure could help with.

🟢 Living will

a powerful way to create trust is to publicly describe a plan addressing the condition under which an organisation would be wound down, how this would happen, and how any ongoing assets could be archived and preserved when passed to a successor organisation. Any such organisation would need to honour this same set of principles

– POSI

Crossref has two relationships that require us to set out plans for an orderly wind-down.

The first is a condition of our incorporation as a non-profit in the state of New York. This explicitly includes a provision that requires us to hand over our operations and responsibilities to a successor non profit organisation that has a similar constituency and mission. The NY State Attorney General reviews and approves any major changes to ensure this requirement is met.

The second is a condition of our being members of the DOI Foundation, which includes provisions for us to hand over management of DOIs to another registration agency should Crossref ever wind-down. It is worth noting that we have already seen this clause invoked for other registration agencies that have wound down and who have, as part of the DOI Foundation provisions, handed responsibility for their DOIs to Crossref.

This is not to say that we are perfect on this score. We do not, for example, have any single place that outlines the steps that would need to be taken in order to execute the requirements laid out by our obligations to the state of New York and the IDF.

🟢 Formal incentives to fulfil mission & wind-down

infrastructures exist for a specific purpose and that purpose can be radically simplified or even rendered unnecessary by technological or social change. If it is possible the organisation (and staff) should have direct incentives to deliver on the mission and wind down.”

– POSI

Crossref has a track record of periodically reviewing our services and decommissioning those that are no longer needed - either because they have fulfilled their specific mission or because there is simply waning interest in them (arguably, the same thing).

Again, this is not to say we are perfect on this score. We also have, by our last count, about 30 specialised, overlapping APIs- many of which are used by just a handful of users. These have escaped our normal scrutiny because they never had the status of a formal service and had not been through our product management process.

But still, Crossref has long made it a habit to question its own existence. At virtually every board annual strategy meeting we ask the question “will technology X make Crossref unnecessary?” We need to continue with the attitude that the best thing we could do for our members is to make ourselves unnecessary.

🔴 Stakeholder Governed

a board-governed organisation drawn from the stakeholder community builds more confidence that the organisation will take decisions driven by community consensus and consideration of different interests.

– POSI

Overall, Crossref meets most of the Governance requirements with the notable exception of broader stakeholder involvement.

Of course, the key to this is how you define “stakeholder.”

Some may dispute this and argue that Crossref “stakeholders” are “publishers” because they are the parties that invested in creating Crossref.

But this narrow definition of “stakeholder” - focusing solely on those who have “invested”- is not widely held. In fact, common phrases like “stakeholder economy” and “stakeholder capitalism” describe the exact opposite- systems that don’t just focus on the “investor”, but which instead balance benefits to the investor with benefits to employees, the broader community, society, and the environment.

It is this latter, broader definition of “stakeholder” that is used in POSI.

And just in case anybody still thinks that people other than publishers don’t consider themselves “stakeholders’ in the Crossref infrastructure, we simply point to this, recently tweeted by Brea Manuel, a researcher, in celebration of their publication in Nature Reviews Chemistry (read it, and learn how to recruit and retain a diverse workforce):

Sustainability

Financial sustainability is a key element of creating trust. “Trust” often elides multiple elements: intentions, resources, and checks and balances. An organisation that is both well meaning and has the right expertise will still not be trusted if it does not have sustainable resources to execute its mission. How do we ensure that an organisation has the resources to meet its obligations?

– POSI

In the area of sustainability, Crossref clearly meets four of the five of the criteria listed and is most of the way to meeting the fifth.

🟢 Time-limited funds are used only for time-limited activities

day to day operations should be supported by day to day sustainable revenue sources. Grant dependency for funding operations makes them fragile and more easily distracted from building core infrastructure.

– POSI

Crossref has never supported production activities based on grants. Indeed Crossref’s delivery on this point is what inspired the approach taken in this principle. This distinguishes Crossref from many grant-funded infrastructure initiatives which either barely stay afloat or disappear altogether. Even those that survive often do so by pursuing solutions that align with their funder’s interest over their user’s needs.

🟢 Goal to generate surplus

organisations which define sustainability based merely on recovering costs are brittle and stagnant. It is not enough to merely survive, it has to be able to adapt and change. To weather economic, social and technological volatility, they need financial resources beyond immediate operating costs.

– POSI

Crossref has always attempted to generate a surplus. Crossref has generated surpluses since 2002 - so for 18 years of its 20 year existence.

🟡 Goal to create contingency fund to support operations for 12 months

a high priority should be generating a contingency fund that can support a complete, orderly wind down (12 months in most cases). This fund should be separate from those allocated to covering operating risk and investment in development.

– POSI

Crossref currently has a contingency fund that would support operations for 9 months. Although this may be standard for industry, it seems prudent to extend this in the case of infrastructure organisations, particularly when they are membership organisations. First, the very fact that something is infrastructure implies that the systemic effects of its failing ungracefully could have industry-wide repercussions. Second, the decision-making process of a membership organisation whose governance is voluntary is inherently slower. It has taken Crossref Board 9 months, for example, just to discuss the ramifications of adopting POSI.

Given our recent financial performance, we expect Crossref could comfortably increase the contingency fund to support 12 months of operations within the next 2-3 years.

🟢 Mission-consistent revenue generation

potential revenue sources should be considered for consistency with the organisational mission and not run counter to the aims of the organisation.

– POSI

Crossref has a good track record of periodically reviewing our services and fees and adjusting them to better support Crossref’s mission. The role of the Membership & Fees Committee in advising the Board has been critical. The very first example of this was in the early days of Crossref when we dropped matching fees because they were disincentivising members from linking their references. Crossref was also quick to recognise that, in order to support global research and reach smaller publishers in lower income countries, we had to develop a sponsoring mechanism to help defray the costs and ameliorate the technical complexity of participating in Crossref. Most recently we have taken the decision to drop fees for Crossmark as it was clear they had become a barrier to our members distributing retraction and correction notifications in a machine actionable format.

🟢 Revenue based on services, not data

data related to the running of the research enterprise should be a community property. Appropriate revenue sources might include value-added services, consulting, API Service Level Agreements or membership fees

– POSI

Crossref does not charge for or resell its members’ data. Doing so would restrict dissemination and reduce the discoverability of our members’ content. Instead our revenue comes from a combination of membership fees and service fees. The DOI registration is a member service that generates the bulk of our revenue. But our SLA-backed APIs are becoming increasingly popular as members and others seek to integrate Crossref metadata into their production workflows and services.

Insurance

Even with the best possible governance structures, critical infrastructure can still be co opted by a subset of stakeholders or simply drift away from the needs of the community. Long term trust requires the community to believe it retains control. Here we can learn from Open Source practices. To ensure that the community can take control if necessary, the infrastructure must be “forkable.” The community could replicate the entire system if the organisation loses the support of stakeholders, despite all established checks and balances. Each crucial part then must be legally and technically capable of replication, including software systems and data. Forking carries a high cost, and in practice this would always remain challenging. But the ability of the community to recreate the infrastructure will create confidence in the system. The possibility of forking prompts all players to work well together, spurring a virtuous cycle. Acts that reduce the feasibility of forking then are strong signals that concerns should be raised. The following principles should ensure that, as a whole, the organisation in extremis is forkable.

– POSI

Crossref clearly meets two of the four Insurance requirements. And the remaining two can be met easily with some clarification and time.

The “governance” section of POSI is designed to ensure that an infrastructure organisation is beholden to the broader stakeholder community and that it can not be co-opted by a particular party or special interest. And the “sustainability” section of POSI is designed to ensure that the infrastructure organisation takes the financial steps to ensure it can weather sudden changes in the financial or technical environment. But the last section, “insurance” is designed to protect stakeholder interests in case either “governance” or “sustainability” fail.

The term “forkable” comes from the Open Source software community where it is used to indicate when a software community’s interests diverge and they decide to split a project into several projects, with each new project focusing on a particular sub-community’s interests.

One of the immediate worries that people have when they first hear of the concept of “forkability” is that it will encourage the creation many variations of a project based on frivolous criteria. But this simply does not happen. Forking a project is never easy and takes a lot of effort. It is only done successfully when a critical mass of the community becomes unhappy with the direction a project is taking and is willing to take on the substantial burden of running an entirely separate project. Without such a critical mass, the fork just withers and has virtually no effect on the original project.

And the reason for this is simple, the mere knowledge that a project is “forkable” forces project maintainers to balance the interests of the community so that no sizable subgroup grows dissatisfied enough to fork the project.

Forkability encourages reponsivness to the community by making sure that the community is not “locked-in.”

Crossref itself was founded, in part, to prevent lock-in. Use of the DOI in linking citations makes it easier for publishers to move platforms, and for journals and societies to move between publishers.

And Crossref itself is architected in part to ensure that lock-in is not possible. Crossref is just one of several DOI registration agencies. Members unhappy with Crossref, can move to another DOI registration agency and their citation links will continue to work. But there are things we could do to make this even easier.

🟢 Available data (within constraints of privacy laws)

It is not enough that the data be made “open” if there is not a practical way to actually obtain it. Underlying data should be made easily available via periodic data dumps.

– POSI

Crossref provides public APIs that allow users to access Crossref metadata. We are planning to eventually release yearly public data files. We already did this once when we released a public data file in support of COVID-19 research. This in no way prevents the provision of data through paid Service Level Agreement tiers that provide guarantees of regularity, availability or reliability for those that need it. Existing Metadata Plus customers primarily use data that is available through the open API or existing dumps, but value additional services that support their use-cases.

🟡 Patent non-assertion

“The organisation should commit to a patent non-assertion covenant. The organisation may obtain patents to protect its own operations, but not use them to prevent the community from replicating the infrastructure.

– POSI

Crossref has never registered a patent. But the DOI Foundation, with significant support from Crossref, had to respond to (and then monitored) a set of patent applications that, if successful, the DOI System would infringe on. The applications were filed more than 15 years ago and haven’t been successful so these applications aren’t a current concern. As a result of this, the DOI Foundation adopted a patent policy in 2005 that covers all Registration Agencies and protects the DOI System. We may want to register protective patents in the future in order to enable us to defend ourselves against patent trolls.

The problem with patents is that they could be used by an organisation to prevent the infrastructure forking. One technique that has been used by major companies to assure communities that they will not be affected by patents, is to make a patent non-assertion covenant. For example, IBM, Microsoft and Google have made non-assertion statements in order to assure the open source and standards communities that they participate in that they will not co-opt an open source project or open standard by asserting patents on code or processes they contribute.

Though Crossref has never registered a patent, issuing a patent non-assertion covenant would help assure stakeholders that we would not use patents in the future to prevent the community from forking the system.

🟡 Open source

All software required to run the infrastructure should be available under an open source license. This does not include other software that may be involved with running the organisation.

– POSI

All code for new initiatives since 2007 has been released under an open source MIT license. The legacy Content System code could be open sourced within 12-18 months with no extra effort.

If some Crossref stakeholders wanted to “fork” Crossref or leave for another DOI registration agency, their biggest hurdle would be trying to recreate the twenty years worth of rules and algorithms we use for processing and matching metadata. Without access to the source code of the system, it would be almost impossible for these to be reverse engineered.

Similarly, without access to the source code of our system - it is difficult to ensure that Crossref is, indeed, non-discriminatory in the way it works with member content. It would be possible, for example, for Crossref to modify its matching algorithms to deliberately favour or deprecate some members’ content.

If we want to assure the community that we are managing our member metadata fairly and if we want to provide even better insurance to our members and the broader stakeholders, we should make all of our code open source.

The legacy so-called “CS” (content system) is in the process of being refactored. The only reason we cannot open source this immediately is that we still need to make some security changes to it. These security changes are being done as part of a current refactoring project and should be completed without any extra effort within 12-18 months. After that, we can open source the code.

🟡 Open data (within constraints of privacy laws)

For an infrastructure to be forked it will be necessary to replicate all relevant data. The CC0 waiver is best practice in making data legally available. Privacy and data protection laws will limit the extent to which this is possible.

– POSI

Achieving this simply requires us clarifying copyright and license information and that this will not have any effect on the metadata registered in Crossref by our members.

First we should outline the current copyright status of a Crossref metadata record.

The fundamental issue is that what we colloquially call “Crossref metadata” is actually a mix of elements, some of which come from our members, and some of which come from third parties and some of which comes from Crossref itself. These elements, in turn, each have different copyright implications.

On top of this, Crossref has terms and conditions for its members and terms and conditions for specific services. These grant Crossref the right to do things with some classes of metadata and not do things with other classes of metadata - regardless of copyright.

Let’s start with the easiest case. Crossref already has two services with CC0 metadata:

The Open Funder Registry
Event Data

Obviously, the POSI open data provision would not change anything for either service.

The next easiest case is private data. Crossref collects PII (usernames, passwords IP addresses, etc.). This would remain private. And we will continue to manage it in conformance with GDPR. It would not be affected by the open data provision of POSI.

Next let’s look at what most people probably think of as “Crossref metadata”- that is, the basic bibliographic metadata that Crossref has collected from its members since its founding (titles, authors, volumes, issues, etc). For the record- this does not include abstracts.

Since 2000 Crossref has stated that it considers this basic bibliographic metadata to be “facts.” And under US law (Crossref is registered in the US) these facts are not subject to copyright at all. If this data is not subject to copyright at all, there is no way Crossref can “waive the copyright” under CC0. This metadata would not be affected at all under the open data provision of POSI.

More recently, some of our members have been submitting abstracts to Crossref. These are copyrighted. In the case of subscription publishers, the copyright usually belongs to the publisher. In the case of open access publishers, the copyright most often belongs to the authors. In both cases, Crossref cannot waive copyright under CC0 because the copyright is not ours to waive. However, we are allowed to redistribute the abstracts with our metadata because that is part of the terms and conditions we have with our members. We already have language that notes the distinct copyright status of the abstracts in our metadata, but, ideally, we should extend our schema to make that information available in a machine actionable form as well. In short, the copyright status of abstracts would not be affected at all by the open data provision of POSI.

Crossref also has its Reference Distribution Policy that the board adopted in 2017 - limited and closed references are not distributed by Crossref and this won’t change. [EDIT 6th June 2022 - all references are now open by default with the March 2022 board vote to remove any restrictions on reference distribution].

And this leaves us with the one thing that would be affected by the open data provision of POSI- data that is created by Crossref itself as a byproduct of our services. By law, this data is under Crossref’s copyright unless we explicitly waive it. This data includes things like, participation reports, conflict reports, member IDs and Cited-by counts (just the counts, not the references) and any aggregations of our otherwise uncopyrighted data that might, by aggregating it, be subject to sui generis database rights. At the moment, although we distribute this data freely and without restriction, we have no explicit copyright attached to it. All we would be seeking to do is explicitly say that data generated by Crossref will be distributed CC0. Again, at first it would be enough to just specify this in human readable form, along with our other copyright information. But, eventually, we would want to include this information in machine actionable form in the metadata itself.

To summarise:

Metadata type	Example	Current Copyright	Change under POSI
Already CC0	Open Funder Registry, Event Data	CC0	None
Private	Log files, user IDs	Private	None
Bibliographic	Title, authors, volume, issue	Facts	None
Closed references		Facts - but no distribution under the reference distribution board policy from 2017	None
Limited references		Facts - but no public distribution under the reference distribution board policy from 2017	None
Open references		Facts	None
Crossref-generated data	Participation data, reports, extracts	Copyright Crossref	CC0

[EDIT 6th June 2022 - all references are now open by default with the March 2022 board vote to remove any restrictions on reference distribution].

No member metadata will be affected by our adopting the open data provision of POSI. The only data that would be affected is data generated by Crossref itself.

However, the adoption of this principle would likely have an effect on our decisions about future services. For example, under this principle we would not launch any new services where the data was not freely reusable or the copyright of the data was not CC0.

Conclusion and Next steps

So again we face the paradox- We are announcing something that is simultaneously insignificant and important. It is insignificant in that we are simply saying that we will continue to do what we have largely been doing since Crossref was founded. But it is important because, in codifying what we have been doing, we are also confirming that these principles actually worked. That they were essential to building the trust that allowed us to function over the past twenty years, and they will continue to be essential in the future- as we look to work with existing organisations to strengthen current infrastructures, and work with new stakeholders to develop new infrastructures.

So much of the work in building scholarly infrastructure is about building trust. We would love to see other organisations and services adopt POSI as well. Doing so would help us to collaborate more efficiently by allowing us to confirm from the outset that our fundamental values align. And having a set of verifiable commitments that we can point to will also help build the community’s trust in our respective organisations and services.

And this brings us to an important point. Although POSI might have been inspired by Crossref, POSI is not a “Crossref thang” and it never has been. The movement to create open scholarly infrastructures and to define and clarify the ground rules within which they operate has become a much broader community concern.

To this end, we’ve worked with some sibling infrastructure organisations—such as Dryad and ROR—as well as the original authors of POSI to create a website where we could host the list of principles independent of the original blog post and independent of any single organisation:

openscholarlyinfrastructure.org

Minimally, this provides a place for anybody who wants to link to or cite POSI - either because they are endorsing them, or because they are simply discussing them.

If we see enough activity of this type, then the site could evolve to become a register of those organisations and services who have formally adopted POSI and a place where they can link to their self-assessments against the principles.

The community promoting, discussing and applying POSI has long since grown beyond the original authors of the POSI blog post. And it is also much larger than any single organisation. Our hope is that this website encourages that growth.

And, of course, in addition to the external outreach and coordination, Crossref still has internal work to do in addressing the outstanding issues that were raised in our own self-assessment above. We need to increase our contingency funds. We need to publish a patent non-assertion covenant. We need to open source our core software. And we need to clarify our metadata license information and make it explicit that Crossref waives copyright (using CC-0) for any metadata generated by Crossref. And, finally, as Crossref expands and starts working with different stakeholders, we will need to adjust our governance and the composition of our board accordingly. We will, of course, post updates here as we make progress on addressing these areas.

2020 marked Crossref’s 20th birthday. What a grim year to have an anniversary. But we are, at least, ending it on a little bit of a high. We are delighted that the issue of open scholarly infrastructure has become so prominent in the community. And we are eager to help strengthen and extend this infrastructure. The decision by Crossref’s board to adopt POSI is the equivalent of Crossref finally adopting a written constitution. And it is a fitting launch to our next twenty years.

Calling all 24-hour (PID) party people!

Kathleen Luschek — Tue, 13 Oct 2020 00:00:00 +0000

While we wish we could be together in person to celebrate the fifth PIDapalooza, there’s an upside to moving it online: now everyone can participate in the universe’s best PID party! With 24 hours of non-stop PID programming, you’ll be able to come to the party no matter where you happen to be.

Send us your ideas for #PIDapalooza21

Now is your chance to share your work in the #PIDapalooza21 spotlight! We’re seeking proposals for short, interactive sessions about what you are doing––or want to do––with persistent identifiers and the communities that love and use them.

#PIDapalooza21 will feature sessions around the broad theme of PIDs and Open Research Infrastructure, focusing on the following areas:

Theme 1. PIDs 101

For PID beginners! You’ve got just 30 minutes to get attendees up to speed on a PID or PIDs. Make it fast! Make it fact-filled! Make it fun!

Theme 2. PID Communities International

Have you always wanted to host a Spanish-language PID session, or bring together PID people in the humanities? Tell us how you’d connect with PID peers around the world!

Theme 3. PID Success Stories

There’s nothing better than hearing about what’s working in the PID world––and why! Share your success stories so we can all benefit from them.

Theme 4. PID Party!

It wouldn’t be PIDapalooza without the party sessions, so be creative! Help us make this the best PID party ever!

Propose a session now!

The call for proposals will be open until October 30. Submit your PIDea now!

*Note: The PIDapalooza submission form uses Google. If you are unable to access Google Forms, email your session idea.

Get the full low-down on #PIDapalooza21 at the PIDapalooza website.

EASE Council Post: Rachael Lammey on the Research Nexus

Rachael Lammey — Mon, 12 Oct 2020 00:00:00 +0000

This blog was initially posted on the European Association of Science Editors (EASE) blog: “EASE Council Post: Rachael Lammey on the Research Nexus”. EASE President Duncan Nicholas accurately introduces it as a whole lot of information and insights about metadata and communication standards into one post…

I was given a wide brief to decide on the topic of my EASE blog, so I thought I’d write one that tries to encompass everything - I’ll explain what I mean by that.

In the past, Crossref has had the opportunity to talk to EASE members about the importance of registering content whose metadata contains important information related to the article. Richer metadata helps to connect the content to other key information such as who wrote it, who it was funded by, the relevant license, the research it cites, any updates to the work such as corrections and retractions, and the data that underpin the research. The use of open persistent identifiers like DOIs, funder IDs, ORCID iDs and ROR IDs are always recommended.

Such rich and connected metadata also helps discoverability of the published research in a different way than just direct access; if you can find something based on looking at the publications related to a particular funder, author, or institution, then there are more ways to come across what you’re looking for. Making links between objects underpinning the research also helps put the research in context and can help further research by making connections to other valuable information that may have been more difficult to make otherwise.

I’ve mentioned the Research Nexus in the title of this post. It’s achieved by declaring relationships between publications and other associated research objects, and from those objects to related publications. The metadata that reveals relationships between research objects can be as informative as the objects themselves. These relationships can assert certain facts that may not be otherwise obvious: this is our goal with the Research Nexus. These relationships and assertions need to exist not just on the web pages of the outputs, but also reflected in a standard way in the metadata so that the information is computer-readable and can be used at scale. As Jennifer Lin, who coined the term, explains:

“Researchers are adopting new tools that create consistency and shareability in their experimental methods. Increasingly, these are viewed as key components in driving reproducibility and replicability. They provide transparency in reporting key methodological and analytical information. They are also used for sharing the artefacts which make up a processing trail for the results: data, material, analytical code, and related software on which the conclusions of the paper rely. Where expert feedback was also shared, such reviews further enrich this record.”

In her Crossref blog, Jennifer goes on to give some examples, including:

Linking to an entire collection of methods and video protocols via Protocols.io
Linking to software and peer reviews in JOSS
Linking to preprint, data, code, source code, peer reviews in Gigascience

I’d include an additional example of linking research to the grant using the grant identifier and associated metadata from the funding section of this PLOS paper (read more about the example from EuroPMC who register grants with Crossref for Wellcome).

These links can be established by adding them into the Crossref relationship metadata schema. The information is then made available to anyone via our open APIs, so that they can easily see and use the information.

In all of these, publishers and other parties are linking to associated research outputs to support the reproducibility and discoverability of content.

The reproducibility point is worth reiterating; EASE has always supported projects to maintain high standards around the review of research, publication standards and ethics, and the reduction of research waste. And connecting articles to data, preprints, protocols, and peer reviews, and making the relationships open for analysis will help achieve this.

We also know that there are work and cost involved in establishing these links, and we’re working on ways to lower the barriers in doing so by:

Revisiting what we charge to encourage best practice. Starting in 2020, we have removed fees for registering vital information on corrections, retractions and other Crossmark metadata. This is timely in light of the updates to the EASE Standardised Retraction form.
We’re also working to remove fees for translations and versions that are linked together by the appropriate relationship metadata so that publishers posting translations or different versions of an article don’t have to pay multiple times for these. Our Membership & Fees Committee is currently reviewing other ways we can support publishers keen to make these connections.
Finding ways to make it easier for publishers to collect this information from authors e.g. submission systems integrations with data repositories to collect robust information on article/data links.
Allowing the registration of peer review metadata for content other than journal articles e.g. books, preprints (coming soon).
Making it easier for publishers to register this information with us at Crossref via the provision of simple to use tools, interfaces and reporting.

The outputs of the research process, such as journal articles, don’t exist in isolation - you only have to look at the interest in the corpus of COVID-19 publications, preprints and associated data to see this. This thinking is also supported by campaigns like Metadata 2020 advocating for “richer, connected, and reusable, open metadata will advance scholarly pursuits for the benefit of society.” The relationships revealed by the Research Nexus may one day help progress research to realise benefits that help us all, providing we all make efforts to effectively support them. More to come…

2020 Board Election

Lucy Ofiesh — Mon, 28 Sep 2020 00:00:00 +0000

This year, Crossref’s Nominating Committee assumed the task of developing a slate of candidates to fill six open board seats. We are grateful that in the midst of a challenging year, we received over 70 expressions of interest from all around the world, a 40% increase from last year’s response. It was an extraordinary pool of applicants and a testament to the strength of our membership community.

There are six seats open for election (two large, four small), and the Nominating Committee is pleased to present the following slate.

The 2020 slate

Candidate organisations, in alphabetical order, for the Small category (four seats available):

Beilstein-Institut, Wendy Patterson
Korean Council of Science Editors, Kihong Kim
OpenEdition, Marin Dacos
Scientific Electronic Library Online (SciELO), Abel Packer,
The University of Hong Kong, Jesse Xiao

Candidate organisations, in alphabetical order, for the Large category (two seats available):

AIP Publishing, Jason Wilde,
Oxford University Press, James Phillpotts,
Taylor & Francis, Liz Allen

Here are the candidates’ organisational and personal statements

You can be part of this important process, by voting in the election

If your organisation is a voting member in good standing of Crossref as of September 14, 2020, you are eligible to vote when voting opens on September 30, 2020.

How can you vote?

On September 30, 2020, your organisation’s designated voting contact will receive an email with the Formal Notice of Meeting and Proxy Form with concise instructions on how to vote. You will also receive a user name and password with a link to our voting platform.

The election results will be announced at LIVE20 virtual meeting on November 10, 2020.

Open Abstracts: Where are we?

Ludo Waltman — Fri, 25 Sep 2020 00:00:00 +0000

The Initiative for Open Abstracts (I4OA) launched this week. The initiative calls on scholarly publishers to make the abstracts of their publications openly available. More specifically, publishers that work with Crossref to register DOIs for their publications are requested to include abstracts in the metadata they deposit in Crossref. These abstracts will then be made openly available by Crossref. 39 publishers have already agreed to join I4OA and to open their abstracts.

Where are we at the moment in terms of openness of abstracts? For an individual publisher working with Crossref, the percentage of the publisher’s content for which an abstract is available in Crossref can be found in Crossref’s Participation Reports. The chart presented below gives the overall picture (as of September 1, 2020) for medium-sized and large publishers working with Crossref. The vertical axis shows the number of journal articles of a publisher in the period 2018-2020. Because of the large differences between publishers in the number of articles they publish, this axis has a logarithmic scale. The horizontal axis shows the percentage of the articles of a publisher for which an abstract is available in Crossref. The orange dots represent publishers that have agreed to join I4OA. The publishers colored in blue have not yet agreed to join the initiative.

A similar chart was published a few months ago in this blog post on the importance of open abstracts. Comparing the above chart with the one published a few months ago, the first effects of I4OA are already visible. While for most publishers the percentage of abstracts available in Crossref has hardly changed, it has increased from 11% to 95% for the Royal Society, one of the founding publishers of I4OA. This reflects the efforts the Royal Society has made over the past months to improve the availability of abstracts in Crossref for its content, not only for new content but also for existing content. For SAGE, another founding publisher of I4OA, the percentage of abstracts available in Crossref has increased from 38% to 50%. A further increase can be expected to take place in the coming months. The third founding publisher of I4OA, Hindawi, has remained at a stable level, with abstracts being available for 97% of its content.

The above chart shows that many publishers supporting I4OA are already making abstracts available in Crossref. Other publishers do not yet make abstracts available in Crossref but have nevertheless decided to join I4OA. This is the case for Frontiers, PLOS, and Karger, and also for several smaller publishers not visible in the above chart, such as EMBO and Ubiquity Press. These publishers are currently adjusting their workflows and will start submitting abstracts to Crossref soon.

Of the publishers that have not yet joined I4OA, some may not yet be aware of I4OA, while others may need more time to decide whether they will join the initiative. As can be seen in the above chart, most publishers that have not yet joined I4OA do not make abstracts available in Crossref at the moment. However, some publishers have not yet joined I4OA even though they do make abstracts available in Crossref. We hope these publishers will join I4OA soon. By joining the initiative, these publishers would formalize their commitment to openness of abstracts.

None of the publishers in the above chart makes abstracts available in Crossref for 100% of its journal content. Some publishers, such as Copernicus and Hindawi, are close to 100%, but even these publishers have some content for which no abstract is available. Importantly, this does not necessarily mean that publishers have failed to submit abstracts to Crossref for some of their content. Instead, it may simply mean that some of their journal content does not have an abstract. Research articles usually have an abstract, but many other types of content published in journals, such as book reviews, letters, editorials, and corrections, often do not have an abstract. For most publishers, it is therefore impossible to make abstracts available for 100% of their content. Moreover, since Crossref does not distinguish between different types of content published in journals, we cannot provide separate statistics on the availability of abstracts for different types of journal content.

As an example, let’s consider Brill, a publisher that has joined I4OA and that mainly focuses on the humanities and social sciences. Abstracts are available in Crossref for 57% of Brill’s content in the period 2018-2020. This may suggest that Brill has failed to submit abstracts to Crossref for a significant share of its content. However, when we look up journal publications of Brill in 2018 and 2019 in the Web of Science database, abstracts turn out to be available for only 68% of these publications. Assuming that Web of Science has more or less complete coverage of abstracts, this seems to indicate that Brill has already submitted most of its abstracts to Crossref. In fact, Web of Science shows that about a quarter of the publications of Brill are book reviews and that hardly any of these book reviews has an abstract. This illustrates why some publishers, for instance those that publish many book reviews, cannot be expected to get close to 100% availability of abstracts.

Despite the above caveats, it is clear that there is still a long way to go in improving the availability of abstracts in Crossref. As of September 1, 2020, abstracts were available for 21% of all journal articles in Crossref in the period 2018-2020. In Web of Science (Science Citation Index Expanded, Social Sciences Citation Index, and Arts & Humanities Citation Index), 86% of all journal publications in 2018 and 2019 that have a DOI also have an abstract.

Publishers who wish to distribute their abstracts openly through Crossref can include them in the normal content registration process. They can send XML to Crossref (using Crossref’s metadata deposit schema), either directly via HTTPS POST or via the Crossref admin system. For back-content, a resubmission of the full XML is required. In addition, various tools can be used to deposit abstracts. Open Journal Systems (OJS) has a plugin that supports the depositing of abstracts. Metadata Manager also facilitates this, but only for journal articles. Crossref’s web deposit form does not yet support abstracts, but Crossref is working on this.

To keep track of the progress publishers are making in depositing abstracts in Crossref, we plan to publish regular updates of the chart presented above on the I4OA website. We look forward to witnessing the impact of I4OA in the coming months!

Thank you to guest authors Bianca Kramer and Ludo Waltman, as well as the other founding members of I4OA.

Get involved with Peer Review Week 2020 and register your peer reviews with Crossref

Amanda Bartell — Mon, 21 Sep 2020 00:00:00 +0000

Just when you thought 2020 couldn’t go any faster, it’s Peer Review week again! Peer Review is such an important part of the research process and highlighting the role it plays is key to retaining and reinforcing trust in the publishing process.

As the Peer Review Week team states:

“Maintaining trust in the peer review decision-making process is paramount if we are to solve the world’s most pressing problems. This includes ensuring that the peer review process is transparent (easily discoverable, accessible, and understandable by anyone writing, reviewing, or reading peer-reviewed content) and that everyone involved in the process receives the training and education needed to play their part in making it reliable and trustworthy.”

A key way that publishers can make peer reviews easily discoverable and accessible is by registering them with Crossref - creating a persistent identifier for each review, linking them to the relevant article, and providing rich metadata to show what part this item played in the evolution of the content. It also gives a way to acknowledge the incredible work done by academics in this area.

For Peer Review week last year, Rosa and Rachael from Crossref created this short video to explain more.

Fast forward to 2020 and over 75k peer reviews have now been registered with us by a range of members including Wiley, Peer J, eLife, Stichting SciPost, Emerald, IOP Publishing, Publons, The Royal Society and Copernicus. We encourage all members to register peer reviews with us - and you can keep up to date with everyone who is using this API query. (We recommend installing a JSON viewer for your browser to view these results if you haven’t done so already).

Register peer reviews and contribute to the Research Nexus

At Crossref, we talk a lot about the research nexus, and it’s a theme that you’re going to hear a lot more about from us in the coming months and years.

The published article no longer has the supremacy it once did, and other outputs - and inputs - have increasing importance. Linked data and protocols are key for reproducibility, peer reviews increase trust and show the evolution of knowledge, and other research objects help increase the discoverability of content. Registering these objects and stating the relationships between them support the research nexus.

Peer reviews in particular are key to demonstrating that the scholarly record is not fixed - it’s a living entity that moves and changes over time. Registering peer reviews formally integrates these objects into the scholarly record and makes sure the links between the reviews and the article both exist and persist over time. It allows analysis or research on peer reviews and highlights richer discussions than those provided by the article alone, showing how discussion and conversation help to evolve knowledge. In particular, post-publication reviews highlight how the article is no longer the endpoint - after publication, research is further validated (or not!) and new ideas emerge and build on each other. You can see a real-life example of this from F1000 in a blog post written by Jennifer Lin a few years ago.

As we’ve said before:

Article metadata + peer review metadata = a fuller picture of the evolution of knowledge

Registering peer reviews also provides publishing transparency and reviewer accountability, and enables contributors to get credit for their work. If peer review metadata includes ORCID IDs, our ORCID auto-update service means that we can automatically update the author’s ORCID record (with their permission), while our forthcoming schema update will take this even further, making CRediT roles available in our schema.

How to register peer reviews with Crossref

You need to be a member of Crossref in order to register your peer reviews with us and you can currently register peer reviews by sending us your XML files. Unfortunately, you can’t currently register peer reviews using our helper tools like the OJS plugin, Metadata Manager, or the web deposit form.

You can find out more about registering peer reviews on our website - we even have a range of markup examples.

We know that there’s a range of outputs from the peer review process, and our schema allows you to identify many of them, including referee reports, decision letters, and author responses. You can include outputs from the initial submission only, or cover all subsequent rounds of revisions, giving a really clear picture of the evolution of the article. Members can even register content for discussions after the article was published, such as post-publication reviews.

Get involved with Peer Review Week 2020

We’re looking forward to seeing the debate sparked by Peer Review Week and hearing from our members about this important area. You can get involved by checking out the Peer Review Week 2020 website or following @PeerRevWeek and the hashtags #PeerRevWk20 #trustinpeerreview on Twitter.

We’re excited to see what examples of the evolution of knowledge will be discoverable in registered and linked peer reviews this time next year!

Crossref at the Frankfurt Digital Book Fair

Rosa Morais Clark — Thu, 17 Sep 2020 00:00:00 +0000

Frankfurt Book Fair (#FBM20) will be online this year since people are really not traveling right now. This special edition of #FBM20 will have an extensive digital program in which we will be participating. So you can hang out with us from anywhere in the world!

Similar to the in-person event of years past, members of our technical support, membership, and outreach teams will be on hand at our online Crossref Cafe.

Here are our Crossref Cafe hours:

	Support	Membership	Community outreach	Product
Wed 14 Oct 8:00 - 9:00 UTC	Paul	Sally	Vanessa	Bryan
Wed 14 Oct 14:00 - 15:00 UTC	Shayn	Anna	Susan	Sara
Thu 15 Oct 8:00 - 9:00 UTC	Paul	Laura	Vanessa	Martyn
Thu 15 Oct 14:00 - 15:00 UTC	Isaac, Shayn	Anna, Kathleen	Susan	Kirsty
Fri 16 Oct 8:00 - 9:00 UTC	Paul	Amanda	Vanessa, Rachael	Rakesh
Fri 16 Oct 14:00 - 15:00 UTC	Isaac, Shayn	Anna, Kathleen	Susan

Who will be online:

Susan, Vanessa, and Rachael can talk to you about our upcoming events.
Kirsty can talk to you about Crossmark.
Kathleen can explain Similarity Check.
Laura can show you how to use Metadata Manager for Content Registration.
Isaac, Shayn, and Paul can help troubleshoot any metadata, DOI, or reporting needs.
Sara can talk to you about content registration.
Anna will give you a ‘metadata health check’ including a tour of your Participation Report.
Rakesh can talk to you about product design.
Sally and Amanda can answer your questions about membership.
Martyn can talk to you about Cited-by.
Bryan can talk to you about recent updates to our products and services.

We are happy to schedule one-on-one virtual meetings as well.

Please do drop-in to say “Guten Tag”. We’re looking forward to seeing you online!

Blog on Crossref

Innovation in scientific publishing and its implications for Crossref DOI registration practices - MetaROR’s approach

DOI registration for articles on the MetaROR platform

Record type for articles on the MetaROR platform

Summary of MetaROR’s approach to Crossref DOI registration

Outlook

A spotlight on our community in Indonesia

Translation in Bahasa Indonesia

Insights from a roundtable on author affiliation metadata

Insights from presenters

Insights into challenges

Inherent data complexity

Author-related issues

Technical barriers

Publisher practices

Insights into solutions

Adopt collective approaches

Engage authors and institutions

Improve the tech

Encourage publisher best practices

Moving forward

References

Participating organizations

The GEM program - Year Three and program expansion for 2026

Overview of the first 3 years of GEM

Program expansion in 2026

Reduction of Grant DOI registration fees: a boost for the Research Nexus

A fee reduction and a two-year fee waiver pilot

Supercharging the Grant Linking System

Why give funders a fee break and not others?

The best way of acknowledging research funding in the metadata: Crossref Grant ID

Highlights of a very busy year: our 2025 annual report

Strategic theme 1: Contribute to an environment where the community identifies and co-creates solutions for broad benefit

Enhanced tools and services

Deprecations and modernisation

Eating our own DOI dogfood

Community connections

Strategic theme 2: A sustainable source of complete, open, and global scholarly metadata and relationships

Schema developments

Public data file

Inaugural Metadata Awards

Metadata Matching project

Data infrastructure and Research Nexus participation dashboard

Retraction Watch integration

Metadata API and services improvements

Strategic theme 3: Manage Crossref openly and sustainably, modernising and making transparent all operations so that we are accountable to the communities that govern us

Infrastructure modernisation

RCFS Projects

Membership growth, efficiencies, and accessibility

Growth by the numbers

Organisational sustainability

Open governance through board election and annual meeting

Strategic theme 4: Foster a strong team—because reliable infrastructure needs committed people who contribute to and realise the vision, and thrive doing it

Team structure

New staff and new roles

Supporting a thriving global culture

Twenty-five years of Crossref: reflections from the 2025 annual meeting and board election

Shared perspectives from the community

Governance and election results

Tools in practice

Crossref then & now

Roadmap highlights

Resourcing Crossref for Future Sustainability (RCFS)

Behind the scenes: metadata, data science

Data science at Crossref

Community highlights

Closing reflections

Wellcome and Europe PMC: supporting Open Research through open metadata

What motivated you to join Crossref?

The way Wellcome implemented the Grant Linking System is a bit unique, given that it partnered with Europe PMC for the technical implementation and metadata registration with Crossref. Can you tell us more about how it works?

How is Wellcome leveraging the funding metadata and Crossref grants IDs that are being shared and registered with Crossref?

Wellcome is streamlining the way of asking grantees to report on their publications, facilitated by Europe PMC. Can you tell us a bit more about how this will work and what role metadata will play?

If you look into the future, what would your hopes be for the GLS and greater transparency in funding metadata in general? What do you think that we could achieve collectively as a community?

What would you say to colleagues in other funders about investing in open metadata?

Some things are big because they are small – the new fee tier for Crossref members takes effect

Version in Español

Algunas cosas son grandes porque son pequeñas: la nueva tarifa para los miembros de Crossref entra en vigencia

It's Time: Planning for Metadata Schema Deprecation

Which schema will be deprecated?

Why deprecate now?

Se pudesse alterar algo no GLS ou na forma como os metadados dos subsídios que regista são utilizados, o que seria?