Text and Data Mining

Latest blog posts

2026 July 09

Schema 5.5 now available: adding CRediT, new record types for blogs and posters, and more

Research is rarely limited to a single contributor performing a single role. Behind every research output are people contributing in various ways: software development, data analyses, methodology design, and much more. Often, the same person contributes in several of these ways. Until now, Crossref metadata could only capture part of that picture, but this is changing with Schema 5.5.

...Find out more

2026 July 02

Take part in UX Research at Crossref

Through user experience research (UXR) initiatives that take into account our diverse membership and community, we can have a continuous, deeper understanding of the role of metadata in our members’ workflows, and ensure that our work continues to meet our community’s needs. Your support is the key to this process, and will positively impact the wider community - and if you’d like to start today, you can take part in our latest initiative: help us improve our Events page by sharing your thoughts on the page’s feedback form.

...Find out more

2026 June 30

Building, refining, and connecting: summary of our May 2026 community update

Our 2026 Community Update took place on 13 May. Two calls, one for the eastern and one for the western time zone, highlighted how our global community is growing, how we’re refining the metadata that supports trust in the scholarly record, and connecting records more effectively through our latest tools.

...Find out more

2026 June 26

From commitment to connection: 200,000 grants in the scholarly record

Funding is one of the key enablers of the research lifecycle, but has been one of the hardest parts of the scholarly record to identify, describe and connect. This is slowly changing as we have recently reached a very exciting milestone for Crossref’s Grant Linking System (GLS). What makes it remarkable is not only the numbers reached, but where the data comes from. Research funders, who joined Crossref as members, have actively contributed more than 200,000 grants to the Research Nexus (Figure 1).

...Find out more

Evolving our support for text-and-data mining

Many researchers want to carry out analysis and extraction of information from large sets of data, such as journal articles and other scholarly content. Methods such as screen-scraping are error-prone, place too much strain on content sites and may be unrepeatable or break if site layouts change. Providing researchers with automated access to the full-text content via DOIs and Crossref metadata reduces these problems, allowing for easy deduplication and reproducibility. Supporting text and data mining echoes our mission to make research outputs easy to find, cite, link, assess, and reuse.

OTMI - An Update

We’ve just posted an update about OTMI (the Open Text Mining Interface) on our Web Publishing blog Nascent. This post details the following changes:

Contact email - otmi@nature.com
- Wiki - http://opentextmining.org/
  - Repository - https://web.archive.org/web/20090706181310/http://www.nature.com/otmi/journals.opml
  The OTMI content repository currently provides two years’ worth of full text across five of our titles:
  - Nature
    - Nature Genetics
      - Nature Reviews Drug Discovery
        Nature Structural & Molecular Biology
        The Pharmacogenomics Journal
        See the wiki for draft technical specs and for a sample script to generate the OTMI files. And feel free to add to the wiki on existing pages or create new pages as required.

Get involved

Find a service

Documentation

About us

2026 July 09

Schema 5.5 now available: adding CRediT, new record types for blogs and posters, and more

2026 July 02

Take part in UX Research at Crossref

2026 June 30

Building, refining, and connecting: summary of our May 2026 community update

2026 June 26

From commitment to connection: 200,000 grants in the scholarly record

Text and Data Mining

Evolving our support for text-and-data mining

OTMI - An Update

Topics

Archives