Documentation

Metadata principles and practices

When you register your content with us, you create a metadata record for a digital object. The metadata within that record becomes an enduring, widely distributed connection to the research nexus.

Our requirements are minimal, beyond basic bibliographic metadata. We’d like to require everything, but don’t because:

  • Not all metadata fields are relevant. For example, not all journals have volumes and issues, and not all articles have funding.
  • Our members are not always able to send us everything, and having some metadata is better than having no metadata. For example, it’s better to have an identifier attached to basic bibliographic information than for there to be no identifier at all.
  • Some metadata are hard to come by. For example, digitized back issues may not have good reference lists available.

However, we hope all members will follow our metadata best practices rather than just meeting the basic requirements. This will ensure that the records and identifiers you register with us are discoverable and connected.

Principles (modeled on Metadata 20/20 principles)

Metadata 20/20 has a set of basic principles that can be applied to our metadata to ensure that it is Compatible, Complete, Credible and Curated.

Principles are aspirational - they help us define what we hope to accomplish with our metadata. So while we don’t meet all of the principles completely, they can still guide us as we move forward. Let’s take a look at the Metadata 20/20 principles one-by-one.

COMPATIBLE: provide a guide to content for machines and people So, metadata must be as open, interoperable, parsable, machine actionable, human readable as possible.

How are we compatible?

  • The metadata provided to Crossref is made freely and openly available through our APIs
  • Crossref metadata is provided in both JSON and XML formats. Our JSON and ‘UNIXSD’ XML formats are comprehensive and contain all metadata registered with us (with the exception of the reference data that certain members have elected not to share).
  • We also provide limited metadata tailored for specific purposes via content negotiation (BibTeX, RIS, RDF).
  • We try to make use of vocabularies and identifiers as much as possible, and allow free text only when there is no other option.

What more can we do?

  • Provide a JSON schema to make REST API outputs easier to ingest.
  • Adopt and support existing and new standards that define the metadata we collect.

COMPLETE: reflect the content, components and relationships as published So, metadata must be as complete and comprehensive as possible.

How are we complete?

We aim to collect all metadata that is relevant to describing and using the scholarly content registered with us, and work to make it possible for members to send this metadata to us.

What more can we do?

A lot, this is our biggest challenge - we need to:

  • Make it easy for members to send metadata to us.
  • Make it easy for members to assess the metadata they have sent to us.
  • Evolve our schema (or evolve beyond an XML schema) to quickly to support new types of content and metadata segments.

CREDIBLE: enable content discoverability and longevity So, metadata must be of clear provenance, trustworthy and accurate.

How are we credible?

Our metadata is provided to us by our members, and we don’t curate or clean up the metadata in any way. We do insert metadata into outputs such as DOI matches for citations, recursive relationships, and clearly flag those pieces as being inserted by Crossref in our metadata outputs.

This means, good or bad, metadata accuracy depends on the quality of metadata provided by our members.

What more can we do?

We can:

  • Facilitate reporting and correction of metadata errors identified by metadata users.
  • Create tools to help members assess their metadata quality.

CURATED: reflect updates and new elements So, metadata must be maintained over time.

How are we curated?

An important obligation for our members is to keep metadata up to date - for some this may mean periodically updating registered URLs, for others this may mean ensuring license and Crossmark data is current.

What more can we do?

  • Assess and report URLs that are broken.
  • Provide tools to allow members to assess their license metadata.
  • Make sure that DOIs that move from member to member are maintained.

Page owner: Patricia Feeney   |   Last updated 2021-October-22