The Crossref Curriculum

Administrative metadata

Administrative metadata provides information about the origin and maintenance of a research object. This includes a link to accessing its full-text. Administrative metadata includes information needed to support the preservation of a research object, including archiving arrangements. For example, a particular application and operating system may be required in order to access a digital file.

Learn more about the three main types of metadata: descriptive (bibliographic), administrative, and structural.

On this page, learn more about:

Access to full-text through a landing page

Include the URL for access to full-text, so that readers can access your content. Learn more about creating a landing page.

Funding information

Add funder information, including the funder’s unique identifier from the Funder Registry, and help build connections between funders and research outputs.

Linking research funding and published outcomes

Funding data is used by funders to track the publications that result from their grants, including use of facilities, equipment, salary awards, and so on.

Publishers can contribute by depositing the funding acknowledgements from their publications as part of their standard metadata. The deposit should include funder names, funder IDs, and associated grant numbers.

Funding data can be searched using our interfaces for people or our APIs for machines. This data clarifies the scholarly record, and makes life easier for researchers who may need to comply with requirements to make their published results publicly available.

How to collect and register funding data

  • Ask authors to submit the names of their funder(s) and grant numbers when they submit their manuscript, or extract funding information from the text of accepted manuscripts
  • Match funder names to their corresponding Funder ID in the Funder Registry
  • Deposit with Crossref funder name(s), ID(s) and grant ID(s) for each DOI.
    • You can register funding data as a stand-alone deposit (useful for backfiles) or as part of your standard metadata deposit (for current content)
  • Make use of our metadata retrieval tools to check the metadata we hold for your publications (and to retrieve metadata for your own analysis)
  • Check your progress using Participation Reports (beta) to see the percentage of your deposits that have funding data (and other key metadata elements) registered.

License information

Copyright is a type of intellectual property, which allows the copyright owner to protect against others copying or reproducing their work. Copyright arises automatically when a work that qualifies for protection is created. Scholarly communications relies on researchers sharing, adapting, and building on the work of others, so a license (an official permission or permit) is needed in order for copyrighted content to be used in these ways.

Including license information (or access indicators) in your deposit is very helpful in letting readers know how they can access and use your content, for example, in text and data mining. You can include access indicators in metadata deposits.

Examples of licenses

An additional element (<ai:program>) has been added from schema version 4.3.2 to support the access indicators schema (AccessIndicators.xsd).

License information metadata collected includes:

  • free-to-read status (free_to_read)
  • license URL element (license_ref)
  • start_date attribute, optional, date format YYYY-MM-DD
  • applies_to attribute, optional, allowed values are:
    • vor (version of record)
    • am (accepted manuscript)
    • tdm (text mining)

Note that free-to-read is an access indicator, separate from the license. It’s used to show that a work is available at no charge for a limited time, but would normally be behind a paywall.

Access indicators may be included in a metadata deposit, submitted as a resource deposit, or uploaded as a .csv file, and may be included with CrossMark metadata where applicable. The ai namespace must be included in the schema declaration, for example:

<doi_batch version="4.3.6" xmlns="https://www.crossref.org/schema/4.3.6" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://www.crossref.org/schema/4.3.6 https://www.crossref.org/schemas/crossref4.3.6.xsd" xmlns:ai="https://www.crossref.org/AccessIndicators.xsd">

Best practice for license information

This guidance for members on how to register better license metadata with Crossref is to help academic institutions identify content written by their researchers, and how this content may be used, particularly in an automated, machine-readable way.

Institutions need to know which article version may be exposed on an open repository, and from what date. It is no longer sufficient simply to describe in words how they may calculate the embargo end-date, for example, by referring them to a general set of terms and conditions that apply to all of your content across its whole lifecycle – they need to know whether this version of this article can be exposed on their repository and, if so, from what specific date, and what repository readers can then do with the content they find there.

The Crossref schema contains all the fields you need to specify this unambiguously. By doing so, you can also be more confident that institutions will have the information they need to respect your terms and conditions.

In this section, learn more about:

How Crossref collects license information

A single Crossref DOI can be associated with metadata relating to multiple versions of a work: the author’s accepted manuscript (AAM), version of record (VoR), or a version intended for text and data mining (TDM). Each of these versions can have their own license conditions attached to them. To reflect this, works in Crossref can have multiple license elements. Each license element can contain a URL to a license, the article version to which the license applies, and the license start date. Together, these can describe nuanced license terms across different versions of the work. An analysis done by Jisc of Crossref metadata found that while 48% of journal articles published in 2017 had license information, the licenses most often referred to the text and data mining version of the work, and licenses were still being used inconsistently for the version of record (VoR) or accepted manuscript (AM). A major concern is that many members link to their general terms and conditions rather than to licenses that apply at specific times to specific versions of a work. For example, a member may set its policies out in a general terms and conditions page, and link to it in the license metadata:

<license_ref applies_to="vor" start_date="2019-01-01">
    http://www.publisherwebsite.com/general_terms_and_conditions
</license_ref>

On the terms and conditions page, the member could spell out, for example, the license that applies to the VoR, the restrictions that apply to the AAM during its embargo period, and details of how the AAM may be used after its embargo period. A repository manager would then have to go through the terms and conditions, and manually calculate the embargo end date, in order to determine whether the work could be deposited to a repository. This is a prohibitively onerous process for institutions, and risks content being used outside the terms of member policies because of human error. It would be helpful if members could instead set out specific licenses for each stage in each article’s lifecycle, for each of its versions. If the licensing terms for a version will change (for example, because it may be exposed on a repository after an embargo period), then a separate license should be used, with the start_date element indicating when the new license comes into effect. Using start dates for this license information is best practice in general, as it can validate immediate open access, which is at the heart of many institutional and funder policies. This is set out in more detail in the examples below.

Example: Green OA with Creative Commons license

In this example, a work is published on 1 January 2019. Under the member’s policy, the VoR is under access controls. The AAM is under embargo for a six-month period and then becomes open access under a CC BY NC ND license.

Green OA with Creative Commons license

Green OA with Creative Commons license

By using a Creative Commons license with a start date, the embargo end date can be unambiguously deduced from the metadata.

Example: Green OA with member-defined post-embargo license

Linking to a Creative Commons license is optimal whenever possible, as this is an unambiguously open license and so will be readily recognizable as identifying the post-embargo period. It is also a standard license which makes it more easily machine-readable. However, if you need to define your own open license, you can instead link to that in the metadata along with the appropriate start date.

Green OA with member-defined post-embargo license

Green OA with member-defined post-embargo license

Repository managers will still be able to unambiguously distinguish works that can be made available after an embargo period, albeit involving a brief manual check, provided the license identifies itself explicitly as referring specifically to the post-embargo period. It would not be suitable to provide a single URL containing license terms for both the pre-embargo and post-embargo period, for example:

<license_ref applies_to="am" start_date="2019-01-01">
    http://www.publisherwebsite.com/am_ general_terms
</license_ref>

This would not allow institutions to unambiguously determine the embargo end date and license, and so should be avoided.

Example: Gold OA

In the case of gold OA, the licenses are simple: both the AAM and the VoR have an open license (in this example, CC BY) that starts no later than the date of publication. The start date could optionally be omitted entirely, since the license terms will apply for the article’s lifetime.

Gold OA license

Gold OA license

Use cases

Having clear, unambiguous license metadata helps institutions use the content within your terms and conditions. For example, an institution could use Crossref to find works published by researchers at their organisation (provided you have also populated the affiliations of all the (co-)authors), and check programmatically for the presence and with-effect dates of any open license(s). This would show whether (and if so when) the work can be exposed on their repository.

How to populate your Crossref metadata with license information

There are multiple ways that members can add license information to the metadata they deposit/have deposited with Crossref:

Example license information as part of a metadata deposit

<publication_date media_type="print">
  <year>2013</year>
</publication_date>
<pages>
 <first_page>13</first_page>
</pages>
<ai:program name="AccessIndicators">
 <ai:free_to_read start_date="2011-02-11"/>
 <ai:license_ref applies_to="vor" start_date="2011-02-11">https://www.crossref.org/license</ai:license_ref>
</ai:program>
<doi_data>
 <doi>10.5555/openAI_test2</doi>
 <resource>https://www.crossref.org/test</resource>
</doi_data>

Example license information as part of a resource deposit

<body>
   <!-- license updates with dates / free to read info included-->
   <lic_ref_data>
     <doi>10.5555/pubdate1</doi>
      <ai:program name="AccessIndicators">
        <ai:free_to_read/>
         <ai:license_ref applies_to="vor" start_date="2011-01-11">https://www.crossref.org/vor-license</ai:license_ref>
         <ai:license_ref applies_to="am" start_date="2012-01-11">https://www.crossref.org/am-license</ai:license_ref>
         <ai:license_ref applies_to="tdm" start_date="2012-01-11">https://www.crossref.org/tdm-license</ai:license_ref>
      </ai:program>
   </lic_ref_data>
   <!-- license updates with just license URL included-->
   <lic_ref_data>
      <doi>10.5555/pubdate1</doi>
      <ai:program name="AccessIndicators">
         <ai:free_to_read/>
         <ai:license_ref>https://www.crossref.org/vor-license</ai:license_ref>
         </ai:program>
   </lic_ref_data>
</body>

Article numbers or IDs

Journal articles and other scholarly works often have an ID such as an article number, eLocator, or e-location ID instead of a page number. In these cases, do not use the <first_page> tag to capture the ID - instead, use the <item_number> tag with the item_number_type attribute value set to article_number.

Example article number or ID

<publication_date media_type="online">
   <month>5</month>
   <day>10</day>
   <year>2017</year>
</publication_date>
<publisher_item>
   <item_number item_number_type="article_number">3D9324F1-16B1-11D7- 8645000102C</item_number>
</publisher_item>
<crossmark>

Internal and other identifiers

You can include identifiers that are not explicitly defined in our deposit schema section within the optional <publisher_item> section. <publisher_item> is also used to capture article or e-location IDs. This option should only be used for identifiers that identify the item being registered. Use relationships to capture identifiers for related items.

Examples of identifier types include:

  • PII
  • SICI
  • DOI
  • DAI
  • Z39.23
  • ISO-std-ref
  • std-designation
  • report-number
  • other

Example of an identifier

<publisher_item>
   <identifier id_type="**pii**">s00022098195001808</identifier>
</publisher_item>

Publication IDs

Every publication in the Crossref system is assigned a unique publication ID. These are used mostly for internal purposes, but may be useful when retrieving data in bulk or identifying a specific title. Publication IDs may be retrieved using OAI-PMH, or from the browsable title list.

Find publication IDs using an OAI-PMH request

An OAI-PMH ListSets request will return titles and publication IDs for journals, books, conference proceedings, and series-level data:

J (journal) is the default set, set=B must be specified to retrieve book or conference proceeding titles, and S for series-level titles. Sets may be further limited by member prefix. Learn more about OAI-PMH.

The publication ID is listed within the <setspec> element, after the set and member prefix. For example, within the following set, 24 is the publication ID for Journal of Clinical Psychology:

<set>
   <setSpec>J:10.1002:24</setSpec>
   <setName>Journal of Clinical Psychology</setName>
</set>

Find publication IDs using the browsable title list

The browsable title list includes the publication ID next to each title in the search results. Select the Purple id icon icon to reveal the ID. For most purposes, publication IDs are always preceded by the publication type (J, B, or S for journal, book, or series).

Archive locations

Digital preservation is a combination of policies, strategies, and actions that ensure persistent access to digital content over time. It includes archiving arrangements. The Digital Preservation Coalition’s Digital Preservation Handbook gives a good introduction to practicalities and best practices in archiving arrangements.

Under the Crossref member obligations, you are asked to make best efforts to have your content archived by an archiving organization, and you are encouraged to include information about your designated archive in your metadata. This helps us work with archives to ensure your DOIs continue to resolve to your content, even if your organization ceases.

The archives listed in our deposit schema section are:

Another archiving service is PKP Preservation Network (PKP PN).

Please contact us if you have archiving arrangements with an organization that is not listed.

To include archiving metadata, insert the relevant archive information into your metadata above the doi_data section, for example:

<archive_locations>
    <archive name="CLOCKSS"/>
    <archive name="Internet Archive"/>
    <archive name="Portico"/>
    <archive name="KB"/>
</archive_locations>
<doi_data>
    <doi>10.32013/12345678</doi>
    <resource>https://www.crossref.org/xml-samples/
</doi_data>

Update or retraction information

Learn more about flagging content that has been updated, corrected, or retracted using Crossmark.

Last Updated: 2020 April 8 by Laura J. Wilkinson