Metadata Plus snapshots

Metadata Plus snapshots provide access to our 101886768 metadata records in a single file, providing an easy way to retrieve an up-to-date copy of our records. Snapshots are available for Metadata Plus service users.

The files are made available via a /snapshots route in the REST API which offers a compressed .tar file (tar.gz) containing the full extract of the metadata corpus in either JSON or XML formats.

How to access snapshots

New snapshots are created each month available by the 5th day, providing all records up to and including the previous month.

If you’re looking for the most up-to-date snapshot (all records up to and including the previous month), you can use the following URLs which will always alias to the current month:

If you want to test to see if a particular snapshot is available, you can do a HTTP HEAD request using the following URL patterns:

  • JSON output: https://api.crossref.org/snapshots/monthly/{YYYY/MM}/all.json.tar.gz
  • XML output: https://api.crossref.org/snapshots/monthly/{YYYY/MM}/all.xml.tar.gz

Please note that XML snapshots are available in UNIXSD format only.

As snapshots are available to Metadata Plus users only, you will need to identify yourself in the request by using a “Crossref-Plus-API-Token” HTTP header with your access token. The example below shows how this should be formatted, with XXX replaced by your token:

Crossref-Plus-API-Token: Bearer XXX

The files will be very large (>42GB) so may take a while to download depending on the speed of your internet connection.

Note: we have updated our Plus authorization procedures to use the “Crossref-Plus-API-Token” custom header so that you no longer need to follow the two-step workaround previously documented here. We will continue to support the previous “Authorization” header until no later than 2019-01-31, though we may end support earlier if we can confirm that the old procedure is no longer in use.

Please contact our support team if you’re unable to access snapshots.

Keeping your data current

For applications where you want to keep a copy of our metadata records current, use OAI PMH Plus (as described above) or the REST API to query for new records at your preferred interval.


FAQs

When are snapshots for each month made available?

Snapshots will become available around the 5th day of each calendar month. We are working to optimize this process to make them available sooner.

Are snapshots for ‘all time’ available?

Snapshots are kept available for current and previous quarters. Each quarter we will remove the files from the two prior quarters (e.g.: on 1st April the files from the previous Oct/Nov/Dec are removed).

I’m seeing a 404 error when I request the url

If you’re looking for the current month, this may be because the archive hasn’t been created for that month yet. They are usually available by the 5th of each month.

If you’re looking for a month that’s more than 6 months old, it may be that the snapshot has been deleted. On a quarterly basis, we will remove the files from two quarters previous (e.g. on April 1, 2018 the files from Oct/Nov/Dec 2017 will be removed). If you aren’t looking for a particularly new or particularly old archive and you’re still seeing a 404 error, please contact us using the dedicated Plus support email: plus-support@crossref.org.

I’m seeing a 401 error when I request the url

Snapshots are only available to Metadata Plus users. This message means that the system doesn’t recognise you as a Metadata Plus user. If you’re already a Metadata Plus user, make sure you’re using your correct token in the header of your query. If you’re still having problems, contact us using the dedicated Plus user support email.

I need a full snapshot mid-month

Snapshot archives are provided at the start of each month. The archive contains all the registered content received by Crossref up until that time. (Really? Yea, all of it.) If you need a snapshot mid month, you should download and ingest the latest archive and then harvest and ingest the registered content that has changed since then.

To get the registered content that has changed since an archive was created, use OAI PMH Plus or the REST API. For example, if the archive was created on January 31, 2018 then the OAI PMH Plus harvest’s initial URL is

https://oai.crossref.org/oai?verb=ListRecords&set=J&from=2018-01-31&metadataPrefix=cr_unixsd

This will harvest journal data. If you are interested in book data then use the “B” set.

https://oai.crossref.org/oai?verb=ListRecords&set=B&from=2018-01-31&metadataPrefix=cr_unixsd

If you are interested in series data then use the “S” set.

https://oai.crossref.org/oai?verb=ListRecords&set=S&from=2018-01-31&metadataPrefix=cr_unixsd

It is important to use the “created” date and not the “completed” date. It takes time to build the archive and so changes will occur during the build. Using the created date ensures those changes are harvested too.


Please contact our Plus support team with any questions.

Last Updated: 2018 April 30 by Amanda Bartell