Metadata Plus snapshots

Metadata Plus snapshots provide access to our 98922134 metadata records in a single file, providing an easy way to retrieve an up-to-date copy of our records. Snapshots are available for Metadata Plus service users.

The files are made available via a /snapshots route in the REST API which offers a compressed .tar file (tar.gz) containing the full extract of the metadata corpus in either JSON or XML formats.

How to access snapshots

New snapshots are created each month available by the 5th day, providing all records up to and including the previous month.

If you’re looking for the most up-to-date snapshot (all records up to and including the previous month), you can use the following URLs which will always alias to the current month:

If you want to test to see if a particular snapshot is available, you can do a HTTP HEAD request using the following URL patterns:

  • JSON output: https://api.crossref.org/snapshots/monthly/{YYYY/MM}/all.json.tar.gz
  • XML output: https://api.crossref.org/snapshots/monthly/{YYYY/MM}/all.xml.tar.gz

Please note that XML snapshots are available in UNIXSD format only.

As snapshots are available to Metadata Plus users only, you will need to identify yourself in the request by using an “Authorization” HTTP header with your access token. The example below shows how this should be formatted, with XXX replaced by your token:

Authorization: Bearer XXX

The files will be very large (>42GB) so may take a while to download depending on the speed of your internet connection.

Note: Users accessing snapshots via curl or other command line tools may find that authorization fails due to a redirect issue. If you are experiencing difficulties, the following two-step curl workaround may help:

1st curl command:

curl  -verbose -H 'Authorization: Bearer YOUR_TOKEN_HERE'  -H 'User-Agent: `Downloader/1.1 (mailto:YOUR_EMAIL_HERE)'  -o all.xml.tar.gz -X GET https://doi.crossref.org/snapshots/monthly/2018/04/all.xml.tar.gz`

Note: Unnecessary use of -X or –request, GET is already inferred.

Use the value of the “Location:” response header provided in the output of the 1st command within your 2nd command.

2nd curl command:

curl 'YOUR_LOCATION_VALUE_HERE' > file.tar.gz

Please contact our support team if you’re unable to access snapshots. We plan to roll out a modified approach to authorization to avoid this redirect issue soon.

Keeping your data current

For applications where you want to keep a copy of our metadata records current, use OAI PMH Plus (as described above) or the REST API to query for new records at your preferred interval.


FAQs

When are snapshots for each month made available?

Snapshots will become available around the 5th day of each calendar month. We are working to optimize this process to make them available sooner.

Are snapshots for ‘all time’ available?

Snapshots are kept available for current and previous quarters. Each quarter we will remove the files from the two prior quarters (e.g.: on 1st April the files from the previous Oct/Nov/Dec are removed).

I’m seeing a 404 error when I request the url

If you’re looking for the current month, this may be because the archive hasn’t been created for that month yet. They are usually available by the 5th of each month.

If you’re looking for a month that’s more than 6 months old, it may be that the snapshot has been deleted. On a quarterly basis, we will remove the files from two quarters previous (e.g. on April 1, 2018 the files from Oct/Nov/Dec 2017 will be removed). If you aren’t looking for a particularly new or particularly old archive and you’re still seeing a 404 error, please contact us using the dedicated Plus support email: plus-support@crossref.org.

I’m seeing a 401 error when I request the url

Snapshots are only available to Metadata Plus users. This message means that the system doesn’t recognise you as a Metadata Plus user. If you’re already a Metadata Plus user, make sure you’re using your correct token in the header of your query. If you’re still having problems, contact us using the dedicated Plus user support email.

I need a full snapshot mid-month

Snapshot archives are provided at the start of each month. The archive contains all the registered content received by Crossref up until that time. (Really? Yea, all of it.) If you need a snapshot mid month, you should download and ingest the latest archive and then harvest and ingest the registered content that has changed since then.

To get the registered content that has changed since an archive was created, use OAI PMH Plus or the REST API. For example, if the archive was created on January 31, 2018 then the OAI PMH Plus harvest’s initial URL is

https://oai.crossref.org/oai?verb=ListRecords&set=J&from=2018-01-31&metadataPrefix=cr_unixsd

This will harvest journal data. If you are interested in book data then use the “B” set.

https://oai.crossref.org/oai?verb=ListRecords&set=B&from=2018-01-31&metadataPrefix=cr_unixsd

If you are interested in series data then use the “S” set.

https://oai.crossref.org/oai?verb=ListRecords&set=S&from=2018-01-31&metadataPrefix=cr_unixsd

It is important to use the “created” date and not the “completed” date. It takes time to build the archive and so changes will occur during the build. Using the created date ensures those changes are harvested too.


Please contact our Plus support team with any questions.

Last Updated: 2018 August 6 by Amanda Bartell