CrossRef operates an OAI-PMH service for the distribution of metadata to subscribers (see http://www.crossref.org/metadata_services.html). This system is based on the OCLC version 2 repository framework and implements the interface as documented at http://www.openarchives.org/OAI/openarchivesprotocol.html.
Access to CrossRef’s service for retrieving DOI metadata is regulated by the IP address of a server which will be performing a harvest. Our repository supports selective harvesting according to sets defined by the hierarchy of publisher, journal title, and year of publication. (Note: due to the size of the repository it is highly discouraged to perform a ListRecords action for the entire collection. Use of the ListRecords verb must include a SET specification).
Our service allows public access to two OAI verbs which allow for discovery of coverage information
Setspecs are formatted as follows:
Note: content type: J for journals, B for book or conference proceeding titles
With the ListSets request the 'set' parameter is optional. Leaving off the set parameter will return a listing of all publishers, all their journal titles and each year of publication for which we have DOIs.
With the ListIdentifers request the set,from, and until parameters are optional. The from and until parameters are used to specify dates when the DOIs were registered or updated with CrossRef and not the publication date.
view results)
Many OAI requests are too big to be retrieved in a single transaction. If a given response contains a resumption token (
see example) the user must make an additional request to retrieve the rest of the data. Note that resumption tokens remain viable for only 5 minutes.
CrossRef's Metadata Service is available based on subscription. After an organization applies for subscription members of CrossRef may decide if they wish for their metadata to be included in distributions to the subscribing organization (knows as an opportunity to ‘opt-out’). This may result in the subscribing organization being able to retrieve less than the full set of metadata available at CrossRef. Upon acceptance of an application the subscriber will supply CrossRef with two IP addresses from which they will be allowed to retrieve metadata. Any request received from these IP addresses will be filtered so as to exclude non-participating CrossRef member metadata. The two publicly available requests mentioned above are not subject to this limitation when the requests come from an IP address not assigned to a known subscriber. These two requests when received from a subscriber's registered IP addresses will only return data pertinent to that subscriber (e.g. a ListSets request will omit any publisher who has chosen not to distribute to that subscriber).
CrossRef may also establish a subscriber using an ‘opt-in’ model where each publisher must specifically request that their data be enabled for transmission to such a recipient.
Note: opt-out and opt-in is established at the title level. Titles may have more than one depositor, but OAI delivery is determined by the prefix identified as the owner in the CrossRef system.
As mentioned above, the size of the CrossRef repository precludes using one OAI request to retrieve all the available data. Often a subscriber first obtains a list of the available sets using the ListSets request and then repeatedly submit ListRecords requests for each publisher. This approach can require a substantial amount of time. Upon request CrossRef staff can produce an archive of all the data available to a given subscriber which can then be downloaded via FTP. Please submit requests to support@crossref.org.
A sample application for harvesting CrossRef OAI data is available at http://www.crossref.org/08downloads/oaipmhRequest.zip