query spec

How to Query


(Revised: January 18, 2006)

Overview

The CrossRef query Resolver accepts bibliographic meta-data and returns the corresponding DOI. Queries are formated in XML or in a bar "|" (aka. pipe) delimited format containing 10 fields for queries against the journal holdings and 12 fields for queries against reference (books) and conference proceeding holdings. These queries are submitted interactively through the Web browser interface or programmatically via the system's HTTP programming interface.

The Resolver will also accept a DOI as input and will return the associated meta-data. The syntax for this 'reverse' DOI lookup is very simple (no pipes) and is described in the interface discussions below.

In addition CrossRef supports OpenURL by operating a version 1.0 compliant resolver and by accepting version 0.1 DOI queries.

XML Queries

XML is the preferred format to use when sending queries into the CrossRef system. This format is more precise, offers more functionality and is extensible. All new query features will only be offered through the XML format. Support of the legacy pipe'd format will continue but it will not be extended. The latest schema for XML queries can be found at http://www.crossref.org/qschema/crossref_query_input2.0.xsd

The following example demonstrates the XML query format. Here the fuzzy matching is enabled (the default) for journal title but has been turned off for author name.

<?xml version = "1.0" encoding="UTF-8"?>
<query_batch version="1.0" xmlns = "http://www.crossref.org/qschema/1.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<head>
<email_address>ckoscher@crossref.org</email_address>
<doi_batch_id>trackingid1</doi_batch_id>
</head>
<body>
<query key="MyKey1" enable-multiple-hits="false"
forward-match="false">
<journal_title match="fuzzy">Chem Commun</journal_title>
<author match="exact">Moulton</author>
<issue match="exact">8</issue>
<volume/>
<first_page>863</first_page>
<year>2001</year>
</query>
</body>
</query_batch>


Query Match Alerts

The 'forward-match' attribute is used to enable Query Match Alerts. This feature eliminates the need for users to repeatedly poll CrossRef for queries that do not initially return a DOI. When a query is marked to enable alerts the CrossRef system will automatically send an email containing the query results to the specified address

Process Tracking

Tracking IDs allow you inspect the status of a batch job uploaded to the system. By executing an HTTP request in the following form you can retrieve information on when a job was processed

http://doi.crossref.org/servlet/submissionDownload?usr=<USER>&pwd=<PWD>&doi_batch_id=trackingid1&type=result

Query results are not saved by the system, so the tracking ID function can not return the results of a query job.

Multiple Hits

Setting the enable-multiple-hits attribute to ''true" instructs the system to return several results if it is unable to reduce the candidate matches to a single item. Normally, if a query results in an ambiguous result set the system returns no data at all. This behavior is necessary since most processes querying CrossRef are automated and are incapable of deciding amongst multiple DOI records. However, in an editorial environment when a person is involved in the query multiple results may be valuable.

This is demonstrated on the Guest Query form available on the CrossRef Web site.

CrossRef Query Format (Journal - 10 fields)

There are 10 fields in a journal query with the only required values being a journal identifier (supplied as ISSN and/or journal title) and either author or first page to help identify the article. The remaining fields are optional, however, since the Resolver attempts to find a unique DOI for each query it is strongly recommended that the query specify as much data as possible. While a legal query would be to supply only journal title and first page the chances of finding a unique match would be very slim. In fact, as the CrossRef repository becomes more complete very detailed queries will become even more essential.

Special considerations for each field in the query are described in the following table.

Field
Description
Consideration
ISN ISSN Print and/or electronic ISSN delimited by a ","
TTL Journal Title The full journal title or an abbreviation (or a combination of the two)
NAM First Author First author's sur name
VID Volume ID Numerical or text (e.g. fall, Q1) volume specifier
IID Issue ID Numerical or text issue identifier
PID Page ID Number or text page identifier. Page is treated as a normalized number (leading 0s are removed). Leading character data (e.g. page or PP) should be omitted
YNO Year A number. Character data will cause an error. Two digit values are assumed to be 19XX
TYP Type full_text, abstract_only or bibliographic_record
KEY Unique key A unique key that will be echoed back with the query results. Usefull for matching a query with its corresponding results
DOI DOI The DOI (always left blank in the query)

An example metadata query:

01291831,01291831|International Journal of Modern Physics C |Huang|11||287|2000|full_text||

When a query result is returned the data will be presented in either the same pipe delimited format (default) or as XML.

The CrossRef Resolver is specifically built to perform a fuzzy match on the query input. This happens in several steps:

  1. Certain field combinations are used determine a list of potential DOI matches
  2. Journal title and author name are text matched using a customized weighting algorithm
  3. The potential match set is reduced to determine if a unique match is present

As a result not all the values in a query may be used in finding a match with the results reflecting what was found in the repository for each field. This will most often happen with journal titles and author names due to misspellings.

CrossRef Query Format (Books/Conference Proceedings - 12 fields)

Two more fields are used to query books and conference proceedings in addition to the 10 used for journals.

Field
Description
Consideration
ISN ISSN Print and/or electronic ISSN delimited by a ","
STL Serial Title The full serial title or an abbreviation (or a combination of the two)
VTL Volume Title The full book title
NAM First Author First author's sur name
VID Volume ID Numerical or text (e.g. fall, Q1) volume specifier
EID Edition Numerical or text edition identifier (books only)
PID Page Number or text page identifier. Page is treated as a normalized number (leading 0s are removed). Leading character data (e.g. page or PP) should be omitted
YNO Year A number. Character data will cause an error. Two digit values are assumed to be 19XX
CNO Component # Chapter, section or part inside the book/conf. proceeding
TYP Type full_text, abstract_only or bibliographic_record
KEY Unique key A unique key that will be echoed back with the query results. Usefull for matching a query with its corresponding results
DOI DOI The DOI (always left blank in the query)

    = new

Conference proceeding deposits can include information about the conference event, the proceedings publication and the individual conference papers within the proceedings. When serach for conference proceeding DOIs consider the following:

  • Volume Title (VTL) is used to search against a combination of the proceedings title and the conference acronym.
  • The field Serial Title (STL) is not used and should be left blank.
  • Conference proceedings do not have a volume or edition number, therfore the field EID should be left blank.

Book deposit information includes title, colume and edition metadata about the book as well as title information about the content item (the chapter or section). The book level metadata may also contain series title inforamation.

Example:

078037293X||10th IEEE International Conference on Fuzzy Systems (Cat No01CH37297) FUZZY-01|Ha|1||332|2001||||

The HTTP Query Interfaces

There are two methods for submitting queries, interactively via http://doi.crossef.org and programmatically using the CrossRef HTTP interface. Each of these methods operate in two modes. The first is a synchronous mode where you (or an automated system) submit a query and wait for the results. The second is an aynchronous mode where queries are submitted in a file (formated as XML or pipe'd) and the results are returned later in an email message.

Using the Interactive Synchronous Query Interface

To use the browser interface login at http://doi.crossref.org using your CrossRef username & password. To obtain your username and password contact Chuck Koscher (ckoscher@crossref.org) . Once logged on select the Queries and then Interactive Query Upload tabs.

Enter your queries one per line (no line breaks within a query) using the XML or pipe delimited format described above as shown in figure 1 below. Alternately try the Sample Query option as a demonstration. The query input form allows you to select the format of the return results as either piped delimited (default) or as XML. The Area option is used to select between the production holdings repository (Live) or your account's test area. Each member account has a test area that can be used for trial uploads and queries. Each test area is isolated from other test areas.

Figure 1 - Query Input Form

The results from an interactive query in pipe delimited format are shown in figure 2 below. When using the XML format the results will be sent directly to you browser which may attempt to display the XML.

Figure 2 - Query Results Page In Pipe'd Format

Note: CrossRef recommends that you limit the number of queries entered in a single operation to no more than 20. This will reduce the chance of a timeout occurring and interrupting your request.

Note: CrossRef also provides a simple forms based interface for submitting single queries using a guest account available at http://www.crossref.org/guestquery

Using the Interactive Asynchronous Query Upload

The asynchronous query upload feature has the advantage of not being prone to HTTP connection timeouts that may occur when using the synchronous (wait for your results) mode. In addition, it more efficiently utilizes CrossRef resources and is highly recommended for large query jobs. Using this mode requests containing 100 to 5000 queries can be easily be handled.

Once logged into http://doi.crossref.org follow the Submissions tab to the Upload tab.  The Type selection is used to identify the kind of operation being performed. Caution: this form is also used to upload metadata deposits to the CrossRef system (Type selection=Metadata). For metadata queries in pipe'd or XML format select a Type of "Query". For DOI queries (DOIs submitted and metadata returned) select a Type of "DOI Query". Using the Browse button will bring up a familiar file selection window as shown in figure 3 below.

Figure 3 - Uploading an Asynchronous Query

For pipe formated queries the file should contain the queries one per line and needs a header line to identify the return email address as shown below.

H:email=ckoscher@crossref.org
1069-6563|ACADEMIC EMERGENCY MEDICINE|Verbeek PR|9|7|671|2002||key-001|
01291831|International Journal of Modern Physics C |Huang|11||287|2000||key-002|

For XML formated queries the file should follow the query schema

Using the HTTP Interface

The most common method of submitting queries to CrossRef is to have an automated service interact with the Resolver's HTTP interface. This interface supports both the synchronous and asynchronous modes of operation. When used in the synchronous mode we strongly encourage the grouping of queries into requests containing 10 or more but less than 500 individual queries. These limits will help balance the load on CrossRef resources.

The HTTP interface supports both GET and POST methods for queries. Synchronous queries are performed using a URL with encoded parameters as follows:

http://doi.crossref.org/servlet/query?usr=<USR>&pwd=<PWD>&qdata=
|Proc.%20Natl%20Acad.%20Sci.%20USA|Zhou|94|24|13215|1997|||

To place more than one query in the request simply include it in the qdata paramater separated by '%0A':

example: qdata=|%20Natl%20Acad.%20Sci.%20USA|Zhou|94|24|13215|1997
|||%0A|J.%20Mol.%20Biol.|Hagerman|260|||1996|||

In this example <USR> and <PWD> would be replaced with your account username and password. It is also necessary to URL encode the data provided in the 'qdata' parameter. Certain characters can not be passed in a URL without causing problems. The table below lists the characters which must be encoded, for more information. For more information visit http://www.blooberry.com/indexdot/html/topics/urlencoding.htm

Character Name URL code
; semicolon %3B
/ slash, virgule, separatrix, or solidus %2F
? question mark %3F
: colon %3A
@ at sign %40
= equals sign %3D
& ampersand %26
lf line feed %0A

To utilize the asynchronous interface you will need to construct an HTTP post with the encType se to multipart/form-data .

The body of the multi-part message should be formated the same as described above in the section on uploading ayshncronous queries while the remaining parameters expected in the URL are shown in the following table.

Form field Description possible
values
mandatory Default
operation Depends on submission type
  • doMDUpload: For metadata submissions
  • doXSDMDUpload: same as doMDUpload
  • doQueryUpload: For Query submissions
  • doDOIQueryUpload: For DOI query submissions
  • Submit Batch File: same as doMDUpload (for backward compatibility)
NO doMDUpload
login_id CrossRef supplied login N/A YES N/A
login_passwd CrossRef supplied password N/A YES N/A
area Designated area for this submission
  • live
  • test
NO live
Content parts
fname Submission contents N/A YES N/A

For complete technical documentation please see the help pages at http://doi.crossref.org/doc/userdoc.html.

Sample Java code is available at http://doi.crossref.org/doc/samples.zip. For help with Perl and Visual Basic programs please contact Chuck Koscher (ckoscher@crossref.org)

The Open Channel Interface (OCI)

The Open Channel interface offers a significantly improved performance over the normal HTTP interface. This interface is intended for use when resolution of a query has an impact on an individual who may be waiting for the results. Users who are performing queries as part of their back end processing to populate local link databases should not use the OCI. Back end systems performing large volumes of queries where the results can be processed off line should consider using the query upload feature whereby a file of queries (XML or pipe'd) is uploaded to the system, it is processed in a queue and the results are emailed to the user.

The OCI operates much like a 'telnet' session in that the system performing the queries connects to a special port on the CrossRef system and then simply writes queries to the session and reads the results back. Here is a sample session where 5 pipe'd queries were submitted to the OCI ( the line numbers have been added here to help describe the activity)

1) [root@cr2 root]# telnet 172.20.1.17 8081
2) Trying 172.20.1.17...
3) Connected to cr1.crossref.org (172.20.1.17).
4) Escape character is '^]'.
5) H:USR=creftest;PWD=******
6) AUTHORIZED
7) |curr opin struct biol|Zwickl|10||242|2000||KEY1|
8) 0959440X|Current Opinion in Structural Biology|Zwickl|10|2|242|2000|full_text|KEY1|10.1016/S0959- 440X(00)00075-0
9) |nature|Groll|386||463|1997||KEY2|
10) |cell|Glickman|94||615|1998||KEY3|
11) |trends cell biol|Schwechheimer|11||420|2001||KEY4|
12) |mol cell|Kohler|7||1143|2001||KEY5|
13) 00928674|Cell|GLICKMAN|94|5|615|1998|full_text|KEY3|10.1016/S0092-8674(00)81603-7
14) 00280836,14764679|Nature|Groll|386|6624|463|1997|full_text|KEY2|10.1038/386463a0
15) 09628924|Trends in Cell Biology|Schwechheimer|11|10|420|2001|full_text|KEY4|10.1016/S0962-8924(01)02091-8
16) 10972765|Molecular Cell|KOHLER|7|6|1143|2001|full_text|KEY5|10.1016/S1097-2765(01)00274-X

Lines 7, 9, 10 and 11 are the queries being written by the user to the OCI. Lines 8, 13, 14, 15 and 16 are the query results being written back by the OCI. Note that line 8 was returned very quickly before any more queries could be input. The next 4 queries took a little time to process and were returned in the order when completed, note: the results from the third query (line #10, KEY3) is returned before the results for the second query (line #9, KEY2)

In Java the code would look like this:

1) socket = new Socket(host, port);
2) out = new PrintWriter(socket.getOutputStream(), true);
3) in = new BufferedReader(new InputStreamReader(socket.getInputStream()));
4) out.println("H: usr=<username>;pwd=<pwd>");
5) String line = in.readLine();
6) out.println(qData);
7) line = in.readLine();

The first three lines open a connection to the OCI. Line 4 sends a login username and password while line 5 reads the status of the login (which would be "AUTHORIZED" if the connection is made). Lines 1 through 5 only need to be executed once. Lines 6 and 7 are then repeated for any queries to be resolved. The string 'qData' would contain the pipe'd query and the results would be read back into the variable 'line'.

Operation of the OCI interface does place certain demands on the CrossRef system and is only available once a user account has been authorized. If you feel this interface would provide a dramatic improvement to your operation please contact me to discuss the possibility of using the OCI.

OpenURL Queries

Metadata Queries

CrossRef maintains a resolver that accepts metadata, does a search to find the DOI and optionally redirects the caller to the target of the DOI (via.dx.doi.org).

Example

http://doi.crossref.org/resolve?pid=<USR>:<PWD>&aulast=Maas%20LRM
&title= JOURNAL%20OF%20PHYSICAL%20OCEANOGRAPHY&volume=32&issue=3&spage=870&date=2002

<USR> should be replace with your username and <PWD> with your password.

Additional parameters accepted are:

issn - not recommended
stitle - short title which may be supplied as an alternative to title
sid - in older forms sid was used instead of pid (which is probably the more correct field to be used)

redirect - set to false to return the DOI instead of redirecting to the target URL (default is true)

Current limitations:

No parsing is being performed on the date field to extract year from a more complex represenation. I was unable to locate a defined format for an extended value to be accepted in this field.

DOI Queries

Cross currently supports DOI queries formatted as OpenURL version 0.1 requests. These queries are used to retrieve the basic metadata for a known DOI. This metadata includes journal identifiers (title and/or ISSN), first author , journal enumeration (volume issue page) and year.

http://doi.crossref.org/servlet/query?id=10.1006/jmbi.2000.4282&pid=<USR>:<PWD>

Where <USR> and <PWD> would be replaced with your account username and password.

Results will always be returned in an XML format as shown below.

<?xml version="1.0" encoding="UTF-8" ?> 
<doi_batch version="0.3">
  <head>
    <doi_batch_id>Crossref::Resolver_26-Feb-2004@09:25:12</doi_batch_id> 
    <timestamp>26-Feb-2004@09:25:12</timestamp> 
   <depositor>
     <name /> 
       <email_address /> 
    </depositor>
  <registrant /> 
  </head>
  <body>
    <doi_record type="full_text" key="">
      <doi_data>
        <doi>10.1006/jmbi.2000.4282</doi> 
        <url /> 
      </doi_data>
      <journal_article_metadata>
      <article>
        <author sequence="first">
          <given_name /> 
          <surname>Jiang</surname> 
        </author>
        <date type="print">
          <year>2001</year> 
        </date>
        <enumeration>
          <volume>305</volume> 
          <issue>3</issue> 
          <first_page>377</first_page> 
        </enumeration>
      </article>
      <journal>
        <full_title>Journal of Molecular Biology</full_title> 
        <issn type="print">00222836</issn> 
        <issn type="electronic">10898638</issn> 
      </journal>
    </journal_article_metadata>
  </doi_record>
  </body>
</doi_batch>

Considerations When Querying

Journal Titles & ISSN

One of the most powerful features of the CrossRef system is its fuzzy matching processes. Consequently we strongly encourage the use of journal title as an identifier instead of ISSN. Queries that supply only ISSN tend not to resolve as well as those that supply just journal title or both title and ISSN. ISSN is essentially treated as a number resulting in the need to perform an exact match on the text string representation. Very little normalization can be done. Title however is a very complex value that can be normalized in several ways giving the matching function more options as it seeks to locate the proper DOI.

Real Time Queries

The CrossRef system was not intended to support real time queries where a metadata search to obtain a DOI is performed the instant a person clicks on a link. However, the synchronous HTTP interface will lend itself to being used in this manner. The primary concern regards the acceptable response time that the system can provide. Since most users are submitting batch queries via an automated process they are not expecting all their transactions to complete in one to two seconds. Typically if a request has only one query the transaction will complete in under 5 seconds (actually around 1 second). Requests with 20 to 50 queries often complete in around 30 seconds, while large queries (100-500) can take several minutes.

While our system is performing very well, and we are in the process of adding new hardware resources, we can not assure the service level typically considered acceptable for real time queries. If you plan to use our system in this manner despite this caution, please let us know.

Batch Query Upload

Writing a program to perform the upload is a fairly simple process (in Java or Perl anyway). A fully functional Java program can be download from http://www.crossref.org/08downloads/doQPost.java. This program accepts an XML file or text that must conform to the batch query formats. It also accepts a file (anything without a .XML extension) that is a list of XML files to deposit. It is run by issuing:

java doQPost <USR> <PWD> filename

In order to use this you will need a copy of a recent Java runtime and you'll need the HTTP Client library.

For more complete technical documentation please visit http://doi.crossref.org/doc/userdoc.html.


copyright 2002, pila, inc. all rights reserved