CrossRef encourages organizations which operate institutional repositories to assign DOIs to their original non-duplicative
works.
There are a few simple guidelines when the repository is a DSpace based system (www.dspace.org). DSpace and the DOI
both utilize the Handle system for identifiers. When a DSpace repository is configured it must be registered with CNRI
(http://www.cnri.reston.va.us/) who provides the repository operator with a Handle prefix (typically a sequence of numbers).
This is not a DOI prefix (also a sequence of numbers which begin with “10.”).
In order to assign DOIs to DSpace items the repository must join CrossRef and be given a DOI prefix. Subsequently each item
for which a DOI is desired is deposited with CrossRef using the CrossRef deposit schema. The DOI is best constructed as
shown here.
When DOIs are deposited, the CrossRef system must assign the DOI to an internal title record. This is necessary to support
query processing which is heavily dependent on matching query fields to a publication. Upon deposit, the XML publication title is used to determine if CrossRef already has a record for the publication. If needed a new title record is created, and then the
DOIs in the deposit are assigned to that title. Title processing is a very simple 'string' compare thus an XML title using
ampersand (&) would be considered unique from one using the word "and". ISSNs are also dealt with in a simple manner.
Once the system has determined the internal title record to use, the ISSNs are simply associated with that record. If the same ISSN is deposited with different versions of a title, all versions will end up having that ISSN.
Generally this is not a problem for the query match processing since the evaluation rules for a query are much more
sophisticated and can overcome any confusion caused by such a condition. Further, the effected titles (e.g. the '&' Vs 'and') are
typically related or are really the same publication.
The need to do this results from a scale problem. By tracking titles as records the CrossRef system has 14,000 or so items to compare when starting to match a query. If the publication title of each DOI were tracked and matched against on an individual
basis the query matcher would have to start with over 22 million items.
Since mid 2002 when the system was deployed this situation has had the following consequence. The ISSNs returned in a query
result have been the ones that the title record holds and not (necessarily) the ones deposited with the DOI. Compounding the
problem is the way the system saves ISSNs against a given title record. For a new title record the system saves the first print
and electronic ISSNs it is given as 'THE' print and electronic ISSNs for that title. Subsequent ISSNs are saved as additional
values and are primarily used to help the matching process. These first ISSNs are what typically gets returned in the metadata
for all queries.
In early 2006 we introduced a new query result format that solves this problem (format=unixref). Any query (XML or piped)
that requests this format will be returned the exact data sent in by the publisher for that DOI. The first query shown below
returns the ISSN 0163-1829 while the second returns 1098-0121. This results from the fact that the first ISSN is for an earlier
version of this journal which has a different sub-title than the second ISSN. However, due to the nature of the deposits the
CrossRef system has only created one internal title record for this publication.
CrossRef enables DOI assignment for the following content items:
Journals/working papers: journal title, volume, issue and article
Books: book series, title, chapter/section/entry
Conference proceedings: multi-volume title, title, paper
Components: sub-items of journal articles, book chapters/entries and conference papers including figures, tables,
graphs and supplemental data.
NOTE: DOIs are required for journal articles, conference papers and book titles but are optional for journal titles, volumes
and issues and book chapters/entries and components.
CrossRef is currently working on expanding the types of content it registers DOIs for to include theses and dissertations and
database records. Full technical details are available at "How to Deposit".
Depositing "As-Crawled" URLs to support metadata distribution to search engines.
CrossRef initiatives related to full text indexing of member content by various search engine has taken two forms.
1) CrossRef Search where a sub-group of CrossRef members had their content indexed and then offered through a site
selective search form provided by Google. This form limited the search to only those participating publishers and (more
importantly) excluded general Web content.
2) Site-maps: these tools provide guidance and aid to search engines to help verify the metadata extracted by the engine
during its crawl and to bind the DOI with the text that has been indexed so that the search results may display and use the
DOI for links.
These two items are separate and distinct. Publishers may have CrossRef perform #2 on their behalf while not participating in
#1 (and vice-versa).
A major problem arose with #2 above when it became known that search engines were crawling publisher’s sites using
different URLs than that associated with the DOI. This made it impossible for the search engine to line-up CrossRef metadata
with the as-crawled full text of the publisher.
To rectify this, a recommendation was made at the Google Scholar Publisher Open House (Sept 14th) to allow publishers to
deposit additional URL data with CrossRef so that it may be passed to the search engines along with the article metadata.
This would allow publishers to provide the data being asked for by Google (using a modified NLM DTD) by using their existing
CrossRef processes obviating the need for a new interface with Google (and subsequent search engines).
This capability is now available in the CrossRef system. It utilizes the <collection> element of the existing schema. Here is an
example XML fragment from such a deposit:
The <doi_data> element is the same one being used now to deposit the URL bound to the DOI. This happens in the <doi> and
the first <resource> element directly located under <doi_data>. The 'as-crawled' URLs are then provided in a <collection>
structure. Immediately below the <collection> tag is a <property> tag whose type must be set to 'xref:url'. This tells CrossRef
the following items are additional URLs for the DOI. Each URL has its own <item> tag which also has a <property> tag whose
type describes the URL. This type value must begin with "xref:url:crawled" followed by one or a combination of these search engine names: google, msn, yahoo, scirus, altavista. Other names will be added as needed. If no name is included the URL is
considered a 'generic' as crawled URL for use by any search engine robot.
In the example above type 'xref:url:crawled:blort' would cause an error.
If the same URL is used by more than one search engine, multiple names may by included on one property type value:
Ex: 'xref:url:crawled:google:scirus’
Depositing crawled URLs by themselves.
You may also deposit as-crawled URLs in a stand-alone fashion using the CrossRef citation schema. All other DOI metadata
may be omitted while the same <collection> structure is used for the URL data.
Once a DOI is assigned to a piece of content it should remain unchanged and only one DOI should exist for a given piece of
content. When ownership of content changes from one publisher to another (at the moment this is most commonly a journal
changing ownership), it is necessary to transfer the ownership of the DOIs. Typically, a CrossRef account can only be used to
create/modify DOIs that have the same prefix as is associated with the user's account. The ownership transfer modifies this
relationship so that a specific group of DOIs are marked as belonging to a prefix other than the prefix in the DOI. It is
important to note that the DOI itself is not modified at all in this process.
After a transfer occurs it is no longer possible to 100% know who publishes a given journal simply by looking at the DOI's
prefix. But, this is Ok since the DOI is intended to be an opaque identifier.
In addition, the ownership transfer process allows us to assign a new URL to the DOI redirecting it to the acquiring publisher's
Web site. This shortcut is provided since the acquiring publisher may not always have the original XML for these transferred
DOIs and therefore would be unable to easily perform an update.
Omitting one or more fields in a deposit increases the chance of a conflict.
Conflicts are created for a variety of reasons both intentionally and by mistake. It is the mistakes that the system are trying to
catch. Intentionally creating conflicts happens when there simply is not sufficient metadata to distinguish two DOIs. This often
happens with on-going sections, for example 'Book Reviews, 'Letters' or "Errata'. In these cases there often is no author and
several items may appear on the same page of a single issue. In this situation the depositor can insert a <item_number
type='sequence-number'> tag into the deposit to make sure the article has a unique metadata field. The DOI suffix is a handy
value to use with this tag.
Conflicts result in poor query results since the CrossRef system is unable to uniquely select a DOI for a given metadata query.
Remember, only when a single DOI is found will CrossRef return a DOI otherwise no results are returned. Resolving a conflict
is accomplished either automatically when one of the DOIs is re-deposited with new (and different) metadata or by a CrossRef
administrator. The administrator can resolve the conflict by making one of the DOIs prime and the other an alias. Once a DOI
is classified as an alias it no longer can be updated and will always point to the same URL as the prime DOI. All updates to the
prime DOI will be reflected in the alias. Note: the administrator can undo the effects of selecting a DOI as prime/alias
effectively re-instating the conflict.
Periodically CrossRef runs a report that identifies all the unresolved conflicts for a given prefix and emails them to the effected
publisher. Each conflict is represented int he report by a record like this:
===========================================
ConfID: 1686
CauseID: 15569075
OtherID: 1158478,
JT: Macromolecular Chemistry and Physics
MD: Koopmann, 199 ,10,2119,1998,Synthesis and properties of poly(dimethyldiphenylsilylenemethylene)
DOI: 10.1002/(SICI)1521-3935(19981001)199:10<&
DOI: 10.1002/macp.1998.021991007 (1686-null )="=========================================="
Each conflict has a unique ID (confID) and typically involves two DOIs. The deposit submission of the DOI causing the conflict is identified in the CauseID field, while the deposit submission of the effected DOI is listed as the OtherID. The journal title is listed on the line labeled JT and the metadata for the DOIs is shown on the line labeled MD. Since this is a conflict the metadata for both DOIs is the same. The parenthetical value following the DOI (e.g. 1686-null) lists all the conflicts this DOI is involved in and the resolution status of that conflict. Null means the conflict is unresolved.
Summary, to correct a conflict:
a) redeposit one of the DOIs with new (different) metadata and the conflict will automatically be erased
b) contact CrossRef and instruct us to delete conflict #ConfID
c) contact CrossRef and instruct us to make one of the DOIs in conflict #ConfID 'prime'
Note: while at the moment a CrossRef administrator must get involved to complete options b or c, our plans are to change the
system to let users take this action for themselves.
Deleting a DOI
DOIs are supposed to be persistent and are therefore very difficult to delete. In fact the CrossRef system has no "delete a DOI" function
(note: DOIs can be deleted from the Handle system under very specific conditions). Instead of deleting a DOI we recommend updating it
to point to an explanatory page describing why the DOI is no longer available. As a default CrossRef maintains a page for these
purposes (http://www.crossref.org/deleted_DOI.html). After setting the URL for a DOI to this or a similar page the journal title for the DOI
may be updated to "CrossRef Listing Of Deleted DOIs" and the ownership transferred to CrossRef. The DOI then will not be listed on any
of the normal CrossRef reports.
Note: When making an XML deposit using the above title for the journal please set any ISSN values to 00000000. Also, until the ownership
transfer takes place this title will appear in the Depositor Report under your publisher name.
An alternative method of deleting DOIs requires the involvement of a CrossRef administrator. This process entails creating two transfer
files which the administrator must process.
File 1: This file transfers the listed DOIs to the 'journal' CrossRef Listing Of Deleted DOIs.
the ASCII or ISO 8859-1 (Latin-1) character sets. ASCII and Latin-1 characters can be represented in one octet (byte) of information and are displayed properly in almost all editors and viewers on most computer systems.
Special characters go beyond this set and are best represented as Unicode characters. Information about Unicode is available
at the Unicode Project home page, which has links to some nice tools and resources and a very nice interactive code chart Another interesting chart tool is available here.
Unicode is a 16 bit (two octet) format and requires viewers and editors that understand the format to allow users to work with these characters. If you view a Unicode character set file in a non-Unicode viewer you'll likely see a lot of gibberish
(little square boxes and question marks). To further complicate things UTF-8 is an encoding scheme that allow Unicode characters to be included in single octet format files, like XML files. Again, you need a UTF-8 capable viewer to work with these types of files. The XML used to make deposits to CrossRef are encoded using UTF-8 (<?xml version="1.0" encoding="UTF-8"?>).
Fortunately the ASCII and Latin-1 sets map directly into UTF-8 so that any ASCII file can be viewed and edited in a UTF-8 compliant application (like a browser). The tricky part comes when we want to include a special Unicode character in an XML CrossRef deposit. There are two ways to achieve this:1) Use a UTF-8 editor or tool when creating the XML and just insert characters right in the file (this results in a one or more byte sequence per character appearing in the file)2) Encode the special character using what is called a numerical representation Option #1 would be very easy to do if you've got the right tools. The character shown below has a decimal value of 352 which is 160hex. This converts to the UTF-8 sequence C5,A0 (in hex). A small XML file was created with this 2 byte sequence inserted in between the <UTF_encoded> tags.
You'll see that this displays properly in a browser, but if you save the XML source and try to view it in certain editors it will not display correctly (it does display Ok in NotePad). Option #2 is the more common approach and is done by constructing an entity reference in the XML that is the numerical value of the character. Example:<surname>Šumbera</surname> includes the
special character This character was also included as an entity reference in the small XML file mentioned above. Notice that
(most) browsers automatically convert the entity reference to the character's actual representation.
Resolving problems with a DOI
Issues will either result from depositing DOIs, querying for DOIs or following links built with
DOIs. Each represent a step in the life-cycle of a DOI and the systems involved in each step are unique. Generally, solutions
are not very complicated, there is just a bit of detective work required to figure things out. Dealing with problems is best
handled by contacting support@crossref.org. Please provide as much information about the problem as you can. Of particular
importance are the submission IDs for deposits and the location of bad DOI links.
Using parameter passing
Parameter passing is a feature that allows data placed on an outbound DOI link to be passed along to the target publisher. In fact, there are two different procedures. The first utilizes an undocumented feature of the DOI resolver
while the second is a robust implementation based on OpenURL structured to prevent name collisions that would result in the
loss of data.
Procedure #1
For very limited situations where parameter passing is needed the name-value pair called "urlappend" may be used.
Outbound links are constructed with parameter values being placed into this single URL variable. However, before constructing the link we must know if the registered URL for the DOI already has parameters. Example:DOI 10.5555/unique_doi_060103-02|
has no parameters on its registered URL, passed parameter are added as follows:
DOI 10.5555/unique_doi_060103-01 does have parameters on its registered URL , passed parameter are added as follows:
http://dx.doi.org/10.5555/unique_doi_060103-01?urlappend=%26passed_param=some_value
Since the 'urlappend' function is very primitive we must place the correct character (either a '?' or a '&') before the name-value
pair information we wish to pass on to the target. In addition, with this approach the target's publisher may not be expecting
incoming parameters and the values placed inside urlappend may conflict with names of parameters already on the registered
URL. Procedure #2A more robust solution based on OpenURL is a bit more complicated but it has several advantages:
1. Operates without having to know that the DOI's registered URL already has parameters
2. Prevents parameter name collisions
3. Allows the recipient of the link (the host of the target) to control whether they get passed parameters or not
4. Establishes a common vocabulary for passed parameters reducing the need for bi-lateral coordination OpenURL 1.0 defines a set of parameters to be used when conveying information in a hyperlink.
Parameter passing depends on two of these (rft_dat and rfr_dat) to act as 'wrappers' for the passed parameters and the
existing parameters on the DOI's registered URL. Parameter rft_dat (referent data)
will contain the entire registered URL's parameter set as a 'nested' string (sort of parameters within a
parameter). Parameter rfr_dat (referrer data) holds another 'nested' parameter string containing the
information the source wishes to pass on to the target. Data inside the rfr_dat should conform to the
controlled vocabulary described in the Parameter Passing white paper.Since the DOI's registered URL parameters will be safely nested inside rft_dat, parameter passing will also forward to the target any non OpenURL parameters found on the source link. The intention here is to improve the flexibility of
parameter passing; however, use of rfr_dat is strongly encouraged due to the controlled vocabulary.
Example: The in-bound link
http://dx.doi.org/openurl?url_ver=Z39.88-2003&rft_id=doi:10.5555/unique_doi_060103-01&rfr_dat=rfr_dat=
cr_setver=01&cr_pub=Source Publisher&cr_work=Source Journal Title&cr_src=SRC-NAME is combined with the registered URL http://www.crossref.org/paramEcho?param=done¶m2=two to create the following outbound link
http://www.crossref.org/paramEcho?url_ver=Z39.88-2003&rft_id=doi:10.5555/unique_doi_060103-01&rft_dat=param1=
done¶m2=two&rfr_dat=rfr_dat=cr_setver=01&cr_pub=Source Publisher&cr_work=Source JournalTitle&cr_src=SRC-NAME
(Note: For readability the examples are NOT shown in encoded form. If they were the rft_dat
portion would look like this rft_dat=param1%3Ddone%26param2%3Dtwo)
Working Example
The following link uses a DOI without parameter passing and shows what the target
system would receive for incoming information (click the link). The values of 'param1' and param2' are part of the URL
registered in Handle for this DOI
This next link uses the same DOI with parameter passing. Here the source of the link (this page) wants to send the name/value
pair of 'stuff=good' to the target system. Following this link shows that the param1 and param2 variables are still passed to the
target, but the are wrapped inside the rft_dat parameter. The data 'stuff=good' is also passed to the target inside the rfr_dat
parameter. This demonstrates good practice in that name conflicts will be avoided.
This next link demonstrates how any top level parameter will also be passed to the target, but not
wrapped in any OpenURL containers. Here the name value pair 'santa=claus' is appended to the link in a way that is not
consistent with the OpenURL specification (however, the specification does not explicitly prohibit this). When you follow this link you'll see that the target receives this data
This next link shows how the OpenURL based parameter passing handles potential conflicts in parameter names. Following this
link you'll see that the variable 'param1' is passed to the target twice.
However, the one nested inside rft_dat is safely know to be the one registered with the DOI's URL
Using the primitive urlappend function above the target can not tell which value of 'param1' came from the DOI or from the source link (note: in building this link I must know that the DOI has parameters on its registered URL and thus insert the '%26')
Forward matching, or as its now known Query Match Alerts, is a feature whereby if a query fails
when its first submitted, CrossRef will remember it and when a deposit is made that satisfies the query an email will be sent to the query's submitter. This eliminates the need to regularly poll CrossRef with failed queries to see if they match any recent
deposit. In order to use this feature you must submit your queries as XML
The important points are to set the "forward-match" attribute to true, make sure you fill in you email address and to assign a unique ID to the query using the "key" attribute. When you submit a query that does not immediately match you will reeive a notice telling you the query has been stored for
later evaluation.
After a deposit is made with this DOI you will receive an email with you query results.
Please note the connection between the batch ID given for the XML query submission, the query key
given for the individual query in the submission and the email results.
The CrossRef system now supports a query mode where only article title and first author surname are required. This service
is tuned to return one and only one DOI record. As with other metadata queries if the system finds more than one possible
DOI the results are considered ambiguous and thus no results are returned.
Currently these queries may only be performed using a pipe delimited format as shown in this example.
The query type must be set to ‘a’ (for article/author), otherwise the request will be considered a malformed metadata query.
These queries may also be submitted in bulk (asynchronous) using our batch upload feature. The uploaded file should be
formatted with a header containing the email address to which the results will be sent followed by one query per line as shown
here.