local hosting info

(updated 2/22/05)

Local Hosting by Members and Affiliates

Members and Affiliates have the option of receiving a direct feed of all the data submitted to CrossRef. This enables them to host the CrossRef data locally. Local Hosters pull the data from CrossRef in XML and can take either all or a subset of the content delivered to CrossRef. Local Hosting is for those organizations that wish to retrieve DOIs within their local system, without going out over the Internet to retrieve DOIs from CrossRef. Since the data is delivered in XML format, the Local Hoster must have the infrastructure to load the information into their own local system and databases.

Some of the conditions for Local Hosting are:

  • Local Hosting is available to PILA members and affiliates
  • Local Hosters are subject to the general membership terms and any specific terms for Local Hosting as determined by the Board of Directors
  • Local Hosters will not re-sell the data received for Local Hosting or provide a lookup service for other members' DOIs. There is no restriction on Members providing a lookup service for their own DOIs
  • The Annual Administrative Fee for Local Hosting is in addition to any other membership or deposit fees that may apply
  • Agents appointed by Members are not eligible for Local Hosting
  • There are no individual DOI look-up charges applied

Local Hosting Fees for 2004

Members:
Partial (< 30% of DOIs) $5,000
Full $20,000
Affiliates:
Partial (<30% of DOIs)
$15,000
Full
$50,000


Technical Information About Local Hosting

The distribution of CrossRef metadata is accomplished via FTP of XML files from CrossRef's servers. FTP access to the file area
is controlled by IP addresses and account authentication. There are two data model styles to choose from: 1) daily XML as
deposited by member publishers or, 2) daily/weekly/monthly XML files listing the metadata organized by journal title.

1) This is a legacy data distribution method which should not be adopted by any new local hosters. This method makes available
on a daily basis the XML sent in by depositing publishers, minus any reference lists for a given article note1. These XML files are
structured according to our deposit Schema or to our older deposit DTD. In each file the publisher may be adding new DOIs or
updating the metadata for DOIs deposited on an earlier date. The data in a single file may pertain to more than one journal.
Each file (and possibly each DOI record in the file) will be marked with a timestamp. CrossRef only accepts the metadata if the
timestamp has been incremented from the previously recorded value.

In order for a local hoster to accurately reconstruct the CrossRef metadata holdings all deposited XML files must be processed
sequentially. The file naming convention used for these files identifies the date the file was received
(e.g. 20030115_id_xxxx.xml).  After establishing the metadata holdings the local hoster must process each day's files in order
to remain current.

2) The second model of XML provides the essential metadata and comes in two flavors, ‘lite-weight’ and ‘medium-weight’.  The
medium-weight XML is structured according to an output data schema. This data is available via  FTP from a directory tree on
our server. In the root level of the tree there are two folders, mdByTitle-lite and mdByTitle-medium. Each of these contains 12
additional folders as shown here:

          day0 day1 day2 day3 day4 day5 day6
          full
          week0 week1 week2 week3

Also in the lite-weight folder are six text files:

          lastrundate_daily_books.txt
          lastrundate_daily_confproc.txt
          lastrundate_daily_journals.txt
          lastrundate_weekly_books.txt
          lastrundate_weekly_confproc.txt
          lastrundate_weekly_journals.txt


Data is generated on a daily, weekly and monthly basis. Each day the prior day’s DOI deposit activity (new or update) are
written to one of the ‘dayX’  folders.  The process rotates through the folders each week. Each week (on Saturday) the prior
week’s deposit activity is written to one of the ‘weekX’ folders. Again the process rotates through the folders. Once each four
weeks (on the same week that ‘week0’ is populated) the entire metadata set (all DOIs for all journals) are written to the ‘full’
folder.

The TXT files describe what run was most recently made and which folder contains the data. For example,
lastrun_daily_journals.txt may contain this single line which indicates that the data for Sept 11, 2007 is in the day0 folder.

          11-SEP-2007:12-SEP-2007:0

If for some reason the daily process failed the date range may span more than one day. This example shows that the deposit
data for Sept 11 through Sept 13 are in ‘day0’.

          11-SEP-2007:14-SEP-2007:0

The weekly files contain two lines of text. The first identifies the date the run was made and the second indicates which folder
contains the data. Remember, the ‘full’ folder will be populated on the same week that the ‘week0’ folder is used.

          8-SEP-2007
          2


Note 1: Currently the references that are deposited for an article, which is part of CrossRef’s forward linking service, are not
made available to local hosters. References are only distributed to the publisher owning the parent article and to CrossRef Web
Services subscribers.

copyright 2002, pila, inc. all rights reserved