Although you will no doubt be eager to start using CrossCheck to begin checking manuscripts, it is vital that you first make sure that yourcontent is being successfully added to the CrossCheck database. This is important for several reasons:
- The value of the CrossCheck database is directly proportional to the percentage of relevant published literature that is included in the database. The sooner we get relevant scholarly literature indexed in the system, the less likely it is that CrossCheck members will get "false negatives" when they check manuscripts for plagiarism.
- Getting your content indexed immediately means that other users of the iParadigms system can detect if somebody is plagiarizing *your* content.
- Getting your content indexed means that you can start using the "CrossCheck Depositor" and "CrossCheck Deposited" logos on your site and content. This, in turn, will serve as a deterrent to people who might be tempted to plagiarise your content.
- You cannot start using CrossCheck to detect plagiarism until your content is being successfully indexed.
The process of getting your content added to the CrossCheck database is simple. If your content is already indexed by search engines (e.g. Google, Google Scholar), then you can use virtually the same techniques to have your content added to CrossCheck.
The CrossCheck system is powered by the iParadigms plagiarism detection engine. In order to get your content added to the CrossCheck database, you must allow iParadigms to "crawl" your web site. This is the same way in which Google or other search engines might crawl your site, except that it should be far less resource intensive and intrusive than a normal search engine crawl. How can the iParadigms crawl be less of a burden on your site than other search engines? Because, instead of having iParadigms crawl your entire site (including navigation pages, marketing ephemera, etc.), CrossRef will tell iParadigms precisely what to Crawl by providing them with a list of your DOIs which point only to your DOI-identified content. Indeed, once your site has been initially indexed, then iParadigms will only crawl your new content when we tell them that you have added or updated DOIs in the CrossRef system. In short, after the first index of your site, the iParadigms may never have to crawl your entire site again and they will be immediately informed when you have loaded new content.
Enabling Indexing With Your Hosting Provider
If your site is hosted by third party, then enabling your site to be indexed by iParadigms can most likely be handled by your hosting provider. To do this, contact your hosting provider account manager and ask them to enable CrossCheck indexing for your site. Once your hosting provider has confirmed that they have enabled CrossCheck indexing, then forward this confirmation to firstname.lastname@example.org and we will activate indexing. If you host your own content, then continue reading.
Technical Details For Enabling CrossCheck Indexing
The iParadigms crawler can be accommodated using exactly the same techniques that you use for allowing search engines to index the full text of your site.
First, you will need to authorize the iParadigms crawler to access full text on your site. You can do this by allowing access to your site from the iParadigms crawler IP ranges:
IMPORTANT: Please note that the IP range has changed.
Second, you need to be able to direct the iParadigms crawler to the full text of your content. The iParadigms system will do a "directed crawl" of your site using the DOIs that you have already deposited in CrossRef (see above). The problem is that these DOIs typically point to the landing page for the content in question (e.g. an abstract page). In order to direct the iParadigms crawler to the full text of the article pointed to by the DOI, you can use one of several standard methods:
- Landing page Microformats
- Support "as-crawled" (aka "full text") URLs in your CrossRef metadata deposits
Again, these are standard mechanisms for supporting search engine crawlers and it doesn't really make much of a difference which of the above mechanisms you support, as long as you support one of them. It is also important to note that the speed of the indexing will be directly related to the speed with which you allow the iParadigms crawler to perform the directed crawl. If you throttle the crawler then indexing will be slowed down commensurately.
Finally, each mechanism for enabling CrossCheck indexing is discussed in more detail below.Redirection
This method uses a server-side script to detect that the IP address and/or User Agent HTTP header in order to determine when the iParadigms crawler is attempting to access a landing page. When the script detects the iParadigms crawler, the server returns the full text of the article instead of the landing page.Landing Page Microformats
The method embeds either a META tag or a SPAN element in the landing page which, in turn, explicitly identifies the link(s) to the fulltext of the article.
For instance, a META tag solution might include the following in the HEAD section of the HTML for the landing page:
<meta name="fulltext_html" content="http://www.foo.com/bar/article.html"> <meta name="fulltext_pdf" content="http://www.foo.com/bar/article.pdf">
Alternatively, if the landing page already includes links to the full text of an article, you could explicitly indicate which of the links on the landing page goes to the full text by surrounding it with a SPAN element. For example:
<span class="fulltext_html"><a href=" http://www.foo.com/bar/article.html ">Download HTML</a></span> <span class="fulltext_pdf"><a href=" http://www.foo.com/bar/article.pdf ">Download PDF</a></span>
Note that one disadvantage of using the Microformat mechanism is that it requires the iParadimgs crawler to do two page-reads in order to get to the full text content. This in turn, will slow down indexing and put a greater load on your site. Indexing speed with microformats can be a particular problem if you also throttle the iParadigms crawler. We strongly recommend that, if you choose the microformat technique, you remove any throttles from the iParadigms crawler.
If you decide to use this method for directing the iParadigms crawler to full text, you will need to describe the microformat that you are using to CrossRef by sending email to email@example.com.As-Crawled URLs
The CrossRef metadata deposit format allows you to record an "as-crawled" URL in addition to the traditional landing page URL that is normally registered with a DOI. If you provide as-crawled URLs in your metadata, iParadigms will make use of them instead of the landing page URLs when crawling your site.
More information about depositing as-crawled URLs can be found in the CrossRef help system at: