similarity check

Enabling Similarity Check Indexing


Although you will no doubt be eager to start using Similarity Check to begin checking manuscripts, it is vital that you first make sure that your content is being successfully added to the Similarity Check database. This is important for several reasons:

  1. The value of the Similarity Check database is directly proportional to the percentage of relevant published literature that is included in the database. The sooner we get relevant scholarly literature indexed in the system, the less likely it is that Similarity Check members will get "false negatives" when they check manuscripts for plagiarism.
  2. Getting your content indexed immediately means that other users of the Turnitin system can detect if somebody is plagiarizing *your* content.
  3. Getting your content indexed means that you can start using the "Similarity Check Depositor" and "Similarity Check Deposited" logos on your site and content. This, in turn, will serve as a deterrent to people who might be tempted to plagiarise your content.
  4. You cannot start using Similarity Check to detect plagiarism until your content is being successfully indexed.

The Similarity Check system is powered by the Turnitin plagiarism detection engine. In order to get your content added to the Similarity Check database, you must allow Turnitin to "crawl" your web site. This is the same way in which Google or other search engines might crawl your site, except that it should be far less resource intensive and intrusive than a normal search engine crawl.

How can the Turnitin crawl be less of a burden on your site than other search engines? Because, instead of having Turnitin crawl your entire site (including navigation pages, marketing ephemera, etc.), Crossref will tell Turnitin precisely what to Crawl by providing them with a list of your DOIs which point only to your DOI-identified content. Indeed, once your site has been initially indexed, then Turnitin will only crawl your new content  when we tell them that you have added or updated DOIs in the Crossref system.  In short, after the first index of your site, the Turnitin may never have to crawl your entire site again and they will be immediately informed when you have loaded new content.

Enabling Indexing With Your Hosting Provider

If your site is hosted by third party, then enabling your site to be indexed by Turnitin can most likely be handled by your hosting provider, and by the party who deposits your DOIs and associated metadata with Crossref. To do this, contact your hosting provider account manager and ask them to enable Similarity Check indexing for your site via the method below. Once your hosting provider has confirmed that they have enabled Similarity Check indexing, then forward this confirmation to and we will activate indexing. If you host your own content, then continue reading.

Technical Details For Enabling Similarity Check Indexing

The Turnitin crawler can be accommodated using exactly the same techniques that you use for allowing search engines to index the full text of your site.

First, you will need to authorize the Turnitin crawler to access full text on your site. You can do this by allowing access to your site from the Turnitin crawler IP ranges:

IMPORTANT: Please note that the IP range has changed.

You should also make sure that does not disallow our user agent: UA: TurnitinBot/2.1

Second, you need to be able to direct the Turnitin crawler to the full text of your content.  The Turnitin system will do a "directed crawl" of your site using the DOIs that you have already deposited in Crossref (see above). The problem is that these DOIs typically point to the landing page for the content in question (e.g. an abstract page).  In order to direct the Turnitin crawler to the full text of the article pointed to by the DOI, you should use the following method:

Support "as-crawled"  (aka "full text") URLs in your Crossref metadata deposits 

The Crossref metadata deposit format allows you to record an "as-crawled" URL in addition to the traditional landing page URL that is normally registered with a DOI.  If you provide as-crawled URLs in your metadata,Turnitin will make use of them instead of the landing page URLs when crawling your site.

More information about depositing as-crawled URLs can be found in the Crossref help system at:
These can be deposited as part of the full Crossref deposit, or as a resource-only deposit. Both examples are shown in the help page link above.

Publishers who use the web deposit form to deposit DOIs with Crossref can enter the as-crawled URLs/full-text links in the ‘Turnitin URL’ field when they’re are entering article information.

You can also upload this information for your existing DOI deposits by uploading a .csv file containing your DOIs and the associated full-text links via the Web Deposit Form. Information on how to compile this form is here:

If you have any questions about depositing this information, please contact  

Updated May 4, 2016.

copyright 2002, pila, inc. all rights reserved