Flagging content that is "free" for text mining

Flagging content that is “free” for text mining.

The Crossref API can be used for locating the full text of published articles and preprints for the purpose of text mining.

Crossref members who have have subscription-access content and who want to make some of their content available for text mining need to take the following steps.

The Crossref schema supports the NISO Access and License Indicators ALI section, and, normally, the free_to_read functionality of ALI would be the recommended mechanism for indicating that content is available for free (e.g. “gratis”, not “open”). However, the ALI free_to_read element is not currently exposed through our REST API filters.

But we have defined a workaround that allows members to both register the ALI free_to_read element and an equivalent assertion that will work with the REST API and which will allow researchers to locate content that has been flagged as “free.”

Steps TL;DR

Crossref’s participation reports can be used to tell if you are already doing this. See the section marked “Text mining URLs” and/or “Similarity Check URLs” to see what percentage of your registered content has some sort of full text link.

2. Remove your platform’s access control restrictions from the URLs for the DOIs you would like to make available for free.

This will vary from publisher to publisher and platform to platform. But please note that Crossref does not have any control over access to our members’ content.

3. Flag the DOIs that you are making available “free.”

You can do this by submitting the relevant ALI free_to_read element as well as a Crossmark assertion for each relevant DOI. See details below.

4. Test the DOIs via the Crossref API to ensure that everything is working.


Flagging your DOIs as “free”.

To flag your DOIs as “free”, you can submit a single CrossMark assertion and deposit the XML using our ‘resource-only deposit’ mechanism (Note that as of January 2020 there is no longer a charge for participating in Crossmark and so this can be done without any additional fees)

The following XML shows an example “resource-only deposit” that shows how you can add the ALI free_to_read and a Crossmark free assertion to an existing Crossref metadata record.

<?xml version="1.0" encoding="UTF-8"?>
<doi_batch version="4.4.2"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.crossref.org/schemas/4.4.2 http://www.crossref.org/schemas/doi_resources4.4.2.xsd">
        <!-- Replace below with a unique ID -->
            <!-- Replace below with member name -->
            <depositor_name>Member Name</depositor_name>
            <!-- Replace below with the email address where errors should be reported -->
            <!--the DOI being updated with CrossMark metadata -->
            <!--CrossMark metadata -->
                <!-- If you already have a Crossmark policy DOI, replace it below. If you do not
                have a Crossmark policy, then repeat the DOI being updated -->
                    <assertion name="free" label="Free to read">This content has been made available to all.</assertion>
            <!--the DOI being updated with CrossMark metadata-->
            <program xmlns="http://www.crossref.org/AccessIndicators.xsd">

Assuming this record was named free_to_read.xml, then you can deposit the record via our XML API using curl as follows:

curl -F 'operation=doDOICitUpload' -F 'login_id=USERNAME' -F 'login_passwd=PASSWORD' -F 'fname=@FILENAME.XML' 'https://doi.crossref.org/servlet/deposit'
Note that it can take up to an hour before an update is reflected in the REST API.

Querying articles flagged as free in the Crossref REST API

You may want to acquaint yourself with the documentation for the Crossref REST API.

But here are some example queries using a filter to identify content that has been asserted to be ‘free’ using the above technique.

Querying all works that have a free assertion associated with them:

https://api.crossref.org/v1/works?filter=assertion:free,has-full-text:t&query.bibliographic="Covid 19"

(note that as of 2020-03-12 this returns zero results)

Last Updated: 2020 March 13 by Geoffrey Bilder