8 minute read.
How we use Crossref metadata
Bruce Rosenblum, CEO, Inera Incorporated talks about the work they are doing at Inera, and how they’re using our metadata as part of their workflow.
Can you tell us little bit about Inera, and yourself
I’ve always been fascinated by the intersection of publishing and technology. At Inera I help scholarly and technical publishers improve their workflows through technology, and build editorial and XML software solutions to improve the publication workflow. I lead the development teams for our eXtyles and Edifix products, and I also participate in community projects: co-authoring the original NLM DTD suite, developing the Crossref Metadata Deposit Schema in 2001, and serving for 8 years on the NISO board. I continue to work on JATS and BITS development, and I co-chair the NISO STS Working Group. Before joining Inera, I developed publishing technology such as an Apple II Word processor for Chinese in 1981, and early micro-computer desktop publishing systems in the late 1980s.
At Inera, we develop and license the eXtyles family of Word-based editorial and XML software tools (including eXtyles, eXtyles NLM, eXtyles STS, and eXtyles SI) as well as Edifix, an online bibliographic reference solution. eXtyles and Edifix allow users to automate the most time-consuming aspects of document publication. Publishers of scholarly journals and books, standards, and government documents around the world rely on our software solutions to drive efficient, effective publishing workflows.
Inera and Crossref have collaborated since 2001, and we jointly won the 2014 NEPCo Award for the ongoing symbiotic relationship between our organizations.
What problem is your software and service trying to solve?
Publishers receive manuscripts from authors who have deep knowledge of their subject matter but are sometimes not expert writers and rarely expert users of Microsoft Word. Our eXtyles and Edifix solutions are designed to help publishers rapidly and accurately prepare these manuscripts for publication by automating a lot of technical and editorial cleanup, then producing high-quality JATS and BITS XML.
Within eXtyles and Edifix, we have sophisticated algorithms that heuristically parse bibliographic references, copyedit them automatically to a publisher’s editorial style, and then link them to Crossref and PubMed. These features eliminate a lot of repetitive detail copy editing work so that human editors can focus on higher-level editing tasks, and they produce more accurate bibliographies that include online links, with a fraction of the work that it would take to look up, check, and correct each reference manually.
Simply stated, we use Crossref metadata in our products to ensure that bibliographic reference lists are as complete, correct, and up to date as possible at the point of publication.
Both eXtyles and Edifix use Crossref metadata to improve reference lists. Our reference processing module pulls apart references to journal articles, books, book chapters, conference papers, and standards, applies elements based on the JATS reference model, and then reconstructs them according to the editorial style chosen by the user (e.g., AMA, APA, MLA, or a custom-configured style to meet customers’ requirements).
Crossref metadata is used for two primary purposes. First, we query the Crossref database to obtain DOI links for journal articles, books, conferences, and other types of references. This link lookup helps our customers fulfill their Crossref membership obligations and helps ensure that researchers get appropriate credit for citations of their work. Second, we use the metadata obtained from Crossref to improve the accuracy of author-supplied reference entries.
What values do you pull from our APIs?
The most important metadata value we retrieve is the DOI itself. Because the majority of bibliographic references in author manuscripts do not include DOIs, a key feature of our service is DOI retrieval. However, we use metadata well beyond the DOIs once we’ve matched a record. Even if a reference already has a DOI, we still do a traditional query, using the other available reference elements, to retrieve a DOI and compare the results to flag discrepancies. We’ve found that ~20% of author-supplied DOIs are incorrect, so correcting these discrepancies is one of myriad ways that our software uses Crossref metadata to improve references before publication.
We also pull all of the other fields that are used to build a bibliographic reference—complete author list, title of the item, publication date, volume, pages, and so on—and use these elements to correct and improve the reference. By filling in missing data (e.g., volume, issue, and page numbers) and flagging discrepancies between author-supplied entries and Crossref metadata (e.g., author names in a different order, words missing or misspelled in an article or chapter title), our software assures publishers of a high-quality bibliography with minimal manual effort.
Finally, we use Crossmark metadata to flag references that may have been corrected—or retracted—to inform editors when an item may need further attention from an author. Did the author knowingly cite a retracted article? If not, does that change the science of the paper citing that retracted item?
Yes, we’ve built our own tools to query Crossref’s APIs. In 2002, we used the old “piped-query” API to submit elements of journal references, but we outgrew this API because it returned too many false positive results and missed other DOIs that were correct, and because we wanted to query Crossref for DOIs to other reference types (e.g., books, conference papers, reports) as well as journals. We switched to XML queries in 2006, and the result was a huge improvement in the quality and quantity of DOI links for our customers.
But just moving to XML queries still wasn’t good enough. Eight years ago, we wanted to improve DOI retrieval of non-journal items like reports, and we found that the existing Crossref APIs didn’t provide what we needed. So we collaborated with Crossref CTO Chuck Koscher to create the author–title query as an extension to regular XML queries. The result was a dramatic improvement in our ability to retrieve DOIs to non-journal items. The author–title query was a precursor to Crossref’s current metadata APIs, and it continues to serve us well.
All the time! Our customers are located all around the world in more than 25 countries on six continents, so Crossref metadata queries from our software are happening continually, at any time of the day or night, seven days a week, and even on holidays.
There are two other important ways that our software interacts with Crossref APIs every day. First, Crossref’s Simple Text Query (STQ) service, which is used by smaller publishers to meet their Crossref requirement to add DOIs to their reference lists, was built using Inera’s reference parsing engine. In this case, our software runs on Crossref’s servers and is an integral part of the Crossref ecosystem.
Second, to test our products, we run a comprehensive automated quality assurance process every night that tests all aspects of our software and ensures day-over-day stability. When we added Crossref linking functionality in 2003, we began running several thousand Crossref queries per night, looking for consistency in our software’s results. A few months later, we noted an unexpected change: a reference that had previously returned a DOI failed to link! We contacted Crossref about the “lost” DOI, and upon investigation, Crossref discovered that in the process of redepositing 20,000 DOIs, the publisher had accidentally inverted author surnames and given names in all of those records.
Crossref immediately recognized the value of Inera’s automated testing, and its ability to unearth such errors, to Crossref and its members. Over time, the number of DOIs we test nightly has grown to tens of thousands, so we’ve worked with Crossref to develop an automated reporting and analysis process that makes detecting and resolving the issues highlighted by our internal testing more efficient.
The co-development of the author–title query API and the sharing of our nightly test suite results are just two examples that highlight the nature of the Inera–Crossref relationship: it’s characterized by technology integration, bidirectional information exchange, and innovative problem solving.
What are the future plans for Inera?
We’re constantly working to improve eXtyles and Edifix and to develop new and innovative ways to help our customers. Here are a few examples:
Two years ago, at the peak of the Zika outbreak, we received an urgent request from the World Health Organization to help them create DOIs for articles that had been submitted but not yet peer reviewed (see Zika Open). Within 16 hours of their request, we developed, tested, and deployed updated software that allowed WHO to publish information vital to researchers, including DOIs, within hours of receipt.
With respect to Crossref APIs, we plan to integrate the Crossref query features to retrieve DOIs for standards that are deposited by organizations like IEEE, ASTM, and BSI. We also plan to expand our linking and verification capabilities to incorporate newer reference types such as preprints and data citations.
More broadly, we’re very excited about the eXtyles Metadata Extraction technology we released last year. This technology can be used by online submission systems and preprint servers to automatically extract key metadata elements (title, abstract, authors, affiliations, keywords) from author-submitted manuscripts, no matter what “style” the author may have used to format the manuscript. This technology is already in-use at Aries Systems to simplify the submission process. We’re looking forward, soon, to seeing this technology used by preprint servers and institutional repositories to automate the collection and deposit of preprint metadata to Crossref.