As a follow-up to our blog posts on the Crossref REST API we talked to SHARE about the work they’re doing, and how they’re employing the Crossref metadata as a piece of the puzzle. Cynthia Hudson-Vitale from SHARE explains in more detail…
Cynthia Hudson-Vitale, digital data librarian in Research Data and GIS Services at Washington University in St. Louis Libraries and visiting program office for SHARE
SHARE (http://share-research.org) is building a free, open, data set about research and scholarly activities across their life cycle. It is a higher education initiative whose mission is to maximize research impact by making research widely accessible, discoverable, and reusable. SHARE’s data set is free, openly licensed, and built with open source technology developed at the Center for Open Science (COS). Launched in beta in April 2015 the data set has grown to more than 6 million records from 100+ providers, including Crossref, Social Science Research Network (SSRN), DataONE, 50+ library institutional repositories, and more.
How is the Crossref REST API used within SHARE?
SHARE currently harvests metadata from Crossref using the Crossref application programming interface (API). We pull such metadata values as journal title, author, DOI, journal name, and publisher, to name just a few. This metadata is then fed into our data processing pipeline, normalized, and aggregated into the full data set.
What are the future plans for SHARE?
Phase II of SHARE, launched in late 2015, focuses on adding metadata providers, enhancing the metadata, and making connections and links between the metadata records. These links will show the entire life cycle of research and scholarship—connecting a data management plan, grant award information, data deposits, analytic/software code, pre-publications, final manuscripts, and more.
To move these plans forward, SHARE is applying machine-learning and automation techniques and working with the community to verify metadata enhancements and curate the metadata. Current technology work focuses on imputing subject domain keywords and object types into the SHARE data set using learning models and heuristics. Data models and schemas are in development to connect the research lifecycle, connect multiple instances of an object to a single entity, and capture metadata provenance.
What else would SHARE like to see in Crossref metadata?
We would love to see rights-declaration metadata elements and article references/citations included in the metadata about digital objects. The rights-declaration information is invaluable for individuals who want to know what category the object is in (public domain, copyrighted, etc.), what constraints or permission requirements exist, contact information, and more. Additionally, networks of research can be discovered and meta-scholarship facilitated by making article reference lists machine-readable and openly available.
Does this give you any ideas? Feel free to get in touch with questions or take the API for a spin yourself and let us know what you can do with it!