6 minute read.
Metadata and integrity: the unlikely bedfellows of scholarly research
I was invited recently to present parliamentary evidence to the House of Commons Science and Technology Select Committee on the subject of Research Integrity. For those not familiar with the arcane workings of the British Parliamentary system, a Select Committee is essentially the place where governments, and government bodies, are held to account. So it was refreshing to be invited to a hearing that wasn’t about Brexit.
The interest of the British Parliament in the integrity of scientific research confirms just how far science’s ongoing “reproducibility crisis” has reached. The fact that a large proportion of the published literature cannot be reproduced is clearly problematic, and this call to action from MPs is very welcome. And why would the government not be interested? At stake is the process of how new knowledge is created, and how reliable that purported knowledge is.
The other issue driving this overview of research practices are the cases of deliberate fraud and wrongdoing that have recently created headlines (e.g., the STAP papers concerning the reprogramming of stem cells). While these cases are clearly dramatic outliers, they nevertheless serve to diminish public confidence in scholarly research and the findings that come out of this enterprise.
As with most inquiries, the question quickly boiled down to: who is to blame? As Bill Grant MP asked me directly, “Where does the responsibility lie?”
My answer was lifted from an article by Ginny Barbour and colleagues in F1000Research this November (https://doi.org/10.12688/f1000research.13060.1): publishers are responsible for the integrity of the published literature, while institutions and employers are ultimately responsible for the conduct of their staff. Misconduct entails intent, usually to deceive the reader into believing a conclusion that the researcher wishes them to believe. But journal editors can never know, and are not in a position to investigate, whether a researcher has deliberately falsified their data.
However, there are things that publishers can do to ensure high standards of integrity. Much of this involves making a study’s authors publish as much information about what they have done as possible - the more the reader can see of how data were generated, the more that reader can trust the findings communicated in the published article.
Article metadata directly supports this function. It provides structure and transparency to information pertaining to ethics and integrity. And because metadata is independent of the main article, it can be readable even if the article itself is locked behind a paywall.
Crossref already provides metadata that can demonstrate the integrity of published articles. The metadata collected on 91+ million scholarly works across publishers and disciplines is open and freely accessible to all. Bibliographic information, for example, allows readers to see who the authors of the article are, where they are from, and what else they have published. Similarly, funding data allows readers to identify potential conflicts of interest, for example if the funder has commercial or political affiliations. Even if the reader cannot see the conflict of interest statement (or if the journal has not provided one), they can use the funding statement to surface potential conflicts.
And if they wanted, publishers could provide additional metadata to add still more transparency to the research process. Ethical approval by institutional review boards, for example, could be captured, and any protocol numbers traced back to the original ethics committee approval. At present the process of ethical approval varies from country to country, and from institution to institution. Encouraging authors and journals to deposit information on the approval process would both demonstrate the high ethical standards the author is working to, and also improve the standards themselves, since institutions would have to encode their approval processes in a way that is understandable to others. This could pave the way to significantly higher international ethical standards, all through a simple addition to the indexed metadata underlying the scholarly literature.
One key recommendation that I and many others made to the Committee was, in short, “show your work”. As a researcher, that means showing your data. As a publisher, that means showing what checks you have done. In both cases, metadata can help.
A major issue that publishers and researchers can – and should – address is the provision of actual scientific data. Most papers, today, present only the end results of the authors’ (often quite extensive) analyses. The case for sharing data is an obvious one - many recent cases of misconduct could have been identified earlier, or even avoided altogether, if editors and readers had had access to underlying datasets.
With images, a requirement to submit raw images alongside the edited figures would dramatically reduce the cases of manipulation that are rife in the literature (studies suggest up to 20% of papers have some kind of inappropriate figure manipulation, with around 1 in 40 papers showing manipulation beyond that which can be expected to be a result of error). Similarly, providing the numbers that a paper’s analyses are based upon would allow readers to fully assess if datasets are distributed as would be expected through random sampling, and, if they choose, to determine if the data are sufficient to support the statistical inferences made in the paper. The Crossref schema – by providing unique identifiers to data citations - makes this link between data and paper possible. (See the recent blog post on the Research Nexus for more information.)
For publishers, showing your work also means being transparent to your readers about the editorial checks that a manuscript has undergone. Crossref has a tool that enables this editorial transparency: it’s called Crossmark. Crossmark allows readers to see the most up-to-date information about an article, even on downloaded PDFs. In most cases it is used to show whether the version of an article is most recent one, or whether any corrigenda or retractions have been subsequently added. But it can also be used to provide whatever information a publisher wishes to share about the paper. Some journals have experimented with using Crossmark to ‘thread’ publications together, for example, by linking all the outputs generated from a single clinical trial registration number (blog post here). But publishers could go further and display metadata pertaining to the editorial checks they have performed on a paper. So Crossmark could tell readers that the paper has been checked for plagiarism, or figure manipulation, or reporting standards such as CONSORT or ARRIVE guidelines. Here at Research Square we have been addressing this with a series of Badges that researchers can apply to their papers to demonstrate what checks have been performed.
Together, these implementations would provide value to the reader, who can see exactly what has been checked, and to the publisher, who can show how rigorous their editorial processes are. It would also serve to highlight the integrity of the authors who have passed all of these checks.
Research integrity is not something that can be easily measured but, unlike wit or charm, it is something that people generally know that they have.* This means that they just need to be transparent in their output to demonstrate this to the world. Metadata provides a simple way of doing this, so researchers and publishers should make sure they provide it as openly as they can.
*with apologies to Laurie Lee for the mangled quote