Managing UK research data for future useBMJ 2009; 338 doi: http://dx.doi.org/10.1136/bmj.b1252 (Published 25 March 2009) Cite this as: BMJ 2009;338:b1252
For some time the BMJ has been watching other journals’ efforts to encourage authors to make raw research data available. Now we are taking part too, by asking authors to include a data sharing statement at the end of each original research article. The statement will explain which additional data—if any—are available, to whom, and how. Those data could range from additional explanatory material to the complete dataset. People allowed access to the data might range from fellow researchers to everyone. And data might be available only on request, accessible online with a password, or openly accessible to all on the web with a link on bmj.com.
We understand that many authors wish to guard data until they have published all their own papers, and we know that data sharing is hard to do. But we hope that authors will, increasingly, set the data free, perhaps after a set period of personal use.
Data sharing means more than the open access publication of articles and the posting in online registries of study protocols and main results. Sharing allows other researchers—and perhaps scientists, clinicians, and patients—access to raw numbers, analyses, facts, ideas, and images that do not make it into published articles and registries. At its fullest extent, data sharing means free access for everyone. Many people would call this a moral obligation because most research is publicly funded and involves the public as participants. Other potential benefits include quicker scientific discovery and learning, better understanding of research methods and results, more transparency about the quality of research, and greater ability to confirm or refute research through replication.
Such sharing raises important questions about who owns the data,1 who gives permission to release the data (including funders, research participants, owners of the intellectual property, and copyright holders), where and how the data should be stored (in electronic repositories managed locally, nationally, or internationally; or in subject specific databases), how the data should be stored and managed and made compatible across repositories, how the data should be accessed and mined, who should have access and when, and what limits may be needed to prevent misuse and mishandling of data. Yet, despite these and other complexities, the movement to free the world’s vast swathes of untapped research data is gathering speed.
This momentum is coming not just from a few open access advocates and proponents of making the web a searchable network of data as well as articles.2 As delegates at the UK Research Data Service (UKRDS) conference in London heard last month, many researchers would also like access to unpublished raw data.3 Yet academic, technology, publishing, and business interests currently conspire—deliberately or not—to keep data hidden away. Researchers lack the incentives and the means to analyse all the data that they generate, to manage data after funded projects have ended, and to share data other than informally with certain collaborators. A recent UKRDS study on the logistics and costs of developing and maintaining a national shared digital research data service concluded that such a service is feasible and worth funding, and that it could greatly increase UK universities’ potential for research and innovation and their global competitiveness.4 Meanwhile, many other countries are already on the case.
Data sharing is hardly a new idea. Physicists, environmentalists, and researchers in the basic biomedical sciences have been doing this for years. Funders such as the UK Medical Research Council,5 US National Institutes of Health,6 and Wellcome Trust7 already mandate sharing of data from research in basic science and genetics. The US National Heart, Lung, and Blood Institute has opened up to researchers worldwide its collection of genetic and clinical data from three asthma research networks and the Framingham Heart Study.8 Even GlaxoSmithKline has opened up its “patent pool” so that data relevant to finding drugs for neglected diseases can be explored by other researchers.9
Numerous science journals mandate data sharing too. For example, a condition of publication in a Nature journal is that “authors are required to make materials, data and associated protocols promptly available to others without preconditions. Data sets must be made freely available to readers from the date of publication, and must be provided to editors and peer-reviewers at submission, for the purposes of evaluating the manuscript. For the following types of data set, submission to a community-endorsed, public repository is mandatory.”10 These data sets include those containing DNA and protein sequences, macromolecular structures, microarray data, and chemical compound screening.
For most medical journals, however, sharing of clinical research data is a new and difficult concept. Last week the editors of the open access BioMed Central journal Trials spelled out the main ethical and editorial barriers to data sharing in medicine and, partly drawing on discussions with scientists and other editors (including TG), proposed some solutions (box). 11 The maintenance of patient confidentiality is a major challenge, because the combination of clinical data and personal data and the place of research can be enough to reveal a research participant’s identity. Hence clinical research data need to be anonymised carefully before sharing and, if a risk of identification remains, patients should be asked for consent to data sharing as well as consent to taking part in the research.
Sharing medical research data: ethical and editorial barriers and proposed solutions11
Ethics committees: encourage researchers to include plans to publish data in trial information sheets and discuss the safeguards in place to protect the privacy of patients
Research funding agencies: give greater scrutiny to data sharing plans and monitor their enforcement
Journal editors and publishers: recommend that authors prepare data in line with an agreed standard (which requires further consideration). Encourage deposition of data in the journal or suitable third party repository as part of the submission process, potentially via an accession number system, as for trial registration
Trialists: obtain explicit consent for publication of suitably anonymised raw data as part of patient recruitment procedures
Since 2007, the Annals of Internal Medicine has been asking authors to make a “reproducible research statement” at the end of each research paper.12 Authors state whether and within what limits they will share the original study protocol, the dataset used for the analysis, and the computer code used to produce the results. We gladly acknowledge that we are emulating this policy in introducing data sharing statements for BMJ research articles and bringing data sharing to authors’ and readers’ attention. We hope authors and readers will now join us in this debate and will help journals to set data free.
Cite this as: BMJ 2009;338:b1252
Competing interests: None declared
Provenance and peer review: Commissioned; not externally peer reviewed.