Making raw data more widely available
BMJ 2011; 342 doi: https://doi.org/10.1136/bmj.d2323 (Published 04 May 2011) Cite this as: BMJ 2011;342:d2323- Andrew J Vickers, associate attending research methodologist
- 1Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, NY 10065, USA
- vickersa{at}mskcc.org
Medical investigators routinely refuse to share data from medical studies, seeming to regard such data as private property rather than a public resource for the benefit of medical science and future generations of patients. One survey found that 75% of pharmaceutical researchers were opposed in principle to making raw data available.1 Two studies have found that only a minority of researchers (10% and 25%) share data when publishing in journals with explicit policies that raw data must be made available.2 3 Even in genomics research, where the principle that microarray data should be deposited in a publicly accessible database is widely accepted, many published studies do not have an associated data set publicly available.4 5
The benefits of sharing raw data from medical studies have been widely discussed.6 7 8 9 Data sharing ensures reproducibility, allows testing of secondary hypotheses, facilitates development of new statistical methods, provides a resource for teaching, aids design of new studies, simplifies data acquisition for meta-analysis, and helps prevent fraud and selective reporting.
Reasons that investigators give for why they cannot or should not share their data can be shown to be trivial. Researchers have claimed that data sharing would affect patient confidentiality, even though, anonymisation of data is generally straightforward and can follow established guidelines.10 Alternatively, data requests have been refused on the grounds that compiling a data set would be too much work, although this has to be done anyway for the published statistical analysis. Another common worry is that data might be analysed using invalid methods, which is a fair point, but surely a judgment for the scientific community as a whole.
The system is broken and needs fixing: investigators ought to share data, but they do not, and they ignore explicit policies aimed at increasing the availability of data. One obvious solution would be for journals to beef up enforcement—for example, by demanding a data set or a link to a data repository at the time of paper submission in the same way that a conflict of interest statement is required before a manuscript can be sent for peer review. Another alternative is to hit researchers where it hurts: their funding.
The Wellcome Trust has devised its own initiative to tackle the problem by developing principles for funders with regard to data sharing.11 At a meeting in May 2010, about 17 funders committed to work together to increase the availability of data generated by the research they fund, signing up to a statement of purpose.12 The statement includes overall aspirations, such as data sets for published papers being archived and made available to other researchers in a clear and transparent manner; general principles—that data sharing should be equitable, efficient, and ethical; and specific recommendations for developing knowledge and good practice, such as data management standards.
The leadership of the Wellcome Trust should be applauded, even though there are currently no plans for enforcement, such as cutting the funding for researchers who refuse to share data. But it seems likely that open access to raw data will go the same way as open access to published papers. It started as a simple idea—that the results of research funded by the US government should be available to the American people—then became voluntary policy, which was toothless and failed, and was therefore transformed into a mandatory requirement. When researchers submit a new grant to the National Institutes of Health, or write a progress report for an existing grant, they have to provide the PubMed Central ID numbers for any papers based on federally supported research. It would not be a surprise if, in a decade’s time, funders finally get tired of paying for data that researchers keep to themselves. If the purpose of a grant is to improve public health, and if public health will benefit most from making grant related data more widely available, we should fully expect funders to demand that grantees share data: whoever pays the piper calls the tune.
In the meantime, a modest recommendation to the medical research community: get used to it. Once medical researchers start publishing their data, and depositing it in data archives, they will discover not only that it is painless, but that it affords huge advantages to medical science, and to patients present and future.
Notes
Cite this as: BMJ 2011;342:d2323
Footnotes
Competing interests: The author has completed the Unified Competing Interest form at www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declares: no support from any organisation for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.
Provenance and peer review: Commissioned; not externally peer reviewed.