Making data sharing the norm in medical research
BMJ 2023; 382 doi: https://doi.org/10.1136/bmj.p1434 (Published 11 July 2023) Cite this as: BMJ 2023;382:p1434Linked Research
Prevalence and predictors of data and code sharing in the medical and health sciences
- Clara Locher, pharmacologist1,
- Gérard Le Goff2,
- Anne Le Louarn, codirector3,
- Ulrich Mansmann, professor of biometry and informatics4,
- Florian Naudet, professor of therapeutics1 5
- 1University of Rennes, CHU Rennes, Inserm, Irset (Institut de recherche en santé, environnement et travail)-UMR_S 1085, CIC 1414 (Centre of Clinical Investigation of Rennes, Rennes, France
- 2Patient representative, France Rein Bretagne, Laillé, France
- 3GCS CNCR (Comité National de Coordination de la Recherche), Paris, France
- 4Ludwig-Maximilians University Munich, Medical Faculty, Institute for Medical Information Processing, Biometry, and Epidemiology, Munich, Germany
- 5University Institute of France (IUF), Paris, France
- Correspondence to: F Naudet floriannaudet{at}gmail.com
Reuse of medical research data—which is conditional on access to individual participant data—is expected to maximise the value of medical research. It enables alternative hypothesis testing, validation of claims, exploration of controversies, restoration of unpublished trials, avoidance of duplicated efforts, and production of new knowledge from existing datasets. Given these benefits, politicians,123 funders,4 and publishers5 now support and implement data sharing policies. However, converging evidence indicates that current policies are unlikely to reach their goal of achieving data sharing. In a linked paper at The BMJ, Hamilton and colleagues (doi:10.1136/bmj-2023-075767) synthesised 105 meta-research studies examining 2 121 580 articles across 31 medical specialties and found that, despite some heterogeneity, data sharing rates are consistently low across medical research.6 Intention-to-share data have increased with time but are not associated with any increase in actual data sharing.
Responsible sharing of such sensitive data is, however, a complex endeavour. Preparing, storing, processing, administering data, and complying with legal and regulatory data protection requirements all call for appropriate infrastructure and institutional support, generating additional costs.7 While these practical problems can be resolved pre-emptively, the suboptimal sharing of systematic review data as shown in the study by Hamilton and colleagues6—which are neither sensitive nor difficult to share—suggests deeper reasons for the lack of open data.
Unlike the open source software community, where error correction is welcomed, sharing biomedical data can be perceived as risky given the possibility of uncovering errors that can damage a researcher's reputation. Some private companies and researchers can consider that they own the data they have collected and view data sharing as a competitive disadvantage via loss of the exclusive use of the data. Further, the value of reuse, replication, or validation studies is sometimes underestimated by editors.8
These barriers to data sharing focus more on individual researchers’ or organisations’ interests and less on expected benefits to science and society. We cannot overestimate the positive impact on citizens of knowing how their health data are reused, and the contribution to improving healthcare. Trial participants are generally supportive of responsible sharing of de-identified individual patient data.9 If data sharing is not without risk, the risk of re-identification of correctly pseudonymised information shared in a secure environment remains low in relation to the stakes. The new knowledge produced through data sharing is likely to make patients more aware of the positive benefit/risk balance—including the benefit of improving health for all—thus refocusing the debate.
Coordinated efforts can improve the situation. If leading funders such as the US National Institutes of Health and Horizon Europe implement and finance data sharing activities, other funders should follow. The French DGOS (Direction Générale de l'Offre de Soins), for example, funds clinical research for about €130m (£112m; $143m) per year10 but has no policy yet. Institutional review boards should systematically assess data sharing plans within their ethical evaluation of research projects, balancing the potential benefits (eg, large confirmatory trials might have more value than small exploratory studies) with the risks (eg, re-identification risks could be higher in trials of rare v frequent diseases). Institutional review boards should be able to assess in any new research project whether the research question would benefit from data sharing or whether it would be better resolved by reusing existing datasets, because data reuse avoids exposing new research participants to risk.
Journals and regulators should adopt mandatory policies, which are probably more effective than non-mandatory policies.6 The Committee on Publication Ethics should clarify how journals should react when confronted with empty promises of data-sharing. Initiatives rewarding data sharing behaviours, such as the Hong Kong Principles11 or the Good Pharma Scorecard,12 should be widely endorsed to establish openness and transparency as core values. But even if it is possible to make open data a reality, shared data are useless if they are not requested. Available datasets are clearly underused.13 There is a need to train researchers not only to prepare and share but also to reuse existing datasets.14 Such so-called data champions should disseminate the culture of open data everywhere (eg, funders, institutional review boards, data use and access committees), making data sharing a practice with real impact and not just wishful thinking.
Acknowledgments
We thank Angela Swaine Verdier for reviewing the English language used in this article.
Footnotes
Competing interests: The BMJ has judged that there are no disqualifying financial ties to commercial companies. The authors declare the following other interests: CL is involved in the doctoral network MSCA-DN SHARE-CTD (HORIZON-MSCA-2022-DN-01 101120360), funded by the EU, with additional contributions from Bayer, Smart Data Analysis and Statistics, SciCrunch, and ECRIN-ERIC. UM is the coordinator of MSCA-DN SHARE-CTD. FN received funding from the French National Research Agency (ANR-17-CE36-0010), the French Ministries of Health and Research. FN is a work package leader in the OSIRIS project (Open Science to Increase Reproducibility in Science); the OSIRIS project has received funding from the EU’s Horizon Europe research an innovation programme. FN is also a WP leader in MSCA-DN SHARE-CTD. ALL and GLG no conflicts of interests. Further details of The BMJ policy on financial interests are here: https://www.bmj.com/sites/default/files/attachments/resources/2016/03/16-current-bmj-education-coi-form.pdf.
Provenance and peer review: Commissioned; not externally peer reviewed.