Intended for healthcare professionals

Rapid response to:

Analysis

Increased mortality associated with weekend hospital admission: a case for expanded seven day services?

BMJ 2015; 351 doi: https://doi.org/10.1136/bmj.h4596 (Published 05 September 2015) Cite this as: BMJ 2015;351:h4596

Rapid Response:

The BMJ should require all papers to share their analytic code.

Freemantle and colleagues [1] respond to calls for them to share the raw data from their paper by explaining that this is impossible, due to information governance issues around individual patients' electronic health records. They also allude to the fact that large numbers of other researchers have access to the same electronic health record data that they have used.

("Several correspondents have requested that we share ‘the raw data’, and while we would welcome the possibility of doing this, for governance reasons it is simply not possible. However, these data are available from the HSCIC to those with adequate provision for data management and security. Large datasets such as the one used for our analyses (≈150m) observations require specialist computer equipment and software to setup and undertake the analyses.")

This raises an important issue that is often neglected in discussions on transparency in science: the need for analytic code to be shared, as well as underlying data. While pseudonymised individual patient data often poses a reidentification risk, this is rarely the case with the analysis programs written in Stata, R, or other packages to analyse that data (excepting, at worst, some individual manual commands to correct errors in individual patients' data).

Sharing such code can be hugely informative for those wishing to understand in detail the analytic choices made. It also facilitates sensitivity analyses, to interrogate the impact of large numbers of specific individual analytic choices (which can often be discretionary) on the overall result. To take this very concrete example: were Freemantle et al to share their code for this paper, then those who have access to the same underlying data through their own licenses could review the impact of the original team’s analytic choices very rapidly.

This is not a criticism or challenge aimed at Freemantle et al specifically. It would be excellent if they could share their code for this paper: but it would be even more useful if the BMJ could change its policy to require that all such analytic code is published, as an appendix, as a matter of routine, alongside every quantitative paper.

Such a policy would do much more than simply enhance transparency and reproducibility. It would also create common access to a rapidly growing archive of analytic code, with clear and immediate clinical relevance, from which many researchers could learn new programming techniques and shortcuts, and from which whole routines could be re-used. This is particularly salient given the considerable amount of avoidable and laborious duplication of effort around data preparation in electronic health records research. Sharing code would therefore allow each publication to contribute even more to advancing science, and so help accelerate the discovery of important signals in patient data.

It would also, unlike opening hospitals to routine activity, and implementing full clinical and non-clinical staffing at weekends, cost nothing.

[1] http://www.bmj.com/content/351/bmj.h4596/rr-45

Competing interests: BG co-founded the AllTrials.net campaign and works on various other projects on scientific transparency including www.COMPare-trials.org, www.OpenTrials.net and others. BG receives personal income from speaking and writing for lay audiences on problems in science including replication.

11 January 2016
Ben M Goldacre
Senior Clinical Research Fellow
Centre for Evidence Based Medicine, Nuffield Dept Primary Care, University of Oxford.
Centre for Evidence Based Medicine, Department of Primary Care Health Sciences, University of Oxford, Radcliffe Observatory Quarter, Woodstock Road, Oxford OX2 6GG