All BMJ research papers should share their analytic code

Ben Goldacre

doi:10.1136/bmj.i886

Letters Sharing raw data

All BMJ research papers should share their analytic code

BMJ 2016; 352 doi: https://doi.org/10.1136/bmj.i886 (Published 18 February 2016) Cite this as: BMJ 2016;352:i886

All rapid responses

Rapid responses are electronic comments to the editor. They enable our users to debate issues raised in articles published on bmj.com. A rapid response is first posted online. If you need the URL (web address) of an individual response, simply click on the response headline and copy the URL from the browser window. A proportion of responses will, after editing, be published online and in the print journal as letters, which are indexed in PubMed. Rapid responses are not indexed in PubMed and they are not journal articles. The BMJ reserves the right to remove responses which are being wilfully misrepresented as published articles or when it is brought to our attention that a response spreads misinformation.

From March 2022, the word limit for rapid responses will be 600 words not including references and author details. We will no longer post responses that exceed this limit.

The word limit for letters selected from posted responses remains 300 words.

Re: All BMJ research papers should share their analytic code

I should preface my response by saying I support the AllTrials campaign and have been following Dr Goldacre’s assessment of “outcome switching”. I feel obliged however to express my concerns about this most recent suggestion i.e. the obligatory sharing of code.

Programmers already share their code through journals, conferences etc. thus some of the benefits described already exist, and if i understand things correctly the industry (e.g. Phuse) is developing standard programs as the regulatory environment imposes much sameness and repetition of output. Regarding the reuse of code, the program must be flexible and user-friendly and that can take many more hours of the programmer’s time; it is one thing to have a working program and another to have reusable code. Code shared in journals etc. is not idiosyncratic or study-specific, it is cleaned, described in detail and illustrated with an example. This is quite different from dumping lengthy code in a supplementary document to support an analysis in the BMJ. Almost all of the code would be uninteresting since well designed trials do not demand complex analyses and it would merely articulate in SAS language the assumptions that are stated in the article. And in the cases where the code is interesting, even seasoned programmers may find it difficult to read e.g. if there are copious macros or the style is unusual or different languages have been used (industry statisticians like SAS, academics like R). Anyone involved in the process of validating code will know that hundreds of queries are raised across the set of programs and that almost all of these will ultimately be dismissed by the author of the programs (owing to their more intimate knowledge of the data and code). This implies that making code available will likely elicit uninformed assertions regarding errors, or on the other hand, reusing code without understanding it. Consider the case, years after an analysis has been programmed, when a higher-up believes they’ve spotted an error in the numbers and the programmer is obliged to placate them; when confronted with their own code the programmer, in the absence of thorough documentation of certain minor decisions that were taken at the time, will begin to question their own work. Thus what hope is there for a researcher skimming code that was attached to a BMJ article? I personally believe coding errors are far more rife than we’d like to think. But what is needed is truly independent validation which implies a dogged refusal to view the code, and this returns us to the issue of sharing data and away from this new issue of sharing code. Companies usually refer to several forms of validation with independent coding as the most rigorous and ‘code review’ as the most unreliable, and it is the former which is needed and the latter which BMJ readers would be performing (only the former will reveal assumptions embedded deep in the code).

Incidentally, I have written this too quickly and I’m not primarily a programmer, thus i hope genuine programmers respond to Dr Goldacre’s suggestion. Eg, what will contract programmers think about releasing code filled with tips and tricks they’ve accumulated over decades of service and education? The programmer’s time and reputation does not ‘cost nothing’.

Competing interests: No competing interests

19 February 2016

Paul M Brown

Biostatistician

Department of Medicine, University of Alberta

All BMJ research papers should share their analytic code

All rapid responses

Re: All BMJ research papers should share their analytic code

Article alerts

Log in or register:

Download this article to citation manager

Help

Forward this page

Content links

About us

Resources

Explore BMJ

My account

Information

Search form

All BMJ research papers should share their analytic code

All rapid responses

Re: All BMJ research papers should share their analytic code

Article alerts

Log in or register:

Download this article to citation manager

Help

Forward this page

Content links

About us

Resources

Explore BMJ

My account

Information