Intended for healthcare professionals

Letters Sharing raw data

All BMJ research papers should share their analytic code

BMJ 2016; 352 doi: https://doi.org/10.1136/bmj.i886 (Published 18 February 2016) Cite this as: BMJ 2016;352:i886

Re: All BMJ research papers should share their analytic code

I should preface my response by saying I support the AllTrials campaign and have been following Dr Goldacre’s assessment of “outcome switching”. I feel obliged however to express my concerns about this most recent suggestion i.e. the obligatory sharing of code.

Programmers already share their code through journals, conferences etc. thus some of the benefits described already exist, and if i understand things correctly the industry (e.g. Phuse) is developing standard programs as the regulatory environment imposes much sameness and repetition of output. Regarding the reuse of code, the program must be flexible and user-friendly and that can take many more hours of the programmer’s time; it is one thing to have a working program and another to have reusable code. Code shared in journals etc. is not idiosyncratic or study-specific, it is cleaned, described in detail and illustrated with an example. This is quite different from dumping lengthy code in a supplementary document to support an analysis in the BMJ. Almost all of the code would be uninteresting since well designed trials do not demand complex analyses and it would merely articulate in SAS language the assumptions that are stated in the article. And in the cases where the code is interesting, even seasoned programmers may find it difficult to read e.g. if there are copious macros or the style is unusual or different languages have been used (industry statisticians like SAS, academics like R). Anyone involved in the process of validating code will know that hundreds of queries are raised across the set of programs and that almost all of these will ultimately be dismissed by the author of the programs (owing to their more intimate knowledge of the data and code). This implies that making code available will likely elicit uninformed assertions regarding errors, or on the other hand, reusing code without understanding it. Consider the case, years after an analysis has been programmed, when a higher-up believes they’ve spotted an error in the numbers and the programmer is obliged to placate them; when confronted with their own code the programmer, in the absence of thorough documentation of certain minor decisions that were taken at the time, will begin to question their own work. Thus what hope is there for a researcher skimming code that was attached to a BMJ article? I personally believe coding errors are far more rife than we’d like to think. But what is needed is truly independent validation which implies a dogged refusal to view the code, and this returns us to the issue of sharing data and away from this new issue of sharing code. Companies usually refer to several forms of validation with independent coding as the most rigorous and ‘code review’ as the most unreliable, and it is the former which is needed and the latter which BMJ readers would be performing (only the former will reveal assumptions embedded deep in the code).

Incidentally, I have written this too quickly and I’m not primarily a programmer, thus i hope genuine programmers respond to Dr Goldacre’s suggestion. Eg, what will contract programmers think about releasing code filled with tips and tricks they’ve accumulated over decades of service and education? The programmer’s time and reputation does not ‘cost nothing’.

Competing interests: No competing interests

19 February 2016
Paul M Brown
Biostatistician
Department of Medicine, University of Alberta