BMJ 2001;322:530 ( 3 March )

Information in practice

Infopoints

Publishing raw data and real time statistical analysis on e-journals

Authors of medical publications rarely provide their readers with the full raw data from their work but provide only the summarised statistical analysis. Indeed, publishing the raw data in a paper journal would usually be impractical and of little help to readers as transcription from the printed paper to a computer for further analysis would be laborious and prone to transcription errors. Without raw data, however, peer reviewers are unable to check the statistical analysis, and further work on the data by others is not possible.

I demonstrate a method of including the raw data within a web version of an audit project that includes real time data analysis (see details below). The raw data for this paper amounts to only 1526 data items, but even this much data could not normally be included in a paper journal. The internet and most modern computers can cope with much larger datasets.

In the demonstration version I have included software to provide the database for readers to view. From here the data can easily be copied and pasted into another application. The data can also be easily viewed within the HTML code with any browser such as Internet Explorer that allows users to view source code. The statistical analysis is carried out with JavaScript within the browser software, and all the algorithms are available for inspection by readers within the HTML code if desired.

The demonstration paper is a simple audit cycle, but any publication involving a considerable amount of raw data could be published in this form with considerable advantage. Potential advantages of providing raw data and statistical software within the web version of a published paper include

  • Raw data remain available in the foreseeable future for other workers to analyse further
  • The data can be easily copied into other applications, making analysis by others a practical proposal
  • The data are available for effective meta-analysis
  • The statistical analysis is available to be checked by peer reviewers and readers
  • Internet publication has in practical terms unlimited capacity for data storage
  • Most journals will support a web version in the next few years.

Some of the advantages of electronic publishing have been realised with the launch of web versions of major journals such as the BMJ and Lancet. The practical limitations of sharing large amounts of data have been overcome with internet technology. Presently, raw data from most research are likely to be filed away or lost in the depths of a hard disk once the paper is published.

If raw data were published with the original paper they would remain available, with appropriate permission and acknowledgement, to other workers in the specialty. Furthermore, if the data were published within the electronic version of a paper they could not become separated or lost as they would be an integral part of the paper. Meta-analysis of published evidence would be more effectively combined if the raw data were available. Also readers could easily add to or alter the database and rerun the statistical analysis in the knowledge that the analysis would be identical with that performed in the published article.

David J R Hutchon, consultant obstetrician and gynaecologist

Memorial Hospital, Darlington, County Durham DL3 6HX

Footnotes

   The demonstration paper, "A complete audit cycle of ultrasound estimation of the date of delivery," is available at www.hutchon.freeserve.co.uk/demo.htm

Competing interests: None declared.


© BMJ 2001

Related Articles

Authors should make their data available
Douglas G Altman and Christopher Cates
BMJ 2001 323: 1069. [Extract] [Full Text]

Code of conduct is needed for publishing raw data
Gunther Eysenbach and Eun-Ryoung Sa
BMJ 2001 323: 166. [Extract] [Full Text]

This article has been cited by other articles:

  • Williamson, P R, Gamble, C, Altman, D G, Hutton, J L (2005). Outcome selection bias in meta-analysis. Stat Methods Med Res 14: 515-524 [Abstract]  
  • Riley, R. D., Heney, D., Jones, D. R., Sutton, A. J., Lambert, P. C., Abrams, K. R., Young, B., Wailoo, A. J., Burchill, S. A. (2004). A Systematic Review of Molecular and Biological Tumor Markers in Neuroblastoma. Clin. Cancer Res. 10: 4-12 [Abstract] [Full text]  
  • Kirkwood, J. M., Ibrahim, J., Sondak, V. K., Ernstoff, M. S., Flaherty, L., Haluska, F. J., Lens, M.B., Dawes, M. (2002). Use and Abuse of Statistics in Evidence-Based Medicine. JCO 20: 4122-4124 [Full text]  
  • Altman, D. G, Cates, C. (2001). Authors should make their data available. BMJ 323: 1069-1069 [Full text]  

Rapid Responses:

Read all Rapid Responses

Open source in biomedical publishing
G Eysenbach
bmj.com, 3 Mar 2001 [Full text]
Open source in biomedical publishing
D J R Hutchon
bmj.com, 4 Mar 2001 [Full text]
Publishing Raw Categorical Data in Full
Vance W Berger
bmj.com, 5 Mar 2001 [Full text]



Student BMJ

Risk of surgery for inflammatory bowel disease: record linkage studies

What can you learn from this BMJ paper? Read Leanne Tite's Paper+

www.student.bmj.com

Listen to the latest BMJ Interview