Preparing raw clinical data for publication: guidance for journal editors, authors, and peer reviewers
BMJ 2010; 340 doi: https://doi.org/10.1136/bmj.c181 (Published 29 January 2010) Cite this as: BMJ 2010;340:c181All rapid responses
Rapid responses are electronic comments to the editor. They enable our users to debate issues raised in articles published on bmj.com. A rapid response is first posted online. If you need the URL (web address) of an individual response, simply click on the response headline and copy the URL from the browser window. A proportion of responses will, after editing, be published online and in the print journal as letters, which are indexed in PubMed. Rapid responses are not indexed in PubMed and they are not journal articles. The BMJ reserves the right to remove responses which are being wilfully misrepresented as published articles or when it is brought to our attention that a response spreads misinformation.
From March 2022, the word limit for rapid responses will be 600 words not including references and author details. We will no longer post responses that exceed this limit.
The word limit for letters selected from posted responses remains 300 words.
The need for proper data management in clinical research cannot be
overemphasized. As a reviewer and a consultant, I have often observed good
studies getting bogged down during preparation of data files and datasets
for analysis and sharing. Although spreadsheets and text files are
suitable formats for this purpose, they create problems if the data entry
person has used the spreadsheets as elaborate manual master charts,
without sticking to the basic structure of "a row for a record, a column
for a field or variable, and the first row for field names or variable
names". Because of this, I have found it useful to persuade and encourage
researchers to use the free software Epi Info (CDC, Atlanta, USA) as a
data management and analysis tool. It functions like humans: starting with
forms, forming tables in the background, and allowing storage and display
of variable specifications. Being a relational database system, it allows
data to be recorded in functionally related forms that can be related or
linked as necessary. These data can then be exported either as a
spreadsheet or as a text file. Besides, the program can read or import
data from a variety of databases because of open database connectivity. I
think its use for data management can considerably facilitate proper
preparation, maintenance, and sharing of raw data.
Competing interests:
None declared
Competing interests: No competing interests
Sharing raw data: (another) idea of Francis Galton
Francis Galton wrote the following in 1901, in the first issue of
Biometrika (1):
"I have begun to think that no one ought to publish biometric
results, without lodging a well arranged and well bound manuscript copy of
all his data, in some place where it should be accessible, under
reasonable restrictions, to those who desire to verify his work."
Biometrika aimed to publish raw data as a precaution against
incompetent analysis. An anonymous editorial in the same issue (presumably
written by the journal's editors, WFR Weldon, K Pearson, and CB Davenport)
saw a key role for the journal in allowing contributions of statisticians
to solving biological problems (2):
"We shall publish careful biometric observations, even if they be
accompanied by only the most elementary statistical treatment; we shall
look forward to our mathematical workers supplementing such fundamental
observations by more elaborate statistical calculations. For this reason
we shall not only print as copious observational and experimental data as
possible, but endeavour to form a manuscript collection of such data
available for further research. We hope that every number of Biometrika
will present statistical material ready for the mathematician to calculate
and to reason upon. All such investigations ancillary to data appearing in
our pages we shall receive gladly and publish at the earliest
opportunity."
The founders of Biometrika hoped that biologists would also do their
share, by providing meaningful context to the data analysts. They wanted
to establish a fast mechanism for collaboration - something that is much
easier to achieve today online. Possibly, this is the direction for data
repositories: make it a two-way street. If a researcher makes her data and
analysis details accessible, then anyone who reanalyzes the data should
post their detailed methods and results in the same location.
1. Galton F. Biometry. Biometrika 1901;1:7-10
URL: http://www.jstor.org/stable/2331669 Accessed: 19/07/2010
2. Anonymous. The Spirit of Biometrika. Biometrika 1901;1:3-6
URL: http://www.jstor.org/stable/2331668 Accessed: 19/07/2010
Competing interests: No competing interests