Rapid Responses to:

PAPERS:
Sanaa Al-Marzouki, Stephen Evans, Tom Marshall, and Ian Roberts
Are these data real? Statistical methods for the detection of data fabrication in clinical trials
BMJ 2005; 331: 267-270 [Abstract] [Full text]
*Rapid Responses: Submit a response to this article

Rapid Responses published:

[Read Rapid Response] Detecting fraudulent or tampered data
John P Coffey   (1 August 2005)
[Read Rapid Response] Checking data and assumptions
R Allan Reese   (1 August 2005)
[Read Rapid Response] What if a computer were used?
Neville W Goodman   (4 August 2005)
[Read Rapid Response] Trial Data
Mark R Daley, Bruce H. Graham   (4 August 2005)
[Read Rapid Response] Single Blind is Double Blind
John M. Williams   (5 August 2005)
[Read Rapid Response] Researcher as a source of uncontrollable bias
Sviatoslav L. Plavinski   (5 August 2005)
[Read Rapid Response] Response on Benford's Law
Stephen J Evans   (7 August 2005)
[Read Rapid Response] Re: Response on Benford's Law
Dr Richard J C Brown   (8 August 2005)
[Read Rapid Response] Re: Single Blind is Double Blind
Nuri Schwarz   (9 August 2005)
[Read Rapid Response] Benford's Law [very large data set number-analysis] not needed in approaches that work
Eddie Vos   (9 August 2005)
[Read Rapid Response] Single Blind is Double Blind, Revisited
John M. Williams   (11 August 2005)
[Read Rapid Response] Wider application of these methods
Irene M Stratton, Oxford Centre fro Diabetes, Endocrinology and Metabolism, Churchill Hospital, Headington, Oxford OX3 7LJ   (12 August 2005)
[Read Rapid Response] What is a single blind trial?
Douglas G Altman   (12 August 2005)
[Read Rapid Response] Letter to the British Medical Journal (BMJ)
David Wolfson, Nora Bohossian   (24 August 2005)

Detecting fraudulent or tampered data 1 August 2005
 Next Rapid Response Top
John P Coffey,
Consultant nuclear medicine physician
Royal Preston Hospital. Fulwood, Preston,PR2 9HT

Send response to journal:
Re: Detecting fraudulent or tampered data

A simple method for assessing raw medical data for fraudulent or concocted data is by using Benford's Law. Benford's Law is a phenomenological law stating that in series of figures or statistical tables the frequency of the number 1 as the first significant digit occurs more often than the 10% that might be predicted. This law is also known as the first digit phenomenon and applies principally though not exclusively to dimensionless or scale invariant data in which the numerical value of data depends on the units. The law largely applies to data obtained in a semi-random manner and is sensitive to deliberate tampering or biasing of data after which the profile of the frequency of the first digits of numbers obtained will differ significantly from that predicted. For this reason its value in financial audit and other areas is being evaluated. To date its application in assessing numerical data from medical imaging has not been reported. Benford's Law (1) applies to numbers drawn from a wide range of sources and is not restricted to scale invariate data. Demonstrating this requires sophisticated and thorough investigations of central limit like theorems and the mantissa of random variables. With an increase in the number of variables the density function approaches the logarithmic distribution.

If the law applies to data that are scale invariate or that are not dimensionless then if a probability distribution exists for these data it must be invariable over any change of scale. Thus P ( kx ) = f ( k )* P ( x ) If

( P ( x ) dx = 1 , then ( P ( kx )dx = 1/k and normalisation implies f (k ) = 1/k. Differentiating with respect to k and setting k=1 gives x P' (x ) = -P ( x ) Although this is not a proper probability distribution since it diverges, the laws of physics and convention impose cut-offs. If many powers of 10 lie between the cut-offs then the probability that the first significant digit in base 10 is D is given by the logarithmic distribution; ln (D+1/D) / ln 10

= ln (D+1)- ln (D)/ ln 10

= Log (1+1/D)

. The principle of logarithmically distributed significant digits in various scientific calculations is well known and widely exploited. An extensive range of diverse data sets does obey this logarithmic distribution for significant digits and a considerable amount of empirical evidence supports the use of Benford's law. While many data sets do not follow the distribution, combination of data tables tend to conform more closely to the logarithmic distribution. Hill (2,3) demonstrated in his theorem that random samples taken from random selection of distributions will conform to the logarithmic law even though the initial distributions themselves do not. This theorem also implies that numerous sampling methods from random distributions will show a trend towards the same logarithmic distribution.

However as the method is also base invariable, the same distribution of first digits should be obtained following conversion of the data to bases other than base 10. Similarly, probability distributions can be calculated for the frequency of the second or other digits. These tend to occur with more uniform frequency, e.g. approaching 10% for sixth order significant digits. The value of this method is that tampering or external interference with data acquisition is readily detectable and leads to abnormal skewing or unexpected increased frequency of first digits other than 1. Biased sampling techniques or concocted data tend to show a predominance of the first significant digit 6 with far fewer numbers starting with the digit 1. It may provide a convenient means for medical statisticians to assess raw data. Discussions on the desirability of publishing raw data used in medical research studies have largely centred on the difficulties involved. Publishing data on a journal website is feasible however, as is the inclusion of data supplements in material sent to reviewers. In systemic reviews in particular, data from the primary studies should be available to readers of the review for analyses to be checked and where required, investigated. Benford's law may have an application in these circumstances in assessment of numerical data at a basic level. (5,6,7) Further potential uses of the significant digit law might be in the testing of computer generated mathematical models. This could have applications in analysis of predicted data from models describing tracer kinetics or compartmental distribution of tracers or contrast medium. "Goodness of Fit" tests to predicted frequency of digits have been devised and applied in audit of accounting data. (4) Application of Benford's law therefore, may be helpful to clinical audit departments and medical statisticians in assessment of bias in data acquisition in medical audit or research.

References;

1. Benford F. The law of anomalous numbers Proceedings of the American Philosophical Society 1938: 78; 551-572.

2. Hill T. A statisitical derivation of the significant digit law. Statistical Science 1996: 10; 354-313.

3. Hill T. P. The first digit phenomenon. American Scientist 1998: 86; 358-363

4. Nigrini M, A Taxpayer compliance application of Benford's law. Journal of the American Taxation Association 1996: 18; 72-91.

5. Hutchon D.J.R. Infopoints: Publishing raw data and real time statistical analysis on e- journals. British Medical Journal 2001; 322:530.

6. Eysenbach G., Sa E-R., Code of conduct is needed for publishing raw data. British Medical Journal 2001; 323:166.

7. Information for authors. Clinical Chemistry; www.aacc.org/ccj/infoauth/stm

Competing interests: None declared

Checking data and assumptions 1 August 2005
Previous Rapid Response Next Rapid Response Top
R Allan Reese,
Senior statistician, Centre for Environment, Fisheries and Aquaculture Sciences
Cefas, Weymouth DT4 8UB

Send response to journal:
Re: Checking data and assumptions

The paper states that, "In a randomised trial, the data at baseline should be similar in the randomised groups. ... This is the reason why, in general, tests for statistical significance are not conducted at baseline in genuine trials." This is a common, but in my opinion misguided, attitude. The key word is "should", and a test is always appropriate to test that assumption.

The question arises whether tests on baselines should be reported or published. It seems useful to report that the tests were carried out and either were not significant, confirming the randomisation as successful, or revealed features in the baseline measures that needed explanation or adjustment in the analysis. An obvious example would be if outliers (anomalous observations) were found. Except for explaining retrospective adjustment to the analysis, it should not be necessary to report baseline tests in detail. Tables such as Table 1 in the paper are rarely useful to the reader, in that they do not report "Results" of the study. Baseline values may appear within the results, since these will presumably show changes and need qualifying as absolute or relative.

Something implicit in the paper is the pressure to find "significant" results to justify publication. Since most studies are justified to funders or ethics committees on the expectation of finding "significant" results, it is often more significant (in the English sense) when expectations are not met. It would be a service to science if researchers were encouraged to report, very briefly, all studies on a basis of (1) what we expected (2) what we found (3) why [not].

Finally, it is regretable that Al-Marzouki's analysis is reported with no indication that right of reply has been offered to the impugned researcher.

Competing interests: None declared

What if a computer were used? 4 August 2005
Previous Rapid Response Next Rapid Response Top
Neville W Goodman,
Consultant Anaesthetist
Southmead Hospital, Bristol, BS10 5NB

Send response to journal:
Re: What if a computer were used?

What will happen if, from now on, fraudulent authors use computers to generate their fraudulent data?

Competing interests: None declared

Trial Data 4 August 2005
Previous Rapid Response Next Rapid Response Top
Mark R Daley,
Intensivist
Intensive Care Unit, Blacktown Hospital, Blacktown NSW 2148 Australia,
Bruce H. Graham

Send response to journal:
Re: Trial Data

We read with great interest your articles regarding possible data fabrication. The paper by Al-Marzouki, Evans, Marshall and Roberts (1) illustrates the difficulties faced trying to “prove” data is falsified. A small point regarding the paper is the random selection of patients from five centres as the reference trial, compared to patients from a single centre. It may be argued that the variance of baseline data from the single centre trial would be expected to be smaller than that from a multi -centre trial.

Of more overall concern is the number of large multi-centre trials now performed where even the primary authors do not see the original data. Data is supplied centrally already in electronic form, possibly via intermediaries with variable financial interest. Should authors state a (minimum) percentage of raw data and patient files they specifically assessed in their quality assurance of the data?

1. Al-Marzouki S, Evans S, Marshall T, Roberts I. Are these data real? Statistical methods for the detection of data fabrication in clinical trials BMJ 2005;331:267-270.

Competing interests: None declared

Single Blind is Double Blind 5 August 2005
Previous Rapid Response Next Rapid Response Top
John M. Williams,
Business Owner
Markanix Co., P. O. Box 2697, Redwood City, CA 94064

Send response to journal:
Re: Single Blind is Double Blind

There are apparent problems with the Al-Marzouki, et al analysis as well as with the study being criticized.

For one thing, Al-Marzouki et al adopt the criticized-study's term, "single-blind", in their final-digit analysis.

However, the criticism uses a DOUBLE-BLIND definition to criticize what was reported as a "single-blind" experiment.

By standard definition, an unblinded study is one in which the experimenter(s) and the subjects (clinical subjects) are aware of the conditions as they are administered.

In a single-blind study, the subjects are unaware of the conditions as they are administered, but the persons administering the trials are aware of them.

In a double-blind study, both the persons administering the trials and the subjects are unaware of the conditions during the trials. For example, see Bowman, et al, "Textbook of Pharmacology" (6th ed.), Chapter 20, "Clinical trial of new drugs".

Thus, apparently, Al-Marzouki et al applied a double-blind criterion incorrectly. If the criticized study indeed was single-blind, then the experimenters were aware of the conditions during trials, and some bias of the data should be expected. Thus, at least some of the statistical inference drawn by Al-Marzouki et al was not meaningful. A Bayesian analysis might have thrown more light on the data than the methods actually reported.

This doesn't mean that knowing falsification did not occur. But, it is to say that there are serious risks in applying statistics to draw conclusions about specific occurrences -- it's the same problem as with circumstantial evidence in court, or with epidemiology in general. One always should err on the side of assuming improbable statistics rather than dishonesty. The statistics should raise suspicion, but dishonesty should be inferred only because of more direct evidence.

Competing interests: None declared

Researcher as a source of uncontrollable bias 5 August 2005
Previous Rapid Response Next Rapid Response Top
Sviatoslav L. Plavinski,
Dean, College of Public Health
Medical Academy for Postgraduate Studies, 191015, St.Petersburg, Russia

Send response to journal:
Re: Researcher as a source of uncontrollable bias

The problems raised by BMJ (1,2) are very important for medical research community. Unfortunately, there is no simple solution to the problem of fraudulent or tampered data. The paper by Al-Marzouki, Evans, Marshall and Roberts (1) uses statistical test to compare variances for baseline variables in two groups of randomized controlled trials (RCT). Their implicit hypothesis (which seemed to be true) that if researcher tampers with data he does not know that variance should be approximately the same and generates all his data by hand.

Unfortunately only statistically illiterate researcher will do that, especially for publication in high-impact journal. If researcher lacks ethical constraint he will simply decide on necessary mean values, take variance from published sources or small sample of patients then run existing in all statistical packages function to generate, for example, normally distributed data with given mean and variance. Then he can project what result he would like to get and repeat process for made-up follow-up data. He can even use non-normally distributed data, mixed distribution, etc. There is almost no way to uncover such fraud, except collecting on-site evidences that investigation was not performed. Such investigation is legally difficult and very expensive, especially in computer era, as most of data are accumulated in electronic - easy to tamper with - form. All solutions (like registration of all controlled trials with possibility of sudden on-site inspection, collection of data with time-stamps in third place, etc.) will be very costly and eventually will harm mostly innocent researchers due to rise in research expenditures.

With future of statistical fraud control bleak, more discussion should be directed to possibilities to decrease incentives connected with fraud. Impact of research fraud would be lessened if consumers of scientific information will remember about and demand reproducibility. This, in turn, should influence decisions of Institutional Review Boards (IRB) on repeating RCTs. It is now considered unethical to repeat RCT if previous one showed one treatment superior. IRB in different institution should allow repetition of RCT if controversial treatment was used in a previous RCT or body of other knowledge does not support results of previous RCT.

Unfortunately, possibility of scientific fraud is relatively high. In a recent survey (3) 0.3% of US scientists funded by NIH confessed that they had falsified or ‘cooked’ research data and almost every seventh (15.3%) indicated that they dropped observations or data points from analysis based on a gut feeling.

It is now probably the time to consider researcher as a source of possible bias that is not controlled by use of randomization and use for decision-making regarding RCT the same causation criteria of strength, consistency, specificity, relationship in time, biological gradient, biological plausibility, coherence of evidence, experiment and analogy that were put forward by Sir Austin Bradford Hill and are standard for assessing causation in non-randomized studies.

1. Al-Marzouki S, Evans S, Marshall T, Roberts I. Are these data real? Statistical methods for the detection of data fabrication in clinical trials BMJ 2005;331:267-270.

2. White C., Suspected research fraud: difficulties of getting at the truth. BMJ 2005;331:281-288

3. Martinson BC, Anderson MS, de Vries R. Scientists behaving badly. Nature 2005; 435; 737-738

Competing interests: None declared

Response on Benford's Law 7 August 2005
Previous Rapid Response Next Rapid Response Top
Stephen J Evans,
Prof of Pharmacoepidemiology
LSHTM, London WC1E 7HT

Send response to journal:
Re: Response on Benford's Law

 

It is my experience that Benford’s law is of very limited use in the detection of fabrication or falsification in medical research data. It is of unquestioned value in financial fraud, including health claims data, but research data are of a different nature. For example systolic blood pressures in a very large number of patients in a typical randomised trial may all have the first digit as 1. Their cholesterol values may have no first digits of 1 and almost certainly none of 2. These are clear departures from Benford’s law but are definitely not examples of fabrication of falsification.

 

If all the variables are taken together then still the pattern does not conform to Benford’s law. In neither of the trials studied in our paper do any of the variables taken singly show even a remote fit to the distribution suggested by Benford. Taken together the 5 variables in common between the trials do not show such a fit, so that lack of fit to Benford’s law is no evidence of fabrication or falsification in this situation.

 

For example from the MRC trial, the pattern for the five variables considered is shown in the table;

 

First digit

Frequency

%

Benford’s %

1

1,774

42.34

30.1

2

131

3.13

17.6

3

4

0.1

12.5

4

87

2.08

9.7

5

333

7.95

7.9

6

537

12.82

6.7

7

547

13.05

5.8

8

431

10.29

5.1

9

346

8.26

4.6

 

 

The requirement is that the data must have a range that covers at least two orders of magnitude. This often applies in financial data but only rarely, if ever, in medical research data. If a very large number of variables were taken together then the fit will be rather better, but the problem is that the selection of variables can affect the distribution of first digit even when many are chosen. The argument can always be that not enough variables have been considered for Benford’s law to be applicable.

 

Incidentally, as with many such things, Newcomb, not Benford, was the original discoverer of the law.

 

Competing interests: None declared

Re: Response on Benford's Law 8 August 2005
Previous Rapid Response Next Rapid Response Top
Dr Richard J C Brown,
Principal Research Scientist
National Physical Laboratory

Send response to journal:
Re: Re: Response on Benford's Law

I read with interest the above discussion of Benford's Law as a route to assessing fraud in clinical data. Whilst most data from clinical trials may not be suitable for assessment by Benford's Law (as Prof Evans notes), readers may be interested to know that a paper has recently been published showing that Benford's Law can be used to screen other types of analytical data - http://www.rsc.org/publishing/journals/AN/article.asp?doi=b504462f . Benford's Law has the great advantage over other statistical techniques, involving the mean and standard deviation, that one has prior knowledge of what answer to expect from the test, i.e the Benford distribution is a property of data in general, whilst other statistical distribution are properties of the particular data set. However for Benford’s Law to be useful as a screening technique the data being examined must, as a rule of thumb, span at least four orders of magnitude; a criterion which I suspect would not be met for most sets of clinical data?

Competing interests: None declared

Re: Single Blind is Double Blind 9 August 2005
Previous Rapid Response Next Rapid Response Top
Nuri Schwarz,
Biology Research
SMC Beer Sheva 84000

Send response to journal:
Re: Re: Single Blind is Double Blind

This criticizm of the Al-Marzouki et al analysis stems from a misunderstanding of the definition for a single-blind study.

Mr. Williams "standard definitions" are taken from a textbook of Pharmacology which gives the definitions for Pharmacological studies. These definitions are correct for an open (unblinded) study (- the experimenters and the subjects are aware of the conditions), and for a double-blind study (- both the experimenters and the subjects are unaware of the conditions). However, the definition for a single-blind study is not the full scientific definition.

A single blind study is one in which EITHER the experimenter OR the subjects are unaware of the conditions!

In Pharmacological single blind trials (such as clinical trials of new drugs) it is always the subjects who are "blind", since some recieve a placebo. However, in diet studies (such as the one Al-Marzouki et al analysed), the subjects always know what they are eating, so in this case a single blind study means that the experimenters doing the various measurements do not know to which group (intevention diet or control) the person they are checking belongs. Thus, no bias of the data should have been expected.

In any case, even if the trial was known to be unblinded, this would still not explain the two other statistical tests. It is the combination of the differences in means, variances, and digit preference which strengthens the conclusion that data fabrication took place in the diet trial.

Competing interests: None declared

Benford's Law [very large data set number-analysis] not needed in approaches that work 9 August 2005
Previous Rapid Response Next Rapid Response Top
Eddie Vos,
maintains health-heart.org
Sutton (Qc) Canada J0E 2K0

Send response to journal:
Re: Benford's Law [very large data set number-analysis] not needed in approaches that work

Let not this debate become lost in statistical complexities and keep our focus on results.

The authors compared 2 trials, one wonderfully well done, the MRC trial [MEDLINE 2861880] where in 85,572 years of observation with beta-blocker or diuretic in mildly hypertensive patients [diastolic 90-109 mm Hg], the difference in deaths at trial end vs. placebo was 5, one death per roughly 8500 years of drug use. Everything, even in hind sight, was perfect and that should have been the end of 2 drugs if mortality is an endpoint in such patients. But was it?

The later trial of 1992 [MEDLINE 1586782] lacked statistical rigor but it was an intriguing study in only 406 patients with suspected acute myocardial infarction in which 17 (44%) fewer patients died on the intervention diet. There are questions but when many drugs don't save lives even in the best run mega-trials, one should repeat the trials that do find mortality benefit. Paraphrasing Harvard's Dr. Alexander Leaf in 1999 Circ. concerning an equally surprising diet trial with similar benefits, the Lyon Diet Heart Study: "first let's find an effect and then figure out what caused it".

When a prevention approach actually produces results in the time frame of a single human being, others should replicate such trials especially if it takes patient numbers only in the hundreds. This should have been the value of BMJ publishing this kind of study, preferably with an editorial, in the first place. vos{at}health-heart.org

Competing interests: None declared

Single Blind is Double Blind, Revisited 11 August 2005
Previous Rapid Response Next Rapid Response Top
John M. Williams,
business owner
Markanix Co., P. O. Box 2697, Redwood City, CA 94064

Send response to journal:
Re: Single Blind is Double Blind, Revisited

Mr. Schwarz brings up the idea of a "single-blind" study as one in which EITHER the human subject(s) or the experimenter(s) making the clinical measurements are unaware of the trial conditions at the time of measurement.

If indeed this was the meaning intended both in the criticized diet study and the criticism of it, then the term may not have been misused by Al-Marzouki et al, depending on which meaning was intended in the diet study.

But, this raises other questions: If the subjects know the conditions, why blind the experimenters? What use could this serve? The experimenters merely could ask the subjects during the trials some question revealing the group assigned. "How have you been feeling since going on the fruit diet? Step onto the scale." If Mr. Schwarz's definition was intended, then unblinded bias again would be as likely as fraud as an explanation of some of the bias found in the statistical averages.

Measurements of weight or heart rate are objective, so the Schwarz use of "single-blind" would seem intended primarily to prevent intentional falsification rather than unconscious bias. But, as just pointed out, an INTENTION to fabricate would sidestep this kind of "single-blind" easily.

If this is a real difference in usage, it might be worthwhile for BMJ or its interested readers to try to come to agreement on terminology on this issue. Mr. Schwarz cites no reference for his definition of "single -blind"; but, it would have been useful to be able to examine this question in more detail.

It would appear that the usage proposed by Mr. Schwarz is ambiguous and thus liable to be misused. What about a diet study in which the food looks the same to the subjects, or an exercise study in which the subjects can tell what they are doing, but do not know otherwise the purpose of the study? This reverses the meaning of "single-blind" (experimenter vs. subject) with no hint of this reversal to the reader.

I tried a search for "Single Blind" on Yahoo and on Google. Only one of the first few dozen pages returned agreed with Mr. Schwarz's definition (http://www.medterms.com). The NCI and other sites, which could be considered authoritative, all unequivocally define "single-blind" the way it is used in the reference I cited previously. Unfortunately, my old copy of the Merck Manual does not address experimentation of this kind.

It is unclear to me whether Mr. Schwarz should have referred to his definition as "scientific". It would be useful to know the context in which Mr. Schwarz's definition has been used in science, and by whom (other than the apparently erroneous usage by Al-Marzouki et al pointed out in my previous posting).

Competing interests: None declared

Wider application of these methods 12 August 2005
Previous Rapid Response Next Rapid Response Top
Irene M Stratton,
Senior Medical Statistician, statistical advisor to Diabetic Medicine
Diabetes Trials Unit,
Oxford Centre fro Diabetes, Endocrinology and Metabolism, Churchill Hospital, Headington, Oxford OX3 7LJ

Send response to journal:
Re: Wider application of these methods

The methodologies suggested in this paper have further applications in the reviewing process to check for data errors that may have occurred not only through fraud but also through faulty data entry - I found a very large s.d. in BMI field for one arm of a clinical trial in a recent paper under review and this was shown to be due to a misplaced decimal point - the BMI should have been ~ 30 and was ~ 300, as the decimal point had been omitted in the weight field.

I am not sure that comparing the s.d. might always be the correct comparison - in the example in this paper there is a phase shift between the two trials, as one trial had hypertensive patients only. In this context coefficient of variation (c.v.) may be a better measure of dispersion.

Data from other papers about the same disease group might serve as better comparators than data of the same type from a general population. Consequently as I review it may prove to be useful to maintain a database of the mean and s.d. of key variables, for me this would be in populations of subjects with diabetes.

Finding outliers when comparing s.d.'s and means might also mean that the data entry (as in my example above) was singly punched late at night by a junior researcher with no validation checks - double entry data systems with validation should be required by review and ethics committees.

Competing interests: None declared

What is a single blind trial? 12 August 2005
Previous Rapid Response Next Rapid Response Top
Douglas G Altman,
Professor of Statistics in Medicine
Centre for Statistics in Medicine, Oxford OX2 6UD

Send response to journal:
Re: What is a single blind trial?

There is no official definition of many terms used in randomised trials, including double blind, single blind, intention to treat, and so on. The term randomised does have precise technical meaning but it is often misused. Labels are valuable only if they have a unique meaning and are only used in the correct way.

John Williams queries the definition of a single blind trial. One publication in the BMJ states that in a single blind trial “either only the investigator or only the patient is blind to the allocation”.[1] The term is thus unhelpful without clarification. Double blind trials are just as confusing as single blind trials. A survey of physicians and a review of textbooks and reports revealed numerous interpretations of the designation “double-blind.”[2] Of key importance in both single and double blind trials is whether the outcome assessor is blinded.

Hence the CONSORT Statement avoids labels and asks for specific information: “Whether or not participants, those administering the interventions, and those assessing the outcomes were blinded to group assignment. If done, how the success of blinding was evaluated.”[3]

Likewise, in an article in which we tried to clarify the various “terminological tangles” associated with blinding, we wrote: “we urge that authors explicitly state what steps were taken to keep whom blinded. If they choose to use terminology such as single-, double-, or triple- blinding in reporting randomized controlled trials, they should explicitly define those terms.”[4]

Arguments about the correct meaning of “single-blind” are pointless.

1 Day SJ, Altman DG. Blinding in clinical trials and other studies. BMJ 2000;321:504.

2 Devereaux PJ, Manns BJ, Ghali WA, Quan H, Lacchetti C, Montori VM, et al. Physician interpretations and textbook definitions of blinding terminology in randomized controlled trials. JAMA 2001;285:2000-3.

3 Moher D, Schulz KF, Altman D for the CONSORT Group. The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomized trials. JAMA 2001;285:1987-91. [see also www.consort-statement.com]

4 Schulz KF, Chalmers I, Altman DG. The landscape and lexicon of blinding in randomized trials. Ann Intern Med 2002;136:254-9.

Competing interests: None declared

Letter to the British Medical Journal (BMJ) 24 August 2005
Previous Rapid Response  Top
David Wolfson,
Professor
Department of Mathematics and Statistics, McGill University, Montreal, Quebec, Canada H3A 2K6,
Nora Bohossian

Send response to journal:
Re: Letter to the British Medical Journal (BMJ)

The authors use "conventional statistical significance tests" to compare the baseline characteristics of the two randomised groups, treating the two trials separately. The use of tests such as the t-test or F-test is questionable here.

There is no indication that the initial diet-trial group of subjects from which the two subgroups were randomly selected was itself selected from a well-defined population. Therefore, it is reasonable to assume that the implied model is what Lehmann [1, page 5] calls a "Randomization Model." In this model, there are no populations, and, consequently, there are no population means or variances and tests about such parameters are meaningless.

Further, since the only source of randomness, in the diet and other similar trials, is through the purported randomization, the only purpose of testing for baseline covariate balance, is to check whether there has truly been random allocation to the two groups. As the randomization is actually in question in the diet trial, the use of a formal tests is justified here.

When there is no underlying population (distribution), one is forced to use distribution-free procedures with vague hypotheses.[see Lehmann [1], pages 22-24 and 31-32] The issue is not one of robustness against departures from Normality but one of the nonexistence of an underlying distribution.

References:

[1] Lehmann EL, D'Abrera HJM. Nonparametrics: Statistical Methods Based on Ranks. Holden-Day, 1975.

Competing interests: None declared