Rapid Responses to:

ANALYSIS:
Catherine E Hewitt, Natasha Mitchell, and David J Torgerson
Listen to the data when results are not significant
BMJ 2008; 336: 23-25 [Full text]
*Rapid Responses: Submit a response to this article

Rapid Responses published:

[Read Rapid Response] Nature's data
David JR Hutchon   (5 January 2008)
[Read Rapid Response] Cholesterol - a case where the data was not heeded
Evan L Lloyd   (9 January 2008)
[Read Rapid Response] Re: Cholesterol - a case where the data was not heeded
Raymond G Holder   (10 January 2008)
[Read Rapid Response] About p-values and confidence intervals: habitual misconceptions.
Luis C Silva   (12 January 2008)
[Read Rapid Response] Listen to all the evidence when results are not significant
Michael C. Watson, Denise Kendrick, Carol Coupland, Deborah Futers, Jean Robinson   (16 January 2008)

Nature's data 5 January 2008
 Next Rapid Response Top
David JR Hutchon,
Consultant Obstetrician
Darlington Memorial Hospital. DL3 8QZ

Send response to journal:
Re: Nature's data

There are plenty of examples showing that trialists are rarely neutral about their research. The results of the findings of a study need to be judged on the likelihood that it is a true finding. Using Bayes principal the evidence that an intervention improves the outcome of a whole population needs to be much stronger than the evidence that it does not.

Nature does nothing uselessly.

For example it is current common practice to interfere with the normal transition from fetal to adult pattern circulation by clamping the cord immediately after birth. Why should it be necessary to carry out a randomised controlled trial in order to prove that such intervention is harmful? On principle it should only be necessary to prove that it is not beneficial and much weaker evidence is required to reach such a conclusion. There is already substantial evidence that immediate cord clamping is harmful to the neonate yet the practice continues.

When will we will start to heed data with significant results?

Competing interests: None declared

Cholesterol - a case where the data was not heeded 9 January 2008
Previous Rapid Response Next Rapid Response Top
Evan L Lloyd,
Retired
72 Belgrave Road, Edinburgh EH12 6NQ

Send response to journal:
Re: Cholesterol - a case where the data was not heeded

The paper by Hewitt, Mitchell & Torgerson (1) is fascinating and valuable, and I hope some changes will result from their analyses. They suggest that negative or nil results may not be accepted because the investigators have invested a lot of intellectual capital (and professional standing?) into the idea. There may also be pressure because businesses may have funded the study, or may see financial opportunities in a positive outcome. I would like to suggest that this scenario happened with the dietary fat/heart disease hypothesis.

During the 1970s, despite the fact that there was a large body of clinical and scientific knowledge and evidence to the contrary, there were a growing number of ‘experts’ who claimed that dietary fat produced cholesterol, which in turn caused Coronary Heart Disease (CHD), including death. To confirm the theory, long term controlled studies were set up trying to lower the dietary fat content in the study group. After 10 years, the first study MRFIT (Multiple Risk Factor Intervention Trial) reported its results in October 1982 (2). Despite the very stringent dietary restrictions (fat in the diet reduced by 25%), the cholesterol showed only a small (5%) fall, and there was no difference between the groups in the incidence of CHD or deaths. The study therefore failed to support the theory. By some strange timing, a World Health Organisation (WHO) committee of experts had produced a report in the summer of 1982 (3) calling for a fundamental change i.e. reduction of fat intake, in the Western diet. This recommendation was therefore made before the MRFIT results were available to the scientific and medical community. It seems likely that the ‘experts’ on the WHO committee were the same ‘experts’ involved in the MRFIT trial, and, being aware of the disappointing outcome of the trial, were trying to pre-empt the final decision. A similar European trial (4) also reported similar disappointing results. The proponents of the fat theory rationalised the results by saying that the members of the trial groups had not been trying hard enough and that the advice to reduce fat intake should be rolled out to everyone. I can personally remember Prof M Oliver (Professor of Cardiology at Edinburgh University) saying that, if all the effort put into the trials had not produced the desired result, there was no possibility that the public at large could be ‘persuaded’ to alter their diet further than the trial subjects.

Despite the negative evidence from these trials, a conference (5) in 1984 decided that the advice to lower cholesterol levels should be applied to all, even those with NORMAL cholesterol levels. This decision was based on the results of the Lipid Control Programme (LRCCPPT) trial (6) where the ‘successful’ outcome was announced at a press conference before the results were published. In the study (6) people who had a genetic protein abnormality which resulted in VERY HIGH cholesterol levels, and a VERY HIGH risk of cardiac death, had their cholesterol levels reduced to normal levels by chemical means The outcomes were that the risk of heart disease, and death, returned to normal levels. (An opportunity for the pharmaceutical giants?) Prof M Oliver disagreed with the conclusions of the congress and noted that the panel making the decision was ‘packed’ with supporters of the theory (7). The title of his letter (7) “Consensus or Nonsensus conference on Coronary Heart Disease” was very apt. Since then pronouncements on diet, all supporting the ‘consensus’, seem to be made by ‘panels of experts’ without reference to research findings.

However, an analysis of nine MRFIT type studies (8) showed that none had been effective, making a total of over 170,000 subjects studied with no ‘positive’ results. In a town on the South Coast of England there was a far higher intake of saturated fat, but a much lower level of CHD deaths, than in a town in the North of England (9). Also an analysis of world wide figures (10) shows that climate is a much better predictor of cholesterol levels and CHD deaths than diet. Also, though incidence of CHD deaths is now falling in many countries (11, 12), including Scotland, in all these countries the levels of fat consumption remained the same throughout the period of the rise and subsequent fall (12). Also the already low incidence of CHD deaths in Japan has been falling further despite the level of fat in the diet rising steadily (12).

The fatty diet/cholesterol/CHD thesis has produced a vast army of “worried well”, who keep going to their doctor for checks on their normal cholesterol levels. The tests cost money, personal or state, and more funds (3Bn£ per year) go to the firms who provide the cholesterol lowering drugs. On the ‘real’evidence, is this a waste of money?

It is important to remember that ‘consensus’ doesn’t always mean the stated facts are true. After all at one time there was a consensus that the world was flat. Isn’t it time this subject was reviewed dispassionately?

References

1. Hewitt C, Mitchell N, Torgerson D. Heed the data when results are not significant. BMJ 2008; 336: 23-5.

2. MRFIT Research Group. Multiple risk factor intervention trial. JAMA 1982; 248: 1465-77.

3. WHO. Prevention of Coronary Heart Disease, WHO Technical Report Series, No. 678, 1982, WHO, Geneva.

4. WHO European Collaborative Group. Multifactorial trial in the prevention of heart disease, incidence and mortality results. Eur Ht j. 1983; 4: 141-7.

5. Consensus Conference. Lowering blood cholesterol to prevent heart disease. JAMA 1985; 253:2080-7.

6. Lipid Research Clinic Programme. LRC-CPPT results. JAMA 1984; 251: 351-6.

7. Oliver M. Consensus or nonsensus conference on coronary heart disease. Lancet 1985; I: 1087.

8. Ebrahim S. Systematic review of randomised controlled trials of multiple- risk factor interventions for preventing coronary heart disease. BMJ 1997; 314: 1666-73.

9. Cade JE, Barker DJP, Margetts BM, et al. Diet and inequalities of health in three English towns. BMJ 1988; i: 1359-62.

10. Lloyd EL. The role of cold in ischaemic heart disease: a review. Public Health 1991; 105: 205-15.

11. Walker WJ. Coronary mortality – What is going on? JAMA 1974; 227: 1045-6.

12. le Fanu J. Eat Your Heart Out. The fallacy of the healthy diet. 1987: Macmillan, London. p 109.

Competing interests: None declared

Re: Cholesterol - a case where the data was not heeded 10 January 2008
Previous Rapid Response Next Rapid Response Top
Raymond G Holder,
Retired engineer
Bh9 3NF

Send response to journal:
Re: Re: Cholesterol - a case where the data was not heeded

Following on from Evan Lloyd's response, might I suggest that the public interest would best be served if it became obligatory for details of the source of funding for trials and studies, and that of their sponsoring bodies,to be clearly set out within their reports.

Perhaps a Government Health warning should then be printed at the foot, so that all might see if science or commerce was the driving force behind the work. At present it almost appears that a pre printed form would be required.

Is there still any totally independent medical research body in the country? MRC was intended to be such a body, but little is heard from it.

Competing interests: Statin damaged patient

About p-values and confidence intervals: habitual misconceptions. 12 January 2008
Previous Rapid Response Next Rapid Response Top
Luis C Silva,
Senior resarcher
National Center for Medical Science Information, Havana, CUBA

Send response to journal:
Re: About p-values and confidence intervals: habitual misconceptions.

Hewitt, Mitchell &Torgerson (1) are right when they state that “authors may claim that the non-significant result is due to lack of power rather than lack of effect”, or that “no firm conclu¬sions can be drawn because of the modest sample size”. As a matter of fact, it is quite easy to find thousands of examples of the type “we have not found significance, but with a larger sample size we probably would do” in current research reports. The main problem with such a statement is that it is true, or, to be more precise, that it is always true.

This is one of the basic deficiencies of the statistical test of hypothesis based on p-values: the magnitude of p depends on the size of the sample size. Everyone knows that, given a large enough sample size, you will be able to reject the null hypothesis. It is closely related with the fact that the null hypothesis cannot be true, because it represents only one point in infinitude of points on a line, so its probability is zero. These and several other weaknesses of conventional Null Hypothesis Significance Testing (NHST) have been pointed out repeatedly along decades (2, 3, 4, 5, 6); consequently, it is not possible to share the affirmation that “Statistical significance is important … to guide us in the interpretation of a study’s results”.

The double standard used to maintain such an incongruous, impoverished, and potentially misleading procedure becomes obvious when one note that nobody says “we have found significance, but with a smaller sample size, we probably would not have found it”, which is always true as well.

One of the explanations for the almost universal application of frequentist inference recourses is that most of the people makes a ritual use of them, thinking that they do what they actually do not: a lot of researchers think that p is a measure of the probability that the null is true [7, 8, 9] or that a 95% confidence interval of a difference contains the true effect with probability equal to 0,95. It can be remembered the well known Cohen’s citation that states: “What's wrong with NHST? Well, among many other things, it does not tell us what we want to know, and we so much want to know what we want to know that, out of desperation, we nevertheless believe that it does!” [10]. The same thing can be said about confidence intervals.

Hewitt, Mitchell &Torgerson themselves misunderstand the frequentist nature of a confidence interval. They erroneously say that the 51% confidence interval “shows where, more often than not, the true treatment estimate will lie”. The true treatment difference either is a constant that lies between the extremes of this interval or does not lies there; it is not a number that sometimes is within this range and sometimes is not. The claims that “each value within the confidence interval is not equally plausible” and “values that are close to the point estimate are more likely to correspond to the true value than estimates towards the extreme of the confidence interval” reflects a commonplace misconception. Concerning this specific interval, one only can be sure that it has been obtained using a procedure that 51% of the times would produce intervals that contain the (constant) value of the difference. Only for a probability interval (not a confidence one), obtained by means of a Bayesian approach, the quoted texts would be valid.

References

1. Hewitt C, Mitchell N, Torgerson D. Heed the data when results are not significant. BMJ 2008; 336: 23-25.

2. Bakan D (The test of significance in psychological research. Psychological Bulletin 1996; 66, 423-437.

3. Gardner MJ, Altman DG Confidence intervals rather than P values: estimation rather than hypothesis testing. BMJ 1986; 292: 746–750.

4. Hunter J E Needed: A ban on the significance test. Psychological Science 1997; 8: 3-7. 5. Goodman SN Toward evidence-based medical statistics (I): The p value fallacy. Annals of Internal Medicine 1999; 130: 995-1004.

6. Matthews RA Facts versus Factions: the use and abuse of subjectivity in scientific research. European Science and Environment Forum Working Paper; reprinted in Rethinking Risk and the Precautionary Principle. Ed: Morris, J. 2000; Oxford : Butterworth.

7. Lecoutre MP, Poitevineau J, Lecoutre B Even statisticians are not immune to misinterpretations of Null Hypothesis Significance Tests International Journal of Psychology 2000; 38: 337 – 345.

8. Haller H, Krauss S (2002) Misinterpretations of significance: A problem students share with their teachers? Methods of Psychological Research Online 7: 1-20.

9. Gigerenzer G, Krauss S, Vitouch O (2004) The null ritual: What you always wanted to know about significance testing but were afraid to ask. In David Kaplan (ed). The Handbook of Methodology for the Social Sciences. (Ch.21)

10. Cohen J The earth is round (p < .05). American Psychologist 1994; 49: 997-1003.

Competing interests: None declared

Listen to all the evidence when results are not significant 16 January 2008
Previous Rapid Response  Top
Michael C. Watson,
Lecturer in Public Health
University of Nottingham, NG7 2HA,
Denise Kendrick, Carol Coupland, Deborah Futers, Jean Robinson

Send response to journal:
Re: Listen to all the evidence when results are not significant

Dear Editor,

This interesting paper by Hewitt et al (1) discusses an important issue in relation to research methodology. However, it is unfortunate that the length of their article precluded a less selective and more balanced representation of our work.

Hewitt et al seem to believe that our overall conclusion was that the intervention should be used, however basing this solely on the ‘What this study adds’ box as they did is unreasonable given the more detailed discussion and interpretation in the article text. Firstly we stated five times in the article that the intervention was not associated with a reduction in injuries, and three times that it was associated with an increase in the primary care injury attendance rate. The “what this study adds” box also included a statement that “larger differences in safety practices may be required to affect injury rates”. Unfortunately Hewitt et al fail to mention these points in their discussion of our paper.

Hewitt et al claim that we “seem to use proxy measures of outcome as justification for the intervention” However these measures which include safety equipment possession and use and parental satisfaction, were defined as secondary outcome measures, in our paper and we clearly stated in the first sentence of our section on “interpretation of the findings” that “the increased possession and use of safety equipment among families in the intervention arm did not translate into a lower injury rate”. It is unfortunate that in table 2, they report the results from the analysis of our primary outcome measure (any medically attended injury) but include our interpretation from the analysis of secondary outcome measures (safety equipment possession and use), and not our interpretation of our analysis of our primary outcome measure. Hewitt et al include a quote from our paper in which we were positive about safety equipment schemes such as those organised by SureStart. We feel it would have been more balanced if they had also included our adjoining sentence:

“However, our findings also highlight the importance of rigorously evaluating the widespread provision of equipment not only in terms of safety practices but also in terms of injury outcomes and uptake of schemes by those most at risk.”

Hewitt et al argue that we noted that it was unlikely that intervention would not reduce injury rates because "several observational studies have shown a lower risk of injury among people with a range of safety practices." They also state that “it is, surprising to seek reassurance from non-randomised data when a randomised trial shows the "wrong" result.” Hewitt et al clearly took exception to our reference to observational studies here. Yet, despite us pointing out in the article that there are very few RCTs in this area measuring injury outcomes (all of which were underpowered to detect a plausible reduction in medically attended injury rates), they fail to appreciate that the majority of evidence in this area comes from observational studies. Are they really suggesting that all such evidence should be ignored?

Our analyses of primary outcome measures at the child level demonstrated that the increase in injury rates for any medically attended injury was confined only to primary care attendances. Secondary care attendances (IRR 1.02, 95% CI 0.90 to 1.13) and hospital admissions (IRR 1.02, 95% CI 0.70 to 1.40) were not increased by the intervention. We stated that several explanations are possible for the higher attendance rate in primary care among intervention arm children and argued that this may have resulted from either increased awareness amongst intervention arm parents with subsequent increased reporting of minor injuries or from risk compensation, whereby parents feel safer because of having the safety equipment and consequently change other behaviours that are protective against injury. It is plausible that raising parental awareness might increase primary care attendances for more minor injuries but not secondary care attendances or hospital admissions for more severe injuries. We believe this is less likely to be the case for risk compensation since if parents change other protective behaviours this might be unlikely only to affect minor injuries. We state that “further work is required to explore these hypotheses further”, but Hewitt et al fail to include this in their discussion of our paper.

It is also possible that some minor injuries occurred in the intervention arm as a direct result of having the safety equipment, e.g. children trapping fingers in stair gates. However, families were informed by both the equipment fitters and the health visitors that if they encountered problems with the equipment then they should contact their health visitor as soon as possible, and although a very small number of parents did so in relation to the refitting of some equipment, none of the families reported injuries involving the equipment. This suggests that this mechanism does not explain the higher primary care attendance rate seen in the intervention arm.

We agree with Hewitt et al that the decision to use a P value of 0.05 or a 95% confidence interval to determine statistical significance is arbitrary but widely accepted, but consider their use of 67% and 51% to be equally arbitrary but without wide acceptance. Wider use of these limits would greatly increase the chances of detecting both beneficial and harmful effects of new interventions when no such effects exist leading to unnecessary costs and public concern.

We believe that further research is urgently needed to examine the protective effect of specific items of equipment on the injuries they could potentially prevent. Hewitt et al argue that safety advice and safety equipment should not be given to families with young children because it increases the risk of harm and cost. Is this a reasonable conclusion bearing in mind that the increased primary care attendance rate could be due to increased parental reporting of injury, the limited ability of our study to demonstrate reductions in injury due to the higher than expected baseline prevalence of safety equipment and the lower 95% confidence intervals including the possibility that the intervention reduces secondary care attendances by 10% and hospital admissions by 30%? Throwing the baby out with the bath water seems premature under these circumstances.

Reference: 1. Hewitt C, Mitchell N, Torgerson D. Heed the data when results are not significant. BMJ 2008; 336: 23-25.

Competing interests: We are authors of one of the articles discussed in the paper by Hewitt et al.