Commentary: Summary statistics of poor quality studies must be treated cautiously

BMJ 1997; 314 doi: (Published 01 February 1997) Cite this as: BMJ 1997;314:337
  1. Jesse A Berlin, associate professor of biostatistics (berlin{at}
  1. a University of Pennsylvania School of Medicine, Center for Clinical Epidemiology and Biostatistics, 423 Guardian Drive, Philadelphia, PA 19104-6021, USA


    Apart from the lack of adherence to modern standards of trial design, a striking feature of the studies considered by Frank Shann is the extreme degree of heterogeneity of the findings. Shann argues that this combination of heterogeneity and low quality precludes the confident estimation of a summary measure of treatment effectiveness against development of pneumonia or sepsis. I will try to show that exploring sources of heterogeneity in situations such as this can provide clinical insights and generate hypotheses.

    Unfortunately, given the uniformly suboptimal quality of the existing studies, we cannot examine quality of the studies as a source of heterogeneity. At the same time, its uniform poorness means that study quality is not likely to explain the heterogeneity of the findings. For illustrative purposes, however, suppose that we summarise separately the three studies with the highest risks of infection in the control groups. The exact stratified summary odds ratio for those studies is 0.04 (95% confidence interval 0.00 to 0.18), suggesting a substantial benefit from antibiotic prophylaxis when the baseline risk is high, with little evidence of heterogeneity between the studies (P = 1.00 for test of common odds ratio). The high rates of infection in the control groups of the three studies might be “real” and rooted in the epidemiological aspects of measles in the populations studied, or they might be due to chance.

    These same three studies also happen to be the three most recently conducted studies. The colinearity of date of publication and baseline risk makes interpretation of the result for this subgroup somewhat ambiguous. This ambiguity stems from the possibility that other clinically relevant factors might have varied with time–for example, improvements in subtle aspects of study design or changes in the clinical characteristics of patients. Nevertheless, the stratification and summarisation, although not the optimal statistical approach in this situation,1 have raised a clinical hypothesis that might not otherwise have been raised.

    What are we to conclude from this exercise? We might conclude that antibiotic prophylaxis is effective, on the basis that an effect as large as the one observed is unlikely to have been produced either by chance or by bias. The danger with such an argument, and one reason not to summarise these studies, is that many readers will assign a high degree of validity to a quantitative summary of poor quality studies simply because it is quantitative and in spite of any caveats offered by the meta-analyst. My view is that the combined odds ratio is simply a summary from which we can learn about the data. Without more studies of higher quality we are unlikely to solve our dilemma.

    Thus, one message from the above, and from Shann's paper, is that the decision about whether to calculate a quantitative summary of the data is not always straightforward, and different investigators could legitimately arrive at different decisions. From a clinical perspective, the above summaries may not, in fact, tell the whole story. In the same three studies that showed large reductions in infection rates, there were no deaths. Overall, there was no evidence of a reduction in mortality associated with antibiotic prophylaxis. Given our ability to treat infected patients and the risks and costs of antibiotic resistance, the practice of antibiotic prophylaxis for children with measles does not seem justified no matter what the value (either numeric or otherwise) of the summary odds ratio.


    1. 1.
    View Abstract