BMJ 1998;316:690-693 ( 28 February )

Education and debate

Interpreting treatment effects in randomised trials

Gordon H Guyatt, professora Elizabeth F Juniper, professora Stephen D Walter, professora Lauren E Griffith, research biostaticiana Roger S Goldstein, professorb

a Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Ontario, Canada L8N 3Z5, b Division of Respiratory Medicine, University of Toronto, Toronto, Ontario, Canada

Correspondence to: Dr Guyatt guyatt{at}fhs.csu.mcmaster.ca

The need to measure the impact of treatments on health related quality of life has led to a rapid increase in the variety of instruments available and in their use as measures of outcome in clinical trials. One limitation of instruments that purport to measure health related quality of life is difficulty interpreting their results. In the past decade, investigators have progressed in making these questionnaire results interpretable. For example, we have shown that when questionnaires present response options in the form of seven point scales with verbal descriptions for each option (see box), the smallest difference that patients consider important is often approximately 0.5 per question. A moderate difference corresponds to a change of approximately 1.0 per question, and changes of greater than 1.5 can be considered large. Thus, for example, in a domain with four items, patients will consider a 1 point change in two or more items as important. This finding applies across different areas of function, including dyspnoea, fatigue, and emotional function in patients with chronic airflow limitation1; and symptoms, emotional function, and activity limitations in adults2 and children3 with asthma, parents of children with asthma,4 and adults with rhinoconjunctivitis.5 Initially, we used comparisons in the same patient to establish this difference, but more recently we have replicated this finding using differences between patients.6

Summary points


Several questionnaires on quality of life related to health are available, but interpreting their results may be difficult

For some questionnaires, we now know that the smallest change in score that patients consider important is 0.5 

Even if the mean difference between a treatment and a control is appreciably less than the smallest change that is important, treatment may have an important impact on many patients

A method for estimating the proportion of patients who benefit from a treatment when the outcome is a continuous variable has been developed

The method is outlined using two examples, one a crossover trial and the other a parallel group design

This approach emphasises the need to establish ranges of health related changes that represent trivial, small but important, moderate, and large changes in addition to mean differences

    Assumptions

Clinicians and investigators tend to assume that if the mean difference between a treatment and a control is appreciably less than the smallest change that is important, then the treatment has a trivial effect. This may not be so. Let us assume that a randomised clinical trial shows a mean difference of 0.25 in a questionnaire in which the minimal important difference is 0.5. It might be concluded that the difference is unimportant and that the result does not support giving the treatment. This interpretation assumes that every patient treated scored 0.25 better than they would have done had they received the control and ignores the possibility that treatment might have a heterogeneous effect. Depending on the true distribution of results, the appropriate interpretation might be different.

Consider a situation in which 25% of the treated patients improved by a magnitude of 1.0, while the other 75% did not improve at all (mean change of 0). This would mean that the 25% of those treated obtained a moderate benefit from the intervention. Using the method that has recently been developed for interpreting the size of treatment effects---the number needed to treat---investigators have found that doctors often treat 25 to 50 patients, even as many as 100, in order to prevent a single adverse event. 7 8 Thus, the hypothetical treatment with a mean difference of 0.25 and a number needed to treat value of 4 proves to have a powerful effect.

We have developed a method for estimating the proportion of patients who benefit from a treatment when the outcome is a continuous variable. We outline this method using two examples, one a crossover trial and the other a parallel group design.

Seven point scale with verbal descriptors

The following options were given for response to the question "How short of breath have you felt during the last two weeks while climbing stairs?"

bullet   1---extremely short of breath

bullet   2---very short of breath

bullet   3---quite a bit short of breath

bullet   4---moderate shortness of breath

bullet   5---some shortness of breath

bullet   6---a little shortness of breath

bullet   7---not at all short of breath

In the seven point scales used in this study, 7 represents the best possible function, and 1 the worst possible function.

    Crossover trial

To complete the asthma quality of life questionnaire, patients rate the impairments they have experienced during the previous 14 days and respond to 32 questions on seven point scales similar to that in the box.9 In a multicentre double blind crossover randomised trial lasting 12 weeks, 140 patients received salmeterol (50 µg, twice daily), salbutamol (200 µg, four times daily) or placebo plus salbutamol (to be opened as needed). Each patient received all three regimens and used the questionnaire to rate their quality of life in relation to their asthma at the end of each study period.10

                              
View this table:
[in this window]
[in a new window]
 

Table 1 Differences between groups given different treatments for asthma

The mean differences between salmeterol and salbutamol, and between salmeterol and placebo, met conventional criteria for significance. In the current analysis, we examined and compared the distribution of different scores in the salmeterol, salbutamol, and placebo periods. We reasoned that the number of patients who had obtained important benefit from treatment would be the number with a difference of 0.5 or more favouring the treatment period, minus the number with a difference of 0.5 or more favouring the control period. This measure is analogous to the conventional risk difference, with 1 divided by the difference in risk being the number needed to treat.

The figure shows the distribution of differences between the salmeterol and salbutamol treatment periods in the activity domain of the asthma quality of life questionnaire and the difference in the proportion of the distribution in the important benefit compared with the important deterioration ranges. The distribution is approximately normal.

Table 1 shows that for both comparisons, differences between treatments failed to reach the threshold of the minimal important difference for the activity limitation section of the asthma quality of life questionnaire. In the symptom section of the questionnaire, the difference between salmeterol and salbutamol bordered on the minimal important difference. The only comparison in which the minimal important difference was clearly exceeded was that between salmeterol and placebo in the symptom section of the questionnaire.

In contrast to these mean differences, many patients had scores that were more than 0.5 better for salmeterol compared with salbutamol treatment for both symptoms and activity limitations. Fewer had scores that were 0.5 or more better for salbutamol compared with salmeterol. The difference in the proportions is even greater for the comparison between salmeterol and placebo (table 1).

Comparing salmeterol and salbutamol, clinicians would need to treat 4.5 patients for one patient to gain important benefit in the activity domain (or 45 for 10 to benefit). However, the number needed to treat for salmeterol compared with placebo in the activity domain is 2.9. 


View larger version (17K):
[in this window]
[in a new window]
 
Difference in the activity domain of the asthma quality of life questionnaire between periods of treatment with salmeterol and salbutamol

    Parallel group trial

The chronic respiratory questionnaire, which includes 20 items measuring dyspnoea, fatigue, emotional function, and mastery (the extent to which patients feel in control), was developed for use in patients with moderate or severe chronic airflow limitation, and uses seven point scale response options.11 Seventy eight patients with chronic airflow limitation were randomly allocated to a six month programme of respiratory rehabilitation or to conventional community care. We used differences between the patients' chronic respiratory questionnaire scores at baseline and after 24 weeks reported in the primary analysis of the trial results in the current analysis.12 Mean differences between treatment and control for three domains reached significance.

The analysis of the parallel group trial provides additional challenges beyond those of the crossover trial. In theory, to calculate the proportion who improved on treatment we would have needed to know how rehabilitation patients would have fared had they received standard care, and how the standard care patients would have fared had they received rehabilitation. However, we could not observe these data directly because patients received only one treatment or the other. We do, however, know the proportion who improved, remained the same, and deteriorated relative to their baseline status in both treatment and control groups (table 2).

                              
View this table:
[in this window]
[in a new window]
 

Table 2 Calculating the proportion of patients who benefited from receiving rehabilitation in a parallel group trial*

Table 3 shows the proportion of patients in the rehabilitation and control groups whose dyspnoea scores increased by more than 0.5 (improved), changed between -0.5 and 0.5 (unchanged), and fell by more than 0.5 (deteriorated). We can refer to the proportions improved, unchanged, and deteriorated in the two groups as the "marginals." Given these marginals, there is, in general, no single way of filling in the individual cells in table 2---indeed, there are many possibilities. We have assumed that treatment and control responses are independent. Making this assumption, we obtain estimates of the individual cell values by multiplying the corresponding marginals (for instance, in table 2 we obtain the value for cell ax by multiplying the proportion improved in the rehabilitation group by the proportion improved in the standard care group). In table 2, cells ax, by, and cz represent patients whose outcome is the same irrespective of treatment. Patients in cells ay, az, and bz fared better receiving standard care than rehabilitation, and patients in cells bx, cx and cy fared better receiving rehabilitation than standard care. Thus, the proportion who received benefit from treatment is (bx+cx+cy)-(ay+az+bz), which in this case is (0.24+0.11+0.10)-(0.12+0.03+0.05)=0.25 (0.24 without rounding error). The number needed to treat value is therefore 1/0.24, or 4.2. 

Table 3 gives the full results and shows that the mean difference between treatment and control groups exceeded the minimal important difference in two of the four domains. However, for all four domains, the difference in the proportion improved compared with deteriorated in the two treatment groups was similar, leading to consistent number needed to treat values of between 2.5 and 4.4. 

                              
View this table:
[in this window]
[in a new window]
 

Table 3 Differences between patients with chronic airflow limitation who were receiving rehabilitation and patients given conventional care

    Interpretation of treatment effects

The notion of taking a continuous variable, specifying a threshold that defines an important difference, and examining the proportions of patients who reach that threshold is not new. In considering the treatment of hypertension, Rose emphasised the difference between mean differences in populations and the impact these differences might have on individuals. In one specific example, Duffy argues persuasively that knowledge of mean changes in alcohol consumption in a population does not allow one to estimate change in the proportion of heavy drinkers. Rather, ascertaining the proportion of heavy drinkers requires direct measurement.13 Another good example of this approach comes from a recent controlled trial of tissue plasminogen activator treatment in patients with acute stroke.14 In reporting the results of this study, the authors presented both mean values of functional measures and differences in the proportions of patients who reached a threshold level of function.

What we have done that is new is to anchor the threshold difference using the smallest difference that patients consider important---the minimal important difference. We have shown how the method can be applied in both crossover and parallel group trials, how to generate the number needed to treat for one patient to benefit from therapy, and how superficial examination of mean differences can produce very misleading conclusions.


View larger version (94K):
[in this window]
[in a new window]
 
Ascertaining the proportion of heavy drinkers requires direct measurement

When mean differences fall below the minimal important difference, clinicians may intuitively conclude that the treatment has a small, and possibly unimportant, effect. Similarly, doctors who observe a mean difference that is appreciably greater than the minimal important difference may be ready to assume that each patient benefits. This is not necessarily the case. For example, we found a mean difference of 0.7 in the mastery domain of the chronic respiratory questionnaire between those who received and did not receive rehabilitation. Despite this substantial difference, the number needed to treat was 2.5. This means that for every five patients who complete a rehabilitation programme, only two will be better off---a result that may have major implications for the cost effectiveness of the intervention.

Our approach is not restricted to health related quality of life or functional status measures, but applies to any clinical variable. For instance, the interpretation of changes in pulmonary function, exercise capacity, or renal or cardiac function could all be analysed in this way. For these variables, however, the concept of the minimal important difference may be questioned. If renal failure requires dialysis or if cardiac function deteriorates to the point that a heart transplant is necessary, the importance for the patient is clear. Smaller changes in physiological function are important not in themselves, but rather through their effects on patient function and her or his health related quality of life. When considering differences that are important to patients, it may be more appropriate to measure function and health related quality of life directly rather than physiological variables.

Our approach is a way of making data more interpretable---we do not advocate its use as the only analysis. Power may be lost when converting continuous variables to dichotomous or categorical variables. We believe the initial analysis should examine whether differences in continuous variables meet criteria for significance. Once investigators have excluded chance as an explanation for differences between groups they can examine the proportions of patients who have deteriorated, remained the same, or improved as an aid in interpreting the importance of the results.

This approach emphasises the need to establish ranges of health related quality of life, symptoms, and functional status questionnaire changes that represent trivial, small but important, moderate, and large changes. When they understand these ranges, investigators reporting clinical trials should present not only mean differences but also the difference in the proportion of patients who experience important improvement, and the associated number needed to treat. Presenting the results in both ways will reduce the risk of important misinterpretation of randomised trials that directly measure aspects of living that are important to patients.

    Acknowledgments

Funding: Supported in part by a grant from the Medical Research Council of Canada.

Conflict of interest: None.

    References
Top
References

  1. Jaeschke R, Guyatt G, Keller J, Singer J. Measurement of health status: ascertaining the meaning of a change in quality-of-life questionnaire score. Controlled Clin Trials 1989; 10: 407-415[Medline].
  2. Juniper EF, Guyatt GH, Willan A, Griffith LE. Determining a minimal important change in a disease-specific quality of life questionnaire. J Clin Epidemiol 1994; 47: 81-87[Medline].
  3. Juniper EF, Guyatt GH, Feeny DH, Ferrie PJ, Griffith LE, Townsend M. Measuring quality of life in children with asthma. Quality Life Res 1996; 5: 36-46.
  4. Juniper EF, Guyatt GH, Feeny DH, Ferry PJ, Griffith LE, Townsend M. Measuring quality of life in parents of children with asthma. Quality Life Res 1996; 5: 27-34.
  5. Juniper EF, Guyatt GH, Griffith LE, Perrie PJ. Interpretation of rhinoconjunctivitis quality of life questionnaire data. J Allergy Clin Immunol 1996; 98: 843-845[Medline].
  6. Redelmeier DA, Goldstein RS, Guyatt GH. Assessing the minimal important difference in symptoms: a comparison of two techniques. J Clin Epidemiol 1996; 49: 1215-1219[Medline].
  7. Laupacis A, Sackett DL, Roberts RS, An assessment of clinically useful measures of the consequences of treatment. N Engl J Med 1988;318:1728-33.
  8. Jaeschke R, Guyatt G, Shannon H, Walter SD, Cook DJ, Heddle N. Basic statistics for clinicians. III. Assessing the effects of treatment: measures of association. Can Med Assoc J 1995; 152: 351-357[Abstract].
  9. Juniper EF, Guyatt GH, Epstein RS, Ferry PJ, Jaeschke R, Hillers TK. Evaluation of health-related quality of life in asthma: development of a questionnaire for use in clinical trials. Thorax 1992; 47: 76-83[Abstract].
  10. Juniper EF, Johnston PR, Borkhoff CM, Guyatt GH, Boulet LP, Haukioja A. Quality of life in asthma clinical trials: comparison of salmeterol and salbutamol. Am J Respir Care Med 1995; 151: 66-70.[Abstract]
  11. Guyatt GH, Berman LB, Townsend M, Pugsley SO, Chambers LW. A measure of quality of life for clinical trials in chronic lung disease. Thorax 1987; 42: 773-778[Abstract].
  12. Goldstein RS, Gort EH, Guyatt GH, Stubbing D, Avendano MA. Prospective randomised controlled trial of respiratory rehabilitation. Lancet 1994; 344: 1394-1397[Medline].
  13. Duffy JC. Alcohol consumption and control policy. J R Stat Soc A 1991; 156: 225-230.
  14. The National Institute of Neurological Disorders and Stroke rt-PA Stroke Study Group. Tissue plasminogen activator for acute ishemic stroke. N Engl Med J 1995; 333: 1581-1587[Abstract/Free Full Text]

(Accepted 5 October 1997)


© BMJ 1998

Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?

Related Article

Applying results of randomised trials to patients
Stephen Senn
BMJ 1998 317: 537. [Extract] [Full Text]

This article has been cited by other articles:

  • Halpern, S. D., Doyle, R., Kawut, S. M. (2008). The Ethics of Randomized Clinical Trials in Pulmonary Arterial Hypertension. Proc Am Thorac Soc 5: 631-635 [Abstract] [Full text]  
  • Barbui, C. MD, Furukawa, T. A. MD, Cipriani, A. MD (2008). Effectiveness of paroxetine in the treatment of acute major depression in adults: a systematic re-examination of published and unpublished data from randomized trials. CMAJ 178: 296-305 [Abstract] [Full text]  
  • Saver, J. L. (2007). Novel End Point Analytic Techniques and Interpreting Shifts Across the Entire Range of Outcome Scales in Acute Stroke Trials. Stroke 38: 3055-3062 [Abstract] [Full text]  
  • Saver, J. L. (2007). Clinical Impact of NXY-059 Demonstrated in the SAINT I Trial: Derivation of Number Needed to Treat for Benefit Over Entire Range of Functional Disability. Stroke 38: 1515-1518 [Abstract] [Full text]  
  • Saver, J. L. (2007). Deriving Number-Needed-to-Treat and Number-Needed-to-Harm From the Saint I Trial Results. Stroke 38: 257-257 [Full text]  
  • Weinfurt, K. P., Anstrom, K. J., Castel, L. D., Schulman, K. A., Saad, F. (2006). Effect of zoledronic acid on pain associated with bone metastasis in patients with prostate cancer. Ann Oncol 17: 986-989 [Abstract] [Full text]  
  • Pyne, D B, Hopkins, W G, Batterham, A M, Gleeson, M, Fricker, P A (2005). Characterising the individual performance responses to mild illness in international swimmers. Br. J. Sports. Med. 39: 752-756 [Abstract] [Full text]  
  • Krieger, J. W., Takaro, T. K., Song, L., Weaver, M. (2005). The Seattle-King County Healthy Homes Project: A Randomized, Controlled Trial of a Community Health Worker Intervention to Decrease Exposure to Indoor Asthma Triggers. Am. J. Public Health 95: 652-659 [Abstract] [Full text]  
  • Bruynesteyn, K, Boers, M, Kostense, P, van der Linden, S, van der Heijde, D (2005). Deciding on progression of joint damage in paired films of individual patients: smallest detectable difference or change. Ann Rheum Dis 64: 179-182 [Abstract] [Full text]  
  • Senn, S. (2004). Individual response to treatment: is it a valid assumption?. BMJ 329: 966-968 [Full text]  
  • Saver, J. L. (2004). Number Needed to Treat Estimates Incorporating Effects Over the Entire Range of Clinical Outcomes: Novel Derivation Method and Application to Thrombolytic Therapy for Acute Stroke. Arch Neurol 61: 1066-1070 [Abstract] [Full text]  
  • Kalra, L., Evans, A., Perez, I., Melbourn, A., Patel, A., Knapp, M., Donaldson, N. (2004). Training carers of stroke patients: randomised controlled trial. BMJ 328: 1099- [Abstract] [Full text]  
  • Arroll, B., Goodyear-Smith, F. (2004). Corticosteroid injections for osteoarthritis of the knee: meta-analysis. BMJ 328: 869- [Abstract] [Full text]  
  • Velikova, G., Booth, L., Smith, A. B., Brown, P. M., Lynch, P., Brown, J. M., Selby, P. J. (2004). Measuring Quality of Life in Routine Oncology Practice Improves Communication and Patient Well-Being: A Randomized Controlled Trial. JCO 22: 714-724 [Abstract] [Full text]  
  • Murphy, K. R., Fitzpatrick, S., Cruz-Rivera, M., Miller, C. J., Parasuraman, B. (2003). Effects of Budesonide Inhalation Suspension Compared With Cromolyn Sodium Nebulizer Solution on Health Status and Caregiver Quality of Life in Childhood Asthma. Pediatrics 112: e212-219 [Abstract] [Full text]  
  • Wijkstra, P. J., Lacasse, Y., Guyatt, G. H., Casanova, C., Gay, P. C., Meecham Jones, J., Goldstein, R. S. (2003). A Meta-analysis of Nocturnal Noninvasive Positive Pressure Ventilation in Patients With Stable COPD. Chest 124: 337-343 [Abstract] [Full text]  
  • Curtis, J.R., Patrick, D.L. (2003). The assessment of health status among patients with COPD. Eur Respir J 21: 36S-45s [Abstract] [Full text]  
  • Thomas, M, McKinley, R K, Freeman, E, Foy, C, Prodger, P, Price, D (2003). Breathing retraining for dysfunctional breathing in asthma: a randomised controlled trial. Thorax 58: 110-115 [Abstract] [Full text]  
  • Wiebe, S, Matijevic, S, Eliasziw, M, Derry, P A (2002). Clinically important change in quality of life in epilepsy. J. Neurol. Neurosurg. Psychiatry 73: 116-120 [Abstract] [Full text]  
  • Hanes, J. C. (2002). A Nonparametric Approach to Program Evaluation: Utilizing Number Needed to Treat, L'Abbe Plots, and Event Rate Curves for Outcome Analysis. American Journal of Evaluation 23: 165-182 [Abstract]  
  • Juniper, E. F., Price, D. B., Stampone, P. A., Creemers, J. P. H. M., Mol, S. J. M., Fireman, P. (2002). Clinically Important Improvements in Asthma-Specific Quality of Life, But No Difference in Conventional Clinical Indexes in Patients Changed From Conventional Beclomethasone Dipropionate to Approximately Half the Dose of Extrafine Beclomethasone Dipropionate*. Chest 121: 1824-1832 [Abstract] [Full text]  
  • Redmond, A. C., Keenan, A.-M. (2002). Understanding Statistics: Putting P-Values into Perspective. J. Am. Podiatr. Med. Assoc. 92: 297-305 [Abstract] [Full text]  
  • Jones, P.W. (2002). Interpreting thresholds for a clinically significant change in health status in asthma and COPD. Eur Respir J 19: 398-404 [Abstract] [Full text]  
  • Koller, M., Lorenz, W. (2002). Quality of life: a deconstruction for clinicians. JRSM 95: 481-488 [Full text]  
  • Brocklebank, D., Wright, J., Cates, C. (2001). Systematic review of clinical effectiveness of pressurised metered dose inhalers versus other hand held inhaler devices for delivering corticosteroids in asthma. BMJ 323: 896-896 [Abstract] [Full text]  
  • Ram, F. S F, Wright, J., Brocklebank, D., White, J. E S (2001). Systematic review of clinical effectiveness of pressurised metered dose inhalers versus other hand held inhaler devices for delivering beta 2 agonists bronchodilators in asthma. BMJ 323: 901-901 [Abstract] [Full text]  
  • Fritz, J. M, Irrgang, J. J (2001). A Comparison of a Modified Oswestry Low Back Pain Disability Questionnaire and the Quebec Back Pain Disability Scale. ptjournal 81: 776-788 [Abstract] [Full text]  
  • Guell, R., Casan, P., Belda, J., Sangenis, M., Morante, F., Guyatt, G. H., Sanchis, J. (2000). Long-term Effects of Outpatient Rehabilitation of COPD: A Randomized Trial. Chest 117: 976-983 [Abstract] [Full text]  
  • Juniper, E. F., Buist, A. S. (1999). Health-Related Quality of Life in Moderate Asthma: 400 {micro}g Hydrofluoroalkane Beclomethasone Dipropionate vs 800 {micro}g Chlorofluorocarbon Beclomethasone Dipropionate. Chest 116: 1297-1303 [Abstract] [Full text]  
  • VOLLMER, W. M., MARKSON, L. E., O'CONNOR, E., SANOCKI, L. L., FITTERMAN, L., BERGER, M., SONIA BUIST, A. (1999). Association of Asthma Control with Health Care Utilization and Quality of Life. Am. J. Respir. Crit. Care Med. 160: 1647-1652 [Abstract] [Full text]  
  • Senn, S. (1998). Applying results of randomised trials to patients. BMJ 317: 537-537 [Full text]  

Rapid Responses:

Read all Rapid Responses

Random results from randomised trials.
Stephen Senn
bmj.com, 6 Jul 1998 [Full text]



Student BMJ

Intimate examinations

Israeli students are refusing to perform intimate examinations on anaesthetised women without their informed consent.

www.student.bmj.com

Listen to the latest BMJ Interview