Jump to: Page Content, Site Navigation, Site Search,
You are seeing this message because your web browser does not support basic web standards. Find out more about why this message is appearing and what you can do to make your experience on this site better.
Gordon H Guyatt a Department of Clinical
Epidemiology and Biostatistics, McMaster University, Hamilton, Ontario,
Canada L8N 3Z5, b Division of Respiratory
Medicine, University of Toronto, Toronto, Ontario, Canada
Correspondence to:
Dr Guyatt guyatt{at}fhs.csu.mcmaster.ca
The need to measure the impact of treatments on health
related quality of life has led to a rapid increase in the variety of
instruments available and in their use as measures of outcome in
clinical trials. One limitation of instruments that purport to measure
health related quality of life is difficulty interpreting their
results. In the past decade, investigators have progressed in making
these questionnaire results interpretable. For example, we have shown
that when questionnaires present response options in the form of seven
point scales with verbal descriptions for each option (see box), the
smallest difference that patients consider important is often
approximately 0.5 per question. A moderate difference corresponds to a
change of approximately 1.0 per question, and changes of greater than
1.5 can be considered large. Thus, for example, in a domain with four
items, patients will consider a 1 point change in two or more items as
important. This finding applies across different areas of function,
including dyspnoea, fatigue, and emotional function in patients with
chronic airflow limitation1; and symptoms, emotional
function, and activity limitations in adults2 and
children3 with asthma, parents of children with
asthma,4 and adults with
rhinoconjunctivitis.5 Initially, we used comparisons in
the same patient to establish this difference, but more recently we
have replicated this finding using differences between
patients.6
Summary points
Several questionnaires on quality of life related to health are
available, but interpreting their results may be difficult
For some questionnaires, we now know that the smallest change in score
that patients consider important is 0.5
Even if the mean difference between a treatment and a control is
appreciably less than the smallest change that is important, treatment
may have an important impact on many patients
A method for estimating the proportion of patients who benefit from a
treatment when the outcome is a continuous variable has been developed
The method is outlined using two examples, one a crossover trial and
the other a parallel group design
This approach emphasises the need to establish ranges of health related
changes that represent trivial, small but important, moderate, and
large changes in addition to mean differences
| |
Assumptions |
|---|
Clinicians and investigators tend to assume that if the mean difference between a treatment and a control is appreciably less than the smallest change that is important, then the treatment has a trivial effect. This may not be so. Let us assume that a randomised clinical trial shows a mean difference of 0.25 in a questionnaire in which the minimal important difference is 0.5. It might be concluded that the difference is unimportant and that the result does not support giving the treatment. This interpretation assumes that every patient treated scored 0.25 better than they would have done had they received the control and ignores the possibility that treatment might have a heterogeneous effect. Depending on the true distribution of results, the appropriate interpretation might be different.
Consider a situation in which 25% of the treated patients improved by
a magnitude of 1.0, while the other 75% did not improve at all (mean
change of 0). This would mean that the 25% of those treated obtained a
moderate benefit from the intervention. Using the method that has
recently been developed for interpreting the size of treatment
effects
the number needed to treat
investigators have found that
doctors often treat 25 to 50 patients, even as many as 100, in order to
prevent a single adverse event.
7 8
Thus, the hypothetical
treatment with a mean difference of 0.25 and a number needed to treat
value of 4 proves to have a powerful effect.
We have developed a method for estimating the proportion of patients who benefit from a treatment when the outcome is a continuous variable. We outline this method using two examples, one a crossover trial and the other a parallel group design.
|
Seven point scale with verbal descriptors
The following options were given for response to the question "How short of breath have you felt during the last two weeks while climbing stairs?"
In the seven point scales used in this study, 7 represents the best possible function, and 1 the worst possible function. |
| |
Crossover trial |
|---|
To complete the asthma quality of life questionnaire, patients rate the impairments they have experienced during the previous 14 days and respond to 32 questions on seven point scales similar to that in the box.9 In a multicentre double blind crossover randomised trial lasting 12 weeks, 140 patients received salmeterol (50 µg, twice daily), salbutamol (200 µg, four times daily) or placebo plus salbutamol (to be opened as needed). Each patient received all three regimens and used the questionnaire to rate their quality of life in relation to their asthma at the end of each study period.10
|
The mean differences between salmeterol and salbutamol, and between salmeterol and placebo, met conventional criteria for significance. In the current analysis, we examined and compared the distribution of different scores in the salmeterol, salbutamol, and placebo periods. We reasoned that the number of patients who had obtained important benefit from treatment would be the number with a difference of 0.5 or more favouring the treatment period, minus the number with a difference of 0.5 or more favouring the control period. This measure is analogous to the conventional risk difference, with 1 divided by the difference in risk being the number needed to treat.
The figure shows the distribution of differences between the salmeterol and salbutamol treatment periods in the activity domain of the asthma quality of life questionnaire and the difference in the proportion of the distribution in the important benefit compared with the important deterioration ranges. The distribution is approximately normal.
Table 1 shows that for both comparisons, differences between treatments failed to reach the threshold of the minimal important difference for the activity limitation section of the asthma quality of life questionnaire. In the symptom section of the questionnaire, the difference between salmeterol and salbutamol bordered on the minimal important difference. The only comparison in which the minimal important difference was clearly exceeded was that between salmeterol and placebo in the symptom section of the questionnaire.
In contrast to these mean differences, many patients had scores that were more than 0.5 better for salmeterol compared with salbutamol treatment for both symptoms and activity limitations. Fewer had scores that were 0.5 or more better for salbutamol compared with salmeterol. The difference in the proportions is even greater for the comparison between salmeterol and placebo (table 1).
Comparing salmeterol and salbutamol, clinicians would need to treat 4.5 patients for one patient to gain important benefit in the activity domain (or 45 for 10 to benefit). However, the number needed to treat for salmeterol compared with placebo in the activity domain is 2.9.
|
| |
Parallel group trial |
|---|
The chronic respiratory questionnaire, which includes 20 items measuring dyspnoea, fatigue, emotional function, and mastery (the extent to which patients feel in control), was developed for use in patients with moderate or severe chronic airflow limitation, and uses seven point scale response options.11 Seventy eight patients with chronic airflow limitation were randomly allocated to a six month programme of respiratory rehabilitation or to conventional community care. We used differences between the patients' chronic respiratory questionnaire scores at baseline and after 24 weeks reported in the primary analysis of the trial results in the current analysis.12 Mean differences between treatment and control for three domains reached significance.
The analysis of the parallel group trial provides additional challenges beyond those of the crossover trial. In theory, to calculate the proportion who improved on treatment we would have needed to know how rehabilitation patients would have fared had they received standard care, and how the standard care patients would have fared had they received rehabilitation. However, we could not observe these data directly because patients received only one treatment or the other. We do, however, know the proportion who improved, remained the same, and deteriorated relative to their baseline status in both treatment and control groups (table 2).
|
Table 3 shows the proportion of patients in the rehabilitation and
control groups whose dyspnoea scores increased by more than 0.5 (improved), changed between
0.5 and 0.5 (unchanged), and fell by
more than 0.5 (deteriorated). We can refer to the proportions improved,
unchanged, and deteriorated in the two groups as the "marginals."
Given these marginals, there is, in general, no single way of filling
in the individual cells in table 2
indeed, there are many
possibilities. We have assumed that treatment and control responses are
independent. Making this assumption, we obtain estimates of the
individual cell values by multiplying the corresponding marginals (for
instance, in table 2 we obtain the value for cell ax by multiplying the
proportion improved in the rehabilitation group by the proportion
improved in the standard care group). In table 2, cells ax, by, and cz
represent patients whose outcome is the same irrespective of treatment.
Patients in cells ay, az, and bz fared better receiving standard care
than rehabilitation, and patients in cells bx, cx and cy fared better receiving rehabilitation than standard care. Thus, the proportion who
received benefit from treatment is (bx+cx+cy)
(ay+az+bz), which in
this case is (0.24+0.11+0.10)
(0.12+0.03+0.05)=0.25 (0.24 without
rounding error). The number needed to treat value is therefore 1/0.24,
or 4.2.
Table 3 gives the full results and shows that the mean difference between treatment and control groups exceeded the minimal important difference in two of the four domains. However, for all four domains, the difference in the proportion improved compared with deteriorated in the two treatment groups was similar, leading to consistent number needed to treat values of between 2.5 and 4.4.
|
| |
Interpretation of treatment effects |
|---|
The notion of taking a continuous variable, specifying a threshold that defines an important difference, and examining the proportions of patients who reach that threshold is not new. In considering the treatment of hypertension, Rose emphasised the difference between mean differences in populations and the impact these differences might have on individuals. In one specific example, Duffy argues persuasively that knowledge of mean changes in alcohol consumption in a population does not allow one to estimate change in the proportion of heavy drinkers. Rather, ascertaining the proportion of heavy drinkers requires direct measurement.13 Another good example of this approach comes from a recent controlled trial of tissue plasminogen activator treatment in patients with acute stroke.14 In reporting the results of this study, the authors presented both mean values of functional measures and differences in the proportions of patients who reached a threshold level of function.
What we have done that is new is to anchor the threshold difference
using the smallest difference that patients consider important
the minimal important difference. We have shown how the method can be
applied in both crossover and parallel group trials, how to generate
the number needed to treat for one patient to benefit from therapy, and
how superficial examination of mean differences can produce very
misleading conclusions.
|
When mean differences fall below the minimal important difference,
clinicians may intuitively conclude that the treatment has a small, and
possibly unimportant, effect. Similarly, doctors who observe a mean
difference that is appreciably greater than the minimal important
difference may be ready to assume that each patient benefits. This is
not necessarily the case. For example, we found a mean difference of
0.7 in the mastery domain of the chronic respiratory questionnaire
between those who received and did not receive rehabilitation. Despite
this substantial difference, the number needed to treat was 2.5. This
means that for every five patients who complete a rehabilitation
programme, only two will be better off
a result that may have major
implications for the cost effectiveness of the intervention.
Our approach is not restricted to health related quality of life or functional status measures, but applies to any clinical variable. For instance, the interpretation of changes in pulmonary function, exercise capacity, or renal or cardiac function could all be analysed in this way. For these variables, however, the concept of the minimal important difference may be questioned. If renal failure requires dialysis or if cardiac function deteriorates to the point that a heart transplant is necessary, the importance for the patient is clear. Smaller changes in physiological function are important not in themselves, but rather through their effects on patient function and her or his health related quality of life. When considering differences that are important to patients, it may be more appropriate to measure function and health related quality of life directly rather than physiological variables.
Our approach is a way of making data more interpretable
we do not
advocate its use as the only analysis. Power may be lost when
converting continuous variables to dichotomous or categorical variables. We believe the initial analysis should examine whether differences in continuous variables meet criteria for significance. Once investigators have excluded chance as an explanation for differences between groups they can examine the proportions of patients
who have deteriorated, remained the same, or improved as an aid in
interpreting the importance of the results.
This approach emphasises the need to establish ranges of health related quality of life, symptoms, and functional status questionnaire changes that represent trivial, small but important, moderate, and large changes. When they understand these ranges, investigators reporting clinical trials should present not only mean differences but also the difference in the proportion of patients who experience important improvement, and the associated number needed to treat. Presenting the results in both ways will reduce the risk of important misinterpretation of randomised trials that directly measure aspects of living that are important to patients.
| |
Acknowledgments |
|---|
Funding: Supported in part by a grant from the Medical Research Council of Canada.
Conflict of interest: None.
| |
References |
|---|
|
|
|---|
(Accepted 5 October 1997)
Read all Rapid Responses
Israeli students are refusing to perform intimate examinations on anaesthetised women without their informed consent.