Jump to: Page Content, Site Navigation, Site Search,
You are seeing this message because your web browser does not support basic web standards. Find out more about why this message is appearing and what you can do to make your experience on this site better.
Published 22 July 2009, doi:10.1136/bmj.b2688
Cite this as: BMJ 2009;339:b2688
David Arnold, medical student1, Alan Girling, senior research fellow2, Andrew Stevens, professor of public health2, Richard Lilford, professor of clinical epidemiology2
1 Cardiff University Medical School, Cardiff, 2 School of Health and Population Sciences, University of Birmingham, Birmingham B15 2TT
Correspondence to: R Lilford r.j.lilford{at}bham.ac.uk
Design Review of studies providing both a direct and indirect utility estimate.
Selection criteria Papers reporting comparisons of utilities obtained directly (standard gamble or time trade off) or indirectly (European quality of life 5D [EQ-5D], short form 6D [SF-6D], or health utilities index [HUI]) from the same patient.
Data sources PubMed and Tufts database of utilities.
Statistical methods Sign test for paired comparisons between direct and indirect utilities; least squares regression to describe average relations between the different methods.
Main outcome measures Mean utility scores (or median if means unavailable) for each method, and differences in mean (median) scores between direct and indirect methods.
Results We found 32 studies yielding 83 instances where direct and indirect methods could be compared for health states experienced by adults. The direct methods used were standard gamble in 57 cases and time trade off in 60 (34 used both); the indirect methods were EQ-5D (67 cases), SF-6D (13), HUI-2 (5), and HUI-3 (37). Mean utility values were 0.81 (standard gamble) and 0.77 (time trade off) for the direct methods; for the indirect methods: 0.59 (EQ-5D), 0.63 (SF-6D), 0.75 (HUI-2) and 0.68 (HUI-3).
Discussion Direct methods of estimating utilities tend to result in higher health ratings than the more widely used indirect methods, and the difference can be substantial. Use of indirect methods could have important implications for decisions about resource allocation: for example, non-lifesaving treatments are relatively more favoured in comparison with lifesaving interventions than when using direct methods.
Measurement of utilities is a tricky and controversial area. Broadly, two groups of methods exist (fig 1
). The first is based on mapping preferences directly onto the utility scale. This can be done by means of a trade off (standard gamble or time trade off) or visual analogue scale, which is less favoured.2 3 We refer to these as direct measures of utility.
The second is based on mapping preferences onto the utility scale indirectly via a generic health related quality of life questionnaire (such as European quality of life five dimensions [EQ-5D], short form six dimensions [SF-6D], and health utilities index mark 2 [HUI-2] and mark 3 [HUI-3]). Questionnaire responses are converted to utilities by means of "tariffs" or "weights." These tariffs are available as a result of separate and previous exercises in which various possible health states have been calibrated by means of a trade off method from a sample of the general population. We refer to this group of methods as indirect methods of utility measurement.4 5 6 7
Indirect methods bypass the time consuming process of asking respondents to trade health states for different risks of death (or years of remaining life) each time a study is carried out; such trade offs need be done only once and the results obtained can thereafter be used to derive utilities using a simple questionnaire. The indirect methods encountered in this study are summarised in table 1
.
|
|
It is well known that different methods of utility estimation yield systematically different values.8 9 10 11 12 13 14 Standard gamble, time trade off, and visual analogue scale have all been compared across studies in a systematic way.15 16 17 18 19 20 21 However, the relation between direct and indirect utility measures has not been systematically documented across studies, despite the fact that differences between them have been identified in particular examples. Such examples have contributed to a widely held impression among health economists that direct methods tend to yield higher utilities (reflecting better reported health) for given health states than do indirect methods, irrespective of the type of direct or indirect method used (for example, time trade off versus standard gamble or EQ-5D versus SF-6D). The aims of the current study were therefore: to systematically confirm or refute the impression that direct methods tend to yield higher utility values than indirect methods; to quantify the magnitude of any such differences; and to describe the relation between direct and indirect measures.
Data were collected by searching systematically for studies in which the same respondents completed a direct trade off and a generic quality of life questionnaire leading to an indirect utility assessment. In view of the acknowledged differences among the various direct and indirect methods, we examined our results overall and by subgroups based on the specific methods used.
Previous studies have shown that health state valuations obtained directly from affected populations are often higher (that is, health is rated as better) than those obtained from unaffected populations asked to make hypothetical judgments.22 23 For this reason we supplemented holistic comparisons of direct and indirect methods with an analysis stratified by the type of population.
|
As a further check that our search strategy had not missed important studies, we searched PubMed (using the strings shown in table 3
) for studies mentioning all utility/preference ratings in four conditions for which utility measures are commonly made: asthma, diabetes, rheumatoid arthritis, and stroke. We then scrutinised studies to find out whether they compared direct and indirect methods in the same group of people.
|
Data extraction
Each eligible study was scrutinised by two authors (DA and AG) and the following information was extracted: title, authors, date, disease topic, the methods of elicitation of direct utilities (standard gamble, time trade off, or both); the generic questionnaires used (EQ-5D, SF-6D, HUI-2, HUI-3). Mean utility values for each method, with sample sizes, were extracted for the disease states in each included study. Medians were extracted when means were not reported. Standard deviations were recorded if they were available or could be deduced by working back from cited confidence intervals. Each study was classified according to whether the respondents were actual patients with direct experience of a certain condition (we refer to these as "current patients") or were asked to imagine the experience of the condition ("hypothetical patients").
Statistical methods
Many of the studies contained data on more than one health or disease state. Where the same state had been assessed by separate groups of respondents (for example, baseline assessments in the two arms of a randomised controlled trial) these groups retained their identity in the analysis. In studies where the same group of respondents had provided utilities for more than one state, these states were combined (within a study) by pooling the data to ensure independence between different groups contributing to the analysis. This process entailed calculating "average medians" for studies that did not cite means. The end result was that each of the independent groups contributed a single mean (or median) observation on each of several utility methods. Differences between the utility observations obtained in this way cannot be relied on to follow a constant distributional form, even if they could be attributed to chance alone. Thus the sign test was used for paired comparisons of direct and indirect methods.
Average relations between direct and indirect utilities across the independent groups were fitted using least squares regression lines, with the constraint that all methods assign the value 1 to perfect health. In each case the scores from the direct method were regressed on the scores from the indirect method, to test the hypothesis that direct utilities could be predicted by applying a linear correction to the indirect utilities. The predictive value of the fitted lines was assessed using an approximate method. Residuals about the fitted line were calculated and standardised with respect to variation between subjects (in both utilities). The method used only those studies that contributed means and standard deviations to the data set. The latter were converted to standard errors. Where a group had been formed (as described above) by pooling values over different health states, average standard errors were used—guaranteed to be greater than the true standard errors in the presence of (unknown) correlations between states. For most studies it was not possible to estimate residual standard deviations accurately because the correlations between direct and indirect utilities were not available. Instead, a conservative approach was employed, leading to underestimates of the absolute magnitude of the standardised residuals. For a fitted line y=a+bx the standardised residual for the point (x',y') with standard errors (sx,sy) was estimated as (y'–a–bx')÷(sy+|b|sx). The denominator of this expression is guaranteed to be greater than the true residual standard deviation. We applied
2 tests to the sums of squares of the estimated standardised residuals. The approximations described here mean that these tests are conservative—they understate the discrepancy between the reported utility values and the fitted lines. Small P values indicate that such discrepancies cannot reasonably be attributed to variation between the respondents contributing to the same study.
This left 28 studies for analysis covering a wide range of diseases (including rheumatic, neuropsychiatric, vascular, respiratory, gastrointestinal, renal, and infectious diseases) and encompassing data from 4688 respondents. Some studies reported utilities for several different states of health or disease. Altogether there were 83 instances in which direct and indirect methods could be compared. Sixty-eight of the 83 comparisons (from 25 studies) were based on current patients, and 15 (from three studies) were based on hypothetical patients (one group from the general public, two groups from the healthcare community).
The full list of papers, including those involving children, is given in appendix 1, with summary data from each paper in appendix 2 (see Web Extra material). A breakdown of the clinical topics investigated (and the number of disease states per topic) is given in table 4
.
|
|
|
|
|
|
|
Limitations
As with all systematic reviews, the results are constrained by the published literature which, in this case, gave access to 83 comparisons across 28 studies. The studies covered many different diseases and several different methods of utility elicitation allowing a general trend to be identified—that indirect methods tend to yield lower utilities than direct methods. However, the heterogeneity of methods and disease states in the review precludes any generalisable numerical summary of effect size that would apply to a different spectrum of patients or methods.
Explaining the findings
This study was not designed to elucidate the reasons for the differences in results between indirect and direct methods. Nevertheless, there are several possible explanations that could contribute to an understanding of the results.
One potential explanation is that the generic questionnaires used to obtain the indirect utilities impose constraints: respondents are forced to encapsulate their potentially complex condition within five to eight categories, depending on the questionnaire used. The questionnaires do not allow respondents to report, for example, potentially positive aspects of their situations that would boost utility values. There is a risk, therefore, that people contributing tariffs for the various states form a different impression to that which those completing the questionnaire would have had.
Secondly, it is likely that the respondents who contribute trade off values for indirect utility elicitation differ systematically from the patients that participate in direct methods. The general population used to obtain tariffs for indirect utility estimation is spread across a wider age range than patient populations typically used in direct estimation. Young people in good health and people with diseases may have very different perspectives on the relative merits of remaining in a given health state and making a trade off involving death.
Those participating in elicitation of indirect tariffs are always asked to make a hypothetical choice whereas direct utilities are usually based on the experience of people who actually have the condition, who are aware not only of the distress of living with a condition, but also of mitigating factors. They may derive benefits from family, friends, work, and recreation on a daily basis and tend, over time, to adapt to their condition.26 Members of the general public contributing tariffs for health states are at a considerable emotional distance—they lack personal experience of the disease, and the descriptions given in generic questionnaires are rather bland and decontextualised.25 However, the difference between direct and indirect utilities seems to remain even when values are elicited from an (admittedly small) sample of hypothetical patients. This may be because the verbal "pictures" of disease used were somewhat richer when applying direct methods to hypothetical patients than when eliciting tariffs for the calibration of indirect methods.25
Finally, modifications to the methodology for deriving indirect utilities from quality of life questionnaires have recently been suggested.27 If implemented these would tend to increase the valuations attached to mild health states and eliminate some of the differences observed here.
Direct or indirect?
Simply observing a systematic and substantial difference between methods, as we have done, does not resolve the decision of which method to choose. There is, however, no universally accepted theoretical basis for choosing direct or indirect methods.1 Some people think that utilities should be derived from patients (ideally at stages throughout their illness) on the grounds that only patients really know what the condition is like. Others think that the citizens perspective is more relevant. Various reasons have been given for this opinion—it is argued that citizens are the natural (and democratic) locus for a decision about use of societys resources, that citizens can be asked to factor in societal objectives (such as the pursuit of equity), and that the necessarily hypothetical choices that citizens make conform to "the axioms of utility theory", whereby decisions are made prospectively under uncertainty. However, a preference for direct or indirect utilities does not necessarily result from these considerations: direct utilities derived hypothetically from citizens and indirect utilities calibrated through a survey of the general population can both provide the advantages of the citizens perspective.
Given the absence of a gold standard for the elicitation of utilities, it may be informative to consider the practical implications of using one method or the other.
Implications for resource allocation
Had we found little or no difference between direct and indirect methods, then the latter could be recommended on grounds of efficiency since, once the tariffs have been set, indirect utilities are far simpler to elicit. However, it seems that indirect methods give consistently lower levels of utility than direct methods. This means that there is more headroom for utility gain with indirect methods. The utility of death, however, is fixed at zero. Thus, in comparison with direct methods, indirect methods will favour the allocation of healthcare resources away from interventions that prevent or delay death in favour of those that alleviate non-fatal conditions. It could be argued that the popularity of indirect methods for informing rationing decisions simply expresses a legitimate societal attitude in favour of non-lethal over lethal conditions. However, it could equally be argued that the public would rather give more weight to delaying death. In that case indirect methods might risk undervaluing both personal and societal preferences. In England, the National Institute for Health and Clinical Excellence (NICE) has noted that "society places great value on extending the lives of people with life-threatening conditions" and has issued new guidance for "life-extending treatments that are above the usual range of cost effectiveness,"28 in the wake of decisions that denied life extending treatments.
We are not the first investigators to suspect that many of the utilities used in health resource decisions seem too low, particularly at the upper end of the scale.29 If this is so, then the data presented here show that the tendency applies especially to indirectly elicited utilities (see box).
Implications for decision makers
Those who prefer direct methods, but who wish to exploit the convenience of indirect methods, might propose using a simple correction to map indirect utilities onto the putatively more valid direct utilities. Our results show that linear adjustments of indirect utilities can achieve only a partial conversion to direct utility scores. They cannot be used to accurately predict direct utilities for particular health states. However, such an adjustment may be better than nothing, because the adjusted utilities are well calibrated on average. Further development in the science of utility estimation could make a contribution by identifying the characteristics of disease states for which the average adjustment is inappropriate.
This paper adds weight to the recommendation to be cautious when using utilities of any type. In the construction of health economic models it may be prudent to extend the range of uncertainty beyond the confines of statistical confidence limits and conduct a sensitivity analysis. In that case, the results presented here suggest that indirect utilities might best be used to populate only the lower limits of such an analysis.
Because direct and indirect methods can lead to such noticeable differences in elicited utilities, priority setting institutions should avoid using a mixture of methods for different decisions, otherwise a motivated choice of method might be used to distort the outcome in a preferred direction. NICE has certainly had to deal with inconsistent use of methods.30 It may be that a general counsel against the mixing of different methods is the most important implication of this work.
|
Cite this as: BMJ 2009;339:b2688
Contributors: RJL conceived the idea for the paper, DA performed the literature reviews and data extraction, and AG repeated the data extraction and carried out statistical analyses. All authors prepared the manuscript and RL is guarantor.
Funding: This work was supported through the MATCH programme (EPSRC Grant GR/S29874/01) although the views expressed are entirely the authors own.
Competing interests: None declared.
Ethical approval: Not required.
© Arnold et al 2009
This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
![]()
CiteULike
Complore
Connotea
Del.icio.us
Digg
Reddit
StumbleUpon
Technorati What's this?
Read all Rapid Responses