Comparison of direct and indirect methods of estimating health state utilities for resource allocation: review and empirical analysisBMJ 2009; 339 doi: http://dx.doi.org/10.1136/bmj.b2688 (Published 22 July 2009) Cite this as: BMJ 2009;339:b2688
- David Arnold, medical student1,
- Alan Girling, senior research fellow2,
- Andrew Stevens, professor of public health2,
- Richard Lilford, professor of clinical epidemiology2
- 1Cardiff University Medical School, Cardiff
- 2School of Health and Population Sciences, University of Birmingham, Birmingham B15 2TT
- Correspondence to: R Lilford
- Accepted 18 February 2009
Background and objective Utilities (values representing preferences) for healthcare priority setting are typically obtained indirectly by asking patients to fill in a quality of life questionnaire and then converting the results to a utility using population values. We compared such utilities with those obtained directly from patients or the public.
Design Review of studies providing both a direct and indirect utility estimate.
Selection criteria Papers reporting comparisons of utilities obtained directly (standard gamble or time trade off) or indirectly (European quality of life 5D [EQ-5D], short form 6D [SF-6D], or health utilities index [HUI]) from the same patient.
Data sources PubMed and Tufts database of utilities.
Statistical methods Sign test for paired comparisons between direct and indirect utilities; least squares regression to describe average relations between the different methods.
Main outcome measures Mean utility scores (or median if means unavailable) for each method, and differences in mean (median) scores between direct and indirect methods.
Results We found 32 studies yielding 83 instances where direct and indirect methods could be compared for health states experienced by adults. The direct methods used were standard gamble in 57 cases and time trade off in 60 (34 used both); the indirect methods were EQ-5D (67 cases), SF-6D (13), HUI-2 (5), and HUI-3 (37). Mean utility values were 0.81 (standard gamble) and 0.77 (time trade off) for the direct methods; for the indirect methods: 0.59 (EQ-5D), 0.63 (SF-6D), 0.75 (HUI-2) and 0.68 (HUI-3).
Discussion Direct methods of estimating utilities tend to result in higher health ratings than the more widely used indirect methods, and the difference can be substantial. Use of indirect methods could have important implications for decisions about resource allocation: for example, non-lifesaving treatments are relatively more favoured in comparison with lifesaving interventions than when using direct methods.
For resources to be allocated fairly, according to benefit gained per unit cost, between different patient groups, health economists and policy makers use a common currency of benefit. This standardisation is usually done by attaching different utilities (also known as preference weightings, health state values) to different health states.1 Utilities are captured on a scale where 1 represents perfect health, 0 represents death, and states worse than death have negative values.
Measurement of utilities is a tricky and controversial area. Broadly, two groups of methods exist (fig 1⇓). The first is based on mapping preferences directly onto the utility scale. This can be done by means of a trade off (standard gamble or time trade off) or visual analogue scale, which is less favoured.2 3 We refer to these as direct measures of utility.
The second is based on mapping preferences onto the utility scale indirectly via a generic health related quality of life questionnaire (such as European quality of life five dimensions [EQ-5D], short form six dimensions [SF-6D], and health utilities index mark 2 [HUI-2] and mark 3 [HUI-3]). Questionnaire responses are converted to utilities by means of “tariffs” or “weights.” These tariffs are available as a result of separate and previous exercises in which various possible health states have been calibrated by means of a trade off method from a sample of the general population. We refer to this group of methods as indirect methods of utility measurement.4 5 6 7
Indirect methods bypass the time consuming process of asking respondents to trade health states for different risks of death (or years of remaining life) each time a study is carried out; such trade offs need be done only once and the results obtained can thereafter be used to derive utilities using a simple questionnaire. The indirect methods encountered in this study are summarised in table 1⇓.
The time trade off and standard gamble interviews must be implemented by a trained interviewer who explains the concept clearly without distressing or leading the patient. By comparison, questionnaires such as EQ-5D are simple, require little explanation, and stratify data into a number of different dimensions that the direct methods do not register (such as mobility and pain).
It is well known that different methods of utility estimation yield systematically different values.8 9 10 11 12 13 14 Standard gamble, time trade off, and visual analogue scale have all been compared across studies in a systematic way.15 16 17 18 19 20 21 However, the relation between direct and indirect utility measures has not been systematically documented across studies, despite the fact that differences between them have been identified in particular examples. Such examples have contributed to a widely held impression among health economists that direct methods tend to yield higher utilities (reflecting better reported health) for given health states than do indirect methods, irrespective of the type of direct or indirect method used (for example, time trade off versus standard gamble or EQ-5D versus SF-6D). The aims of the current study were therefore: to systematically confirm or refute the impression that direct methods tend to yield higher utility values than indirect methods; to quantify the magnitude of any such differences; and to describe the relation between direct and indirect measures.
Data were collected by searching systematically for studies in which the same respondents completed a direct trade off and a generic quality of life questionnaire leading to an indirect utility assessment. In view of the acknowledged differences among the various direct and indirect methods, we examined our results overall and by subgroups based on the specific methods used.
Previous studies have shown that health state valuations obtained directly from affected populations are often higher (that is, health is rated as better) than those obtained from unaffected populations asked to make hypothetical judgments.22 23 For this reason we supplemented holistic comparisons of direct and indirect methods with an analysis stratified by the type of population.
We obtained studies from two sources. As the primary method, the search strings in table 2⇓ were used to look for papers in the PubMed databases (covering all diseases) that mentioned a direct and an indirect method in the title or abstract. We then looked at each article to identify those in which the same group of patients had contributed direct utilities and had completed a health related quality of life questionnaire, the results of which had been translated into indirect utilities by means of pre-existing tariff values.
To verify the primary method, we also reviewed the Tufts database of utilities24 for any studies that compared the utilities within the same group of respondents.
As a further check that our search strategy had not missed important studies, we searched PubMed (using the strings shown in table 3⇓) for studies mentioning all utility/preference ratings in four conditions for which utility measures are commonly made: asthma, diabetes, rheumatoid arthritis, and stroke. We then scrutinised studies to find out whether they compared direct and indirect methods in the same group of people.
The methodologies used to obtain “tariffs” (weightings to convert the results of health related quality of life questionnaires to utilities) for the four indirect methods included among the comparative studies identified by our search have already been reviewed.25 The methods used were time trade off in the case of EQ-5D, standard gamble in the case of SF-6D, and a visual analogue scale that was transformed into a standard gamble (using a conversion factor obtained in other studies) in the case of HUI.
Each eligible study was scrutinised by two authors (DA and AG) and the following information was extracted: title, authors, date, disease topic, the methods of elicitation of direct utilities (standard gamble, time trade off, or both); the generic questionnaires used (EQ-5D, SF-6D, HUI-2, HUI-3). Mean utility values for each method, with sample sizes, were extracted for the disease states in each included study. Medians were extracted when means were not reported. Standard deviations were recorded if they were available or could be deduced by working back from cited confidence intervals. Each study was classified according to whether the respondents were actual patients with direct experience of a certain condition (we refer to these as “current patients”) or were asked to imagine the experience of the condition (“hypothetical patients”).
Many of the studies contained data on more than one health or disease state. Where the same state had been assessed by separate groups of respondents (for example, baseline assessments in the two arms of a randomised controlled trial) these groups retained their identity in the analysis. In studies where the same group of respondents had provided utilities for more than one state, these states were combined (within a study) by pooling the data to ensure independence between different groups contributing to the analysis. This process entailed calculating “average medians” for studies that did not cite means. The end result was that each of the independent groups contributed a single mean (or median) observation on each of several utility methods. Differences between the utility observations obtained in this way cannot be relied on to follow a constant distributional form, even if they could be attributed to chance alone. Thus the sign test was used for paired comparisons of direct and indirect methods.
Average relations between direct and indirect utilities across the independent groups were fitted using least squares regression lines, with the constraint that all methods assign the value 1 to perfect health. In each case the scores from the direct method were regressed on the scores from the indirect method, to test the hypothesis that direct utilities could be predicted by applying a linear correction to the indirect utilities. The predictive value of the fitted lines was assessed using an approximate method. Residuals about the fitted line were calculated and standardised with respect to variation between subjects (in both utilities). The method used only those studies that contributed means and standard deviations to the data set. The latter were converted to standard errors. Where a group had been formed (as described above) by pooling values over different health states, average standard errors were used—guaranteed to be greater than the true standard errors in the presence of (unknown) correlations between states. For most studies it was not possible to estimate residual standard deviations accurately because the correlations between direct and indirect utilities were not available. Instead, a conservative approach was employed, leading to underestimates of the absolute magnitude of the standardised residuals. For a fitted line y=a+bx the standardised residual for the point (x′,y′) with standard errors (sx,sy) was estimated as (y′−a−bx′)÷(sy+|b|sx). The denominator of this expression is guaranteed to be greater than the true residual standard deviation. We applied χ2 tests to the sums of squares of the estimated standardised residuals. The approximations described here mean that these tests are conservative—they understate the discrepancy between the reported utility values and the fitted lines. Small P values indicate that such discrepancies cannot reasonably be attributed to variation between the respondents contributing to the same study.
The main search for studies that contained direct and indirect methods of comparison returned 32 studies. The Tufts database search did not yield any studies comparing direct and indirect utilities in the same population that had not been retrieved by the original search. The same was true for the disease specific search (table 3⇑). Four of the 32 studies were about health states experienced by children. In three of these studies, parents had stood as proxies to provide preference values for their own children. The other study asked young children to rate their own health, but the investigators acknowledged methodological difficulties that could have compromised the results. The four studies of children were excluded from further consideration.
This left 28 studies for analysis covering a wide range of diseases (including rheumatic, neuropsychiatric, vascular, respiratory, gastrointestinal, renal, and infectious diseases) and encompassing data from 4688 respondents. Some studies reported utilities for several different states of health or disease. Altogether there were 83 instances in which direct and indirect methods could be compared. Sixty-eight of the 83 comparisons (from 25 studies) were based on current patients, and 15 (from three studies) were based on hypothetical patients (one group from the general public, two groups from the healthcare community).
The full list of papers, including those involving children, is given in appendix 1, with summary data from each paper in appendix 2 (see Web Extra material). A breakdown of the clinical topics investigated (and the number of disease states per topic) is given in table 4⇓.
In some instances more than one method was used to determine utilities. For direct utilities, standard gamble alone was used in 23 of the 83 comparisons, time trade off alone in 26, and both in 34. The most popular indirect methods were EQ-5D (n=67) and HUI-3 (n=37), one or both of which figured in all but four of the 83 comparisons. Table 5⇓ shows a breakdown.
The utility values reported by the individual studies are averages over samples of respondents, with sample sizes ranging from three to 1011 (median 62). Means and standard deviations of utility values were available for 26 studies (covering 72 of the 83 available comparisons), with the remainder reporting medians only. Table 6⇓ summarises the distribution of the reported mean or median utility values, presented by type of respondent.
More detailed analyses were undertaken for current and hypothetical comparisons that reported EQ-5D, HUI-3, or both. Figure 2⇓ shows the differences between direct and indirect methods for independent groups of participants.
In figure 2, all but nine of 86 points lie above the horizontal line, reflecting the tendency for direct utilities to exceed indirect utilities. However, the spread of the differences diminishes for states approaching perfect health, a consequence of the fact that utility estimates cannot be greater than 1. The direct methods (time trade off, standard gamble) produced significantly higher values than the indirect methods (EQ-5D, HUI-3) in every case where a statistical comparison was feasible (table 7)⇓. By contrast, the differences in utility values between each of the direct methods and between each of the indirect methods were not statistically significant.
The discrepancy between individual direct and indirect measures is reflected in figure 3⇓. These plots are constructed using information taken directly from the studies, without aggregation into independent groups. If direct and indirect methods gave the same results, then the points would be distributed equally above and below the 45° line in each panel. The great majority of points in all panels, however, fell above this line. In each panel, the broken line represents the predicted direct utility score from a regression on the indirect score, as computed from the “current patient” comparisons. Table 7 shows the slopes for these lines and those based on hypothetical comparisons. The lines represent average relations only, with statistically significant (P<0.05) departures from the line in all but one instance, which was based on very low sample numbers. This finding suggests that the variation between participants within studies was not sufficient to account for the discrepancies between the plotted points and the fitted lines. At best, these lines characterise “average” relations between direct and indirect methods across a collection of different health states; it cannot be assumed that they will produce accurate conversions from one type of utility to another.
Our findings show that the versatile and convenient indirect methods of utility measurement yield different results from those obtained by the direct methods. The indirect methods yielded systematically lower utility values than direct methods for a wide range of diseases (we give two examples in the box). The differences in utility values between the direct and indirect groups of methods were sometimes substantial, and tended to be greater than the differences between the individual methods making up the group.
Examples in which indirect methods produce considerably lower utilities than direct methods
For the condition of intermittent claudication an indirect (EQ-5D) utility of 0.57 was elicited. This value implies that a person would forgo over 40% of their remaining life or run slightly over a 40% risk of death to avoid the condition. Yet the same respondents gave direct utilities of 0.82 (time trade off) and 0.91 (standard gamble) for the condition. So, when asked directly, they would only accept a much lower risk (of around 10%) to avoid the condition.
For patients with chronic obstructive pulmonary disease, respondents gave a utility of 0.91 for the direct method (time trade off), accepting a 9% reduction in lifespan to avoid the condition. The EQ-5D utility from the same patients was 0.73, suggesting a willingness to forgo a quarter of their remaining lifespan. Readers may wish to consider whether a preventative treatment with a mortality of 25% would really be acceptable for most patients destined to develop this disease.
In both these cases, the utilities were taken from current patients.
As with all systematic reviews, the results are constrained by the published literature which, in this case, gave access to 83 comparisons across 28 studies. The studies covered many different diseases and several different methods of utility elicitation allowing a general trend to be identified—that indirect methods tend to yield lower utilities than direct methods. However, the heterogeneity of methods and disease states in the review precludes any generalisable numerical summary of effect size that would apply to a different spectrum of patients or methods.
Explaining the findings
This study was not designed to elucidate the reasons for the differences in results between indirect and direct methods. Nevertheless, there are several possible explanations that could contribute to an understanding of the results.
One potential explanation is that the generic questionnaires used to obtain the indirect utilities impose constraints: respondents are forced to encapsulate their potentially complex condition within five to eight categories, depending on the questionnaire used. The questionnaires do not allow respondents to report, for example, potentially positive aspects of their situations that would boost utility values. There is a risk, therefore, that people contributing tariffs for the various states form a different impression to that which those completing the questionnaire would have had.
Secondly, it is likely that the respondents who contribute trade off values for indirect utility elicitation differ systematically from the patients that participate in direct methods. The general population used to obtain tariffs for indirect utility estimation is spread across a wider age range than patient populations typically used in direct estimation. Young people in good health and people with diseases may have very different perspectives on the relative merits of remaining in a given health state and making a trade off involving death.
Those participating in elicitation of indirect tariffs are always asked to make a hypothetical choice whereas direct utilities are usually based on the experience of people who actually have the condition, who are aware not only of the distress of living with a condition, but also of mitigating factors. They may derive benefits from family, friends, work, and recreation on a daily basis and tend, over time, to adapt to their condition.26 Members of the general public contributing tariffs for health states are at a considerable emotional distance—they lack personal experience of the disease, and the descriptions given in generic questionnaires are rather bland and decontextualised.25 However, the difference between direct and indirect utilities seems to remain even when values are elicited from an (admittedly small) sample of hypothetical patients. This may be because the verbal “pictures” of disease used were somewhat richer when applying direct methods to hypothetical patients than when eliciting tariffs for the calibration of indirect methods.25
Finally, modifications to the methodology for deriving indirect utilities from quality of life questionnaires have recently been suggested.27 If implemented these would tend to increase the valuations attached to mild health states and eliminate some of the differences observed here.
Direct or indirect?
Simply observing a systematic and substantial difference between methods, as we have done, does not resolve the decision of which method to choose. There is, however, no universally accepted theoretical basis for choosing direct or indirect methods.1 Some people think that utilities should be derived from patients (ideally at stages throughout their illness) on the grounds that only patients really know what the condition is like. Others think that the citizen’s perspective is more relevant. Various reasons have been given for this opinion—it is argued that citizens are the natural (and democratic) locus for a decision about use of society’s resources, that citizens can be asked to factor in societal objectives (such as the pursuit of equity), and that the necessarily hypothetical choices that citizens make conform to “the axioms of utility theory”, whereby decisions are made prospectively under uncertainty. However, a preference for direct or indirect utilities does not necessarily result from these considerations: direct utilities derived hypothetically from citizens and indirect utilities calibrated through a survey of the general population can both provide the advantages of the citizen’s perspective.
Given the absence of a gold standard for the elicitation of utilities, it may be informative to consider the practical implications of using one method or the other.
Implications for resource allocation
Had we found little or no difference between direct and indirect methods, then the latter could be recommended on grounds of efficiency since, once the tariffs have been set, indirect utilities are far simpler to elicit. However, it seems that indirect methods give consistently lower levels of utility than direct methods. This means that there is more headroom for utility gain with indirect methods. The utility of death, however, is fixed at zero. Thus, in comparison with direct methods, indirect methods will favour the allocation of healthcare resources away from interventions that prevent or delay death in favour of those that alleviate non-fatal conditions. It could be argued that the popularity of indirect methods for informing rationing decisions simply expresses a legitimate societal attitude in favour of non-lethal over lethal conditions. However, it could equally be argued that the public would rather give more weight to delaying death. In that case indirect methods might risk undervaluing both personal and societal preferences. In England, the National Institute for Health and Clinical Excellence (NICE) has noted that “society places great value on extending the lives of people with life-threatening conditions” and has issued new guidance for “life-extending treatments that are above the usual range of cost effectiveness,”28 in the wake of decisions that denied life extending treatments.
We are not the first investigators to suspect that many of the utilities used in health resource decisions seem too low, particularly at the upper end of the scale.29 If this is so, then the data presented here show that the tendency applies especially to indirectly elicited utilities (see box).
Implications for decision makers
Those who prefer direct methods, but who wish to exploit the convenience of indirect methods, might propose using a simple correction to map indirect utilities onto the putatively more valid direct utilities. Our results show that linear adjustments of indirect utilities can achieve only a partial conversion to direct utility scores. They cannot be used to accurately predict direct utilities for particular health states. However, such an adjustment may be better than nothing, because the adjusted utilities are well calibrated on average. Further development in the science of utility estimation could make a contribution by identifying the characteristics of disease states for which the average adjustment is inappropriate.
This paper adds weight to the recommendation to be cautious when using utilities of any type. In the construction of health economic models it may be prudent to extend the range of uncertainty beyond the confines of statistical confidence limits and conduct a sensitivity analysis. In that case, the results presented here suggest that indirect utilities might best be used to populate only the lower limits of such an analysis.
Because direct and indirect methods can lead to such noticeable differences in elicited utilities, priority setting institutions should avoid using a mixture of methods for different decisions, otherwise a motivated choice of method might be used to distort the outcome in a preferred direction. NICE has certainly had to deal with inconsistent use of methods.30 It may be that a general counsel against the mixing of different methods is the most important implication of this work.
What is known
Health state utilities play a crucial role in the allocation of healthcare resources
Utilities may be obtained directly (usually from patients) or, more often, indirectly, by using a quality of life questionnaire, the results of which are converted to utilities using “weights” (tariffs) obtained from the general public
Different direct and indirect methods yield different utility values
What this paper adds
Indirect methods as a group produce consistently lower utilities (worse recorded health) than the direct group of methods
This difference may be larger than the differences between methods within each group
Reliance on indirect methods will result in less resources being allocated to life saving treatments than if direct methods were used
Conversion of indirect utilities to direct utilities is only partly successful
Cite this as: BMJ 2009;339:b2688
We thank Stirling Bryan and Tracey Roberts for helpful discussion. We also thank the referees Steven McPhail and Cindy Lam for their detailed and perceptive comments.
Contributors: RJL conceived the idea for the paper, DA performed the literature reviews and data extraction, and AG repeated the data extraction and carried out statistical analyses. All authors prepared the manuscript and RL is guarantor.
Funding: This work was supported through the MATCH programme (EPSRC Grant GR/S29874/01) although the views expressed are entirely the authors’ own.
Competing interests: None declared.
Ethical approval: Not required.
This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.