Guidelines for authors and peer reviewers of economic submissions to the BMJ
BMJ 1996; 313 doi: https://doi.org/10.1136/bmj.313.7052.275 (Published 03 August 1996) Cite this as: BMJ 1996;313:275- M F Drummond, chair of working party (chedir{at}york.ac.uk)a,
- T O Jefferson, secretary of working party (ak15{at}dial.pipex.com) the BMJ Economic Evaluation Working Partyb
- aCentre for Health Economics, University of York, York YO1 5DD
- bMinistry of Defence, Army Medical Directorate 5, Keogh Barracks, Ash Vale, Hampshire GU12 5RR
- Members of the working party are listed at the end of the paper.Correspondence to: Dr Jefferson.
- Accepted 11 July 1996
Over the past decade interest in the economic evaluation of health care interventions has risen.1 Reviews of published studies have, however, shown gaps in the quality of work.2 3 4 5 As far back as 1974 Williams listed the essential elements of economic evaluations,6 and more recently Drummond and colleagues set out the methodological areas generally agreed among economists.7 Guidelines for economic evaluations have been promulgated and reviewed by many bodies,8 9 10 11 12 13 14 but few medical journals have explicit guidelines for peer review of economic evaluations or consistently use economist reviewers for economic papers even though they are a major publication outlet for economic evaluations.15 16 17 In January 1995 the BMJ set up a working party on economic evaluation to improve the quality of submitted and published economic articles.
It was not our intention to be unduly prescriptive or stifle innovative methods; our emphasis is on improving the clarity of economic evaluations. We also did not address those issues of conduct that have been emphasised in other guidelines.13 14 15 16 17 18
The working party's methods
The working party's objectives were to improve the quality of submitted and published economic evaluations by agreeing acceptable methods and their systematic application before, during, and after peer review. Its task was to produce: (a) guidelines for economic evaluation, together with a comprehensive supporting statement which could be easily understood by both specialist and non-specialist readers; (b) a checklist for use by referees and authors; and (c) a checklist for use by editors.
In producing the guidelines the working party has concentrated on full economic evaluations comparing two or more health care interventions and considering both costs and consequences.19 Articles sent to the BMJ and other medical journals are often more broadly based “economic submissions,”20 which comprise essentially clinical articles that report approximate cost estimates or make statements that a given treatment was “cost effective.”
We took the view that submissions reporting partial evaluations, such as a costing study or an estimate of the value to individuals of improved health, should adhere to the relevant sections of the guidelines given below, as should anecdotal reports or commentaries drawing economic conclusions about alternative forms of care. In addition to a referees' (and authors') checklist, therefore, the working party has produced shorter checklists to help BMJ editors distinguish between full economic evaluations and other types of economic submission and to help them decide which articles should be sent to referees. The main checklist and the editors' checklists are given in the boxes and a flow chart explaining their use is given in figure 1. The checklists do not replace the need for an overall judgment on the suitability of a paper.
Drafts of the guidelines and their supporting statement and the checklists have been circulated to health economists and journal editors and were debated at the biannual meeting of the UK Health Economists' Study Group in January 1996. A survey of members attending the meeting was used to identify those items of the full referees' checklist that should be used by editors.
The final document reflects a broad consensus among the working party. Any differences reflect different perspectives on the role of economic evaluation and the extent of members' interests in particular aspects of methodology rather than basic differences over the need to improve standards of reporting.
Finally, in drafting the guidelines, the working party recognised that authors may not be able to address all the points in the published version of their paper. This being so, they may care to submit supplementary documents (containing, for example, the details of any economic model used) or refer the reader to other published sources.
Guidelines for submission of economic evaluations
The guidelines are given below, grouped in 10 sections under three headings: study design, data collection, and analysis and interpretation of results. Under each section is a commentary outlining the reasons for the requirements and the main unresolved methodological issues and explaining why firm guidelines cannot be given in some cases. The guidelines are designed to be read in conjunction with other more general guidance to authors from the BMJ and the existing BMJ guidelines on statistical methods.21
Study design (1) STUDY QUESTION
The economic importance of the research question should be outlined.
The hypothesis being tested, or question being addressed, in the economic evaluation should be clearly stated.
The viewpoint(s)—for example, health care system, society—for the analysis should be clearly stated and justified.
The research question, or hypothesis, needs to satisfy three criteria.
Firstly, the question should be economically important (in terms of its resource implications) and be relevant to the choices facing the decision maker. The question “Is health promotion worthwhile?” does not meet this criterion because it fails to specify alternatives—worthwhile compared with what? Furthermore, any alternatives need to be realistic. An option of “doing nothing,” or maintaining the status quo, should be included when appropriate.
Secondly, the question should be phrased in a way that considers both costs and outcomes. The research question “Is drug X more costly than the existing therapy?” will provide incomplete information because the decision maker also needs to consider comparative effectiveness.
Thirdly, the research question should clearly state the viewpoint of the economic evaluation, and this should be justified. Possible viewpoints include those of the provider institution, the individual clinician or professional organisation, the patient or patient group, the purchaser of health care (or third party payer), and society itself. For example, hospital and other providers may need information to help in making procurement and related technology management decisions; individual clinicians to inform patient care decisions; health insurers or purchasers to support decisions on whether to pay for a procedure or which services to develop; and patients to know the level of costs they may incur in travelling to hospital or providing informal nursing care at home. The viewpoint chosen will in turn influence both the costs included in the evaluation—for example, whether to limit these to a given department, hospital, or locality and whether patient costs are included—and the types of outcome measured—for example, disease specific outcomes or generic measures of patients' quality of life.
Health economists generally advocate adopting the broader societal viewpoint when possible. This is because data can usually be disaggregated and the analysis carried out from a number of viewpoints. Also, the additional cost of adopting a broader perspective at the outset of a study is probably less than the cost of attempting to gather additional information later. Researchers should therefore identify key potential decision makers (government, purchaser, or provider) at the outset and be able to show that the research question posed will meet the needs of all key groups.
(2) SELECTION OF ALTERNATIVES
The rationale for choice of the alternative programmes or interventions for comparison should be given.
The alternative interventions should be described in sufficient detail to enable the reader to assess the relevance to his or her setting—that is, who did what, to whom, where, and how often.
The choice of the alternative must be designed to help get as close a measure as possible of the opportunity cost of using the new treatment. In principle the comparator should be the most cost effective alternative intervention currently available. In practice the comparator is usually the most widely used alternative treatment. Unless current practice is “doing nothing,” it is usually best not to use placebo as the comparator. Such a study could, however, if well conducted and reported, provide information for use in conjunction with studies of other treatments also compared with placebo.
The alternatives being compared should be described in enough detail to enable the reader to relate the information on costs and outcomes to the alternative courses of action. The use of decision trees and other decision analytic techniques (discussed in section 7) can help to clarify the alternative treatment paths being followed and provide a framework for incorporating cost and outcome data. Clear exposition of alternative treatment paths and the probabilities, cost, and outcomes linked to them should enable decision makers to use those parts of the analysis that are relevant to their viewpoint.
(3) FORM OF EVALUATION
The form(s) of evaluation used—for example, cost minimisation analysis, cost effectiveness analysis—should be stated.
A clear justification should be given for the form(s) of evaluation chosen in relation to the question(s) being addressed.
There are two types of question which require the use of different forms of evaluation (see box).
The first is: “Is it worth achieving this goal?” or “How much more or how much less of society's resources should be allocated to pursuing this goal?” Such questions can be answered formally only by the use of cost-benefit analysis. Looking at one intervention alone, cost-benefit analysis addresses the question of whether its benefits are greater than its costs—that is, the best alternative use of the resources. When several competing interventions are being considered the costs and benefits of each should be examined and that combination which maximises benefits chosen.
The main practical problem with cost-benefit analysis is that of valuing benefits, such as the saving of life or relief of pain, in money units. However, if we are to examine whether more or less should be spent on health care, we need to find a way of comparing the costs (benefits forgone elsewhere) with the benefits of improved health and any other resulting benefits. Even when all benefits cannot be measured in terms of money, cost-benefit analysis provides a useful framework for structuring decision making problems.
The second type of question is: “Given that a goal is to be achieved, what is the most efficient way of doing so?” or “What is the most efficient way of spending a given budget?” Such questions are addressed by cost effectiveness analysis, which can take one of two forms. In the first the health effects of the alternatives are known to be equal, so only the costs need to be analysed, and the least costly alternative is the most efficient. This type of analysis is often referred to as cost minimisation analysis. Secondly, alternatives may differ in both cost and effect, and a cost effectiveness ratio (cost per unit of health effect) is calculated for each. For example, given a fixed budget for dialysis, the modality (home dialysis, hospital dialysis, or continuous ambulatory peritoneal dialysis) with the lowest cost per life year saved would, if implemented, maximise the amount of life years produced by the dialysis programme. In practice, however, the selection of the most efficient mix of programmes, given a budget constraint, is more complicated: it depends on whether alternative programmes are mutually exclusive and whether the scale of programmes can be changed without changing their incremental cost effectiveness ratios.
The concept “within a given budget” is also crucial. Often authors produce a ratio of extra costs per extra unit of health effect for one intervention over another and argue that a low cost effectiveness ratio, relative to other existing health care programmes, implies that a given intervention should be provided. However, judgment is still required as the resources to meet such extra costs would inevitably come from another programme, from within or outside health care. (This point is returned to in section 10.)
The third category of evaluation, cost-utility analysis, lies somewhere between cost effectiveness and costbenefit analysis. It can be used to decide the best way of spending a given treatment budget or the health care budget. The basic outcome of cost-utility analysis is “healthy years.” Years of life in states less than full health are converted to healthy years by the use of health state preference values, resulting in generic units of health gain, such as quality adjusted life years (QALYs) or healthy years equivalents.22 (These approaches are discussed in section 5.)
Data collection (4) EFFECTIVENESS DATA
If the economic evaluation is based on a single effectiveness study—for example, a clinical trial—details of the design and results of that study should be given—for example, selection of study population, method of allocation of subjects, whether analysed by intention to treat or evaluable cohort, effect size with confidence intervals.
If the economic evaluation is based on an overview of a number of effectiveness studies details should be given of the method of synthesis or meta-analysis of evidence—for example, search strategy, criteria for inclusion of studies in the overview.
Economic evaluation of interventions relies on the assessment of their clinical effectiveness. The data can come from a single clinical study, a systematic overview of several studies, or an ad hoc synthesis of several sources. Any limitations which weaken the assessment of effectiveness weaken any economic evaluation based on it. The gold standard for assessing the efficacy of interventions is the randomised, double blind controlled trial. This design has the highest internal validity—that is, freedom from bias.
In most clinical trials the primary assessment is based on an intention to treat analysis, which assesses the clinical outcomes of all randomised patients, whether or not they completed their allocated treatment. Other analyses serve as secondary or exploratory analyses in clinical studies and should be justified if used as the primary analysis for the economic evaluation.
Clinical trials may include active or placebo controls. In active controlled studies the appropriate comparator for economic analysis is the most cost effective available therapy, or the most widely used therapy. In placebo controlled studies the economic analysis should indicate whether there are active comparators that could be considered as alternative therapies.
The generalisability of the study population is important in assessing the results of clinical trials and hence their suitability for economic evaluations. Factors that can limit generalisability include: differences across countries or health systems; costs and benefits resulting only from the trial protocol but which would not arise in practice; unrealistically high compliance rates; or the appropriateness of usual practice in clinical studies that compare a therapy with best usual care. Clinical data from studies employing a “pragmatic” protocol are often more generalisable and hence preferable for economic evaluation.
In a pragmatic trial subjects are still randomised to treatment groups, but the patient and doctor may not necessarily be blind to the treatments. The treatment protocol is also kept as close to normal care as possible and monitoring kept to a minimum. Such trials are attractive for economic analysis since they reflect what may happen in practice, but the results apply only to similar settings. Unfortunately many clinical studies are still performed under fairly restrictive conditions, so some adjustments may be required for economic evaluation (discussed below).
Clinical data can also be generated from overviews or syntheses of clinical literature. Before the data from any such overview are used in economic assessments the methods used for the overview, including the search strategy and the criteria for inclusion and exclusion of studies, need reporting.
Effectiveness data from overviews have the advantage that the confidence interval around the point estimate of clinical effect is usually narrower than that from an individual trial and the result may be more generalisable.23 Typically the economic analyst would take the point estimate of effect from the overview as the base case value and use the confidence interval as the relevant range for sensitivity analysis (see section 9).
Sometimes clinical trial data may be insufficient for economic evaluation because some of the relevant endpoints have not been measured, patients have not been followed for long enough, or the design was not pragmatic. In such cases it may be possible to adjust or supplement the data by modelling.
Ad hoc synthesis of effectiveness data from several sources, including expert opinion, is justifiable when no relevant well controlled clinical studies have been performed.24 In many cases the economic evaluation may be based on a previously published clinical trial or systematic overview. In such a case it would be sufficient to provide a brief summary, addressing the points in the guidelines, and to refer the reader to the published source.
(5) BENEFIT MEASUREMENT AND VALUATION
The primary outcome measure(s) for the economic evaluation should be clearly stated—for example, cases detected, life years, quality adjusted life years (QALYs), willingness to pay.
If health benefits have been valued details should be given of the methods used—for example, time trade off, standard gamble, contingent valuation—and the subjects from whom valuations were obtained—for example, patients, members of the general public, health care professionals.
If changes in productivity (indirect benefits) are included they should be reported separately and their relevance to the study question discussed.
In cost effectiveness analysis benefits are usually measured in natural units. For programmes whose main effect is to extend life the usual measure is life years gained. When the main effect is on quality of life a disease specific or generic quality of life index might be used.
Sometimes the benefit measure may be an intermediate marker rather than a final outcome. For example, in comparing programmes for preventing coronary heart disease reductions in blood pressure might be used. Similarly, if two antenatal screening programmes are being compared cases detected might be chosen. Such intermediate endpoints need to be justified, however, as they may be poor surrogates for final outcomes.
Only a single measure can be used in the calculation of a given cost effectiveness ratio. It cannot reflect the effects of a particular intervention on both quantity and quality of life; nor can more than one aspect of quality of life be expressed. This restriction is the main limitation of cost effectiveness analysis, as other important benefits may be overlooked. Nevertheless, several cost effectiveness ratios could be calculated relating to different outcomes—but this may lead to problems of interpretation. Authors using cost effectiveness analysis should explain why they have chosen a particular outcome measure for calculation of the ratio and reassure the reader that important outcomes are not being overlooked.
In cost-utility analysis the outcome is healthy years. Quality adjusted life years measure healthy years by combining data on the life years gained by programmes with a value (usually obtained from samples of patients or the population in general) reflecting the quality of those years. Two years of life in a health state judged to be halfway between death and full health would be equivalent to one year in full health. Incremental health gain is given by the difference in quality adjusted life years produced by one intervention as compared to another.
Rather than obtaining valuations for each health state and then multiplying by the time spent in each, the use of healthy years equivalents requires a scenario of a specified sequence of health states and their duration. Respondents are asked how many healthy years of life this scenario is equivalent to—hence the term “healthy years equivalents.”
Most methods of measuring quality adjusted life years and healthy years equivalents are based on the notion of sacrifice. In economics something is not of value unless one is prepared to give up something else in order to get it. For example, using a time trade off a respondent is asked how many years of life in a health state he or she would be prepared to give up to be in full health. Using a “standard gamble” the respondent is asked to choose between a certain health state and a gamble with two possible outcomes (one worse and the other better than the health state being valued).
Estimates obtained by time trade off methods reflect respondents' attitudes to time as well as their attitudes to the health state being valued. Likewise, estimates obtained by standard gamble methods reflect respondents' attitudes to risk as well as their attitudes to the health state being valued. Economists are still debating which approach is most desirable.
Another cheaper approach is to include in the clinical trial a generic health state preference instrument, such as the EuroQoL (EQ5D)25 or McMaster health utilities index.26 The responses from patients to a simple questionnaire can then be expressed as a health state preference value by reference to pre-scaled responses (obtained by standard gamble or time trade off) from a relevant reference group.
Values can be provided by the population at large or by a sample of patients with the condition for which the treatment is being evaluated. The choice depends on the perspective of the study. If the issue is allocating resources between competing programmes the former might be used; if it is deciding the best way to treat a given condition the latter might be used. In reporting their results authors should explain why a particular source of values has been used.
In cost-benefit analysis the benefits of health care are traditionally valued in money terms by using either the human capital approach or the willingness to pay approach. The former values a health improvement on the basis of future productive worth to society from being able to return to work. Values have to be imputed for activities such as homemaking, so the human capital approach suffers from problems of how to value health improvements for retired and unemployed people.27 This fairly narrow view of the value of improved health is rarely used nowadays.
Debate continues about whether productivity gains from improved health (“indirect benefits”) should be included alongside other measures of the value of improved health. Some analysts argue that it introduces inequalities between those interventions that are aimed at individuals who could potentially return to productive activity and those that are not. Other researchers are concerned about the potential for double counting if indirect benefits are calculated alongside another method of valuing improved health. Finally, some researchers are concerned about the standard method of measuring productivity gains, which values work days lost by gross earnings. Koopmanschap et al have proposed an approach for measuring productivity changes, called the friction cost method, which recognises that the amount of production lost due to disease depends on the time an organisation needs to restore the initial production level.28 Whatever estimation method is used, indirect benefits should be reported separately so that readers can decide whether or not they should be included in the overall result of the study.
The other approach values health improvement (or types of health care) on the basis of people's willingness to pay for them—usually associated with individuals' ability to pay. If diseases affect rich and poor in different proportions, and if richer people tend to have different preferences from poor people, then treatment of diseases of the rich may appear to be “valued” more highly. A willingness to pay value will, to an extent, reflect ability to pay as well as strength of preference. It is the latter (strength of preference) which reflects “values,” so when using willingness to pay a check is needed for its association with income and social class.
Willingness to pay has advantages over techniques like quality adjusted life years since the latter focuses on valuation of health gains only, while willingness to pay permits respondents to take into account other factors (such as the value they attach to the process of care). In some cases health gain is not even an issue. For example, two different ways of screening may simply provide information in different ways from those screened,29 and respondents will still have preferences which can be assessed by use of willingness to pay. Also, in some situations individuals other than the patient may be willing to pay for improved health—for example, in the case of communicable diseases.
(6) COSTING
Quantities of resources should be reported separately from the prices (unit costs) of those resources.
Methods for the estimation of both quantities and prices (unit costs) should be given.
The currency and price date should be recorded and details of any adjustment for inflation, or currency conversion, given.
Costing involves estimating the resources used—for example, days in hospital—and their prices (unit costs). These estimates must be reported separately to help the reader judge their relevance to his or her setting. When there are many cost items reporting should concentrate on the main costs.
When economic evaluations are undertaken alongside clinical trials data on physical quantities may be gathered as part of the trial. The interpretation of resource use resulting from the trial protocol may, however, prove difficult. One view is that everything done to a patient during a clinical trial could potentially influence outcome, so the costs of all procedures should be included. On the other hand, procedures such as clinic visits solely for data collection would not take place in regular clinical care and may seem unlikely to affect outcome. Authors should consider whether the procedures followed in the trial are typical of normal clinical practice and should justify any adjustments they make to the actual observed resource use.
Outside the context of a trial, estimates of resource quantities should be based on data on real patients, collected either prospectively or retrospectively from medical records. The use of physician “expert panels” to estimate resource quantities, while common, runs the risk that respondents may give inaccurate estimates or specify the resources required for ideal care, rather than that provided in practice.
Prices of resources can be obtained from the finance departments of particular institutions or from national statistics, but charges (or fees) can differ from real costs. The authors of studies should comment on the extent to which the use of charges may bias their estimates.
Guidelines on economic appraisal rarely discuss in detail whether the interventions being compared should be costed at marginal or average cost. Marginal costs are the additional costs of changes in the production of a service. Some authors claim the superiority of marginal costing over average costing, but this choice can be related to context and timeframe. In the short run few costs may be variable if a change in treatment is introduced, whereas over longer periods all resources, including buildings, can be switched to other uses.
Thus if the study relates to a decision of a hospital manager the short run marginal costs of the various options in his or her hospital may be the relevant costs in the current budget period. If the decision relates to a matter of national policy, however, average costs may be more appropriate as these reflect the true variable costs when many services are provided in a large number of facilities across the country.
Finally, the dates of both the estimates of resource quantities and prices should be recorded, along with details of any adjustments to a more recent price level. Also, attention should be paid to the generalisation of cost estimates, since relative prices and the opportunities to redeploy resources may differ from place to place.30 Currency conversions should, when possible, be based on real purchasing power, rather than financial exchange rates, which fluctuate according to money market changes.31 32
(7) MODELLING
Details should be given of any modelling used in the economic study—for example, decision tree model, epidemiology model, regression model.
Justification should be given of the choice of the model and the key parameters.
Modelling techniques enable an evaluation to be extended beyond what has been observed in a single set of direct observations. The model will necessarily be simplified, and the extent to which the simplification is appropriate will be a matter of judgment. Modelling may involve explicit and recognised statistical or mathematical techniques. It may, however, simply bring together data from a variety of sources into a formal prespecified conceptual framework, such as a decision analysis model incorporating best available evidence from a wide variety of sources. It may be “what if” modelling, exploring what values for particular uncertain parameters would be needed for a treatment to be cost effective.
Modelling may be required (a) to extrapolate the progression of clinical outcomes (such as survival) beyond that observed in a trial—for example, the progression of disease in patients with asymptomatic AIDS33; (b) to transform final outcomes from intermediate measures—for example, survival and coronary heart disease events from cholesterol concentrations34; (c) to examine the relation between inputs and outputs in production function models to estimate or apportion resource use—for example, in a cost analysis of neonatal intensive care35; (d) to use data from a variety of sources to undertake a decision analysis—for example, of screening options for prostate cancer36; (e) to use evidence from trials, or systematic reviews of trials, to reflect what might happen in a different clinical setting or population—for example, treatments for respiratory distress syndrome in preterm infants.37
The key requirements are that the modelling should be explicit and clear. The authors should explain which of the reported variables/parameters have been modelled rather than directly observed in a particular sample; what additional variables have been included or excluded; what statistical relations have been assumed or derived; and what evidence supports these assumptions or derivations.
All this information may not be included in the published paper, but it should be available to the reviewer. The overall aim of published reports should be to ensure transparency so that the importance and applicability of the methods can be clearly judged (see section 9).
Analysis and interpretation of results (8) ADJUSTMENTS FOR TIMING OF COSTS AND BENEFITS
The time horizon over which costs and benefits are considered should be given.
The discount rate(s) should be given and the choice of rate(s) justified.
If costs or benefits are not discounted an explanation should be given.
The time horizon should be long enough to capture all the differential effects of the options. It should often extend to the whole life of the treated individuals and even to future generations. If the time horizon is shortened for practical reasons this decision should be justified and an estimate made of any possible bias introduced. Justifying a short time horizon on the grounds of the duration of the available empirical evidence may be fallacious.38 If the relevant horizon for the decision is long term additional assumptions may need to be made.
In health care there is a still debate on discounting.39 40 Most analysts agree that costs should be discounted in any study having a time horizon longer than one year. At present most recommendations seem to vary between 3 and 6%, and a common rate in the literature is 5% per year. Certainly the analyst should use the government recommended rate, probably as the baseline value, and provide a sensitivity analysis with other discount rates. It is also helpful to provide the undiscounted data to allow the reader to recalculate the results using any discount rate.
Most analysts argue that health benefits should be discounted at the same rate as costs in the baseline analysis, even if they are expressed in non-monetary units, such as life years or quality adjusted life years. A zero discount rate—or one lower than that used for costs—can be introduced in the sensitivity analysis. A lower rate is advocated so as not to penalise preventive programmes and also because the results of some studies seem to suggest it.39
However, there is no a priori economic reason to favour preventive programmes and the comparisons may be between them. Imagine two programmes having the same discounted costs and the same total (undiscounted) amount of benefits, say 100 life years, but programme A obtains these benefits between years 2 and 3 and programme B between years 52 and 53. Not discounting health benefits would result in both programmes having the same cost effectiveness ratio, which seems absurd. Moreover, if the absolute benefits of programme B were 100 years and 1 day, it would be preferred—again absurdly.
It is doubtful if there is enough empirical evidence on which to base a decision on the appropriate discount rate. Moreover, if the empirical argument is accepted it should also be applied to the discounting of costs. In favour of a single discount rate for costs and benefits are, firstly, consistency between cost effectiveness and cost-benefit analysis and, secondly, the idea that it is always possible to transform wealth (resources) into health at any point in time. Then, if resources are discounted, why should health not be discounted?
Given the current debates about discounting, the main emphasis should be on transparency in reporting the methods used.
(9) ALLOWANCE FOR UNCERTAINTY
When stochastic data are reported details should be given of the statistical tests performed and the confidence intervals around the main variables.
When a sensitivity analysis is performed details should be given of the approach used—for example, multivariate, univariate, threshold analysis—and justification given for the choice of variables for sensitivity analysis and the ranges over which they are varied.
A recent review suggested that one in four published economic evaluations failed to consider uncertainty at all, and only one in eight handled it well. Without proper consideration of uncertainty the reader may be unable to judge whether conclusions are meaningful and robust.41
At least three broad types of uncertainty are recognised.42
Uncertainty relating to observed data inputs—When observed data have been sampled from an appropriate population standard statistical methods should be used. Typically, confidence intervals might be presented. When both costs and effects have been derived from a single set of individual patient data a stochastic approach may be used to the presentation of the confidence intervals surrounding the cost effectiveness ratio.43 44 45 When data come from a sample attention should also be given to sample size and power. In many studies alongside clinical trials sample size may have been determined entirely by clinical endpoints. In some cases a subsample is assumed to be adequate for collecting data on resource use, but in many cases the variability in resource use data is greater than for clinical parameters, and the distribution of values is often non-normal. Attention must be paid to whether sample sizes are adequate for the economic analyses. Ideally power calculations should be presented.
Uncertainty relating to extrapolation—When data have been extrapolated or modelled (see section 7) the uncertainty inherent in that process is best handled by appropriate sensitivity analysis.
Uncertainty relating to analytical methods—Uncertainties may stem from the existence of alternative analytical methods. Some issues will be avoided by an explicit statement of the approach to be adopted, but others may be usefully handled by using sensitivity analysis—for example, to present results for different discount rates, or with and without indirect costs.
Except for sampled data, uncertainty is usually handled using some form of sensitivity analysis. Simple sensitivity analysis (one way or multi-way), threshold analysis, analysis of extremes, and probabilistic sensitivity analysis may each be appropriate in particular circumstances.42 The ranges of values tested need to be justified and ideally should be based on evidence or logic.
Authors and reviewers should pay particular attention to whether the important question is the precision of the quantitative results or the robustness of the conclusions drawn from them. Firm conclusions may be shown to hold despite considerable uncertainty; on the other hand, relatively tight estimates of parameters may still leave substantial uncertainty about the policy implications of the study.
(10) PRESENTATION OF RESULTS
An incremental analysis—for example, incremental cost per life year gained— should be reported, comparing the relevant alternatives.
Major outcomes—for example, impact on quality of life— should be presented in a disaggregated as well as aggregated form.
Any comparisons with other health care interventions—for example, in terms of relative cost effectiveness—should be made only when close similarity in study methods and settings can be demonstrated.
The answer to the original study question should be given; any conclusions should follow clearly from the data reported and should be accompanied by appropriate qualifications or reservations.
The main emphasis in the reporting of study results should be on transparency. The main components of cost and benefit—for example, direct costs, indirect costs, life years gained, improvements in quality of life—should be reported in a disaggregated form before being combined in a single index or ratio.
The results of economic evaluations are usually presented as a summary index such as a cost effectiveness or cost-utility ratio. When two or more interventions are being compared in a given study, the relevant ratio is the one that relates the additional (or incremental) benefits to the additional costs. Reporting disaggregated data allows the reader to calculate other ratios that he or she sees fit.
Beyond the individual study the reporting and interpretation of cost effectiveness ratios need to be handled with care. For example, authors often compare the cost effectiveness ratios generated in their own study with those for other interventions evaluated in previous studies in “league tables,” where rankings are produced, ranging from the intervention with the lowest cost per life year (or cost per quality adjusted life year) gained to the one with the highest.
Study design (1) The research question is stated (2) The economic importance of the research question is stated (3) The viewpoint(s) of the analysis are clearly stated and justified (4) The rationale for choosing the alternative programmes or interventions compared is stated (5) The alternatives being compared are clearly described (6) The form of economic evaluation used is stated (7) The choice of form of economic evaluation is justified in relation to the questions addressed
Data collection (8) The source(s) of effectiveness estimates used are stated (9) Details of the design and results of effectiveness study are given (if based on a single study) (10) Details of the method of synthesis or meta-analysis of estimates are given (if based on an overview of a number of effectiveness studies) (11) The primary outcome measure(s) for the economic evaluation are clearly stated (12) Methods to value health states and other benefits are stated (13) Details of the subjects from whom valuations were obtained are given (14) Productivity changes (if included) are reported separately (15) The relevance of productivity changes to the study question is discussed (6) Quantities of resources are reported separately from their unit costs (17) Methods for the estimation of quantities and unit costs are described (18) Currency and price data are recorded (19) Details of currency of price adjustments for inflation or currency conversion are given (20) Details of any model used are given (21) The choice of model used and the key parameters on which it is based are justified
(22) Time horizon of costs and benefits is stated
(23) The discount rate(s) is stated
(24) The choice of rate(s) is justified
(25) An explanation is given if costs or benefits are not discounted
(26) Details of statistical tests and confidence intervals are given for stochastic data
(27) The approach to sensitivity analysis is given
(28) The choice of variables for sensitivity analysis is justified
(29) The ranges over which the variables are varied are stated
(30) Relevant alternatives are compared
(31) Incremental analysis is reported
(32) Major outcomes are presented in a dissaggregated as well as aggregated form
(33) The answer to the study question is given
(34) Conclusions follow from the data reported
(35) Conclusions are accompanied by the appropriate caveats
Two sets of objections may be raised to such rankings. Firstly, different studies may have used different methods. Differences in cost per quality adjusted life year could arise from differences in methodological approach, rather than real differences in the interventions themselves.46 Secondly, a simplistic interpretation of league tables may be misleading. For example, each cost effectiveness or cost-utility ratio in the league would have been generated by reference to a comparison programme. In some cases this would have been doing nothing; in others it would have been current care. The incremental ratio will therefore vary in relation to the comparison chosen, which may not itself be an efficient intervention.
Birch and Gafni argue that, in deciding whether or not to adopt a particular intervention, the decision maker needs to assess the opportunity cost for the health care budget.47 Whether or not the total health care budget should grow is a question for cost-benefit analysis, not cost effectiveness or cost-utility analysis. On the other hand, Johannesson argues that cost effectiveness analysis is best viewed as a subset of cost benefit analysis and that, to interpret and use cost effectiveness analysis as a tool to maximise the health effects for one specified real world budget, would be inconsistent with a societal perspective and likely to lead to major problems of suboptimisation.48
In practice, the answer may lie in the way the results of economic evaluations are interpreted. Published data are inevitably specific to a context and will need some reinterpretation by decision makers in other settings. Transparency in reporting can help decision makers generalise results from one setting to another.
Finally, apart from being modest about the generalisability of their results, authors should ensure that their analysis is relatively conservative. Sensitivity analysis plays an important part here, and enough results should be presented to enable the reader to assess the robustness of the study conclusions.
Evaluating the guidelines
We intend to evaluate the guidelines. The options are still under discussion, but the evaluation will probably focus on four questions:
Do the guidelines help BMJ editors filter out unpublishable economic studies at an early stage? This has two components: (a) distinguishing full economic evaluations from other types of economic submissions and (b) avoiding wasting time refereeing papers that are fundamentally flawed. This question could be answered by undertaking a study of economic submissions before and after the publication of the guidelines.
How satisfied are editors, reviewers, and authors with their respective checklists? This question could be answered by assessing the checklists with a questionnaire.
Do the guidelines improve the quality of referees' reports on economic evaluations? This question could be answered by a prospective study to compare reports from reviewers who had and had not been asked to apply the referees' checklist.
Do the guidelines improve the quality of the economic evaluations that are eventually published? This is probably the most difficult question to answer, since it requires a view to be taken about the methodological principles of economic evaluation. However, the evaluation might focus on the transparency of reporting of results, since the main objective of the guidelines is to improve this. Again, a prospective evaluation would be required, comparing the version of economic evaluations submitted to the BMJ with the version eventually published. We forsee two practical problems with this component of the evaluation. Firstly, the BMJ currently receives only a limited number of full economic evaluations,20 so a prospective study might take some time. Secondly, it will be difficult to separate out the distinctive contribution of the guidelines from the benefits of the peer review process more generally.
Members of the working party were: M Buxton, London; V Demicheli, Pavia, Italy; C Donaldson, Aberdeen; M Drummond (chair), York; S Evans, London; TO Jefferson (secretary), Aldershot, UK; B Jonsson, Stockholm; M Mugford, Oxford; D Rennie, Chicago; J Rovira, Barcelona; F Rutten, Rotterdam; K Schulman, Washington DC; R Smith (editor, BMJ), London; A Szczepura, Warwick, UK; A Tonks (assistant editor, BMJ), London; G Torrance, Hamilton, Canada; A Towse, London.
We thank Vanessa Windass and Gaby Shockley for secretarial help and an anonymous referee for helpful comments.
Footnotes
-
Funding Since no funding was sought for this project we thank our own institutions for bearing the cost.
-
Conflict of interest None.