Evidence based purchasing: understanding results of clinical trials and systematic reviews
BMJ 1995; 311 doi: https://doi.org/10.1136/bmj.311.7012.1056 (Published 21 October 1995) Cite this as: BMJ 1995;311:1056- T Fahey, senior registrar in public health medicinea,
- S Griffiths, director of public health medicine,a,
- T J Peters, senior lecturer in medical statisticsb
- aDepartment of Public Health Medicine and Health Policy, Oxfordshire Health, Oxford OX3 9DZ
- bDepartment of Social Medicine, University of Bristol, Bristol BS8 2PR
- Correspondence to: Dr T Fahey, Department of Social Medicine, University of Bristol, Bristol BS8 2PR.
- Accepted 25 July 1995
Abstract
Objective: To assess whether the way in which the results of a randomised controlled trial and a systematic review are presented influences health policy decisions.
Design: A postal questionnaire to all members of a health authority within one regional health authority.
Setting: Anglia and Oxford regional health authorities.
Subjects: 182 executive and non-executive members of 13 health authorities, family health services authorities, or health commissions.
Main outcome measures: The average score from all health authority members in terms of their willingness to fund a mammography programme or cardiac rehabilitation programme according to four different ways of presenting the same results of research evidence—namely, as a relative risk reduction, absolute risk reduction, proportion of event free patients, or as the number of patients needed to be treated to prevent an adverse event.
Results: The willingness to fund either programme was significantly influenced by the way in which data were presented. Results of both programmes when expressed as relative risk reductions produced significantly higher scores when compared with other methods (P<0.05). The difference was more extreme for mammography, for which the outcome condition is rarer.
Conclusions: The method of reporting trial results has a considerable influence on the health policy decisions made by health authority members.
Key messages
Key messages
Previous studies have shown that doctors' and patients' preferences for treatment are strongly influenced by the method used to present results of trials and reviews
In this study the same results from a clinical trial and a systematic review were presented by using four different methods to health authority purchasers
Health authority members' willingness to purchase services was influenced by the methods used to present results
A full understanding of the ways in which the results of trials and reviews are presented is essential for informed purchasing and health policy decision making
Introduction
The randomised controlled trial is regarded as the gold standard in the assessment of healthcare interventions.1 Its explanatory power permits qualitative conclusions about whether a treatment works and a quantitive assessment of the extent to which it works.2 Purchasing organisations are now being exhorted to make health policy and purchasing decisions in terms of evidence of clinical effectiveness.3 4 A large proportion of such evidence comes in the form of randomised controlled trials. Research has shown that the way in which the results from controlled trials are expressed has a significant influence on physicians' willingness to prescribe drugs5 6 7 8 9 10 and patients' perception of benefit from treatment.11 Whether or not the framing of the results of controlled trials has an influence on purchasing and health policy decisions has not been examined.
There are at least four ways of reporting outcomes of clinical trials: as a relative risk reduction, absolute risk reduction, proportion of event free patients, and the number of patients who need to be treated to prevent an event.12 Unfortunately, all too often when the results of randomised controlled trials are reported only one summary measurement of efficacy is used, most commonly the relative risk reduction. The problem with this approach is that relative risk measurements give the reader no idea of the background event rate—that is, the susceptibility of the population to the outcome of interest. Thus, when relative benefits are considerable and the outcome of interest rare, the relative risk reduction may remain high, although the absolute value of treatment will fall substantially. Quite often this problem is confounded in secondary reports and subsequent editorials that also emphasise relative differences at the expense of absolute benefits. As Feinstein states “clinicians are much impressed by the bigger numbers of relative changes than by the smaller magnitudes of the absolute changes for the same results.”13
With the advent of evidence based medicine, understanding the quantitive methods of data presentation from clinical trials is of major importance in terms of informed decision making in health policy.2 Our aim was to assess whether health policy decisions are influenced by the way in which the results of clinical trials and systematic reviews are presented.
Methods
We sent a postal questionnaire to the 182 executive and non-executive members of health authorities, health commissions, and family health services authorities within the Anglia and Oxford regional health authorities. These authorities are responsible for the setting of health policy for their local populations and are made up of people with a broad range of skills. Executive members have responsibility for purchasing, financial, and personnel management, and directors of public health have specific training in epidemiology and include in their responsibilities the assessment of the effectiveness of medical interventions and the needs of their local population. Non-executive members, most of whom are non-medical, are appointed by the chairperson of the regional health authority on behalf of the Secretary of State to act as public representatives for their local population.
The questionnaire was constructed to present data from a single clinical trial on the efficacy of breast cancer screening14 and of a single systematic review on the efficacy of cardiac rehabilitation.15 For each programme, data were presented in four different ways in the following order: as a relative risk reduction, an absolute risk reduction, as the proportion of event free patients, and as the number of patients who needed to be treated to prevent one death.
The absolute risk reduction (absolute difference) is (x-y) where x is the proportion of patients suffering the outcome of interest in the control group and y is the proportion of patients suffering the same outcome in the treatment group.16 The outcome in the control group, x, thus defines the background risk. The number of patients who need to be treated to prevent one death is the reciprocal of the absolute risk difference, 1/(x-y), and the relative risk reduction (relative difference) is [(x-y)/x]x100. The proportion of event free patients is the percentage of patients in each group, x and y, expressed in terms of survival rather than death. table I shows a worked example from a systematic review that examined the efficacy of coronary artery bypass grafting17 in terms of survival benefit to illustrate the various methods of data presentation.
Each set of data was presented on the questionnaire as the result of a different “trial” (appendix). Respondents were asked to mark an “X” along a linear scale marked from 0 (“I would not support the purchasing of this service”) to 10 (“I would strongly support the purchasing of this service”) in terms of a health authority decision to fund a breast screening programme and a cardiac rehabilitation programme. Respondents were not asked to compare the two programmes with each other. It was stated clearly that the decision to fund each programme should be taken only on the strength of the results presented from each “trial” and that costs of each alternative were the same. Respondents were asked to disregard any national or local policies with regard to the funding of either programme.
Mean scores (95% confidence interval) were calculated for each alternative presentation of results. For each programme, repeated measures analysis of variance was performed with the MANOVA procedure of SPSS for Windows.18 The global test for differences between the four options was carried out followed by pairwise multiple comparisons with the Bonferroni correction. All these analyses make full allowance for within person effects.
Results
Overall, 140 questionnaires were returned, three people stating that they could not give an opinion because they required further information. The response rate was therefore 75%, 79% for executive and 76% for non-executive members, and ranged from 68% to 94% across different health authorities.
table II gives the descriptive statistics for the six pairwise comparisons between the options. The decision to fund either programme was significantly influenced by the way in which the data were presented (table III). The overall F test for differences between the four methods of data presentation was highly significant (P<0.001). Reporting as a relative risk reduction produced the highest mean score to support investment in both programmes, and this was significantly higher when compared with all other methods of data presentation. In addition, the number needed to treat was significantly higher than the absolute risk reduction and the percentage event free for mammography and higher than just the latter for cardiac rehabilitation (table III). Only three respondents (all non-executive members claiming no training in epidemiology) stated that they realised that all four sets of data summarised the same results.
Discussion
This study shows that interpretation of clinical trials and systematic reviews is influenced by the method used to summarise the results. Health policy makers, like clinicians,5 6 7 8 9 are more impressed by measurements that report relative benefits compared with those entailing an index of absolute benefit. The mean estimate obtained with the relative risk reduction for both scenarios was distinctly higher than the mean obtained with all other methods. Furthermore, such differences are most extreme when the background susceptibility—in this example, death from breast cancer—is lowest (table III).
The two scenarios were chosen deliberately to reflect the fact that absolute benefits of interventions can differ considerably, even though the relative benefits may be roughly similar. The background risk of death is 85 times higher in patients who have already suffered a myocardial infarct (background risk of death 16% over three years) compared with middle aged women who are eligible to enter a breast screening programme (background risk of death from breast cancer 0.19% over seven years), Such large absolute differences in benefit are not reflected in the results of trials when relative differences are reported alone.
Consequently, it is essential that quantitative data presented in trials and systematic reviews reflect absolute as well as relative benefits.19 This is particularly important in health policy, where background risks vary considerably according to whether interventions are aimed at populations with or without established disease. The number needed to treat possesses useful properties as a versatile way of presenting data. Calculation is relatively straightforward, background risk is incorporated into the calculation, and methods are now available to adjust it according to background risk in the target population.20 Estimation of numbers needed to treat is also useful when the efficacy of different forms of interventions, such as therapeutic trials, surgical procedures, and screening programmes are compared.12 Associated efforts, costs, and side effects of different forms of interventions can then be measured in a common currency. Furthermore, in the absence of information about the costs of alternative programmes the number needed to treat is more realistic as a proxy for cost effectiveness than is relative risk reduction.
Other commentators have suggested that the results of randomised trials and meta-analyses may require different forms of presentation when benefits at a health policy level compared with an individual level are assessed. For example, a systematic review of survival benefits from coronary artery bypass grafting on patients with stable angina expressed results in two ways.17 Firstly, as an aggregate mean survival benefit of 4.26 months at 10 years for all patients randomised to bypass grafting; and, secondly, as a disaggregated benefit (results broken down by subgroups within those randomised to bypass grafting) in which 16% of patients gained between 1.5 and 3.5 years whereas the remaining 84% had no change in life expectancy.21 Again, there is evidence to suggest that doctors are more impressed when the associated gain in life expectancy is expressed in a disaggregated form rather than an albeit equivalent, average figure.22 What should be recognised from all these examples5 6 7 8 9 10 22 is that informed clinical and health policy decisions should be made on the basis of the full understanding and limitations of the data presented. Such decisions depend on scientific description and substantive perception, not on statistical probability.13
Purchasing authorities are being exhorted to increase investment in clinically effective interventions and reduce investment in interventions that evidence has shown to be ineffective.23 With the establishment of the NHS research and development programme an increasing quantity of research is likely to be presented to policy makers in the form of clinical trials and meta-analyses.4 In particular, Cochrane Collaboration systematic reviews and publications from the NHS Centre for Reviews and Dissemination require an understanding of the quantitive and qualitative methods used in summarising evidence. Understanding the advantages and limitations of data presentation is fundamental to the proper interpretation of such research.
This study has shown that understanding on the part of policy makers about how results of controlled trials and systematic reviews are presented needs to be enhanced. The critical appraisal skills for purchasers (CASP) project aims to help purchasers and other decision makers develop their skills in critically appraising evidence about effectiveness.24 Results from this study suggest that health authorities need to consider ways in which the critical appraisal skills of their purchasers might be improved.
We would like to thank David Sackett, professor of clinical epidemiology, Centre for Evidence Based Medicine, John Radcliffe Hospital, University of Oxford, and Dr Ruairidh Milne, consultant in public health medicine, Anglia and Oxford Regional Health Authority, for their helpful suggestions on earlier drafts.
Appendix
MAMMOGRAPHY QUESTIONNAIRE
There is a proposal to offer a breast screening programme to women aged 50-64 in your health authority. There are doubts about the effectiveness of such a screening programme. We will give you four statements derived from four different randomised controlled trials published in medical journals. On the basis of each statement you should indicate how likely you are to agree to the implementation of a breast screening programme. Assume that the costs of each programme are the same. Each result was deemed to be significant.
During a seven year follow up
Programme A reduced the rate of deaths from breast cancer by 34%
Programme B produced an absolute reduction in deaths from breast cancer of 0.06%
Programme C increased the rate of patients surviving breast cancer from 99.82% to 99.88%
Programme D meant that 1592 women needed to be screened to prevent one death from breast cancer.
CARDIAC REHABILITATION QUESTIONNAIRE
There is a proposal to offer a cardiac rehabilitation programme to people who have suffered a heart attack (myocardial infarct) in your health authority. There are doubts about the effectiveness of such a rehabilitation programme. We will give you four statements derived from four different randomised controlled trials published in medical journals. On the basis of each statement you should indicate how likely you are to agree to the implementation of a cardiac rehabilitation programme. Assume that the costs of each programme are the same. Each result was deemed to be significant.
During a three year follow up
Programme A reduced the rate of deaths by 20%
Programme B produced an absolute reduction in deaths of 3%
Programme C increased the rate of patient survival from 84% to 87%
Programme D meant that 31 people needed to enter a rehabilitation programme to prevent one death.
Footnotes
-
Source of funding None.
-
Conflict of interest None.