Using hospital mortality rates to judge hospital performance: a bad idea that just won’t go awayBMJ 2010; 340 doi: http://dx.doi.org/10.1136/bmj.c2016 (Published 20 April 2010) Cite this as: BMJ 2010;340:c2016
- Richard Lilford, professor of clinical epidemiology1,
- Peter Pronovost, anaesthesiologist and critical care physician2
- 1Public Health, Epidemiology and Biostatistics, University of Birmingham, Edgbaston, Birmingham B15 2TT
- 2Department of Anesthesiology and Critical Care Medicine, Quality and Safety Research Group, Johns Hopkins University School of Medicine, 1909 Thames St, Baltimore, MD 21231, USA
- Correspondence to: R Lilford
- Accepted 2 April 2010
Death is the most tractable outcome of care— it is easily measured, of undisputed importance to everyone, and is common in hospital settings. Mortality rates, especially overall hospital mortality rates, have therefore become the natural focus for measurement of clinical quality. In England a high death rate “attracted the attention of the [Healthcare Commission] (HCC) and caused it to launch its investigation” into the Mid Staffordshire NHS Foundation Trust.1
So what is the problem with measuring clinical performance by comparing hospital mortality rates and what alternatives can we offer?
Hospital mortality as a measure of quality: scientific issues
The problem stems from the ratio of a low signal (preventable deaths) in relation to high noise (deaths from other causes). A common but naive response is to argue that risk adjustment to produce a standardised mortality ratio (SMR) solves this problem. However, the idea that a risk adjustment model separates preventable from inevitable deaths is wrong for two reasons.
Firstly, risk adjustment can only adjust for factors that can be identified and measured accurately.2 Randomised trials are preferable to observational studies with statistical controls for this reason. The error of attributing differences in risk adjusted mortality to differences in quality of care is the “case-mix adjustment fallacy”.3
Secondly, risk adjustment can exaggerate the very bias that it is intended to reduce. This counterintuitive effect is called the “constant risk fallacy” and it arises when the risk associated with the variable on which adjustment is made varies across the units being compared.4 For example, if diabetes is a more powerful prognostic factor in Glasgow than in Four Oaks, then adjusting for the average effect of diabetes will deflate expected diabetic deaths in Glasgow and inflate them in Four Oaks. Depending on the distribution of diabetes, this effect could tilt the playing field against Glasgow. The extent to which comorbidities are recorded in hospital statistics (the coding depth) is a potent source of this form of bias.5
Moreover, SMRs vary by about 60% across UK hospitals.6 The proposal that variance of this magnitude can be attributed to differences in the quality of care is not clinically intuitive and does not align well with the following observations:
(1) Mant and Hicks showed that differences in the quality of care could explain only half the observed variance in heart attack mortality even in a fictitious scenario where adherence to the tenets of good care varied from 0% to 100%.7
(2) The famous Harvard malpractice study found that 0.25% of admissions resulted in avoidable death.8 Assuming an overall hospital death rate of about 5% this implies that around one in 20 inpatient deaths are preventable, while 19 of 20 are unavoidable. We have corroborated this figure in a study of the quality of care in 18 English hospitals (submitted for publication). Quality of care accounts for only a small proportion of the observed variance in mortality between hospitals. To put this another way, it is not sensible to look for differences in preventable deaths by comparing all deaths.
(3) Little or no correlation exists between how well a hospital performs on one standard of safe and effective care and how well it performs on another; differences in the quality of care within hospitals are much greater than differences between hospitals.9 10 This finding does not support the prevailing notion of large scale systematic differences in quality at the institutional level and suggests that while commercial organisations such as Enron fail corporately, hospitals are more likely to fail on the specifics—pathology in Liverpool;11 paediatric cardiac surgery in Bristol;12 radiation therapy in Missouri.13
In view of these findings, it is not surprising that a systematic review of empirical studies showed very little correlation between measured quality of care and SMRs.14 In short, hospital mortality rates are a poor diagnostic test for quality and SMRs do not identify preventable deaths.15 16 Given the fragility of SMRs, they should not be used to calculate excess deaths resulting from poor care, yet Mid Staffordshire hospital was blamed for 400 excess deaths on this precarious basis.17
Consequences of using mortality rate to (mis)judge quality of care
Mortality rates, like knives and nuclear particles, are neutral; it is the use to which they are put that has moral salience and that will determine the balance of benefits and harms. We believe that it is not collection of mortality rates per se that is wrong, but rather the use of mortality rates as a criterion for “performance management” (that is, as the basis for sanction or reward). Instituting penalties (or withholding rewards) on the basis of hospital mortality rates is correctly perceived as unjust. Moreover, hospital mortality rates are silent about where any problem might lie. This combination of unfairness and non-specificity is a toxic mix, inducing what has been called “institutional stigma”—a feeling of helpless frustration.3 Human beings are strongly motivated by stigma and will take the shortest route to avoid it,18 even if to do so involves “gaming” the system—for example, by upgrading risk assessments.19 Furthermore, a focus on hospital mortality may lead to overly aggressive care, which is inhumane and drives up costs.20
Defining the principled use of mortality rates
There is an argument for use of hospital mortality rates not as the basis for judgment leading to sanction or reward, but as a signal to identify where further investigation is necessary. This is a seductive argument because at one level it is self evidently correct; there are good arguments for keeping mortality rates under review, so that interesting differences can be investigated. Comparing practices in different places has provided crucial insights, from Semmelweis’s discovery of antisepsis to more recent evidence about the optimal age for surgery to repair congenital mitral valve disease.21
However, controlled studies of this type are entirely different from single site investigations called down on organisations by regulatory bodies. In Mid Staffordshire “concerns about mortality” prompted first the HCC investigation,22 then two reviews commissioned by the Department of Health,23 24 and finally an independent inquiry set up by the secretary of state for health.1 Use of mortality rates to prompt such examinations is not politically neutral; the investigation is itself a sanction. The potential injustice arises from the questionable veracity of a public review carried out with subjective tools in the expectation of finding faults under an atmosphere of escalating emotion.25 Notably, although the majority of people who came forward to give evidence to the Mid Staffordshire inquiry made negative comments, the hospital was certainly not an outlier on the survey of patient opinion that is carried out in all English hospitals every two years.26
Public inquiries may be needed from time to time, but like other interventions they have their own dangers. Our point is not that the Mid Staffordshire hospital was blameless, but that other unidentified hospitals might have been found equally deficient had they been investigated to the same degree in response to the same initial trigger. The signal that initiates an inquiry is therefore of crucial concern. Once a public inquiry is instigated it takes on a life of its own. Far from vindicating the use of SMRs, as some have claimed, finding problems becomes an almost self fulfilling prophecy.27 Thus a twofold risk exists. Firstly, a management team that is no worse than many others will be unjustly singled out by a process that, like the persecution of witches, was well intentioned but informed by the wrong theory.28 Secondly, a myopic focus on a single institution is a distraction from the types of systematic inquiry that may lead to pervasive improvement across a service, such as reducing bloodstream infections in all hospitals.29
Why is use of mortality statistics for performance monitoring so enduring?
Few, if any, of the above arguments are original. Yet the monster survives like the mythical Hydra, which sprouted new heads as quickly as the tireless Hercules could hack them off. We hypothesise that the practice is kept alive by well meaning decision makers who want the idea that mortality reflects quality to be true. Partly this is because stopping people from dying evokes a clear sense of moral purpose, and partly because mortality is such a convenient parameter to collect—for example, it does not require questionnaires for which psychometric properties must be tested, or expensive review of case notes.
If not mortality, then what?
We are not arguing that healthcare providers are exempt from accountability or that patients should not be protected. On the contrary the search for robust measurements should not be impeded by fixing prematurely on a parameter that offers false hope. Two broad types of measurement should be considered: outcomes (other than hospital mortality) and clinical processes.
A few outcome measures appear to be sensitive to quality. Firstly, mortality rates associated with high risk procedures that are heavily dependent on technical skill, such as intrauterine transfusion and heart surgery, seem to have strong signal to noise ratios.30 Use of outlier status to identify poor practice may be credible in these circumstances but the signal will become progressively less informative (random error will make up a larger part of the variance) as poor practice is weeded out.30 Secondly, a limited number of non-mortality outcomes are heavily influenced by the quality of care—rates of hospital acquired bloodstream infection, for example.
However, the responsible use of outcomes in performance management is limited to the few circumstances in which they are valid proxies for quality. We strongly advocate direct measurement of quality by observing clinical process (error rates). That is to say, adherence to the tenets of evidence based safe care should be audited as recommended by the House of Commons Select Committee.31 Such a policy has two significant advantages over hospital mortality rates, as the basis for quality improvement.
(1) The action that should follow is inherent in the criterion itself. For example, if audit of clinical processes shows that anticoagulants are not being given before hip replacement surgery, then it is clear where the hospital should direct its attention. The stigmatising effect is minimal, because the organisation knows what to do to improve.9 10
(2) Since the contingent action is usually clear cut, process audit can provoke improvements wherever there is headroom for better performance, including where the hospital is an average or above average performer. For example, repeated audit and feedback has resulted in improved care for patients with acute heart attack, such that some sort of treatment to unblock the coronary artery is now used in nearly 100% of cases in England.32 Process audit can therefore shift distributions of performance across many criteria and across many organisations. Shifting the mean in this way yields more health gain than can be realised simply by truncating the tail of a distribution.3
Where management and statistical theories collide
Process monitoring is not a panacea. Process is generally more expensive to monitor than outcome, since numerator and denominator data have to be harvested, often from case notes (this process should become easier as electronic records become more widespread). The continuous evolution of evidence requires measures to be constantly updated, and process audit is subject to many types of measurement error, discussed in detail elsewhere.33 Comparative measurement of clinical process will be biased if observers are party to information that may prejudice them, if different observers review different organisations, or if organisations are reviewed in series where learning and fatigue effects may manifest. In short, scientific principles must be followed to avoid bias.
Bias due to differences in measurement in different institutions is of little or no consequence when data are used in bottom-up, staff driven, quality improvement exercises, as in the Veterans’ Administration QUERI system.34 However this sort of bias is crucially important when the results of audit are used as the basis for reward (such as a financial payment) or sanction (such as an external inquiry).
We incline towards a bottom-up agenda for quality improvement and would advocate performance management at one remove, by ensuring that clinical teams have systems in place to monitor quality rather than collecting large amounts of poorly calibrated information centrally. However, the issue of how best to improve quality for a given unit of resource and how best to be accountable to the public remains open. We do not pretend to have all the answers; the science needs to mature, not only to improve the measurement of quality, but also to learn how to use the (inevitably imperfect) measurements so that they do more good than harm. Deeper understanding will depend not on statistical or organisational studies carried out in isolation, but on synthesis of both subjects, in the tradition of Shewhart and Deming over half a century ago.16 While the optimal solution remains elusive, we have argued that certain suboptimal solutions can be identified; performance management of medical care by hospital mortality belongs in this latter category.
Cite this as: BMJ 2010;340:c2016
We thank Alan Girling (University of Birmingham), Cor Kalkman (UMC Utrecht), and John Wright (Bradford Teaching Hospitals NHS Foundation Trust) for their comments, and Peter Chilton (University of Birmingham) for help in preparation of the manuscript.
Contributors and sources: RL produced the first draft of the paper, with subsequent iterations between both RL and PP. RL is guarantor. This article was written as a result of the authors’ longstanding interest in safety and quality of care and the provocation arising from the publication of the Mid Staffordshire report in the week commencing 24 February 2010, and the subsequent media coverage—for example, the BBC Panorama programme of 8 March 2010.
Competing interests: All authors have completed the Unified Competing Interest form at www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declare that: (1) RL had financial support for methodological work from the National Institute for Health Research (NIHR) Collaborations for Leadership in Applied Health Research and Care (CLAHRC) for Birmingham and Black Country, and the Multidisciplinary Assessment of Technology Centre for Healthcare (MATCH) project, although views expressed are his own. PP had no financial support for the submitted work from anyone other than his employer; (2) No financial relationships with commercial entities that might have an interest in the submitted work; (3) No spouses, partners, or children with relationships with commercial entities that might have an interest in the submitted work; (4) PP receives speaking honorariums from not-for-profit hospitals for speaking on safety and quality measurement. RL has no non-financial interests that may be relevant to the submitted work. No other conflicts of interest declared by either author.
Provenance and peer review: Not commissioned, externally peer reviewed.