Developing a summary hospital mortality index: retrospective analysis in English hospitals over five yearsBMJ 2012; 344 doi: http://dx.doi.org/10.1136/bmj.e1001 (Published 01 March 2012) Cite this as: BMJ 2012;344:e1001
- Michael J Campbell, professor of medical statistics,
- Richard M Jacques, research associate,
- James Fotheringham, PhD student,
- Ravi Maheswaran, reader in public health,
- Jon Nicholl, professor of health services research
- Correspondence to: M J Campbell
- Accepted 1 February 2012
Objectives To develop a transparent and reproducible measure for hospitals that can indicate when deaths in hospital or within 30 days of discharge are high relative to other hospitals, given the characteristics of the patients in that hospital, and to investigate those factors that have the greatest effect in changing the rank of a hospital, whether interactions exist between those factors, and the stability of the measure over time.
Design Retrospective cross sectional study of admissions to English hospitals.
Setting Hospital episode statistics for England from 1 April 2005 to 30 September 2010, with linked mortality data from the Office for National Statistics.
Participants 36.5 million completed hospital admissions in 146 general and 72 specialist trusts.
Main outcome measures Deaths within hospital or within 30 days of discharge from hospital.
Results The predictors that were used in the final model comprised admission diagnosis, age, sex, type of admission, and comorbidity. The percentage of people admitted who died in hospital or within 30 days of discharge was 4.2% for males and 4.5% for females. Emergency admissions comprised 75% of all admissions and 5.5% died, in contrast to 0.8% who died after an elective admission. The percentage who died with a Charlson comorbidity score of 0 was 2% in contrast with 15% who died with a score greater than 5. Given these variables, the relative standardised mortality rates of the hospitals were not noticeably changed by adjusting for the area level deprivation and number of previous emergency visits to hospital. There was little evidence that including interaction terms changed the relative values by any great amount. Using these predictors the summary hospital mortality index (SHMI) was derived. For 2007/8 the model had a C statistic of 0.911 and accounted for 81% of the variability of between hospital mortality. A random effects funnel plot was used to identify outlying hospitals. The outliers from the SHMI over the period 2005-10 have previously been identified using other mortality indicators.
Conclusion The SHMI is a relatively simple tool that can be used in conjunction with other information to identify hospitals that may need further investigation.
About 60% of deaths occur in hospital.1 Although a large proportion of these are inevitable, avoidance of unnecessary death is an important objective for health services. Several methods are used within the United Kingdom’s health service to identify trusts with high in-hospital mortality, the most widely publicised being the standardised mortality ratio (a ratio of observed to expected deaths), which is calculated from a statistical model.
The hospital standardised mortality ratio (HSMR)2 produced by Dr Foster, a provider of healthcare information based at Imperial College, London has been used by the Department of Health for several years to identify failing hospitals.3 Concerns and criticism over the methodology and interpretation of standardised mortality ratios have, however, been raised both in academic settings and by the media.4 5 6 7
Arguments against hospital standardised mortality ratios are that factors unrelated to care, such as differential measurement error and inconsistent proxy measures of risk, may affect a hospital’s ranking4 and that they are open to gaming.5 For example, the primary reason for admission is given a diagnostic code, which is used in a model to predict the likelihood of death of that patient. If one hospital codes differently from another, the expected number of deaths in that hospital and hence the hospital standardised mortality ratio can be affected. Some have suggested there is little evidence of a correlation between the quality of care a hospital provides and its hospital standardised mortality ratio.6
A national steering group was established in 2010 to develop a consensus view of the key methodological requirements for a practical hospital standardised mortality ratio.8 Advocating an indicator that was transparent, reproducible, and gave a more complete picture of mortality in hospital, the group specified a new measure, the summary hospital mortality index (SHMI). Table 1⇓ compares the specifications of the index with those of the hospital standardised mortality ratio. The index was to be calculated using the ratio of the number of deaths during admission to a particular hospital in a year compared with the number of expected deaths calculated using a model adjusted for the variables given in table 1. The index reported on all NHS acute trusts and was designed to cover deaths relating to all admissions, including deaths occurring in hospital or within 30 days after discharge.
In February 2011 the University of Sheffield was commissioned to carry out a sensitivity analysis of the suggested variables to determine if their inclusion in the model had an important impact on this new performance indicator. A report was produced in April 2011.9 The specification was accepted by the Department of Health in May and was implemented on 27 October 2011. We investigated the properties of the SHMI, why particular variables have been included, and how the index should be used.
The Department of Health supplied us with a dataset comprising all admissions to English hospitals obtained from the hospital episode statistics data warehouse for episodes that ended between 1 April 2005 and 30 September 2010. After discussion with the Department of Health, we excluded maternity admissions. The records for admissions comprised episodes of care—that is, a continuous period of care administered within a particular consultant led specialty at a single hospital provider. An admission could comprise one or more episodes in one hospital. We used information from the first episode for each admission. We excluded day cases and private and community hospitals that, based on their hospital episode statistics provider codes, were unlikely to accept acute admissions. We examined the effect of including or excluding admissions with zero length of stay. The conclusions were unaffected either way and are included in the results reported here.
We linked data on date of death supplied by the Office for National Statistics to the admission dataset and identified deaths within 30 days of discharge, which we assigned to the last hospital to which the patient was admitted. To include all admissions in the analysis, we created categories for all variables, including one for missing values. We split age into five year age bands, except for infants aged 0-1 and preschool children aged 1-4. A comorbidity score was derived by converting secondary diagnosis codes into the 19 clinical conditions identified in the Charlson comorbidity index,10 with contemporary weights for the presence of individual conditions contributing to the overall score.2 Hospital episode statistics reported the index of multiple deprivation rank (an area level deprivation measure derived from the patient’s postcode) grouped by fifths, with missing values grouped separately. To examine the interaction between age and comorbidity we assessed whether the risk of death with different levels of comorbidity was potentially different for those aged more than 65 compared with those who were younger.
In the first diagnosis field we used ICD-10 (international classification of diseases, 10th revision) codes to identify the reasons for admission, and collapsed these into the grouping schemes given by the Agency for Healthcare Research and Quality11 and National Centre for Health Outcomes Development.12 As the scheme used by the National Centre for Health Outcomes Development excludes deaths from cancer, we created new categories to include these. Initially we took a 10% random sample of the whole dataset and combined some diagnoses into clinically coherent diagnostic groups to ensure a minimum of about 100 deaths within each group, so that sufficient events were available to get robust models. This translates into a minimum of about 200 deaths in any calendar year. A few groups did not have logical partners and so had fewer deaths but none had less than 50 in the 10% sample. A total of 138 diagnostic groups were used (see web extra appendix 1 for details and number of deaths in the 10% sample).
We estimated the probability of death over all admissions in one financial year by fitting logistic regression models using the SHMI covariates within diagnostic group and then summing these probabilities over diagnostic groups for each hospital to obtain the expected number of deaths in a hospital for the year (see formulas in web extra appendix 2). The method is equivalent to indirect standardisation13 and is similar to that used by Dr Foster.3 14 We used individual case logistic regression, which does not require aggregation of categories for a model to fit. We used the same categorical predictor variables for each diagnostic group, which meant that we allowed different coefficients for the predictors for each diagnostic group but also that some models would be over-parameterised. For example, for many diseases no deaths occur at young ages and so the young age categories will be redundant, and, similarly for ovarian cancer, the expected value for men will be zero. For large datasets, however, parsimony is not a priority, and the advantage of using the same model structure in every diagnostic group is that a hospital could calculate its own SHMI by using a standard set of covariates and the weights provided by the logistic regression.
The principle we used in choosing a model is that a parameter is unnecessary in the model if it does not change noticeably either the relative or the absolute magnitude of the performance indicators of the hospitals. A variable may be a statistically significant predictor of mortality, but if the distribution of the variable is similar across hospitals, then adjusting for it will not change the values of the hospitals’ relative SHMI. A variable would have no value in discriminating between hospitals if the SHMI from a model with an additional covariate had a high correlation with an unadjusted SHMI. Therefore, using rank correlation, we chose covariates to be potentially included in the model when their inclusion would give relatively low correlations between the expected values with the existing model. We used the diffsum plot (see web extra appendix 2) to compare the absolute magnitude of change between SHMIs under different models. This plot shows those hospitals that would experience a change to their index if a covariate was included in the model. Given two models, the plot shows the difference in expected number of deaths in a hospital between model 1 and model 2 against the mean of the expected values. We also added a straight line, which shows the value where the index would be expected to change by 5% if model 2 were adopted rather than model 1. Points above the top line are those trusts with an index that would be expected to increase by at least 5% and points below the bottom line are those trusts with an index that would be expected to decrease by 5% if model 2 were adopted rather than model 1.
To look at outliers from the model we used a funnel plot, in which the observed SHMI for each trust is plotted against the expected number of deaths.15 We used a random effects model to draw control limits around the target outcome—that is, a SHMI of 1.16 This enables an over-dispersion parameter to allow for unexplained variation between trusts. If all the trusts were included in the estimate, then truly outlying trusts would inflate the over-dispersion parameter unduly and may not appear as outliers. Thus in estimating the over-dispersion parameter we decided to exclude a proportion of the outlying hospitals. When we calculated the over-dispersion parameter for the log SHMI we adopted a trimming approach16 when calculating z scores (a scaled difference between the observed and expected values) by omitting the top and bottom 10% of trusts according to the z score. If no true outliers existed then the estimate would not be affected much by using this procedure.
We derived the model for the financial year 2007/8 then repeated the procedure for 2009/10 to validate the model. We then fitted the final model to the five financial years 2005/6 to 2009/10.
In total 146 general (acute) trusts and 72 (71 in 2005/6) specialist trusts were included. As no formal definition of general or specialist status exists the definition of general trusts was taken from lists reported by other providers of mortality indicators. For general trusts during 2009/10, the median number of admissions was 52 798 (range 12 188-155 809) and the median number of deaths was 1675 (554-4475). In the same year the corresponding values for specialist trusts were 2912 (14-231 088) and 30 (0-575).
Table 2⇓ shows the distribution of the number of admissions by age, sex, and deaths for the analysis dataset, with the exclusion of admissions detailed in the specification (see table 1) and maternity admissions. In this period, 36 488 693 admissions in England were available for analysis.
The total number of female admissions was similar to the total number of male admissions. For all admissions over the age of 14, the mortality rate for females was consistently lower than that for males. Note the apparent Simpson’s Paradox in that the overall proportion of deaths was higher for females than for males, on account of there being much greater numbers of older women, where most of the deaths occur.
Table 3⇓ shows that method of admission, the Charlson comorbidity scores, and the index of multiple deprivation are all predictive of death. Mortality increased with increasing deprivation until the highest category, when mortality was slightly less than the second highest category. The number of previous emergency admissions was a predictor of mortality but this relation was rather complicated as admissions, not patients, were being looked at; so, for example, a patient with four previous emergency admissions who died during the fifth admission would be included as a survivor for each of the previous four emergency admissions. The percentage of patients scoring zero on the Charlson comorbidity score depended on the coding depth, with the lowest being 71.1%, using all secondary codes. Because of the strong skew, we examined the Charlson comorbidity score as a potential predictor in several ways with different depths of coding, as a continuous variable and as a categorical variable in three groups (0, 1-5, >5). The greatest variation in the model was explained by including comorbidity score as a categorical variable and using all available secondary diagnosis codes.
The basic model included the covariates age and sex. Different models were then fitted and correlations and diffsum plots used to choose additional covariates to add to the model. The lowest correlation between the SHMI allowing for age and sex only and the same index using a model with one additional covariate was with the covariate type of admission (elective, non-elective, and missing), with a correlation of 0.904. Using age, sex, and type of admission as the basic model we found that the next covariate to provide a low correlation was the categorical Charlson comorbidity score (all diagnoses), with 0.951. The addition of other covariates did not produce correlations less than 0.95 for the model that contained these fours factors. In all, 15 models were considered using different ways of coding comorbidity and including varying numbers of covariates.
Figure 1⇓ shows the diffsum plot for a model containing only age and sex compared with the model containing type of admission and Charlson comorbidity score as a categorical variable. If the extra covariates were included, for about 12 hospitals with an expected number of deaths greater than 100 the SHMI would increase by at least 5% and for about 23 the index would decrease by at least 5%. No hospital’s SHMI would change by more than 5% as a result of fitting the covariates deprivation score and number of previous emergency admissions or fitting an interaction with age and comorbidity (fig 1). The best model therefore included the main effects age, sex, comorbidity, and type of admission. Based on 2007/8 data the mean C statistic over diagnostic groups for a model with age and sex was 0.763 (range 0.515-0.958). (A C statistic is a measure of how well a model predicts an event and is interpreted as the probability that a randomly selected individual who has an event has a score greater than a randomly selected individual who does not have an event). The mean C statistic for the model that included Charlson comorbidity score and type of admission was 0.830 (range 0.534-0.970). Therefore, by including the two hospital derived covariates, the area under the receiver operating characteristic curve increased by about 0.07. The mean does not, however, account for some diagnostic groups being larger than others, and the overall C statistic of the model was 0.911.
A linear regression model was used to determine how much of the variability between hospital trusts was explained by the final model, with the observed crude death rate as the dependent variable and the expected death rate from the SHMI model as the independent variable. The coefficient of determination R² showed that for 2007/8 data the final model accounted for 81% of the variability.
The same conclusion was reached by repeating this modelling procedure for the year 2009/10—namely, that a simple model with age, sex, comorbidity score, and type of admission was sufficient to explain hospital variation in crude mortality rates.
Figure 2⇓ gives funnel plots with 2 and 3 standard error bars for the final SHMI model for the years 2005/6 to 2009/10. These are the sort of graphs that could be used to monitor hospital deaths.17 An outlier is defined as a point above the 3 standard errors line. Table 4⇓ gives the outlying hospital trusts by year. These hospitals have all been identified previously by other measures as having high mortality, and some hospitals appear in more than one year.
The summary hospital mortality index (SHMI) is a transparent and reproducible indicator for hospital associated mortality, capturing death in all admissions except maternity ones up to 30 days after discharge. The index includes palliative care and emergency admissions with zero length of stay. Using a model fitted separately for each of 138 reasons for admission and adjusting for age, sex, type of admission, and comorbidity had good performance, with an overall C statistic of 0.911, which is comparable to models derived in other settings, such as one study18 that obtained a C statistic of 0.8 using as predictors age, sex, ICD score, nursing home residency, and drug use. The method of using a model fitted separately for each admission diagnosis is utilised in mortality indicators employed internationally.
Case mix variables in SHMI model
At different ages women are more or less likely than men to be admitted to hospital. However, women who are admitted are less likely to die than men at every age over 14 years. These facts mean that prediction models for hospital mortality usually have to include both age and sex. Having a planned or unplanned admission is also universally recognised as an important predictor of outcome. Our model was also improved by adjusting for comorbidity, but we recognise that the Charlson comorbidity score that we used may not be the perfect tool. This score was originally calibrated in 1984,10 since when updated weights have been calculated2 and were used in our study. It would, however, have been preferable to calibrate the weights on the dataset being scrutinised.19 20 More advanced scores differentiate between secondary diagnoses for conditions present on admission and newly acquired conditions present on discharge,21 a facility not currently available in the hospital episode statistics. An alternative method for calculating hospital standardised mortality ratios is to include hospital as a term in the logistic regression model, together with age, sex, type of admission, and comorbidities. In that way, the estimated coefficients for each hospital indicate their relative performance. We could not try this because we estimated the model separately for each diagnosis group. We did not include hospital as a predictor of deaths in the model as this may remove part of the variability associated with poor care when comparing the observed number of deaths with the expected number. As a consequence we could not test whether the coefficient for a risk factor differed by hospital (the “constant risk fallacy”).22
Deaths in hospital and within 30 days of discharge
The SHMI includes a follow-up time of 30 days after discharge. This raises two important questions. Firstly, is 30 days the right time frame and, secondly, should time be measured from admission or from discharge?
In response to the question of the time frame there is clearly a need to balance two contrary needs. Firstly, there is a need to use a short time frame so that it is more likely that the outcome is connected to the intervention being evaluated (in this case the quality of hospital care) rather than other later interventions such as social and community care after discharge. Secondly, there is a need to use a long time frame to catch all the late effects of care. On balance 30 days seems to be a reasonable compromise.
The Scottish hospital standardised mortality ratio also uses 30 days but it measures mortality from admission not from discharge. One advantage of using time from admission is that it defines a fixed window that is the same for all patients and hence for all hospitals, whatever their discharge policies or opportunities. For example, hospitals treating more deprived, socially isolated populations may find it harder to discharge elderly patients than hospitals treating less deprived populations. Using deaths within 30 days after discharge as a measure of performance could disadvantage hospitals treating deprived populations. Time from admission is also conceptually clearer, since the SHMI would then measure outcomes for a well defined cohort of admissions all receiving their care in the same period. Using 30 days after discharge the SHMI is based on an ill defined group of admitted patients who were discharged or died in hospital during the year but who may have been cared for in previous periods. If time from admission is used, however, then some patients who die in hospital will be categorised as survivors in the analyses of the SHMI. These patients are still receiving hospital care, and hence a SHMI calculated from admission date would not reflect the totality of hospital care, only the initial phase of care, and this could create an incentive for hospitals to shift resources away from patients needing long term care. For this reason it was decided to use 30 days after discharge as the time frame in the SHMI.
The proper treatment of missing values in risk prediction models is particularly important. We took the pragmatic decision of putting missing values into a separate category. This meant that no data were discarded and we could tabulate the proportion of missing values by hospital, which may itself be a reflection of hospital quality. A better statistical procedure might have been some form of imputation. This works well when several continuous variables are correlated. In our case the categorical variables were not obviously associated and also it is hard to see how imputation could be done routinely. Furthermore, in practice the proportion of missing values was low. The variable with the greatest proportion of missing values was the index of multiple deprivation, which is derived from the patient’s post code and was missing in 6.9% of cases. This proportion was not evenly distributed between hospitals, as was shown in our report.9 This variable, however, added little to the discriminatory ability of the model (fig 1) and was not included in the final index. Of the variables included in the final model, none had more than 0.2% missing values. Different methods of handling the missing values would have a negligible impact on the results. We would, however, recommend that hospitals routinely report the proportion of missing values in the variables used in calculating the index.
The Department of Health specified that deaths within 30 days of discharge should be attributed to the last admitting hospital. This assigns responsibility for mortality to the hospital that most recently cared for the patient. Theoretical concerns about this method exist. Admissions spanning several hospitals may mean care in an earlier hospital increases the risk of death in the last hospital. Poor quality care may lead to an emergency transfer to another hospital, with the potential for death in the receiving hospital. The specification could promote premature uncoordinated discharges from a hospital, as subsequent admissions may occur in a different hospital, where the patient eventually dies. Methods to account for these scenarios exist, including by Dr Foster, which assigns a death to all hospitals involved in an admission that spans several hospitals. In reality, however, admissions spanning several hospitals only account for less than 1% of admissions. In addition, as most hospitals serve a geographical area, readmissions are often to the same hospital. This raises another problem since a hospital that admits and discharges a patient who is then readmitted and subsequently dies within 30 days of the first admission to that hospital, will have reduced its death rate since it will have increased the number of admissions for each death. As the death rate was only 4%, however, increasing the denominator of the rate will only change the death rate by a small amount.
Identifying poorly performing hospitals
Although we investigated the properties of the SHMI, this should not be taken to suggest that we unequivocally endorse the use of hospital mortality indicators to monitor quality of care. The main question has not been addressed: does high standardised mortality imply poor care? Although some hospitals with high mortality indicators have been shown to have poor quality of care, such as Mid-Staffordshire NHS Foundation Trust one would also need to investigate hospitals with a low SHMI and determine if they had high quality of care. It is possible that poor standards of care exist in some areas in all hospitals, and so simply targeting some hospitals at one end of the spectrum and finding care lacking does not mean that it is not lacking at the other end of the spectrum. There are many other questions, such as whether hospital mortality can ever be a good quality indicator owing to the large proportion of unavoidable deaths7; only a limited set of candidate case mix variables are available for adjustment and some important ones may be missing (the so called case mix fallacy7); standardisation depends on how reason for admission and secondary diagnoses are coded, which can lead to artificial differences between hospitals4 as well as creating the potential for gaming14; and indirectly standardised measures are used, which are not strictly comparable.23
Nevertheless, it is essential to have some method, however uncertain, to flag up potentially poorly performing hospitals that may warrant further investigation to safeguard the lives of patients. Using hospital mortality measures such as the SHMI to do this also means that the play of chance needs to be ruled out, because in any league table someone has to come top. We used funnel plots to show the variation between hospitals, and included two warning lines to highlight outlying hospitals. The calculation of these warning lines requires some choices, such as which hospitals to exclude or trim when estimating that part of the variability of the SHMI that occurs owing to the play of chance. The choice of a 10% trim is arbitrary. Reducing the percentage of hospitals excluded would increase the width of the warning lines and identify fewer outlying hospitals.
Given these uncertainties, what should be done when a hospital is identified as having an outlying SHMI? We think that there are several questions that should be asked before more detailed investigations ensue (box).
Questions to be asked of outlying hospitals
Does the outlying performance persist over time?
Is this performance sensitive to the methods used? For example, is it sensitive to how the standardisation is carried out or the weightings used?
Is it sensitive to how the control limits are calculated? Is any change in a hospital’s SHMI the result of a change in the observed death rate or the expected death rate? If the expected rate has changed are there changes in the variables used for standardisation?
Is there any corroborating evidence from related quality of care indicators?
What is already known on this topic
Several mortality indices exist, including Dr Foster’s hospital standardised mortality ratio (HSMR)
These indices are, however, based on in-hospital mortality and a subset of admissions
What this study adds
A new summary hospital mortality index (SHMI) was adopted by the NHS in October 2011 and is derived using all admissions and deaths in hospital and within 30 days of discharge
This index is transparent and based on a limited number of predictors: diagnosis on admission, age, sex, type of admission, and Charlson comorbidity score
The SHMI identified the hospitals that had already been highlighted as having a high mortality by Dr Foster
Cite this as: BMJ 2012;344:e1001
We thank members of the HSMR Technical Group who provided guidance and advice during this project, and the Department of Health Information Centre who provided the data.
Contributors: All authors designed the study. MJC, RMJ, and JF carried out the statistical analysis. JF recoded the diagnosis groups. MJC wrote the first draft of the manuscript. All authors were involved in editing consecutive drafts of the manuscript, interpreted the findings, and approved the final draft. MJC, RMJ, and JF are the guarantors.
Funding: This study was funded by the Department of Health. JF is funded by Kidney Research UK to develop performance indicators in renal services.
Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declare that: MJC, RJ, RM, and JN had support from the Department of Health for the submitted work; no financial relationships with any other organisations that might have an interest in the submitted work in the previous three years; and no other relationships or activities that could appear to have influenced the submitted work.
Ethical approval: Not required.
Data sharing: No additional data available.
This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: http://creativecommons.org/licenses/by-nc/2.0/ and http://creativecommons.org/licenses/by-nc/2.0/legalcode.