Jump to: Page Content, Site Navigation, Site Search,
You are seeing this message because your web browser does not support basic web standards. Find out more about why this message is appearing and what you can do to make your experience on this site better.
Gareth J Parry a Medical Care Research Unit, School of Health and
Related Research, University of Sheffield, Sheffield S3 7XL, b Sheffield Health Economics Group, School of Health
and Related Research, c Department of Child Health, Ninewells
Hospital and Medical School, Dundee DD1 9SY
Correspondence to: Mr Parry
g.parry{at}sheffield.ac.uk
| |
Abstract |
|---|
|
|
|---|
Objective: To assess whether crude league tables of
mortality and league tables of risk adjusted mortality accurately
reflect the performance of hospitals.
Design: Longitudinal study of mortality occurring in
hospital.
Setting: 9 neonatal intensive care units in the
United Kingdom.
Subjects: 2671 very low birth weight or preterm
infants admitted to neonatal intensive care units between 1988 and
1994.
Main outcome measures: Crude hospital mortality and
hospital mortality adjusted using the clinical risk index for babies
(CRIB) score.
Results: Hospitals had wide and overlapping
confidence intervals when ranked by mortality in annual league tables;
this made it impossible to discriminate between hospitals reliably. In
most years there was no significant difference between hospitals, only
random variation. The apparent performance of individual hospitals
fluctuated substantially from year to year.
Conclusions: Annual league tables are not reliable
indicators of performance or best practice; they do not reflect
consistent differences between hospitals. Any action prompted by the
annual league tables would have been equally likely to have been
beneficial, detrimental, or irrelevant. Mortality should be compared
between groups of hospitals using specific criteria
such as
differences in the volume of patients, staffing policy, training of
staff, or aspects of clinical practice
after adjusting for risk. This
will produce more reliable estimates with narrower confidence
intervals, and more reliable and rapid conclusions.
|
Key messages
|
| |
Introduction |
|---|
|
|
|---|
Publication of the United Kingdom's patient's charter1 has led to an increase in the direct comparisons of institutional performance using league tables. 2 3 The principle behind league tables, as formulated by the Department of Health, is that "performances in the public sector should be measured, and that the public have a right to know how their services are performing. ...Publication [of league tables] acts as a lever for change and a spur to better performance."4
The British government is committed to using league tables of hospital outcomes as a method of monitoring services and the implementation of best practices.5 For league tables to be used in comparing performance they must discriminate reliably and rapidly between hospitals that perform well and those that perform poorly. To act as levers for effective change, differences identified by league tables must be sufficiently stable or definitive to represent a credible argument for change.
We studied mortality in nine neonatal intensive care units over 6 years. Earlier work has shown that comparisons of hospital mortality may be unreliable unless adjustments are made for clinical risk and the severity of illness.6 We adjusted for these factors using the clinical risk index for babies (CRIB) score. 7 8 We wanted to test whether ranking hospitals by their crude or risk adjusted mortality was reliable in indicating performance.
| |
Subjects and methods |
|---|
|
|
|---|
The cohort comprised infants younger than 31 weeks' gestation or weighing less than 1500 g at birth who were admitted to one of nine neonatal intensive care units in the United Kingdom between 1988 and 1994. Gestation, birth weight, congenital malformations, and routine physiological data from the first 12 hours after birth were abstracted from clinical notes by trained researchers. Congenital malformations which were not inevitably lethal were scored according to the clinical risk index for babies as either acutely life threatening or not acutely life threatening by a consultant paediatrician (WOT-M) who was unaware of which hospital treated the infant. The clinical risk index for babies score ranges from 0 to 23; higher scores indicate increasing clinical risk and severity of illness.
All infants admitted to neonatal intensive care units were included in the analysis unless they had an inevitably lethal congenital malformation or were more than 28 days old. Infants who died before 12 hours of age had clinical risk index for babies scores calculated from physiological data recorded up until the time of death. The outcomes of infants who had been transferred between hospitals were attributed to the hospital that had provided most of the care between 12 and 72 hours after birth. Infants who died in the labour ward or in transit, before admission to a neonatal intensive care unit, were excluded from the study. Mortality was defined as death occurring in any hospital before the infant was discharged. The validity of the score in adjusting for risk was assessed using the area under the curve of the receiver operating characteristic9 and the Hosmer-Lemeshow goodness of fit test.10
Crude league tables were formed without adjusting for case mix. League tables of risk adjusted mortality took account of the infant's initial risk of mortality in each hospital using the clinical risk index for babies score. In both crude and risk adjusted league tables the difference between observed mortality and expected mortality for every 100 infants admitted to the hospital (W score) was used as an indicator of hospital performance.11 (Information on the W score can be found in the Appendix .) A negative W score indicated that a hospital had mortality that was lower than expected, and a positive W score indicated that a hospital had higher than expected mortality. Hospitals were ranked for each year according to their W score. The mortality of the whole cohort was applied to each hospital to give the expected number of deaths for the crude league tables. For the risk adjusted league tables the values for expected mortality came from a logistic regression model that related the score on the clinical risk index for babies to hospital mortality in the whole cohort.
To compare performance it was necessary to determine whether outcomes were significantly different between hospitals and which hospitals performed significantly better or worse than expected. Logistic regression was used in both the crude and adjusted tables to indicate whether significant differences existed in hospital mortality by including a term for hospital. For the risk adjusted league tables, the clinical risk index for babies score was included before the term for hospital in the regression. In years where the term for hospital was significant, hospitals embodying best practice were identified as those where W scores had 95% confidence intervals wholly less than 0.11 Hospitals where W scores had 95% confidence intervals wholly more than 0 were identified as performing poorly.
To test whether performance at individual hospitals was consistent over time, W scores were included in a two way analysis of variance with random time and hospital effects.12 Performance was consistent if the variation of W scores between hospitals was greater than variation within a hospital. Hospitals were numbered according to their rankings in the first of the risk adjusted league tables.
| |
Results |
|---|
|
|
|---|
Data were obtained for 2671 infants admitted between 1 July 1988 and 30 June 1994. Infants were assigned to six annual groups beginning in July 1988. No data were available for hospital 8 in years 5 and 6. Mean birth weight was 1164 (SD 295) g, and mean gestation was 29 (SD 3) weeks (table 1). The total number of annual admissions for all hospitals combined ranged from a minimum of 389 in year 1 to a maximum of 490 in year 3 (table 2). Crude mortality for all hospitals in the whole cohort was 19.7% (527/2671). For all years crude hospital mortality ranged from a low of 15.3% (17/111) for hospital 6 to a high of 28.1% (76/270) for hospital 9.
|
|
League tables of crude mortality
The overall mortality rate of 19.7% was assigned as the expected
mortality in calculating W scores for the league tables of crude
mortality (fig 1). The term for hospital was significantly associated
with mortality only in years 3 and 5. In years 1, 3, 4, and 5 at least
one hospital performed either worse or better than expected. In the
random effects model the term for hospital was significant in
explaining crude W scores over the whole period
(F8,43=2.164, P=0.0498). Variation between hospitals
accounted for only 17% (95% confidence interval 0% to 42%) of the
total variation. Most of the variation in crude mortality was accounted
for within, but not between, hospitals. Crude W scores for each
hospital were unstable over time. When all years were combined the term
for hospital was significantly associated with mortality, and hospital
9 performed worse than expected.
|
Risk adjusted league tables
When the clinical risk index for babies score was fitted to
all infants seen over the six year period the logistic regression model
gave the probability of mortality (p) as:
eG p= 1+eG where
G=
3.492+0.372*CRIB (clinical risk index for babies) score and
e=exponential constant.
2 value of 15.7 (df=9, P=0.074) providing
evidence of a satisfactory fit.
In the annual risk adjusted league tables the 95% confidence intervals
were wide and overlapping (fig 2). Risk adjusted mortality differed
between hospitals only for year 1, when hospital 9 bordered on higher
than expected hospital mortality, and year 3, when hospital 1 had lower
than expected mortality. The 95% confidence intervals for the
apparently best performing and worst performing hospitals overlapped in
every year except year 3. When all years were combined there was
significant variation in risk adjusted outcomes between hospitals, and
hospital 1 had lower than expected risk adjusted hospital
mortality.
|
| |
Discussion |
|---|
|
|
|---|
Problems with annual league tables
There are three fundamental problems with compiling annual league
tables of the performance of individual hospitals. The first problem is
the need to make accurate adjustments for differences in case mix. The
second is the uncertainty that occurs no matter how accurate the
adjustment for case mix when estimates of outcome are made for
hospitals that treat relatively small numbers of patients each
year.
6 7 13 14
The third problem is the lack of
consistency in the apparent performance of hospitals over time.
Goldstein and Spiegelhalter have described the problems of accounting
for case mix and estimates of outcome in hospitals that see small
numbers of patients15; our study confirms their results
and illustrates the inconsistency in the apparent performance of
hospitals over time.
Alternatives to annual league tables
Instead of outcomes, it may be more useful to compare the
implementation of measures of process which have been proved to be
accurate in randomised controlled trials. Mant and Hicks showed that
comparing the use of proved treatments for myocardial infarction would
substantially reduce the number of patients and the time required to
identify significant differences in hospital
performance.17 However, comparisons of the proportions of
patients treated are only valid if patients in different hospitals are
equally eligible to receive the treatments under comparison at all
hospitals.18
such as the volume of
patients, the staffing levels, or the training and expertise of
staff
after adjusting for risk.21-23 This approach,
which has been adopted in the UK Neonatal Staffing
Study,
22 23
would identify more reliably and rapidly the
organisational characteristics likely to improve outcome. This would
allow institutions to become more accountable to the public by showing
that their policies are based on reliable evidence. Our findings may
apply to other areas where the use of mortality league tables are being
considered,5 such as the ranking of individual hospitals
by annual rates of infection, surgical complications, or surgical
mortality, Given that league tables did not seem effective in
evaluating neonatal intensive care units, those who support the use of
mortality league tables must now show why they might be useful
elsewhere.
5 25
| |
Acknowledgments |
|---|
We thank all staff who participated in the study for their support and dedicated care; and Janet Tucker, Jon Nicholl, and Ian Crombie for their comments.
The Medical Research Council cited the clinical risk index for babies (CRIB) score as a scientific achievement in 1994.
Contributors: GJP planned and performed the analyses, and coordinated the writing of the paper. CRG assisted with the collection, management, and analysis of data; and contributed to writing the paper. CJM advised on the analysis of the data and contributed to writing the paper. WOT-M was grant holder, coordinated the data collection, advised on clinical analysis and interpretation, and contributed to writing the paper. All authors are guarantors for the paper.
Funding: This work was funded by Action Research, the Wellcome Trust, the Chief Scientist Organisation and the Clinical Resource and Audit Group at the Scottish Office, the Medical Research Council, and the NHS Executive Mother and Child Health Programme.
Conflict of interest: None.
| |
References |
|---|
|
|
|---|
| |
Appendix: The W score |
|---|
|
|
|---|
The use of the W score has become established in the
presentation of comparisons between observed and expected mortality
after trauma.11 In this study the W score is calculated as
follows: W=100× (observed
expected deaths)/No of
admissions.
For the crude tables, expected deaths are 19.7% of the annual number of admissions in each hospital. For the adjusted tables, expected deaths are predicted by a logistic regression model based on the score on the clinical risk index for babies.
The 95% confidence interval for the W score is given as:
W±1.96 SE (W), where
100 SE(W)=

pi(1
pi) n
n=No of admissions;
implies the sum over all infants in each
hospital in each year; and pi=probability of death in
hospital for infant i.
For infant i in the crude tables: pi=0.197. For
infant i with a score on the clinical risk index for babies (CRIB) of
CRIBi, in the adjusted tables:
eG i pi= 1+eG i
where Gi=
3.492+0.372*CRIBi and
e=exponential constant.
Read all Rapid Responses