Surgeon specific mortality in adult cardiac surgery: comparison between crude and risk stratified dataBMJ 2003; 327 doi: http://dx.doi.org/10.1136/bmj.327.7405.13 (Published 03 July 2003) Cite this as: BMJ 2003;327:13
- Ben Bridgewater, consultant cardiac surgeon ()1,
- Anthony D Grayson, regional clinical information analyst2,
- Mark Jackson, head of clinical governance2,
- Nicholas Brooks, consultant cardiologist1,
- Geir J Grotte, consultant cardiac surgeon3,
- Daniel J M Keenan, consultant cardiac surgeon3,
- Russell Millner, consultant cardiothoracic surgeon4,
- Brian M Fabri, consultant cardiac surgeon2,
- Jones Mark, consultant cardiothoracic surgeon1
- 1South Manchester University Hospital, Manchester M23 9LT
- 2Cardiothoracic Centre, Liverpool L14 3PE
- 3Manchester Royal Infirmary, Manchester M13 9WL
- 4Blackpool Victoria Hospital, Blackpool FY3 HNR
- Correspondence to: Ben Bridgewater
- Accepted 17 June 2003
Objective As a result of recent failures in clinical governance the government has made a commitment to bring individual surgeons' mortality data into the public domain. We have analysed a database to compare crude mortality after coronary artery bypass surgery with outcomes that were stratified by risk.
Design Retrospective analysis of prospectively collected data.
Setting All NHS centres in the geographical north west of England that undertake cardiac surgery in adults.
Participants All patients undergoing isolated bypass graft surgery for the first time between April 1999 and March 2002.
Main outcome measures Surgeon specific postoperative mortality and predicted mortality by EuroSCORE.
Results 8572 patients were operated on by 23 surgeons. Overall mortality was 1.7%. Observed mortality between surgeons ranged from 0% to 3.7%; predicted mortality ranged from 2% to 3.7%. Eighty five per cent (7286) of the patients had a EuroSCORE of 5 or less; 49% of the deaths were in this lower risk group. A large proportion of the variability in predicted mortality between surgeons was due to a small but differing number of high risk patients.
Conclusions It is possible to collect risk stratified data on all patients undergoing coronary bypass surgery. For most the predicted mortality is low. The small proportion of high risk patients is responsible for most of the differences in predicted mortality between surgeons. Crude comparisons of death rates can be misleading and may encourage surgeons to practise risk averse behaviour. We recommend a comparison of death rates that is stratified by risk and based on low risk cases as the national benchmark for assessing consultant specific performance.
There is an unstoppable momentum towards the publication of surgeon specific mortality as part of the initiative to generate greater accountability and transparency in the NHS. This has been triggered by failures of clinical governance in health care and is tied in with political initiatives about patients' choice. The planned date for publication of surgeon specific data in the United Kingdom is 2004, and although it has been accepted by the secretary of state for health that any such data should be robust, validated, and stratified for case mix to allow meaningful comparisons to be made, this type of dataset does not yet exist for all hospitals and surgeons.1
The Society of Cardiothoracic Surgeons of the United Kingdom and Ireland had been planning to undertake an analysis on low risk patients, but because of a lack of an appropriate dataset it is now planning to publish individual surgeons' crude mortality data later this year.2 Two possible datasets could be analysed in the United Kingdom: hospital episode statistics, which are known to be inaccurate at the level of individual clinicians, and returns of crude mortality that have been made to the Society of Cardiothoracic Surgeons of the United Kingdom and Ireland's annual register on the basis of individual surgeons since 1997. Neither dataset has been subjected to rigorous validation, and the society's returns have no mechanism for risk stratification. Although hospital episode statistics can be partly adjusted for case mix by age, sex, and urgency, these are known to be only a few of the many patient specific factors that contribute to predicted operative mortality.3–5 The Society of Cardiothoracic Surgeons has collected a more comprehensive dataset since 1996,1 but this has been voluntary, and not all hospitals and surgeons have contributed. It cannot be used for a comprehensive comparative analysis in the United Kingdom.
We have a long track record of collecting cardiac surgery audit data and validating risk prediction models in Manchester.6–8 Recently we have collected a full dataset on all patients undergoing adult cardiac surgery in NHS institutions in the north west of England since April 1997.8 9 We have analysed this database to explore differences between crude mortality and risk stratified results for surgeon specific publication.
The northwest quality improvement programme in cardiac interventions is a regional consortium involving all four NHS centres that perform adult cardiac surgery and percutaneous coronary interventions in the northwest of England (Blackpool Victoria Hospital, the Cardiothoracic Centre in Liverpool, Manchester Royal Infirmary, and South Manchester University Hospital). The aim of the group is to improve continuously the quality of care for patients receiving cardiac interventions by using a multicentred approach based on studies of institutional systems.9
We collected data prospectively on a total of 8572 consecutive patients undergoing isolated coronary artery bypass graft surgery for the first time between 1 April 1999 and 31 March 2002 in the north west of England. Data collection methods and definitions have been described in detail previously.8 9 Each patient had a dataset collected, which included data from before and after the operation, to enable a predicted mortality to be calculated. Data were collected in each institution and returned to a central source for analysis. Each centre conducted validation of activity and analysis. Mortality was defined as any in-hospital death. Every patient's record contained an anonymised identifier for each consultant surgeon. Data were analysed for all consultants who were operating in the region on 1 April 2003.
The specific questions we addressed were:
What was the overall mortality?
What was the distribution of patients according to predicted operative risk?
Were there differences in predicted mortality between surgeons?
Were there differences in observed mortality between surgeons?
Could the dataset be used to stratify according to case mix to allow meaningful comparisons to be made?
After appropriate analysis did the death rates between surgeons differ significantly?
Categorical data are shown as a percentage whereas continuous data are shown as a mean with a range. We determined crude mortality for each surgeon. We calculated predicted mortality for each patient by using the additive EuroSCORE,10 a scoring system derived from an analysis of 19 000 patients throughout Europe that was reported in 1999. The EuroSCORE ascribes additive points to several risk factors related to the patient and the procedure, to generate a predicted mortality for each patient. It has been shown to be a good overall predictor of mortality for both European and North American surgery.11 12 If a patient related factor necessary to calculate the EuroSCORE was missing in the record that factor was assumed to be absent (this occurred in less than 2% of cases). We examined the distribution of patients in each EuroSCORE group and compared the predicted mortality for each surgeon. Owing to the non-normal distribution of predicted mortality between surgeons, we determined variability by the interquartile range. We used the EuroSCORE to determine low (≥ 5) and high (> 5) risk groups10 and compared observed mortality and variability between the two groups. We calculated the C statistic (equivalent to the area under the receiver operating characteristic curve) to assess the performance of the EuroSCORE. A C statistic of greater than 0.7 indicates a reasonable ability to discriminate between patients who died and those who did not.13 14 We calculated the C statistic for the total population, low and high risk groups. We examined surgeon specific mortality in the low risk group by comparing each surgeon's death rates, plotted with 95% confidence intervals, against the mean performance for the region in this group of patients. We analysed the effect of volume of cases on mortality in the low risk group by the χ2 test for trend after rank ordering surgeons and categorising them as either low, middle, or high volume thirds. We used SAS for Windows version 8.2 to perform all analyses.
Description of the patients
A total of 8572 patients were included in the study. A summary of the incidence of risk factors is given in table 1. Altogether 144 patients died, which is a death rate of 1.7%. The average number of cases per consultant was 372 (range 158 to 598). Not all surgeons were operating throughout the three year period, which accounts for much of this variability.Figure 1 shows observed mortality for the surgeons and ranges from 0% to 3.8%.
Predicted and observed mortality
Figure 2 shows the number of patients in each EuroSCORE group. Predicted mortality in most patients was low. Figures 3 and 4 show the number of deaths and observed percentage mortality in each EuroSCORE group. A large number of patients were in the low EuroSCORE groups, but the percentage mortality was low (less than 2%). In general, observed mortality increased with increasing EuroSCORE. For lower risk patients the EuroSCORE overpredicted observed mortality. In EuroSCORE groupings of 14 and above, the observed mortality was substantially in excess of the EuroSCORE. The EuroSCORE was a good predictor of overall mortality as shown by a C statistic of 0.75. The mean predicted mortality was 3.0% (range 2.0% to 3.7%), indicating a difference of nearly 100% between surgeons at the outer limits of the group. The overall variability between surgeons was high, as shown by an interquartile range of 0.64.
Comparison of low and high risk patients
Eighty five per cent of the total number of patients had a EuroSCORE of 5 or less. Almost half of all observed deaths were in the low risk group (49%). The remaining 51% of deaths were in the 15% of cases in the high risk group (EuroSCORE > 5). The proportion of individual surgeons' practices that are high risk ranged from 5.6% to 23.9%. For the low risk group the observed mortality was 1.0% (range 0% to 2.9% between surgeons), with a mean predicted mortality of 2.3% (range 1.7% to 2.7% between surgeons). For the high risk group the observed mortality was 5.7% (range 0% to 13.6% between surgeons), with a mean predicted mortality of 7.4% (range 6.6% to 8.3% between surgeons). The C statistic indicating predictive ability of the EuroSCORE for the low risk and high risk groups was 0.72 and 0.62, respectively, indicating a satisfactory predictive ability for low risk patients but an unsatisfactory ability for those having high risk surgery.13 14
The variability in predicted mortality between surgeons according to the interquartile range in the low and high risk patients was 0.32 and 0.67, respectively, showing that the low risk patients in each surgeon's practice are a relatively homogeneous group, but there is much greater variation between surgeons in the high risk population. Figure 5 shows for the low risk patients that the 95% confidence intervals around mortality for each surgeon operating in the north west overlap the mean mortality for the region, indicating no surgeon is experiencing mortality results that are different from the peer group. We found a strong univariate association between the volume of operations that each surgeon had performed and observed mortality in the low risk group (P < 0.001)(table 2).
It is possible routinely to collect risk stratified data on all patients undergoing surgery in a defined geographical area in the United Kingdom. Crude mortality analyses may be misleading as variations in the proportion of individual surgeons' practices that are high risk are marked. An accepted risk prediction model is poor at predicting mortality in this high risk population. The low risk group was relatively homogenous between surgeons, and we recommend a comparative analysis based on low risk cases without the need for further risk adjustment.
Strengths and weaknesses of the study
Our study has been conducted on a large population of patients undergoing surgery over three years. The average number of cases was 372 per surgeon, which is a reasonable size to allow comparisons to be made. The study has been conducted in the north west of England and includes all patients
undergoing surgery in NHS hospitals in a defined geographical area.8 9 This is about one eighth of all cardiac surgical activity in the United Kingdom. The data have the confidence of clinicians, which should reassure patients that benchmarking between surgeons is meaningful, helps surgeons believe any differences that emerge, and encourages changes in practice to be made where necessary. This project shows that where there is clinical and management commitment, collecting robust, comprehensive data is possible and useful.
The dataset we have used undergoes local validation in each centre but has not been subjected to external validation, which is a weakness of our study. It has been shown previously that some problems arise with the completeness and reliability of this type of data.15 We have addressed issues of incomplete data by assuming that any risk factor that has a missing field is negative for that risk factor. The incidence of missing data in our study was less than 2%, but this would lead to a small overall underestimate of predicted risk.
We have used low and high risk groupings to allow meaningful comparisons to be made10; the low risk group contains most patients and has a low variability in predicted risk between surgeons. Using the low risk group for mortality analysis and benchmarking excludes higher risk patients from comparisons. Clinically, high risk patients are a heterogeneous group, ranging from stable patients with multiple comorbidities to patients who come to surgery as emergencies, often directly from the cardiac catheter laboratory. We have not compared surgeons' death rates in the high risk group as predicted mortality differs between surgeons, the proportion of individual surgeons' patients who are high risk varies, and the EuroSCORE is a poor predictor for this population. However, half of the deaths in the population of patients are seen in this high risk group, and politicians and the public may be wary of excluding this many deaths from comparative analysis. It is also possible that by not analysing the high risk group we may be losing important messages about performance, which may be useful for improving quality.
Strengths and weaknesses compared with other studies
Healthcare outcomes can be benchmarked in several different ways. An approach that has been suggested and used elsewhere is to allocate a predicted mortality for each surgeon on their total practice of coronary artery surgery by using a mortality prediction tool, and comparing predicted with observed mortality to generate an adjusted death rate.16 In our study patients in the highest EuroSCORE groups (14 and above) have an observed mortality in excess of 50% (fig 5), and a small number of these patients in a surgeon's practice would affect their adjusted mortality adversely. This number varies markedly between surgeons. Because the EuroSCORE is a poor predictor in the high risk group as shown by the C statistic, we think that using adjusted death rates may produce erroneous conclusions.
Although the EuroSCORE is generally regarded to be a good overall predictor of mortality for patients undergoing heart surgery,10–12 it has been noticed previously that it underpredicts risk in high risk patients,17 but the effects of this observation on the publication of surgeon specific mortality has not been described. The EuroSCORE working group have addressed underprediction of the additive score by producing a logistic regression model,17 the logistic EuroSCORE, which may be a better predictor in high risk patients, but this has not yet been fully validated and is not widely used. We studied the widely used additive model for our investigation, but failure to examine all available predictive scoring systems is a further limitation of this work.
Several studies have looked at outcomes of individual surgeons or institutions and their relation to volume of surgery.18–20 Some of these have been on crude mortality data and others corrected for case mix. Although we have observed a strong association between volume and outcome in our data, we did not design our investigation to look for this as a primary end point. We believe that this observation should be treated with caution as there are numerous possible effects, including time and learning curve effects, which were not controlled for by our study design.
What is already known in this topic
The release of surgeon specific mortality data for coronary artery bypass surgery in the United Kingdom isplanned for 2004
Outcomes after surgery are known to depend on severalpatient related factors
Currently no dataset is available to allow anappropriately risk stratified comparison of all surgeonsin the United Kingdom
Proposed analyses would be undertaken on crudemortality data
What this study adds
It is possible to collect risk stratified mortality data on all patients undergoing coronary artery bypass surgeryin a defined geographical area in a multicentre study
Most patients have a low predicted mortality
Predicted mortality differs between surgeons, which islargely due to differing proportions of high risk patients
An accepted risk tool is not good at predictingmortality in the high risk group
Crude mortality comparisons can be misleading and mayencourage surgeons to practise risk averse behaviour
Risk stratified analyses should be encouraged as thebasis for assessing consultant specific performance
Meaning of the study
We believe that publishing surgeon specific, crude mortality data,2 as is planned in the United Kingdom, is not in the best interests of patients, and our study shows that surgeons cannot be compared fairly in this way. Cardiac surgeons already work in a stressful environment, and the perception that a “bad run” might jeopardise their career or result in suspension and investigation may lead to a tendency to turn down high risk cases. The easiest way to obtain low mortality is to do only straightforward operations—so called risk averse behaviour. This has already been identified as a potential problem after a survey of all cardiac surgeons in the United Kingdom in 2000, where 94% of responders agreed that high risk patients were being turned down for surgery.1 Death rates in these patients often approach 100% if the patients are denied surgery and patients at heightened risk from surgery are, in general, those who have the most to gain from a successful operation.21 Our recommendation of benchmarking only low risk patients seems scientifically justified and pragmatic and should help to prevent risk averse behaviour.
Unanswered questions and future research
Some evidence from North America sheds light on the effects of publication of surgeon specific data on patients, cardiologists, and surgeons,1 22 23 but we do not know to what extent initiatives to publish crude mortality data for individual surgeons will actually deny operations to high risk patients, and what implications this will have on patients' survival, quality of life, and use of healthcare resources. This is an important area for future studies. Further investigations are also needed on high risk patients, to improve the quality of risk prediction in this group, and to understand variability in outcomes following high risk surgery for quality improvement purposes.
This study has been conducted on behalf of the North West Quality Improvement Programme in Cardiac Interventions, and the participating consultant surgeons are listed as follows: John Au, Ben Bridgewater, Colin Campbell, John Carey, John Chalmers, Walid Dhimis, Abdul Deiraniya, Andrew Duncan, Brian Fabri, Elaine Griffiths, Geir Grotte, Ragheb Hasan, Tim Hooper, Mark Jones, Daniel Keenan, Neeraj Mediratta, Russell Millner, Nick Odom, Brian Prendergast, Mark Pullan, Abbas Rashid, Paul Waterworth, Nizar Yonan. We would like to acknowledge the assistance of the audit officers working in each centre for their hard work in collecting and validating the data.
Contributors BB had the idea for the study and with ADG and MJ was responsible for the study design. Data analysis was performed by ADG and MJ. The manuscript was prepared by BB and ADG. All authors contributed to writing the paper, which was written on behalf of the North West Quality Improvement Programme in Cardiac Interventions. BB will act as guarantor.
Funding All primary care trusts in the north west of England.
Competing interests None declared.
Ethical approval The project was conducted on routinely collected prospective data. All patient identifiers were anonymised. The study therefore did not need ethical approval.