The legacy of Bristol: public disclosure of individual surgeons' resultsBMJ 2004; 329 doi: https://doi.org/10.1136/bmj.329.7463.450 (Published 19 August 2004) Cite this as: BMJ 2004;329:450
- Bruce Keogh, president elect ()1,
- David Spiegelhalter, statistical adviser1,
- Alan Bailey, data coordinator, UK cardiac surgical register2,
- James Roxburgh, secretary1,
- Patrick Magee, president1,
- Colin Hilton, immediate past president1
- 1 Society of Cardiothoracic Surgeons of Great Britain and Ireland, Royal College of Surgeons of England, London WC2A 3PE
- 2 The Limes, Charfield, Wotton-under-Edge GL12 8SR
- Correspondence to: B Keogh
- Accepted 24 June 2004
After the General Medical Council hearings and the subsequent Bristol Royal Infirmary Inquiry into paediatric cardiac deaths, cardiac surgeons expected a stinging attack on British cardiac surgical practice. What emerged instead, in 2001, was a comprehensive report highlighting many of the difficulties facing frontline clinicians and managers in the NHS.1
The story of the paediatric cardiac surgical service in Bristol is not an account of bad people. Nor is it an account of people who did not care, nor of people who wilfully harmed patients. It is an account of people who cared greatly about human suffering, and were dedicated and well-motivated. Sadly, some lacked insight and their behaviour was flawed. Many failed to communicate with each other, and to work together effectively for the interests of their patients. There was a lack of leadership, and of teamwork. It is an account of healthcare professionals who were victims of a combination of circumstances which owed as much to general failings in the NHS at the time than to any individual failing.1
The report included 198 recommendations, of which two stated that patients must be able to obtain information on the relative performance of the trust and of consultant units within the trust. This led to an increasing belief that the interests of the public and patients would be served by publication of individuals' surgical performance in the form of postoperative mortality. A precedent for this existed in the United States, where in 1990, the New York Department of Health published mortality statistics for coronary surgery for all hospitals in the state, and has published comparable data each year since.2 A newspaper, Newsday, successfully sued the department under the state's Freedom of Information Law to gain access to surgeon specific data on mortality, which the newspaper published in December 1991, evoking a hostile response from surgeons. New Jersey and Pennsylvania states have also started publishing mortality data, but the practice has not yet spread to any other state or country.
Cardiac surgeons had seen this coming, so during the Bristol Royal Infirmary Inquiry the Society of Cardiothoracic Surgeons of Great Britain and Ireland tried to redress perceived deficiencies in surgeons' approach to national data collection and audit3 by producing unambiguous guidelines on data collection and clinical audit in cardiac surgical units (see http://www.scts.org/) and by debating how to measure their clinical performance.
After detailed discussion, the society agreed to institute the collection of data on surgeon specific activity and in-hospital mortality for several index procedures and to use a stringent set of limits to initiate an internal assessment. An annual mortality of greater than 2 SD above the mean was set as the trigger for a review by local clinical governance. This was intended to be a constructive process, not a trigger for criticism, blame, or ill considered actions. The problem with this approach is that there will always be 2.5% of consultants under review.
In-hospital mortality was chosen as a performance measure because it was understandable, easy to measure, could be validated, and included all patients who died in hospital (not just those within a certain time frame). Furthermore, it was used by all public reporting systems in the United States.
Index procedures of isolated, first time coronary surgery, lobectomy for lung cancer,4 and correction of aortic coarctation or isolated ventricular septal defect repair were identified. But the collected data were only for activity and mortality, which did not allow for casemix adjustment.
In 1998, when the decision to collect these data was taken, cardiac surgeons were anxious. They were fearful that in the shadow of what had happened at Bristol chief executives would have a low threshold for suspension, which could unjustly derail the careers of perfectly competent surgeons. Nevertheless, such was the recognition of the importance of this venture that voluntary compliance among consultant surgeons for individual data submission has been 100% from that time.
From individual surgeons' data and subsequent reviews, two things have been learnt. Firstly, little relation exists between volume and mortality. Detailed statistical analysis shows a significant volume effect in which a 20% increase in workload is associated with reduction by a 20th in operative mortality (5% relative reduction, 95% confidence interval 2% to 8%). In real terms, this translates to a reduction in operative mortality from 2% to 1.9%, which is negligible in practical terms. Secondly, when surgeons have been reviewed, several issues of process and organisation, rather than technical, surgical ability, have usually been the underlying problem.
Why publish results on individual surgeons?
A detailed analysis by the Nuffield Trust has shown that the arguments for and against publication are finely balanced.5 The reason for publication determines the way such data are presented. The two key reasons are either to facilitate patient choice or to demonstrate safety. Publishing for patient choice requires detailed, risk adjusted tables of outcome published in a comparative fashion. Publishing to indicate whether a surgeon is safe or not requires agreeing a threshold of unacceptable mortality and then showing where each individual surgeon's results lie relative to that threshold. This is analogous to the blood alcohol level test for driving—a driver is either above or below the agreed or legal limit.
The comparative cardiac surgery reporting programmes in Pennsylvania, New Jersey, and New York have been well publicised. The claims are that these systems are transparent and that in New York the associated scrutiny has resulted in a demonstrable reduction in post operative mortality.6–9 Counter claims suggest that this reduction in mortality is no greater than that seen across the rest of the United States and that in a litigious climate the data required protracted, detailed auditing and validation with the result that, when finally published three years later, the data are no longer relevant. Furthermore, there is a feeling in the US cardiac surgery community that an unintended negative consequence of public disclosure is that surgeons may be protecting their results by avoiding higher risk cases if they feel that their results are drifting into a range that might attract unnecessary yet easily avoidable scrutiny.10–13 The improvement in mortality is easy to show. The avoidance of high risk surgery is less easy to show because of the subjective and immeasurable nature of the clinical decision making process in these complex patients. This is a real irony because the evidence suggests that patients are the one group who pay little attention to these data. What they really want is an operation in a hospital close to home and as soon as possible.14–16
Although the surgeon plays an important role in surgical outcome, so does the anaesthetist, the intensive care physician, and the intensive care nurse. Surgical results are also influenced by the socioeconomic status of the local population; severity of cardiac illness; prevalence of comorbidities; threshold of referral from both the general practitioner and the cardiologist; threshold of acceptance by the surgeons; standards of anaesthesia, surgery, and intensive care; adequacy of facilities and staffing levels; attitude to training; interpersonal relationships between staff; and the geographical layout of the unit (for example, in some units the wards are so far from the theatre and intensive care unit that surgeons have no time to check up on ward patients between surgery cases). So the concept of blaming the surgeon was perceived as unfair.17
These concerns have been reflected in the decision by the Veterans Administration (the biggest US healthcare provider) to discourage the generation of surgeon specific outcomes. The administration believes the performance of a surgeon cannot be separated from that of his or her institution as quality is highly dependent on institutional systems.18 19 Others argue that it is the doctors who are best placed to change institutional processes that influence outcome and they are therefore a logical target.20
We thought carefully about ways to present the data in the United Kingdom to avoid some of the pitfalls of the US models. We agreed we would base any risk adjusted comparative analyses on lower risk cases alone, leaving surgeons able to tackle more complex and difficult cases without unnecessary apprehension. The wisdom of this strategy was recently highlighted by a study in the BMJ confirming that risk stratification systems that may be good at predicting risk in large institutional groups of patients are much less reliable in high risk cases at the level of an individual surgeon because they tend to “under-predict” for higher risk groups. More importantly this study defined the level of predicted risk above which we should exclude patients from comparative analyses.23
The national service framework for coronary heart disease
The national service framework for coronary heart disease, launched in early 2000, included clear recommendations for comparative audit based on the Society of Cardiothoracic Surgeons' clinical dataset.24 The framework led to a national coronary heart disease information strategy,25 which released funds and mandated collection of this dataset through the National Clinical Audit Support Programme under the jurisdiction of the Commission for Health Improvement (now the Healthcare Commission).26 The vision was to harmonise data collection between cardiology, cardiac surgery, and other administrative systems so that everyone had ownership of, and was working from, the same base dataset and the same definitions.
Since 1996 the society has also been collecting comprehensive data on anonymised individual patients from an increasing number of units throughout the United Kingdom that would allow for casemix adjustment. But the data are not yet good enough to allow for meaningful comparisons of units, let alone surgeons. The Nuffield Trust (United Kingdom) and Rand (United States) did a rigorous, independent review of the quality of data in the clinical databases of 10 units in England, and this showed serious but remediable deficiencies in data quality. The review has led to a series of recommendations on data collection, including the requirement for a “permanent cycle of independent external monitoring” and “validation by an independent source” before release.27 28
As part of the national service framework, data collection in England would shift from the Society of Cardiothoracic Surgeons to the central cardiac audit database, part of the National Clinical Audit Support Programme in the NHS Information Authority. The added value would be that this system would provide mortality tracking through the Office for National Statistics. This would enable the society to start analysing and understanding the factors influencing long term survival rather than focusing solely on early postoperative mortality. This is particularly relevant given the observation that the hazard of early death after coronary artery surgery remains raised for 60-90 days.29 This should lead to an understanding of which kinds of patient benefit most from which operation and so contribute substantially to the overall quality of care and more specifically to the basis of informed consent.
The price the surgical community had to pay for these long term benefits was the publication of individual surgeons' results: the first set of results would be released in some form by the end of 2004. But to retain the confidence of all parties—surgeons, the public, and the healthcare regulators—the project would be overseen jointly by the surgical community, the then Commission for Health Improvement, and the Department of Health.
This was an ambitious programme. The society's dataset had to be changed to accommodate standards on NHS data; units had to be connected to the central cardiac audit database through secure connections; and the transmission specifications for the clinical data required standardisation and testing. Locally, data managers were appointed, and networked computer systems were put in place. The first data trickled into the central cardiac audit database in October 2003, too late for the production of validated, risk adjusted, surgeon specific results in 2004.
So the society began to consider other options. In October 2002, it had published unadjusted mortality for coronary and aortic valve surgery for every unit in the United Kingdom. But it had also been collecting individual surgeons' unadjusted mortality data for some years as part of its quality assurance programme. Could it analyse and present these data constructively?
Can crude mortality be usefully presented?
Tables 1 and 2 show that most deaths in coronary surgery occur in high risk patients, however they are stratified. Because of this casemix influence it would not be sensible to publish unadjusted mortality by surgeon. But crude mortality could be used to show that surgeons lie within or outside a certain predefined limit.
It is reasonable that the threshold should be considerably higher when risk adjustment is not used than when it is. So how has the society set the limits? In industry, 99.9% confidence limits (3 SD) are commonly used for quality control processes for manufacturing, where there is control of raw materials. Sadly, this level of standardisation does not hold for cardiac surgery patients, who can be very heterogeneous. So the limits were widened to 99.99% (4 SD) to take this additional, inherent variation into consideration. The society proposes to use these limits as our its basis for publication of individual surgeons' results. So, for the purposes of safety it will consider that any surgeon whose mortality is within 99.99% (4 SD) over an aggregated three year period will have met transparent and defined standards. This means that any outlier is likely to be real—there is less than a 1 in 10000 chance that the society would assert that any particular surgeon with average case mix did not meet its standard. The deficiency is that these limits become very wide at lower volumes, opening the way to accusations of professional protectionism for surgeons with lower volumes.
Surgeons who have been in post for fewer than three years will be analysed similarly for one or two years. Those whose mortality lies within 99.99% confidence limits will be said to “meet” the society's standards; those whose mortality lies below and outside these limits will be said to “exceed” the standards; and those whose mortality lies above these limits with a high mortality will be described as “not meeting” the standards (figure, p 450).
The Healthcare Commission is taking a similarly cautious statistical approach in their clinical indicator of “deaths following a heart bypass operation,” which contributes to the balanced scorecard component of the annual star ratings for hospitals. From this year the indicator will be based on three years' data derived from hospital episode statistics, with possible expansion of the control limits to allow for any observed “overdispersion” arising from inadequate risk adjustment.30
The use of data that are not risk adjusted is still very controversial, but their value is being increasingly recognised. The use of a single risk adjusted number to summarise a surgeon's results runs the risk of lending a level of spurious credibility to an analysis that does not take into account the impact of influences that are not patient related. To many, the number will simply represent the final analysis.31 32 On the other hand, data that are not risk adjusted simply say, “Take a closer look at the bigger picture.” It inevitably invokes a review process of which detailed, risk stratified analysis is only a part. No conclusions can be drawn until a full review has taken place.
This sort of data cannot contribute to patient choice. So patient choice will be driven by other considerations, but patients will know the society is constantly reviewing its results without fear or favour.
When these results are published later this year, medicine in the United Kingdom will have crossed a threshold into a new era. Cardiothoracic surgeons will have shown that it is possible for a surgical specialty to review its own performance at an individual clinician level by professional consensus. This system is not perfect; it is a first step, which, in the words of Alan Milburn in 2003, when he was secretary of state for health, has “opened a door which other branches of medicine will need to enter.” Most importantly, cardiac surgeons will have opened a more general debate that will revolve around the balance between the relative influence of individual physicians and institutional influences on patient outcomes and how this relation translates to transparent public accountability.
The final question is whether, with transparent systems in place to maintain standards, it is necessary to publish a list of names, or can the public good can be served just as well by the knowledge that appropriate mechanisms are in place and independently regulated.
Measurement of outcomes from medical or surgical interventions is part of good practice
Knowing where those outcomes lie with respect to others is an individual professional responsibility
Professional bodies can help by providing benchmarking
Publication of national and institutional results is right and proper
Publication of individuals' results remains controversial because of the potential, unintended negative effects and increasing recognition that individuals' results are strongly influenced by institutional influences that may impinge differently on different individuals
The utility of such publications depends on the relation of the outcome to quality of care, the ability to cater for casemix, and whether the publication is designed to facilitate patient choice or show consistency of standards
BK is consultant cardiothoracic surgeon at Queen Elizabeth Hospital, Birmingham, and coordinator of the national adult cardiac surgical database. DS is a senior statistician at the MRC Biostatistics Unit, Institute of Public Health, Cambridge. JR is consultant cardiothoracic surgeon at St Thomas' Hospital, London. PM is consultant cardiothoracic surgeon at the London Chest Hospital, London. CH is consultant cardiothoracic surgeon at the Freeman Hospital, Newcastle upon Tyne.
Contributors BK drafted the manuscript, managed the collection of the surgeon specific data, and helped with analysis. DS gave statistical advice on how to analyse the surgeon specific data. AB merged and checked the data and helped with analysis. JR, PM, and CH helped design the methodology of analysis and the presentation of surgeon specific data and refine the manuscript. BK is the guarantor.
Funding This work was funded entirely through membership subscription to the Society of Cardiothoracic Surgeons of Great Britain and Ireland.
Competing interests BK is a Commissioner on the Healthcare Commission and coordinator for the National Adult Cardiac Surgical Database and UK Cardiac Surgical Register for the Society of Cardiothoracic Surgeons of Great Britain and Ireland. DS is a statistical adviser to the Healthcare Commission.