Rapid Responses to:

PAPERS:
Paris P Tekkis, Peter McCulloch, Adrian C Steger, Irving S Benjamin, and Jan D Poloniecki
Mortality control charts for comparing performance of surgical units: validation study using hospital mortality data
BMJ 2003; 326: 786-788 [Abstract] [Full text]
*Rapid Responses: Submit a response to this article

Rapid Responses published:

[Read Rapid Response] What determines the standard?
John Robson   (14 April 2003)
[Read Rapid Response] Only one standard: 0% morbidity and mortality
Richard G Fiddian-Green   (16 April 2003)
[Read Rapid Response] Assessment of outcome is complex
Frank A. Frizelle, John Frye   (17 April 2003)
[Read Rapid Response] Why Not to Judge Performance with Statistical Process Control
Anthony P Morton   (26 April 2003)
[Read Rapid Response] Control charts could mislead
Chris Sherlaw-Johnson   (2 May 2003)
[Read Rapid Response] Data validation and interpretation in POSSUM
George A Khoury   (13 May 2003)

What determines the standard? 14 April 2003
 Next Rapid Response Top
John Robson,
general practitioner/senior lecturer
Queen Mary University of London

Send response to journal:
Re: What determines the standard?

Paris Tekkis and colleagues give a further example of the use of adjusted operative mortality rates to compare institutional performance using binomial limits to judge deviance from the mean. The interpretation of this data depends on the level at which the bar is set. In this case the mean of all institutions is the standard against which comparison is made. This is fine if the question is ‘do we deviate substantially from the mean of all our peers?’. This is a question about clinical governance and was asked of paediatric surgical units to identify possible deviant practice.

But if the issue is to encourage better performance, then the question could also be, ‘how far do we deviate from units with better or best practice?’. In that case an external reference value could be chosen if one were available. Alternatively the mean of the top 50%, or top 20% of the distribution could be chosen as the standard of comparison. Raising the bar would mean that the exceptionally good practice of units 3 and 33 is likely to fall within the limits and more units with highest mortality are likely to fall without. Is there any objection to changing the standard in this way, and has it been used in other settings?

Competing interests:   None declared

Only one standard: 0% morbidity and mortality 16 April 2003
Previous Rapid Response Next Rapid Response Top
Richard G Fiddian-Green,
None
None

Send response to journal:
Re: Only one standard: 0% morbidity and mortality

There can be only one standard, 0% morbidity and 0% mortality. Anything short of that should be interpreted as suboptimal performance. That is not the least bit unrealistic even in patients with "co- morbidities". Dr Foster's report conceals the degree to which the standard of surgery in the NHS is, by this definition, suboptimal (1).

Applying Dr Foster's methodology to the report on oesophageal surgery in this issue of the BMJ, for example, the median or mode 100 translates into an operative mortality of 12%, range 0% to 50% in the 29 hospitals which included 1042 patients (2,3). Surgeons in other countries, including myself and some of those with whom I have worked, have performed large numbers of equally large operations including oesophagectomies, splenectomies, hepatic lobectomies and pancreatico-duodenectomies without mortality (1,4,5,6). What is more the 0% mortality has for the most part been obtained by surgeons who assist their residents doing most of the operations. All have been general surgeons.

1. Hospital mortality league tables Bobbie Jacobson, Jenny Mindell, and Martin McKee BMJ 2003; 326: 777-778.

2. Mortality control charts for comparing performance of surgical units: validation study using hospital mortality data Paris P Tekkis, Peter McCulloch, Adrian C Steger, Irving S Benjamin, and Jan D Poloniecki BMJ 2003; 326: 786-788.

3. Playing Russian Roulette Richard G Fiddian-Green bmj.com, 11 Apr 2003 Rapid response to: Interactive case report: J Bligh, R Farrow, Ruth, Richard Farrow, Linda Hands, Natasha Kapur, Malcolm H A Rustin, John Benson, and Ed Peile BMJ 2003; 326: 804-807.

4. Coon WW. Splenectomy in the treatment of hemolytic anemia.Arch Surg. 1985 May;120(5):625-8.

5. Jarnagin WR, Gonen M, Fong Y, DeMatteo RP, Ben-Porat L, Little S, Corvera C, Weber S, Blumgart LH. Improvement in perioperative outcome after hepatic resection: analysis of 1,803 consecutive cases over the past decade. Ann Surg. 2002 Oct;236(4):397-406; discussion 406-7.

6. Yeo CJ, Cameron JL, Sohn TA, Lillemoe KD, Pitt HA, Talamini MA, Hruban RH, Ord SE, Sauter PK, Coleman J, Zahurak ML, Grochow LB, Abrams RA. Six hundred fifty consecutive pancreaticoduodenectomies in the 1990s: pathology, complications, and outcomes.Ann Surg. 1997 Sep;226(3):248-57; discussion 257-60.

Competing interests:   None declared

Assessment of outcome is complex 17 April 2003
Previous Rapid Response Next Rapid Response Top
Frank A. Frizelle,
Professor of colorectal surgery
Christchurch Hospital, New Zealand,
John Frye

Send response to journal:
Re: Assessment of outcome is complex

Editor: We read with some interest the analysis of two databases by these authors which appears to be an improvement on existing models of risk stratification especially with the recently published annual assessments by the Dr Foster group topical (1).

The authors highlight the difficulties in data collection for this type of analysis. The analysed databases contains patient information given on a voluntary basis, approximately one quarter of the data is of a retrospective nature and a significant amount of data was not available for analysis which will hamper its direct comparison to other UK units.

In other fields of surgery, for example colorectal surgery, subspecialists attract not only a more elective practice, which might lower the operative mortality figures, but also referrals of a more complex nature, which could be expected to increase these figures (2-5). Case mix influences results of surgery. Surgeons who specialise in colorectal surgery, undertake a disproportionate number of elective (low risk) cases, and as such their results may appear superficially better. Murray et al 1995 (2) has shown that adjustment for case mix can lead to a substantial change in the relative performance of surgeons. Sagar et al 1994 (3), has shown that by adjusting for patient differences the initial appearances of the data may in fact be reversed. These referral practices are hard to “control for” by examining only pre-operative risks and mortality outcomes.

We agree with the comment in the accompanying editorial - “hospitals are complex systems that are part of larger systems and also contain subsystems”. Variability in outcome has been previously attributed to the interplay of multiple factors including; surgical ability, surgical technique, case mix, case volume, institutional influences peri-operative care and anaesthetic care (4,5). Units that have more experience in their particular field may have a wider range of operative and non-operative approaches available than less experienced units and may also have more subspecialist resources available within the unit for improved decision making pre and post-operatively. Units with the benefits of better support of auxiliary surgical and medical services may also show improvements in their figures, which reflects the multi-disciplinary nature of modern surgery.

Ideally surgical performance should be monitored prospectively and examine not only by operative mortality but also by post-operative morbidity and quality of life measurements, and allow for case mix with comparison. Until then this paper does appear to improve on the current methods of evaluating surgical units’ performance.

1. Tekkis P, McCulloch P, Steger AC, Benjamin IS, Poloniecki JD. Mortality control charts for comparing performance of surgical units: validation study using hospital mortality data bmj.com 2003;326:786

2. Murray GD, Hayes C, Fowler S, Dunn DC. Presentation of comparative audit data. Br J Surg 1995, 82, 329-332.

3. Sagar PM, Hartley MN, Mancey-Jones B, Sedman PC, May J, Macfie J. Comparative audit of colorectal resection with POSSUM scoring system. Br J Surg. 1994, 81, 1492-1494.

4. Houghton A. Variation in outcome of surgical procedures. Br J Surg. 1994, 81, 653-660

5. Unhi SS, Kent SJS. Which surgeons in a district general hospital should treat patients with carcinoma of the rectum. J R Coll Edinb. 1995, 40, 52- 54.

Competing interests:   None declared

Why Not to Judge Performance with Statistical Process Control 26 April 2003
Previous Rapid Response Next Rapid Response Top
Anthony P Morton,
Consultant, Infection Management Services Princess Alexandra Hospital Brisbane
Princess Alexandra Hospital Brisbane Queensland 4102 Australia

Send response to journal:
Re: Why Not to Judge Performance with Statistical Process Control

The paper by Tekkis, McCulloch, Steger, Benjamin and Polonecki (2003) raises two important issues - first, the role of statistical process control (SPC) in hospitals and secondly, the mechanisms and purpose of risk adjustment.

SPC was developed in the 1920’s by Walter Shewhart and its employment has been honed by a number of quality experts, the most prominent of whom has perhaps been Edwards Deming (Salsburg 2001). SPC was developed so that people manning industrial production lines could learn about the quality of their product and take early corrective action if that quality began to deteriorate. Deming (1993) emphasises again and again that the crucial element is the system and its constituent processes, and that SPC is useful only in a secondary role as part of the Deming cycle of fixing the system, followed by monitoring, analysis and feedback - analysing and optimising the system is the primary function in quality improvement, not SPC. Nowhere is it advocated that SPC should be used to judge and compare institutions; Deming in fact repeatedly warns against this behavior.

When we see SPC methods employed on hospital data, there is often total disregard for the principles of quality improvement enunciated by such people as Edwards Deming. SPC is used to judge performance and compare institutions. Usually there is nothing said about analysing and optimising the underlying system and its constituent processes. This is a terrible misuse of SPC. Sanai (2003) describes in graphic detail what happens when we ignore the system.

When used to judge, SPC significance levels must be set relatively high to avoid false positive signals. This means that sensitivity suffers so that there can be considerable delay in detecting genuine signals, and during this time unnecessary patient injury can occur. In hospitals, the “cost” of false negative states is of particular importance - it is the problem that no one knows about or that is ignored that can be the most dangerous. Thus, employing SPC in a judgmental way can destroy its ability to give needed early warning. However, in a learning environment where a department has first carefully analysed and optimised its systems and then employed SPC to monitor, significance levels can be set to give high sensitivity so that genuine changes are detected much more promptly. This means that occasional false positives need to be tolerated. However, provided there is sufficient specificity to prevent tampering, this is of little consequence. It is usually easy for a suitably qualified medical specialist to detect the occasional false positive. Viewed in this way, an SPC signal does not indicate that there is a problem - it indicates that there is sufficient evidence to search for a possible or probable problem. Judgment is only appropriate if, after a signal, such a search is not performed or, if having identified a problem, it is not addressed.

It is important also to consider other consequences of employing SPC in a judgmental manner. First, because the primacy of the system is so often ignored, there is an emphasis on blame rather than finding and correcting causes of problems. Since these usually arise in systems and staff in the wards may have little or no say in many of these systems, morale is damaged leading to increased likelihood of medical error. Furthermore, staff begin to game data in subtle ways that make them useless for improving quality. Finally, staff use data to justify what they do instead of using them to learn how to improve (Jacobson, Mindell and McKee 2003).

When used appropriately to learn, SPC is an extraordinarily useful adjunct to systems analysis and optimisation for quality improvement. When misused to judge, the ability of SPC to improve quality is destroyed and its potential to do harm becomes very real.

Substandard performance always has systems problems at its basis. Dealing with substandard performance requires management that is knowledgeable in understanding systems and that has the courage and resources to correct systems problems. For example, a surgeon who does one or two complex surgical procedures per year is likely to have inferior results. The system problem is the performance of small amounts of complex surgery and a hospital administration that allows this to occur is failing to analyse and optimise its systems. As mentioned above, Sanai (2003) gives an eloquent and moving account of the consequences of ignoring the health of the system and Rothwell, Warlow, and the European Carotid Surgery Trialists’ Collaborative Group (1999) and Carter (2003) describe how difficult it is to use statistical methods to judge performance.

It is important to consider the mechanism and role of risk adjustment for which the authors have performed a careful statistical analysis. However, the POSSUM method they use figures prominently in another recent publication by this group (Tekkis, Kessaris, Kocher, Poloniecki, Lyttle and Windosor 1993); they found that for colorectal surgery it can be poorly calibrated. In addition, it requires a good deal of patient information, some of which may not be available for all patients. It is worthwhile to reflect that Iezzoni, Ash, Shwartz, Daley, Hughes and Mackeirnan (1995) show that “predicting who dies depends on how severity is measured” and Thomas and Hofer (1999), after a careful analysis, conclude that “reports that measure quality using risk adjusted mortality rates misinform the public”. Perhaps the most comprehensive risk adjustment method currently in use is the APACHE system yet anomalies occur not infrequently when it is applied to differing populations. One problem is that random variation can be very large compared with variation due to substandard performance. In addition, the myriad of patient characteristics that may exist can mean that precise and reliable risk adjustment is a mirage.

However, when used appropriately by individual institutions on their own data sequentially, for example in control charts, complex risk adjustment is almost certainly unnecessary even if it were reliable enough to be advocated. Lawrance, Dorch, Sapsford, Mackintosh, Greenwood, Jackson, Morrell, Robinson and Hall (2001) have described a simple risk adjustment tool for myocardial infarction patients that works well when used in this manner in control charts. In addition, Sutton, Bann, Brooks and Sarin (2002) have described a simple surgical risk adjustment method that should be adequate for use by institutions monitoring their own systems and processes sequentially. Simple tools that work are always superior to complex ones with spurious precision and reliability that have the capacity to mislead.

References

Tekkis P, McCulloch P, Steger A, Benjamin I and Poloniecki J “Mortality Control Charts for Comparing Performance of Surgical Units: Validation Study Using Hospital Mortality Data” BMJ 2003;326:786-791. Salsburg D “The Lady Tasting Tea. How Statistics Revolutionised Science in the Twentieth Century” New York W.H.Freeman and Co. 2001. Deming W.E. “The New Economics for Industry, Government and Education” Cambridge, Massachusetts Institute of Technology 1993. Sanai L The Sunday Times January 26th 2003. Jacobson B, Mindell J and McKee M “Hospital Mortality League Tables” BMJ 2003;326:777-778. Rothwell P, Warlow C, and the European Carotid Surgery Trialists’ Collaborative Group “Interpretation of Operative Risks of Individual Surgeons” Lancet 1999;353:1325. Carter D “The Surgeon as a Risk Factor” BMJ 2003;326:832-833. Tekkis P, Kessaris N, Kocher H, Poloniecki J, Lyttle J and Windsor A “Evaluation of POSSUM and P-POSSUM Scoring Systems in Patients Undergoing Colorectal Surgery” British Journal of Surgery 2003;90:340-345. Iezzoni L, Ash a, Shwartz M, Daley J, Hughes J and Mackiernan Y “Predicting Who Dies Depends on How Severity is Measured: Implications for Evaluating Patient Outcomes” Annals of Internal Mecicine 1995;123:763-770. Thomas W and Hofer T “Accuracy of Risk-Adjusted Mortality Rate as a Measure of Hospital Quality of Care” Medical Care 1999;37:83-92. Lawrance R, Dorsch M, Sapsford R, Mackintosh A, Greenwood D, Jackson B, Morrell C, Robinson M and Hall A “Use of Cumulative Mortality Data in Patients with Acute Myocardial Infarction for Early Detection of Variation in Clinical Practice” British Medical Journal 2001;223:324-327. Sutton R, Bann S, Brooks M and Sarin S “The Surgical Risk Scale as an Improved Tool for Risk-adjusted Analysis in Comparative Surgical Audit” British Journal of Surgery 2002;89:763-768.

Competing interests:   None declared

Control charts could mislead 2 May 2003
Previous Rapid Response Next Rapid Response Top
Chris Sherlaw-Johnson,
Senior Research Fellow
Clinical Operational Research Unit, University College London, Gower St, London WC1E 6BT

Send response to journal:
Re: Control charts could mislead

The paper by Tekkis et al illustrates the difficulties of trying to compare outcomes for several units when case mix is variable. In their analysis, mortality rates are adjusted as an attempt to level the playing field but my concern is whether this can give rise to misleading results.

I shall illustrate by a hypothetical example. Consider a Unit that has undertaken 50 procedures with a mean risk rather below average at 8% but for which 8 deaths were recorded (i.e. a mortality rate of 16%). Before any adjustment, each unit will have its own set of confidence limits depending on their case mix. So, in this case, calculating exact confidence limits based on the binomial distribution, the upper 90% limit (two-tailed) is at 17%. With this limit as an 'alert' boundary, the observed mortality rate should not give cause for concern.

The adjusted mortality rate is derived from the ratio of the observed to expected mortality multiplied by the pooled mean. (This is stated in the full text of the paper on the BMJ web site.) For the hypothetical Unit the adjusted rate is (16/8) times 12%, which equals 24%. After adjustment it is desirable that the relationship a Unit has with respect to its own confidence limits is preserved vis-a-vis the new limits that are used for the control charts in figures 3 to 5. So, this hypothetical Unit that lies within its own 90% limit should, after adjustment, lie within the 90% limit on the control chart. Where does it actually lie? On the control chart, the upper 90% limit after 50 procedures corresponds to a rate of 22% and the upper 95% limit to a rate of 24%. So the Unit lies on the upper 95% limit and hence could signal a warning unnecessarily. Admittedly this example is somewhat engineered, and mean risks of 8% after 50 procedures may be unrealistic, but it does highlight the dangers of using this kind of plot for control charting. In an environment in which a unit is to be judged by whether or not it lies outside a particular limit, the errors it generates could be critical.

Competing interests:   None declared

Data validation and interpretation in POSSUM 13 May 2003
Previous Rapid Response  Top
George A Khoury,
Consultant Surgeon
Conquest Hospital, East Sussex NHS trust, St Leonards-on -sea, The Ridge,TN37 7RD

Send response to journal:
Re: Data validation and interpretation in POSSUM

Mortality Control Charts for Comparing Performance of Surgical Units: validation study, using hospital mortality data. Tekkis PP et al, Br Med J 2003:326;786-788

Dear Sir

The paper reports a mean mortality of almost 10% and 30% for elective and emergency gastro-oesophageal surgery in 29 units with some approaching 50%. It concludes that none had under-performed. It sets the high mortality figures as a benchmark against which performance is judged. A multi-factorial model is created that best fit to this dataset and artificially creates a high minimum risk by using the original POSSUM (1) instead of P-POSSUM (2) greatly over predicting mortality in low risk patients. This is compounded it appears, by inclusion of identical risk factors twice within the multi-factorial model i.e. malignancy category, mode of surgery, and age, already incorporated within POSSUM.

Whilst confirming that accurate and reliable data is essential, it is acknowledged that the data in the study is limited. It is unclear whether physiological scores were derived from data collated on admission or as intended (1) after resuscitation. It was also unclear whether the original definitions, for operative parameters, were consistently followed. Whilst an emergency operation is defined as that within 24 hours of admission, many surgeons disregard this and include cases beyond 24 hours and returns to theatre with complications. In our institution, 70% of emergency patients did not meet the criteria, but resulted in increase in the overall POSSUM score. Ambiguity exists also as to the number of procedures performed during an operation. Non clinical audit officers are unable to validate POSSUM data without clear guidelines. The occurrence of peritoneal soiling or intra-operative blood loss serves only to enhance the expected mortality, condoning the otherwise high mortality. The institutions in the study were not randomly selected and this is a significant flaw in the study which may account for the high mortality. The control charts excuse high mortalities for low workloads.. It justifies a high mortality when guidelines preclude small numbers being undertaken.

The POSSUM (1) formula was derived from a general surgical population. The application of this in gastro- oesophageal surgery is suspect albeit adjustments were made. A new formula may be appropriate with fine-tuning of specific operative parameters based on prospective validated data in recognised centres with adequate cases. Bias is inevitable if curative and palliative, or oesophageal and gastric surgery, are combined when numbers are few and surgeons have widely differing operability rates ignored by POSSUM. Even within these sub groups it is unwise to compare outcomes for trans-hiatal oesophagectomy with those of a three phase McEwan procedure with radical lymphadenectomy (3). The former benefits from lower operative mortality but is likely to result in worse survival. The current POSSUM operative parameters do not cater for the true operative severity as it incorrectly equates the severity of a partial gastrectomy with that of an oesophagectomy. Also POSSUM ignores obesity and diabetes, yet it condones poor judgement when surgery is undertaken in advanced cases where the score would reflect this but not question the wisdom of such surgery.

The purpose of a professionally led system of quality control is the objective assessment of outcome taking into account case-mix, co-morbidity and operative severity but these systems should be sound. The POSSUM system is neither simple nor practical Validation of data, internal or external, is fraught with difficulties. The observed to expected mortality ratio generated by POSSUM or its derivatives can be misleading since both components require critical analysis, even though the overall ratio is favourable. It is only then that a judgement can be made on whether performance truly meets with standards. The Surgical Risk Scale system (SRS) is preferred (4) since it is independent of the surgeon and data can easily be examined. Alternatively, the pre-operative physiological scores alone are employed in a manner similar to that of the ruptured aortic aneurysm (RAAA- POSSUM) equation, reported (5) as being effective in sub-specialty sub-group analysis.

George A Khoury MS, FRCS
Consultant General & Colorectal Surgeon
Conquest Hospital, East Sussex NHS Trust, The Ridge, St Leonards on Sea TN37 7RD

REFERENCES

(1) Copeland G P, Jones D. Walters M, POSSUM: a scoring system for surgical audit. Br J Surg 1991; 78:355-360

(2) Wh iteley MS, Prytherch DR, Higgins B, Weaver PC and Prout WG. An evaluation of the Possum surgical scoring system. Br J Surg 1996; 83, 812-815

(3) Khoury GA, Oesophageal Surgery under Akiyama, The Lancet 1989; 1:91-92

(4) Sutton, R, Bann S, Brooks m, Sarin S. The Surgical risk Scale as an improved tool for risk-adjusted analysis in comparative surgical audit. Brit J Surg 2002;89:763-768

(5) Neary WD, Crow P, Foy C, Prytherch D, Heather BP, Earnshaw JJ Comparison of POSSUM scoring and the Hardman Index in selection of patients for repair of ruptured abdominal aortic aneurysm. Br. J Surg 2003;90:421-425

Competing interests:   None declared