Intended for healthcare professionals


Detection of changes in mortality after heart surgery

BMJ 1998; 317 doi: (Published 21 November 1998) Cite this as: BMJ 1998;317:1453

Control limits failed to account for case mix

  1. Steve Gallivan, Director, Clinical Operational Research Unit,
  2. Jocelyn Lovegrover, Research fellow,
  3. Christopher Sherlaw-Johnson, Senior research fellow
  1. Department of Mathematics, University College London, London WC1E
  2. Clinical Operational Research Unit, University College London
  3. Department of Public Health Sciences, St George's Hospital Medical School, London SW17 0RE

    EDITOR—We are concerned about the graphical technique described by Poloniecki et al in their analysis of perioperative mortality rates associated with cardiac surgery.1 Figure 2 shows three traces: observed mortality performance bracketed by control limits and plotted against the number of successive cases performed. The interpretation of the middle of the traces is straightforward since it is simply a variable life adjusted display that has previously been described and will be familiar to many cardiac surgeons in the United Kingdom.2 The use of control limits, on the other hand, is new. However, the usefulness and indeed the validity of these is not clear. As the authors themselves note, their analysis does not amount to a formal test of significance since the control limits have not been corrected for multiple testing; this is a major deficiency. The use of 99% control limits rather than 95% control limits presumably increases their separation and makes them more forgiving. It is not clear which level of significance should be used, a difficulty compounded by the fact that the limits are not based on formal significance testing.

    If we understand correctly, these control limits have been calculated using a χ2 distribution. However, this fails to take into account case mix and heterogeneity of risk, the very things for which variable life adjusted display plots are used. The following example illustrates the danger in ignoring case mix when estimating ranges of variability. Consider operations on two sequences of 1000 patients with different underlying mortality risks that have been assessed preoperatively (table). Based on the given mortality risks, there is a 99% probability that the number of deaths that actually occur would fall in the range shown in the last column. These ranges are derived from exact calculations based on the binomial expansion. Using a χ2 distribution would give a range (16 to 44)close to the exact values obtained for the patients in sequence 2, for whom no heterogeneity of risk is present, but would substantially overestimate the range for the patients in sequence 1, for whom risks are heterogeneous.

    Hypothetical operations on two groups of patients with different mortality risks

    View this table:

    When examining surgical mortality, it is important to take case mix into account. However, this should be done not only when estimating the expected mortality but also when estimating the likely variability. Any overestimation of likely ranges of variability might well lead to undue complacency.


    Author's reply

    1. J D Poloniecki, Lecturer in statistics
    1. Department of Mathematics, University College London, London WC1E
    2. Clinical Operational Research Unit, University College London
    3. Department of Public Health Sciences, St George's Hospital Medical School, London SW17 0RE

      EDITOR—In order not to generate confusion when referring to the cumulative risk adjusted mortality chart, I suggest that Gallivan et al stick to the original name for this plotting technique, which I gave it in 1995. Further details of the precedent are set out at the end of our paper.

      Unfortunately Gallivan et al are not alone in the practice of claiming that a surgeon is worse than his or her colleagues, or that a colleague's performance has deteriorated (and then improved), without any statistical basis for the assertion—that is, without consideration of the rate of false positives.1

      The potential usefulness of control limits is no doubt clear to Gallivan et al. In their paper they state: “Some work on the statistical approach to this question has been done (Jan Poloniecki, unpublished observations).13“ Their haste to submit the same data with the same plotting technique to the same journal at the same time may be responsible for the fact that reference 13 has been omitted from the list of references published in theLancet.1

      Gallivan and his colleagues at University College London are right in thinking that nominal 99% control limits will give wider confidence intervals, and therefore fewer false positive results, than 95% limits based on the same test. If, as we have suggested should happen, a formal internal inquiry is launched whenever the statistical control limits are breached, then the confidence limits must be wide enough to ensure that this does not occur so often as to be unmanageable. For our series, we found that the control limits for the cumulative risk adjusted mortality were breached at most twice in nearly four years. The second occasion was particularly transient—that is, self-correcting—and might not have occurred at all if any of the parameters had been reset after the first occasion.

      Gallivan et al suggest that the test could be based on a multinomial distribution. Both 0 and 100% are valid Parsonnet scores, and with these risk estimates the multinomial confidence limits have a width of 0. None the less, they could try their suggestion on, for example, the St George's data, which they have, to find out how often the control limits for the cumulative risk adjusted mortality chart are breached.


      View Abstract