Jump to: Page Content, Site Navigation, Site Search,
You are seeing this message because your web browser does not support basic web standards. Find out more about why this message is appearing and what you can do to make your experience on this site better.
Control limits failed to account for case mix
EDITOR If we understand correctly, these control limits have been calculated
using a
We are concerned about the graphical technique described by
Poloniecki et al in their analysis of perioperative mortality rates
associated with cardiac surgery.1 Figure 2 shows three traces: observed mortality performance bracketed by control limits and
plotted against the number of successive cases performed. The
interpretation of the middle of the traces is straightforward since it
is simply a variable life adjusted display that has previously been
described and will be familiar to many cardiac surgeons in the United
Kingdom.2 The use of control limits, on the other hand, is
new. However, the usefulness and indeed the validity of these is not
clear. As the authors themselves note, their analysis does not amount
to a formal test of significance since the control limits have not been
corrected for multiple testing; this is a major deficiency. The use of
99% control limits rather than 95% control limits presumably
increases their separation and makes them more forgiving. It is not
clear which level of significance should be used, a difficulty
compounded by the fact that the limits are not based on formal
significance testing.
2 distribution. However, this fails to take into
account case mix and heterogeneity of risk, the very things for which
variable life adjusted display plots are used. The following example
illustrates the danger in ignoring case mix when estimating ranges of
variability. Consider operations on two sequences of 1000 patients with
different underlying mortality risks that have been assessed
preoperatively (table). Based on the given mortality risks, there is a
99% probability that the number of deaths that actually occur would
fall in the range shown in the last column. These ranges are
derived from exact calculations based on the binomial expansion.
Using a
2 distribution would give a range (16 to 44)
close to the exact values obtained for the patients in sequence 2, for whom no heterogeneity of risk is present, but would
substantially overestimate the range for the patients in
sequence 1, for whom risks are
heterogeneous.
When examining surgical mortality, it is important to take case mix
into account. However, this should be done not only when estimating the
expected mortality but also when estimating the likely variability. Any
overestimation of likely ranges of variability might well lead to undue
complacency.
Author's reply
EDITOR Unfortunately Gallivan et al are not alone in the practice of claiming
that a surgeon is worse than his or her colleagues, or that a
colleague's performance has deteriorated (and then improved), without
any statistical basis for the assertion The potential usefulness of control limits is no doubt clear to
Gallivan et al. In their paper they state: "Some work on the statistical approach to this question has been done (Jan Poloniecki, unpublished observations).13" Their haste to submit the
same data with the same plotting technique to the same journal at the
same time may be responsible for the fact that reference 13 has been
omitted from the list of references published in the
Lancet.1
Gallivan and his colleagues at University College London are
right in thinking that nominal 99% control limits will give wider confidence intervals, and therefore fewer false positive results, than
95% limits based on the same test. If, as we have suggested should
happen, a formal internal inquiry is launched whenever the statistical
control limits are breached, then the confidence limits must be wide
enough to ensure that this does not occur so often as to be
unmanageable. For our series, we found that the control limits for the
cumulative risk adjusted mortality were breached at most twice in
nearly four years. The second occasion was particularly
transient Gallivan et al suggest that the test could be based on a multinomial
distribution. Both 0 and 100% are valid Parsonnet scores, and with
these risk estimates the multinomial confidence limits have a width
of 0. None the less, they could try their suggestion on, for example,
the St George's data, which they have, to find out how often the
control limits for the cumulative risk adjusted mortality chart are
breached.
Steve Gallivan
Department of Mathematics, University College London, London
WC1E 6BT
Jocelyn Lovegrove
Christopher Sherlaw-Johnson
Clinical Operational Research Unit, University College London
In order not to generate confusion when referring to the
cumulative risk adjusted mortality chart, I suggest that Gallivan et al
stick to the original name for this plotting technique, which I gave it
in 1995. Further details of the precedent are set out at the end of our
paper.
that is, without consideration
of the rate of false positives.1
that is, self-correcting
and might not have occurred at
all if any of the parameters had been reset after the first
occasion.
Department of Public Health Sciences, St George's Hospital
Medical School, London SW17 0RE
© BMJ 1998