BMJ  2005;330:929 (23 April), doi:10.1136/bmj.38377.675440.8F (published 15 April 2005)

Paper

Was Rodney Ledward a statistical outlier? Retrospective analysis using routine hospital data to identify gynaecologists' performance

Mike Harley, director1, Mohammed A Mohammed, senior research fellow2, Shakir Hussain, statistician3, John Yates, professor1, Abdullah Almasri, visiting statistician3

1 Inter-Authority Comparisons and Consultancy, Health Services Management Centre, University of Birmingham, Birmingham B15 2RT, 2 Department of Public Health and Epidemiology, University of Birmingham, Birmingham B15 2TT, 3 Department of Primary Care and General Practice, University of Birmingham

Correspondence to: M Harley M.J.Harley{at}bham.ac.uk

Abstract

Objectives To investigate whether routinely collected data from hospital episode statistics could be used to identify the gynaecologist Rodney Ledward, who was suspended in 1996 and was the subject of the Ritchie inquiry into quality and practice within the NHS.

Design A mixed scanning approach was used to identify seven variables from hospital episode statistics that were likely to be associated with potentially poor performance. A blinded multivariate analysis was undertaken to determine the distance (known as the Mahalanobis distance) in the seven indicator multidimensional space that each consultant was from the average consultant in each year. The change in Mahalanobis distance over time was also investigated by using a mixed effects model.

Setting NHS hospital trusts in two English regions, in the five years from 1991-2 to 1995-6.

Population Gynaecology consultants (n = 143) and their hospital episode statistics data.

Main outcome measure Whether Ledward was a statistical outlier at the 95% level.

Results The proportion of consultants who were outliers in any one year (at the 95% significance level) ranged from 9% to 20%. Ledward appeared as an outlier in three of the five years. Our mixed effects (multi-year) model identified nine high outlier consultants, including Ledward.

Conclusion It was possible to identify Ledward as an outlier by using hospital episode statistics data. Although our method found other outlier consultants, we strongly caution that these outliers should not be overinterpreted as indicative of "poor" performance. Instead, a scientific search for a credible explanation should be undertaken, but this was outside the remit of our study. The set of indicators used means that cancer specialists, for example, are likely to have high values for several indicators, and the approach needs to be refined to deal with case mix variation. Even after allowing for that, the interpretation of outlier status is still as yet unclear. Further prospective evaluation of our method is warranted, but our overall approach may be potentially useful in other settings, especially where performance entails several indicator variables.

Introduction

The Ritchie report was based on one of the most detailed inquiries yet undertaken into the clinical practice of an individual gynaecologist, Rodney Ledward.1 The criticisms made and subsequently substantiated against Ledward included lack of care and judgment preoperatively, failings in surgical skills, inappropriate delegation to junior staff, and poor postoperative care and judgment.

In common with many other inquiries, little use was made of comparative data regarding the performance of individual consultants or surgical teams. For over 20 years, routine data sources such as hospital episode statistics have been widely perceived as being of little value because of problems with completeness and accuracy. Much is of variable quality and equally variable relevance to the quality and outcomes of the care that the NHS provides.2

Despite these concerns, hospital episode statistics data were used in the Bristol inquiry,3 which concluded unequivocally: hospital episode statistics "was [sic] not recognised as a valuable tool for analysing the performance of hospitals. It is now, belatedly." This paper compares the performance of 142 gynaecology consultants with the performance of Ledward over a period of five years, to determine if Ledward was a statistical outlier according to hospital episode statistics data.

Methods

Using the review of the Ritchie report, other reports of alleged malpractice, a general review of literature on performance failures, and discussions with a practising gynaecologist, we compiled a provisional list of 11 variables that could be indicative of poor performance and could be derived from hospital episode statistics. We refined this list by eliminating those with high inter-correlations. We produced a list of seven indicator variables (table). Nevertheless, we emphasise that, for each indicator, valid reasons may exist that could explain performance occurring in the high end of that indicator distribution. Much less likely is that the same team would display extreme performance across a basket of indicators.


View this table:
[in this window]
[in a new window]
 
Table 1 Seven clinically relevant indicator variables from hospital episode statistics

 

We obtained complications by scanning all seven diagnostic fields of hospital episode statistics. We then calculated each indicator for each of the years from 1991-2 to 1995-6 for Ledward, his three colleagues in the same hospital, and all the gynaecologists in one other region, the West Midlands.

We undertook a retrospective desktop statistical analysis to determine whether Ledward could be identified as a statistical outlier. We assigned a study code to all consultants, and the two analysts were blinded to the code of Ledward. The analysis proceeded in three stages.

Stage 1
Exploratory data analysis—Of the 143 consultants, 68 appeared in all five years. See bmj.com for the number of consultants in each year and the numbers excluded because of missing data. The pattern of missing data was consistent with data missing at random (P < 0.0005).

Stage 2
We carried out a multivariate analysis to detect outliers, based on the computation of a robust Mahalanobis distance4 for each consultant in each year. The statistical details are provided on bmj.com. For each year we computed, from the variable space of the seven indicators, a Mahalanobis distance for each consultant. The Mahalanobis distance is a measure of the "distance" between the origin in the seven indicator variable space and a given data point. So a consultant with average values for each variable will have a Mahalanobis distance of zero, and this represents the origin. Consultants who are furthest away from the origin will have relatively larger distances. For each Mahalanobis distance we also derived an approximate 95% confidence interval, using computer simulation techniques.

The square root of the Mahalanobis distance ({surd}MD) is known to follow approximately a {surd}{chi}2 distribution with k degrees of freedom (k being equal to the number of indicator variables, seven in our case),4 and so we used the mean of the {surd}{chi}2, which is given by the {surd}k degrees of freedom ({surd}7 = 2.66) to define outliers.4 Consultants with 95% intervals above the 2.66 threshold were deemed to be outliers. We report the number of outlier consultants for each year.

Stage 3
We also investigated the change in MD over the five years, using hierarchical analyses for repeated measurements. We constructed a two level hierarchical model, with consultant at level 1 (highest level) and their respective Mahalanobis distances at level 2 (lowest level). We used the standardised residual output from this model (see figure 1) to identify outliers beyond 2 standard deviations.



View larger version (38K):
[in this window]
[in a new window]
 
Fig 1 Fitted values versus the standardised residuals from statistical model. Consultants with standardised residuals outside the ±2 standardised residuals envelope are deemed to be outliers. Ledward is the larger filled circle

 

Results

See bmj.com for the robust {surd}MD for each consultant for each year, and summary of the number of outlier consultants. Ledward seemed to be an outlier in three out of five consecutive years

We also constructed a model to investigate the variation in {surd}MD over time (see bmj.com for further details), which reached significance (P = 0.0043). Figure 1 shows standardised residuals from the model. From this figure, we identified nine high outlier consultants and three low outlier consultants.

After these two analyses, MH revealed the consultant code and confirmed that Ledward was a statistical outlier. Figure 2 shows the variable values for Ledward. Several other consultants were outliers. Two consultants were outliers in all five years, two consultants were outliers in four years, and seven consultants (including Ledward) were outliers in three years.



View larger version (44K):
[in this window]
[in a new window]
 
Fig 2 Histograms for the seven indicator variables, the total number of episodes per consultant, and the square root of the Mahalanobis distance for all years combined. Coloured boxes show the values for Ledward for each of the five years (1991-2 to 1995-6), respectively

 

Discussion

Our study shows a robust statistical method for detecting outlier consultant firms, using a limited set of indicators derived from hospital episode statistics. Ledward was an outlier in three out of five consecutive years, and also when we considered the sequence of his Mahalanobis distances over time. Other outliers should be regarded as signals meriting a scientific search for a credible explanation.5

Potential limitations of the study
The measurement of poor clinical performance in the NHS has no gold standard with which to compare this or any other statistical method.6 Recognising the limitations of statistics in this type of work is therefore important.6 Furthermore, the degree of statistical refinement applied to such problems must be weighed against the more fundamental limitations of the datasets available, their quality, and the role of human judgment in selecting the indicators.

The issue of what to do with subjects who have missing data is important. We excluded these subjects, but this creates the inappropriate impression that consultants with missing data may not be subject to a monitoring process. Although missing or poor quality data can hamper all analyses, they may not, as shown in the Bristol analysis,7 radically alter the ability to detect outliers. One statistical strategy to deal with missing data is imputation, although a more fundamental solution is to improve data collection methods.6

Hospital episode statistics contain a limited number of variables, of which only a portion are potentially useful indicators of quality of care or factors relating to the case mix of patients.

Furthermore, one can easily reduce or increase the number of statistical outlier signals by shortening or widening the intervals of uncertainty, or by using non-robust statistical methods, but this is not a purely statistical question. We must also consider the costs and benefits (including findings) of subsequent investigations. For example, after simulation to determine individual intervals of uncertainty, Ledward was an outlier in three of the five years, but his {surd}MD was above the 95th centile (3.75) in four out of five years, indicating that it may be prudent to review consultants with large Mahalanobis distance (say, above the 95th centile) even though the individual interval of uncertainty crosses (only just) the expected mean.

Proposed framework for investigation
The pyramid model of investigation8 is based on the premise that the bulk of failure is attributable to the system and not the individual. The pyramid prescribes a check of the following variables in the order listed: check the data, check the patient case mix, check the structure, check the process of care, and, finally, carefully check the carers involved.

Careful handling is essential
The presence of substantial criticism in the media, and even appearance before the General Medical Council, does not guarantee that those so accused are actually guilty of poor performance. Once an individual has been publicly identified, the stigma remains,9 and we cannot undo what has been done. These issues are especially important if the explanation for the poor performance is outside the gift of the individual carer.5


What is already known on this topic

Routine hospital episode statistics have now been used to investigate mortality after cardiac surgery at hospital level (for example, in the Bristol inquiry)

The use of hospital episode statistics data to identify broader suboptimal performance where death is a rare event remains less explored, especially at consultant level

What this study adds

A robust new method has been identified for scanning multi-indicator, multi-year data from hospital episode statistics to identify outlier consultants in gynaecology

The method was able to identify Rodney Ledward, who was the subject of the Ritchie inquiry


Useful methods for monitoring performance
Although scanning methods6 such as ours will never have complete diagnostic certainty, they could be used to reliably identify signals from noise,7 which need to be systematically and sensitively examined, perhaps confidentially, by peers. Prevention is preferable but this presents an altogether different challenge—engineering the safety of patients into the process of care by design.


{webplus.f1}Statistical details are on bmj.com

{elps.f1}This is the abridged version of an article that was posted on bmj.com on 15 April 2005: http://bmj.com/cgi/doi/10.1136/bmj.38377.675440.8F

We thank J Duffy for his statistical advice at initial stages of this project; R Penketh, consultant gynaecologist, for his advice on indicators; and R Holder for his advice regarding the limits of uncertainty. We are grateful to S Evans and R Lilford for their critical comments on earlier drafts of the manuscript. Thanks are also due to the Kings Fund for funding the initial part of this work. AA is supported by the Swedish Foundation for International Cooperation in Research and Higher Education.

Contributions: See bmj.com

Funding: The Kings Fund funded the initial stages of this project.

Competing interests: None declared.

References

  1. Department of Health. The report of the inquiry into quality and practice within the National Health Service arising from the actions of Rodney Ledward. (The Ritchie report.) London: Stationery Office, 2000.
  2. Department of Health. An organisation with a memory: report of an expert group on learning from adverse events in the NHS chaired by the chief medical officer. London: Stationery Office, 2000.
  3. Department of Health. Report of the public inquiry into children's heart surgery at the Bristol Royal Infirmary 1984-1995: Learning from Bristol. (The Kennedy report.) London: Stationery Office, 2001.
  4. Rousseeuw PJ, Leroy AM. Robust regression and outlier detection. New York: Wiley, 1987.
  5. Lilford RJ, Mohammed MA, Spiegelhalter D, Thomson R. Use and misuse of process and outcome data in managing performance of acute medical care: avoiding institutional stigma. Lancet 2004;363: 1147-54.[CrossRef][Web of Science][Medline]
  6. Spiegelhalter D, Murray G, McPherson K, Macfarlane A, Evans S, Curnow R, et al. Monitoring clinical performance: a statistical perspective. Submission to the Bristol Inquiry, 2002.
  7. Aylin P, Alves B, Best N, Cook A, Elliot P, Evans SJ, et al. Comparison of UK paediatric cardiac surgical performance by analysis of routinely collected data 1984-96: was Bristol an outlier? Lancet 2001;358: 181-7.[CrossRef][Web of Science][Medline]
  8. Mohammed MA, Rathbone A, Myers P, Patel D, Onions H, Stevens A. An investigation into general practitioners associated with high patient mortality flagged up through the Shipman inquiry: retrospective analysis of routine data. BMJ 2004;328: 1474-7.[Abstract/Free Full Text]
  9. BBC News Online. The second surgeon: Janardan Dhasmana. http://news.bbc.co.uk/1/hi/health/1136419.stm (accessed July 2004).
(Accepted 20 January 2005)


Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to StumbleUpon StumbleUpon   Add to Technorati Technorati    What's this?

Relevant Articles

The value of administrative databases
Mohammed A Mohammed and Andrew Stevens
BMJ 2007 334: 1014-1015. [Extract] [Full Text] [PDF]

Was Rodney Ledward a statistical outlier?: Initial investigations of outliers must be chosen carefully
C Kevin Connolly
BMJ 2005 330: 1448. [Extract] [Full Text]

Was Rodney Ledward a statistical outlier?: Statistical method may be difficult to apply in clinical practice
Benjamin J Cowling and Anthony J Hedley
BMJ 2005 330: 1448. [Extract] [Full Text]

Was Rodney Ledward a statistical outlier?: Ledward's managers knew for 10 years that he was a risk
Nigel Dudley
BMJ 2005 330: 1449. [Extract] [Full Text]

Catch me as soon as you can
BMJ 2005 330: 0. [Full Text] [PDF]

How often are adverse events reported in English hospital statistics?
Paul Aylin, Shivani Tanna, Alex Bottle, and Brian Jarman
BMJ 2004 329: 369. [Full Text] [PDF]

An investigation into general practitioners associated with high patient mortality flagged up through the Shipman inquiry: retrospective analysis of routine data
Mohammed A Mohammed, Anthony Rathbone, Paulette Myers, Divya Patel, Helen Onions, and Andrew Stevens
BMJ 2004 328: 1474-1477. [Abstract] [Full Text] [PDF]

What to do about poor clinical performance in clinical trials Commentary: Of course patients should be told Commentary: The surgeon is only one factor
Su Mason, Jon Nicholl, Richard Lilford, Heather Goodare, and Tom Treasure
BMJ 2002 324: 419-421. [Extract] [Full Text] [PDF]

Explaining differences in English hospital death rates using routinely collected data
Brian Jarman, Simon Gault, Bernadette Alves, Amy Hider, Susan Dolan, Adrian Cook, Brian Hurwitz, and Lisa I Iezzoni
BMJ 1999 318: 1515-1520. [Abstract] [Full Text] [PDF]

The Wisheart affair: paediatric cardiological services in Bristol, 1990-5
Peter M Dunn
BMJ 1998 317: 1144-1145. [Extract] [Full Text] [PDF]

This article has been cited by other articles:

  • Bardsley, M, Spiegelhalter, D J, Blunt, I, Chitnis, X, Roberts, A, Bharania, S (2009). Using routine intelligence to target inspection of healthcare providers in England. Qual Saf Health Care 18: 189-194 [Abstract] [Full text]  
  • Morris, E, Quirke, P, Thomas, J D, Fairley, L, Cottier, B, Forman, D (2009). Authors' response. Gut 58: 609-610 [Full text]  
  • Morris, E, Quirke, P, Thomas, J D, Fairley, L, Cottier, B, Forman, D (2008). Unacceptable variation in abdominoperineal excision rates for rectal cancer: time to intervene?. Gut 57: 1690-1697 [Abstract] [Full text]  
  • Mohammed, M. A, Stevens, A. (2007). The value of administrative databases. BMJ 334: 1014-1015 [Full text]  
  • Dudley, N. (2005). Was Rodney Ledward a statistical outlier?: Ledward's managers knew for 10 years that he was a risk. BMJ 330: 1449-1449 [Full text]  
  • Connolly, C K. (2005). Was Rodney Ledward a statistical outlier?: Initial investigations of outliers must be chosen carefully. BMJ 330: 1448-1448 [Full text]  
  • Cowling, B. J, Hedley, A. J (2005). Was Rodney Ledward a statistical outlier?: Statistical method may be difficult to apply in clinical practice. BMJ 330: 1448-1448 [Full text]  

Rapid Responses:

Read all Rapid Responses

Surgeons in the firing line
R Justin Davies
bmj.com, 22 Apr 2005 [Full text]
Should one investigate outliers? or outlying centres?
Pantula SRK Sastry
bmj.com, 26 Apr 2005 [Full text]
Ritchie Inquiry
Jan Chalmers
bmj.com, 27 Apr 2005 [Full text]
Statistical outlier?
C Kevin Connolly
bmj.com, 28 Apr 2005 [Full text]
The Mahalanobis Distance
Benjamin J. Cowling, et al.
bmj.com, 29 Apr 2005 [Full text]
What should we do about outliers?
Simon J Caswell
bmj.com, 2 May 2005 [Full text]
Does it work?
Oliver R Dearlove
bmj.com, 14 May 2005 [Full text]



Access jobs at BMJ Careers
Whats new online at Student 

BMJ