Jump to: Page Content, Site Navigation, Site Search,
You are seeing this message because your web browser does not support basic web standards. Find out more about why this message is appearing and what you can do to make your experience on this site better.
BMJ 2005;330:929 (23 April), doi:10.1136/bmj.38377.675440.8F (published 15 April 2005)
Mike Harley, director1, Mohammed A Mohammed, senior research fellow2, Shakir Hussain, statistician3, John Yates, professor1, Abdullah Almasri, visiting statistician3
1 Inter-Authority Comparisons and Consultancy, Health Services Management Centre, University of Birmingham, Birmingham B15 2RT, 2 Department of Public Health and Epidemiology, University of Birmingham, Birmingham B15 2TT, 3 Department of Primary Care and General Practice, University of Birmingham
Correspondence to: M Harley M.J.Harley{at}bham.ac.uk
Design A mixed scanning approach was used to identify seven variables from hospital episode statistics that were likely to be associated with potentially poor performance. A blinded multivariate analysis was undertaken to determine the distance (known as the Mahalanobis distance) in the seven indicator multidimensional space that each consultant was from the average consultant in each year. The change in Mahalanobis distance over time was also investigated by using a mixed effects model.
Setting NHS hospital trusts in two English regions, in the five years from 1991-2 to 1995-6.
Population Gynaecology consultants (n = 143) and their hospital episode statistics data.
Main outcome measure Whether Ledward was a statistical outlier at the 95% level.
Results The proportion of consultants who were outliers in any one year (at the 95% significance level) ranged from 9% to 20%. Ledward appeared as an outlier in three of the five years. Our mixed effects (multi-year) model identified nine high outlier consultants, including Ledward.
Conclusion It was possible to identify Ledward as an outlier by using hospital episode statistics data. Although our method found other outlier consultants, we strongly caution that these outliers should not be overinterpreted as indicative of "poor" performance. Instead, a scientific search for a credible explanation should be undertaken, but this was outside the remit of our study. The set of indicators used means that cancer specialists, for example, are likely to have high values for several indicators, and the approach needs to be refined to deal with case mix variation. Even after allowing for that, the interpretation of outlier status is still as yet unclear. Further prospective evaluation of our method is warranted, but our overall approach may be potentially useful in other settings, especially where performance entails several indicator variables.
In common with many other inquiries, little use was made of comparative data regarding the performance of individual consultants or surgical teams. For over 20 years, routine data sources such as hospital episode statistics have been widely perceived as being of little value because of problems with completeness and accuracy. Much is of variable quality and equally variable relevance to the quality and outcomes of the care that the NHS provides.2
Despite these concerns, hospital episode statistics data were used in the Bristol inquiry,3 which concluded unequivocally: hospital episode statistics "was [sic] not recognised as a valuable tool for analysing the performance of hospitals. It is now, belatedly." This paper compares the performance of 142 gynaecology consultants with the performance of Ledward over a period of five years, to determine if Ledward was a statistical outlier according to hospital episode statistics data.
|
We obtained complications by scanning all seven diagnostic fields of hospital episode statistics. We then calculated each indicator for each of the years from 1991-2 to 1995-6 for Ledward, his three colleagues in the same hospital, and all the gynaecologists in one other region, the West Midlands.
We undertook a retrospective desktop statistical analysis to determine whether Ledward could be identified as a statistical outlier. We assigned a study code to all consultants, and the two analysts were blinded to the code of Ledward. The analysis proceeded in three stages.
Stage 1
Exploratory data analysisOf the 143 consultants, 68 appeared in all five years. See bmj.com for the number of consultants in each year and the numbers excluded because of missing data. The pattern of missing data was consistent with data missing at random (P < 0.0005).
Stage 2
We carried out a multivariate analysis to detect outliers, based on the computation of a robust Mahalanobis distance4 for each consultant in each year. The statistical details are provided on bmj.com. For each year we computed, from the variable space of the seven indicators, a Mahalanobis distance for each consultant. The Mahalanobis distance is a measure of the "distance" between the origin in the seven indicator variable space and a given data point. So a consultant with average values for each variable will have a Mahalanobis distance of zero, and this represents the origin. Consultants who are furthest away from the origin will have relatively larger distances. For each Mahalanobis distance we also derived an approximate 95% confidence interval, using computer simulation techniques.
The square root of the Mahalanobis distance (
MD) is known to follow approximately a 
2 distribution with k degrees of freedom (k being equal to the number of indicator variables, seven in our case),4 and so we used the mean of the 
2, which is given by the
k degrees of freedom (
7 = 2.66) to define outliers.4 Consultants with 95% intervals above the 2.66 threshold were deemed to be outliers. We report the number of outlier consultants for each year.
Stage 3
We also investigated the change in MD over the five years, using hierarchical analyses for repeated measurements. We constructed a two level hierarchical model, with consultant at level 1 (highest level) and their respective Mahalanobis distances at level 2 (lowest level). We used the standardised residual output from this model (see figure 1) to identify outliers beyond 2 standard deviations.
|
MD for each consultant for each year, and summary of the number of outlier consultants. Ledward seemed to be an outlier in three out of five consecutive years
We also constructed a model to investigate the variation in
MD over time (see bmj.com for further details), which reached significance (P = 0.0043). Figure 1 shows standardised residuals from the model. From this figure, we identified nine high outlier consultants and three low outlier consultants.
After these two analyses, MH revealed the consultant code and confirmed that Ledward was a statistical outlier. Figure 2 shows the variable values for Ledward. Several other consultants were outliers. Two consultants were outliers in all five years, two consultants were outliers in four years, and seven consultants (including Ledward) were outliers in three years.
|
Potential limitations of the study
The measurement of poor clinical performance in the NHS has no gold standard with which to compare this or any other statistical method.6 Recognising the limitations of statistics in this type of work is therefore important.6 Furthermore, the degree of statistical refinement applied to such problems must be weighed against the more fundamental limitations of the datasets available, their quality, and the role of human judgment in selecting the indicators.
The issue of what to do with subjects who have missing data is important. We excluded these subjects, but this creates the inappropriate impression that consultants with missing data may not be subject to a monitoring process. Although missing or poor quality data can hamper all analyses, they may not, as shown in the Bristol analysis,7 radically alter the ability to detect outliers. One statistical strategy to deal with missing data is imputation, although a more fundamental solution is to improve data collection methods.6
Hospital episode statistics contain a limited number of variables, of which only a portion are potentially useful indicators of quality of care or factors relating to the case mix of patients.
Furthermore, one can easily reduce or increase the number of statistical outlier signals by shortening or widening the intervals of uncertainty, or by using non-robust statistical methods, but this is not a purely statistical question. We must also consider the costs and benefits (including findings) of subsequent investigations. For example, after simulation to determine individual intervals of uncertainty, Ledward was an outlier in three of the five years, but his
MD was above the 95th centile (3.75) in four out of five years, indicating that it may be prudent to review consultants with large Mahalanobis distance (say, above the 95th centile) even though the individual interval of uncertainty crosses (only just) the expected mean.
Proposed framework for investigation
The pyramid model of investigation8 is based on the premise that the bulk of failure is attributable to the system and not the individual. The pyramid prescribes a check of the following variables in the order listed: check the data, check the patient case mix, check the structure, check the process of care, and, finally, carefully check the carers involved.
Careful handling is essential
The presence of substantial criticism in the media, and even appearance before the General Medical Council, does not guarantee that those so accused are actually guilty of poor performance. Once an individual has been publicly identified, the stigma remains,9 and we cannot undo what has been done. These issues are especially important if the explanation for the poor performance is outside the gift of the individual carer.5
|
Useful methods for monitoring performance
Although scanning methods6 such as ours will never have complete diagnostic certainty, they could be used to reliably identify signals from noise,7 which need to be systematically and sensitively examined, perhaps confidentially, by peers. Prevention is preferable but this presents an altogether different challengeengineering the safety of patients into the process of care by design.
This is the abridged version of an article that was posted on bmj.com on 15 April 2005: http://bmj.com/cgi/doi/10.1136/bmj.38377.675440.8F
We thank J Duffy for his statistical advice at initial stages of this project; R Penketh, consultant gynaecologist, for his advice on indicators; and R Holder for his advice regarding the limits of uncertainty. We are grateful to S Evans and R Lilford for their critical comments on earlier drafts of the manuscript. Thanks are also due to the Kings Fund for funding the initial part of this work. AA is supported by the Swedish Foundation for International Cooperation in Research and Higher Education.
Funding: The Kings Fund funded the initial stages of this project.
Competing interests: None declared.
![]()
CiteULike
Complore
Connotea
Del.icio.us
Digg
Reddit
StumbleUpon
Technorati What's this?
Read all Rapid Responses