Jump to: Page Content, Site Navigation, Site Search,
You are seeing this message because your web browser does not support basic web standards. Find out more about why this message is appearing and what you can do to make your experience on this site better.
BMJ 2007;334:1044 (19 May), doi:10.1136/bmj.39168.496366.55 (published 23 April 2007)
Paul Aylin, clinical senior lecturer1, Alex Bottle, lecturer1, Azeem Majeed, professor of primary care and social medicine2
1 Dr Foster Unit, Imperial College London, London EC1A 9LA, 2 Department of Primary Care and Social Medicine, Imperial College London
Correspondence to: P Aylin p.aylin{at}imperial.ac.uk
Design Analysis of inpatient hospital episode statistics. Predictive model developed using multiple logistic regression.
Setting NHS hospital trusts in England.
Patients All patients admitted to an NHS hospital within England for isolated coronary artery bypass graft (CABG), repair of abdominal aortic aneurysm, and colorectal excision for cancer from 1996-7 to 2003-4.
Main outcome measures Deaths in hospital. Performance of models assessed with receiver operating characteristic (ROC) curve scores measuring discrimination (<0.7=poor, 0.7-0.8=reasonable, >0.8=good) and both Hosmer-Lemeshow statistics and standardised residuals measuring goodness of fit.
Results During the study period 152 523 cases of isolated CABG with 3247 deaths in hospital (2.1%), 12 781 repairs of ruptured abdominal aortic aneurysm (5987 deaths, 46.8%), 31 705 repairs of unruptured abdominal aortic aneurysm (3246 deaths, 10.2%), and 144 370 colorectal resections for cancer (10 424 deaths, 7.2%) were recorded. The power of the complex predictive model was comparable with that of models based on clinical datasets with ROC curve scores of 0.77 (v 0.78 from clinical database) for isolated CABG, 0.66 (v 0.65) and 0.74 (v 0.70) for repairs of ruptured and unruptured abdominal aortic aneurysm, respectively, and 0.80 (v 0.78) for colorectal excision for cancer. Calibration plots generally showed good agreement between observed and predicted mortality.
Conclusions Routinely collected administrative data can be used to predict risk with similar discrimination to clinical databases. The creative use of such data to adjust for case mix would be useful for monitoring healthcare performance and could usefully complement clinical databases. Further work on other procedures and diagnoses could result in a suite of models for performance adjusted for case mix for a range of specialties and procedures.
We examined mortality for three index procedures (coronary artery bypass graft, abdominal aortic aneurysm repair, and colectomy for bowel cancer) used in three large clinical datasets (the national adult cardiac surgical database, the national vascular database, and a colorectal cancer database collected by the Association of Coloproctology of Great Britain and Ireland). We compared risk adjustment models for mortality, based on administrative data, with published models based on data from the clinical databases and assessed the ability of each model to predict death.
Background
The Society of Cardiothoracic Surgeons has collected voluntary data from its members for over 25 years and individual patient level data since 1996 and in 2003 introduced the national cardiac surgical database (NCSD). Some 40 units contribute to the database, which contains information on over 210 000 individual records. The central cardiac audit database (CCAD) is now used for all cardiac procedures and will incorporate the national cardiac surgical database. The society has published outcomes using several different risk prediction scores including that of Parsonnet et al,4 the EuroSCORE,5 and scores from both simple and complex models.6 The score accepted by most UK clinicians is the EuroSCORE, which is based on age, sex, and factors related to the patient (such as the presence of chronic pulmonary disease, cardiac factors such as the presence of unstable angina, and other factors related to the operation such as whether or not the admission was an emergency). Adult cardiac surgery was one of the key performance indicators for the Healthcare Commission.7
The Vascular Surgical Society of Great Britain and Ireland (VSSGBI) runs the national vascular database (NVD), which collects data voluntarily from surgeons on three procedures: repair of abdominal aortic aneurysm, carotid endartectomy, and infra-inguinal bypass. At the time of the 2004 report, 259 surgeons in 99 hospitals were contributing data and there were 12 389 records on the database. Information collected includes details of the operation performed, the surgical and anaesthetic staff involved, the patient's history and risk factors, biochemical and haematological parameters, and 30 day postoperative morbidity and mortality.8
The Association of Coloproctology of Great Britain and Ireland (ACPGBI) bowel cancer audit collects clinical data on patients with a diagnosis of bowel cancer, recorded either by consultant surgeons or dedicated audit staff. The database for April 2001 to March 2002 contained information from 93 healthcare trusts or hospitals with details of 10 613 cases of bowel cancer. Data from this audit have been used to create a model for predicting outcomes from colorectal cancer surgery.9 Models for predicting mortality include age, sex, the American Society of Anaesthesiology grade,10 Dukes's stage, urgency of the operation, and cancer excision.
Data on hospital activity have been collected since 1949 from all NHS hospitals in the UK.11 Hospital episode statistics (HES) were introduced in 1986 and measure all hospital inpatient and day surgery activity for England. The basic unit of activity is the finished consultant episode, covering the period a patient is under the care of one consultant. Every NHS hospital in England must submit data items of HES electronically for each episode in every patient's stay in that hospital. The data items are entered from the patient's notes onto the hospital's patient administration systems by trained clinical coders. The items include date of birth, sex, home postcode, and clinical data such as primary and secondary diagnoses and dates and details of any operations performed within the patient's stay. Diagnoses are coded with ICD-10 (international statistical classification of diseases, tenth revision); procedures use the UK Office of Population Censuses and Surveys classification (OPCS4). Since 1991, HES has been used for contracting in the internal market and now contain some fourteen million records per financial year.
HES data are often regarded as unreliable by clinicians because of considerable problems in the early years after their inception in 1986. McKee et al summed up the poor reputation of routine data in 1994: "Many clinicians have concluded that, despite a massive investment in technology, routinely collected data still fail . . . and that separate systems are still required."12 Data quality has since improved considerably,13 14 and, if suitable predictive models could be developed using this routinely collected information source, they would be a valuable tool for generating measures of performance adjusted for case mix.
Operations were classified as elective (admission method (ADMIMETH) 11 to 13) or non-elective (all other ADMIMETH values) as HES does not have an "urgent" category, unlike US admissions data or those from the Society of Cardiothoracic Surgeons. Age was divided into five year bands to
85, but with those aged <45 combined. We used secondary diagnosis fields to create comorbidity variables used to make up the Charlson index.15 Further factors considered specific to each index procedure group were also considered (tables 1 and 2).![]()
The two variables we used that were not adjusted for in the models from the clinical databases were financial year and socioeconomic deprivation. Our measure of deprivation was the index of multiple deprivation for 2004 at super output area, linked through the patient's postcode.
|
|
We compared these HES based models with the best published predictive risk model based on data from the clinical databases. For CABG and abdominal aortic aneurysms we used the most recent society reports available.6 8 For colorectal resection we used the published model in the report on risk adjusted outcomes from the Association of Coloproctology of Great Britain and Ireland.9 16 We compared models using receiver operating characteristic (ROC) curve scores (c statistics). The c statistic is the probability of assigning a greater risk of death to a randomly selected patient who died compared with a randomly selected patient who survived. A value of 0.5 suggests that the model is no better than random chance in predicting death. A value of 1.0 suggests perfect discrimination. In general, values less than 0.7 are considered to show poor discrimination, values of 0.7-0.8 can be described as reasonable, and values above 0.8 suggest good discrimination. The models were calibrated by plotting observed versus predicted numbers of deaths by tenth based on risk. A model that closely fits the observed outcome is desirable, and this can be tested using a
2 type statistic developed by Hosmer and Lemeshow measuring goodness of fit.17 This test compares the number of observed cases with the number of predicted cases for each tenth of risk. As the performance of this test depends on sample size, we also inspected the proportion of residuals whose absolute values were greater than 1.96 (5% are expected to be greater than this value). We also checked for influential data points via their Cook's statistic, which have values greater than 1.18
Tables 1 and 2 show the odds ratios for all the variables for each index procedure.![]()
As expected, patient's age was a strong predictor of mortality but many of the other variables in HES were also significant predictors of mortality (for example, deprivation and comorbidity). Models derived from the training and validation datasets gave similar odds ratios and c statistics. We also trained the models on operations from 1996-7 to 2001-2, testing them on 2002-3 to 2003-4 so that the latter two years represent a "future" dataset to the training set. The c statistics differed by at most 0.02 (with the test set having the higher values for each procedure). Figure 1
shows the ROC c statistics for the three HES based models and published models based on clinical databases. For repairs of abdominal aortic aneurysm and colorectal excision for cancer, the model based on HES had better discrimination than that based on the clinical database. For isolated CABG, the c statistic was similar (0.768 in HES v 0.783 from the national cardiac surgical database).
|
|
|
Discrimination
HES data lack many clinical variables and have been criticised for being inadequate for monitoring performance, but for the index procedures examined in our study, the ROC curves were comparable with those from clinical datasets. Other than lacking clinical variables, the HES models differed in several ways from the clinical datasets: they included the year, area level fifth of deprivation, narrower age bands, and information derived from previous emergency admissions. The degree to which the non-HES models would improve with the use of five year age bands is unknown. We could not apply our HES age groups to the clinical datasets but the clinical models are validated and considered by the relevant surgical bodies to be the best currently available. A US study developed a model based on an administrative dataset (Veterans Affairs patient treatment file) for mortality after cardiac bypass surgery with a c statistic of 0.70 compared with a value of 0.76 from a clinical dataset model (clinical improvement in cardiac surgery programme).19 In a similar study looking at predicting mortality after non-cardiac surgery, the performance of the model ranged from good to fair (0.83 for orthopaedic surgery to 0.65 for thoracic surgery).20
Simplified models of risk prediction might be as effective in predicting outcome as some complex models currently in use.21 22 The authors of the US study derived their own clinical groups to adjust for comorbidity after excluding conditions that might have arisen as a complication after surgery, and recognised that in so doing they may also have excluded comorbid diseases that were important for some patients.19 The Charlson index, which we used, also tries to exclude potential complications.15 For example, it excludes acute renal failure (ICD10 N17) and includes chronic renal failure (N18); however, it also includes unspecified renal failure (N18), which in practice will include some acute cases, of which some will be complications. We also fitted the components of the Charlson index as dummy variables instead of one continuous variable and obtained the same c statistic. Exclusion of the renal disease variable from our complex model reduced the c statistic for CABG marginally from 0.76 to 0.75.
Goodness of fit
When we used the method developed by Hosmer and Lemeshow17 the goodness of fit of at least one of the complex models seemed to be poor. For small samples, the test is known to have poor power to detect badly fitting models and the resulting P value may differ between software packages.23 For large samples, which we clearly have for national data, even small (clinically unimportant) differences between observed and predicted numbers will seem significant. The calibration plots show good agreement between observed and predicted numbers of deaths and examinations of the residuals suggests that the small P value from the Hosmer-Lemeshow statistic is because of the large sample size. A better method for testing the goodness of fit in such cases might be to examine the residuals and check for influential points. With these criteria, all the complex models exhibit good fit.
Data quality
Concerns remain about the quality of HES data.13 The overall percentage of admissions with missing or invalid data on age, sex, admission method, or dates of admission or discharge was 2.4% in 2003. For the remaining admissions, 47.9% in 1996 and 41.6% in 2003 had no secondary diagnosis recorded (41.9% and 37.1%, respectively, if day cases are excluded). In contrast to some of the clinical databases, if no information on comorbidity is recorded, we cannot tell whether there is no comorbidity present or if comorbidity has not been recorded. Despite these deficiencies, our predictive models are still good. In the most recent report of the Society of Cardiothoracic Surgeons, 30% of records had missing EuroSCORE variables.6 Within the Association of Coloproctology database, 39% of patients had missing data for the risk factors included in their final model.9 A comparison of numbers of vascular procedures recorded within HES and the national vascular database found four times as many cases recorded within HES.24
A comparison of analyses based on HES and the Society of Cardiothoracic Surgeons' own data concluded that statistical correlation was good, although counts of operations were consistently lower within HES.6 This was probably because of our stricter definition of what constitutes an isolated CABG in HES. For complex specialist procedures, the OPCS4 coding system may not be suitable for monitoring outcomes,25 but a revised version (v4.3) of the system is now available that should improve the recording of newer types of procedures. With the introduction of a system based on reimbursement for providers of health care for each individual case treated ("payment by results"26), there are financial incentives to record diagnoses more thoroughly,27 28 which may help to improve completeness and accuracy in data abstraction and coding within the NHS even further.
Like the clinical databases in this study, HES does not capture deaths out of hospital, which will reduce mortality in hospital in trusts that discharge patients early. We were able to capture most deaths occurring after transfers to other NHS hospitals and so were missing only deaths after discharge home or to residential homes. National mortality data are now linked to HES, which will allow longer term outcomes to be monitored.
We compared the performance of different models on different databases and it is important to remember that the performance of any model is also a reflection of the quality of the database and the type of patients it covers. HES and the other databases are not strictly comparable because of the high proportion of missing data in the Association of Coloproctology database and because the national vascular database and the Society of Cardiothoracic Surgeons' database include some hospitals outside England, unlike HES. If our comparisons of model prediction do not strictly compare like with like, they are still important because they reflect reality of the two sets of databases in their present form. The question we have asked is: which one is currently the best predictor of death?
Implications for practice
Clinical databases are expensive to compile and maintain. An exercise to look at the utility of electronic health data to assess new health technologies estimated costs per record ranging from around £10 (UK cardiac surgical register) to £60 (Scottish hip fracture audit) compared with £1 per record for HES.29 Despite these costs, mortality adjusted for case mix by unit or surgeon is still not in the public domain from any of the three databases covered in our report, with the recent exception of unit level mortality adjusted for case mix for heart surgery published by the Healthcare Commission.30
We selected our three index procedures a priori because they were common and because the models for risk prediction derived from clinical databases were published and easily accessible. Although work needs to be carried out on other procedures and diagnoses, we have shown the potential utility of administrative data for performance monitoring with adequate adjustment for case mix. There may, of course, be other clinical specialties where it is not possible to generate comparable risk models from routinely collected data (see www.icnarc.org/). Clinical databases also exist for reasons other than performance monitoring, including audit, case finding, and research. Our findings suggest that for monitoring outcomes, administrative databases may be as good as clinical databases. Administrative databases also have the advantage that they are available for the entire NHS and do not depend on voluntary participation by individual clinicians and providers. Hence, they can be used to generate performance measures on all relevant provider units, adjusted for case mix and other relevant variables. These adjusted measures of performance are likely to be fairer and more accurate measures of the performance of clinicians and providers than the cruder measures generally available now. Furthermore, as the content of administrative databases in different countries is often broadly similar, methods of using these databases to generate outcome measures may be applicable in healthcare systems in many developed countries.
|
Funding: Dr Foster Intelligence.
Competing interests: The Dr Foster Unit at Imperial College is funded by a grant from Dr Foster Intelligence (an independent health service research organisation).
Ethical approval: We have approval under Section 60 granted by the Patient Information Advisory Group (PIAG) to hold patient identifiable data and analyse them for research purposes. We also have approval from St Mary's local research ethics committee.
![]()
CiteULike
Complore
Connotea
Del.icio.us
Digg
Reddit
StumbleUpon
Technorati What's this?
Read all Rapid Responses