Development and validation of QMortality risk prediction algorithm to estimate short term risk of death and assess frailty: cohort studyBMJ 2017; 358 doi: https://doi.org/10.1136/bmj.j4208 (Published 20 September 2017) Cite this as: BMJ 2017;358:j4208
- Julia Hippisley-Cox, professor of clinical epidemiology and general practice,
- Carol Coupland, professor of medical statistics in primary care
- Correspondence to: J Hippisley-Cox
- Accepted 7 September 2017
Objectives To derive and validate a risk prediction equation to estimate the short term risk of death, and to develop a classification method for frailty based on risk of death and risk of unplanned hospital admission.
Design Prospective open cohort study.
Participants Routinely collected data from 1436 general practices contributing data to QResearch in England between 2012 and 2016. 1079 practices were used to develop the scores and a separate set of 357 practices to validate the scores. 1.47 million patients aged 65-100 years were in the derivation cohort and 0.50 million patients in the validation cohort.
Methods Cox proportional hazards models in the derivation cohort were used to derive separate risk equations in men and women for evaluation of the risk of death at one year. Risk factors considered were age, sex, ethnicity, deprivation, smoking status, alcohol intake, body mass index, medical conditions, specific drugs, social factors, and results of recent investigations. Measures of calibration and discrimination were determined in the validation cohort for men and women separately and for each age and ethnic group. The new mortality equation was used in conjunction with the existing QAdmissions equation (which predicts risk of unplanned hospital admission) to classify patients into frailty groups.
Main outcome measure The primary outcome was all cause mortality.
Results During follow-up 180 132 deaths were identified in the derivation cohort arising from 4.39 million person years of observation. The final model included terms for age, body mass index, Townsend score, ethnic group, smoking status, alcohol intake, unplanned hospital admissions in the past 12 months, atrial fibrillation, antipsychotics, cancer, asthma or chronic obstructive pulmonary disease, living in a care home, congestive heart failure, corticosteroids, cardiovascular disease, dementia, epilepsy, learning disability, leg ulcer, chronic liver disease or pancreatitis, Parkinson’s disease, poor mobility, rheumatoid arthritis, chronic kidney disease, type 1 diabetes, type 2 diabetes, venous thromboembolism, anaemia, abnormal liver function test result, high platelet count, visited doctor in the past year with either appetite loss, unexpected weight loss, or breathlessness. The model had good calibration and high levels of explained variation and discrimination. In women, the equation explained 55.6% of the variation in time to death (R2), and had very good discrimination—the D statistic was 2.29, and Harrell’s C statistic value was 0.85. The corresponding values for men were 53.1%, 2.18, and 0.84. By combining predicted risks of mortality and unplanned hospital admissions, 2.7% of patients (n=13 665) were classified as severely frail, 9.4% (n=46 770) as moderately frail, 43.1% (n=215 253) as mildly frail, and 44.8% (n=223 790) as fit.
Conclusions We have developed new equations to predict the short term risk of death in men and women aged 65 or more, taking account of demographic, social, and clinical variables. The equations had good performance on a separate validation cohort. The QMortality equations can be used in conjunction with the QAdmissions equations, to classify patients into four frailty groups (known as QFrailty categories) to enable patients to be identified for further assessment or interventions.
NHS England (the commissioning body for the English National Health Service) recently announced that from July 2017 all general practices in England will be contractually obliged to identify patients with moderate and severe frailty as part of the new General Medical Service contract. This is particularly challenging because frailty is a relatively new concept that does not have an agreed definition. Current approaches to defining frailty involve identifying patients with a collection of diagnoses, symptoms, and social factors.12 These factors may be combined into a frailty score. This score is then used to identify patients at risk of important or preventable outcomes such as unplanned hospital admissions or death in the near future.
Although recent guidance from the National Institute for Health Care and Excellence on multiple morbidities3 has recommended tools to predict risk of unplanned hospital admissions,45 NICE was unable to identify any equations to reliably predict all cause mortality. NICE identified 41 studies that validated an equation to predict all cause mortality, all of which had major limitations. For example, some equations had been developed for purposes other than to predict all cause mortality.2 Other limitations were omitting key determinants of death, such as age and sex; giving equal weighting to all component factors within an equation (for example, wearing glasses could have equal weighting to ischaemic heart disease); using small unrepresentative samples; inappropriate handling of missing data; and poor reporting and poor performance of the tool in predicting death. The NICE guideline therefore recommended that research should be undertaken to develop new robust equations to identify patients with reduced life expectancy so that relevant assessments and interventions can be targeted appropriately.
We aimed to address the NICE research recommendation by developing a new equation to predict risk of death over a one year period among people aged 65 and older using a large validated medical research database of representative patients in primary care. Our secondary objective was to develop a definition of frailty directly based on risk of outcomes. Instead of creating a frailty index in the hope that it would predict unplanned admissions and all cause mortality, we decided to work the other way round. Starting with principled estimators of unplanned admissions and all cause mortality, we decided to develop a new classification of frailty, known as QFrailty. This would group people into four categories—severely frail, moderately frail, mildly frail, or fit—based on their absolute risks of an unplanned hospital admission or death within a year. This could then provide an outcomes based classification to improve on the electronic frailty index recommended by NHS England.2
Study design and data source
We undertook a cohort study in a large population of primary care patients in England who were registered with practices contributing to the QResearch database (version 42). All practices had to have used EMIS computer system for at least a year. We randomly allocated three quarters of the practices to the derivation dataset and the remainder to a validation dataset. We identified an open cohort of patients aged 65-100 years registered with practices between 1 January 2012 and 30 September 2016. Exclusions were patients who did not have a valid National Health Service number and those who did not have a postcode related Townsend score (eg, patients had moved to newly built houses with new postcodes not yet linked to deprivation data or patients were homeless or did not have a permanent residence). We determined an entry date to the cohort for each patient, which was the latest of his or her 65th birthday, date of registration with the practice plus one year, date on which the practice computer system was installed plus one year, and beginning of the study period (1 January 2012). Patients were censored at the earliest date of death, de-registration with the practice, last upload of computerised data, or the study end date (30 September 2016).
Our primary outcome was all cause mortality, using the date of death recorded on the QResearch database. We chose to evaluate risk of death at one year for comparability with other studies and to meet the requirements of the research recommendation in the NICE guidelines. The QResearch database is linked at individual patient level to the hospital admissions data and to mortality records obtained from the Office for National Statistics. The records are linked using a project specific pseudonymised NHS number. The recording of NHS numbers is valid and complete for 99.8% of QResearch patients, 99.9% for ONS mortality records, and 98% for hospital admissions records.46
We examined several predictor variables (see box 1) based on established risk factors already included in the QAdmissions equation4 (which predicts risk of unplanned hospital admissions) and variables highlighted in the related literature.2378910
Box 1 Predictor variables
Age (continuous variable)
Geographical region in England (10 regions)
Townsend deprivation score. This is an area level continuous score based on the patients’ postcode.11 Originally developed by Townsend,11 the score includes unemployment (as a percentage of those aged 16 or more who are economically active), non-car ownership (as a percentage of all households), non-home ownership (as a percentage of all households), and household overcrowding. These variables are measured for a given area of approximately 120 households, through the 2011 census, and combined to give a Townsend score for that area. A higher Townsend score implies a greater level of deprivation
Ethnic group (nine categories)
Alcohol intake (<1 unit/day, 1-2 units/day, 3-6 units/day, 7-9 units/day, ≥9 units/day) (see www.nhs.uk/Livewell/alcohol/Pages/alcohol-units.aspx)
Smoking status (non-smoker; former smoker; light, moderate, or heavy smoker)
Body mass index (continuous variable)
Unplanned admissions in past 12 months (0, 1, 2, or ≥3) as recorded on the linked hospital data
Poor mobility (poor mobility, housebound, confined to chair, bedridden, requires home visit, receives mobility allowance)
Lives in a care home (nursing home or residential care)
Congestive heart failure
Cardiovascular disease (myocardial infarction, angina, stroke, or transient ischaemic attack)
Valvular heart disease
Peripheral vascular disease
Treated hypertension (hypertension and current antihypertensive treatment)
Chronic kidney disease (stages 4 or 5)
Diabetes (none, type 1, type 2)
Chronic liver disease or pancreatitis
Malabsorption (including Crohn’s disease, ulcerative colitis, coeliac disease, steatorrhea, blind loop syndrome)
Peptic ulcer (gastric or duodenal ulcer, simple or complicated ulcer)
Asthma or chronic obstructive airways disease
Fragility fracture (hip, spine, shoulder, or wrist fracture)
Parkinson’s disease or syndrome
Bipolar disorder or schizophrenia
Depression in past 12 months
Anaemia (haemoglobin <110 g/L)
Abnormal liver function test result (bilirubin, alanine aminotransferase, or γ glutamyltransferase more than three times the upper limit of normal)
High platelet count (>480×109/L)
Leg ulcer (leg, shin, ankle or foot ulcer, ischaemic neuropathic, arterial, or venous ulcer)
Blindness (registered blind or partially sighted or visual impairment)
Appetite loss in past 12 months
Weight loss in past 12 months (unexplained or abnormal weight loss)
Urinary incontinence in past 12 months
Nocturia in past 12 months
Urinary retention in past 12 months (acute or chronic retention)
Syncope (vasovagal symptom, faint, collapse, “funny turn,” drop attack) in past 12 months
Dizziness in past 12 months
Insomnia in past 12 months
Dyspnoea in past 12 months (breathless at rest or on exertion, paroxysmal nocturnal dyspnoea)
Hearing impairment or deafness in past 12 months
Loneliness in past 12 months
Use of anticoagulants (≥2 prescriptions in past six months)
Use of antidepressants (≥2 prescriptions in past six months)
Use of antipsychotics (≥2 prescriptions in past six months)
Use of corticosteroids (≥2 prescriptions in past six months)
Non-steroidal anti-inflammatory drugs (≥2 prescriptions in past six months)
The number of unplanned hospital admissions in the previous 12 months was derived from information on the linked hospital records. All predictor variables were based on the latest coded information recorded in the general practice record before entry to the cohort.
Derivation and validation of the models
We developed and validated the risk prediction equations using established methods.1012131415 To replace missing values for body mass index, smoking status, and alcohol intake we used multiple imputation with chained equations and used these values in our main analyses.16171819 We carried out five imputations as this has relatively high efficiency20 and was a pragmatic approach accounting for the size of the datasets and capacity of the available servers and software. We included all predictor variables in the imputation model, along with age interaction terms, the Nelson-Aalen estimator of the baseline cumulative hazard, and the outcome indicator.
Cox’s proportional hazards models estimated the coefficients for each risk factor in men and women separately. We used Rubin’s rules to combine the results across the imputed datasets.21 We used fractional polynomials22 to model non-linear risk relations with continuous variables (age and body mass index) using data from patients with recorded values to derive the fractional polynomial terms. Initially we fitted full models. We retained variables if they had an adjusted hazard ratio of <0.90 or >1.10 (for binary variables) and were statistically significant at the 0.01 level. We examined interactions between predictor variables and age at study entry and included statistically significant interactions in the final models.
For each variable from the final model we used the regression coefficients as weights, which we combined with the baseline survivor function at one year to derive risk equations.23 We estimated the baseline survivor function based on zero values of centred continuous variables, with all binary predictor values set to zero.
Validation of the models
In the validation cohort we used multiple imputation to replace missing values for body mass index, smoking status, and alcohol intake. Five imputations were done. We applied the risk equations for men and women obtained from the derivation cohort to the validation cohort and calculated measures of discrimination. As in previous studies,24 we calculated R2 values (explained variation where higher values indicate a greater proportion of variation in survival time explained by the model25), D statistic26 (a measure of discrimination that quantifies the separation in survival between patients with different levels of predicted risk, where higher values indicate better discrimination), and Harrell’s C statistic at one year and combined these across datasets using Rubin’s rules. Harrell’s C statistic27 is a measure of discrimination (separation) that quantifies the extent to which those with earlier events have higher risk scores. It is similar to the receiver operating characteristic statistic but takes account of the censored nature of the data. Higher values of Harrell’s C indicate better performance of the model for predicting the relevant outcome. A value of 1 indicates the model has perfect discrimination. A value of 0.5 indicates that the model discrimination is no better than chance. We also evaluated these performance measures in five age groups and nine ethnic groups. We calculated 95% confidence intervals for the performance statistics to allow comparisons with alternative models for the same outcome and across different subgroups.28
We assessed calibration of the mortality score by comparing the mean predicted risks evaluated at one year with the observed risks by 10th of predicted risk. The observed risks were obtained using the Kaplan-Meier estimates evaluated at one year for men and women. We also evaluated performance by calculating Harrell’s C statistics in individual general practices and combined the results using meta-analytical techniques.29
We also applied the latest version of the QAdmissions score to the validation cohort and calculated measures of discrimination for unplanned hospital admissions over one year.
Decision curve analysis
We used decision curve analysis in the validation cohort to evaluate the net benefits of the mortality score.303132 This method assesses the benefits of correctly identifying people who will have an event compared with the harms from a false positive classification (which could, for example, lead to unnecessary distress or interventions). The net benefit of a risk equation at a given risk threshold is given by calculating the difference between the proportion of true positives and the proportion of false positives multiplied by the odds of the risk threshold. We calculated the net benefits across a range of threshold probabilities and compared these with alternative strategies of “intervention in everyone” and “intervention in no one.” In general, the strategy with the highest net benefit at any given risk threshold is considered to have the most clinical value.
Development of frailty categories
Since there is no currently accepted threshold for classifying high risk of death, we examined the distribution of predicted risks and calculated a series of centile values. For each centile threshold, we calculated the sensitivity, specificity, and positive and negative predictive values of death over a one year follow-up period. Using the latest version of the QAdmissions score we also examined the distribution of patients by their risk of unplanned hospital admission over one year.4 We identified unplanned admissions using the hospital episode statistics linked to QResearch as in the original paper.4 We then classified patients into four frailty groups based on a combination of their predicted risk of unplanned admission and their predicted risk of death over the next 12 months such that the proportion of patients in each group was broadly similar to that published elsewhere for the “electronic frailty score” (EFI) based on a similar English population.2 In the internal validation of the EFI score, 3% of the validation cohort were categorised as having severe frailty, 12% as having moderate frailty, 35% as having mild frailty, and 50% as fit. The corresponding values for the EFI external validation cohort were 4%, 16%, 37%, and 43%.
We repeated some analyses, restricting the validation cohort to those with two or more medical conditions who would meet the NICE broad definition of having multiple morbidities. The supplementary tables present the results. To maximise the power and also generalisability of the results we used all the relevant patients on the database. STATA (version 14) was used for all analyses. We adhered to the TRIPOD statement for reporting.33
No patients were involved in setting the research question or the outcome measures, nor were they involved in the design or implementation of the study. Patient representatives from the QResearch advisory board have written the information for patients on the QResearch website about the use of the database for research. They have also advised on dissemination of the results, including the use of lay summaries describing the research and its results.
Overall study population
Derivation cohort—overall, 1436 QResearch practices in England met our inclusion criteria, of which 1079 were randomly assigned to the derivation dataset, with the remainder (n=357) assigned to the validation cohort. We identified 1 471 558 patients in the derivation cohort aged 65-100 years. Of these, we excluded 2550 (0.2%) who did not have a valid NHS number and a further 2410 (0.2%) who did not have a recorded Townsend score, leaving 1 466 598 for the derivation analysis.
Validation cohort—we identified 500 816 patients in the validation cohort aged 65-100 years. Of those, we excluded 505 (0.1%) who did not have a valid NHS number and a further 833 (0.2%) who did not have a recorded Townsend score, leaving 499 478 for the validation analysis.
Table 1⇓ shows the baseline characteristics of men and women in the derivation and validation cohorts. In the derivation cohort, self assigned ethnicity was recorded in 80.3% (n=1 177 596), smoking status in 99.0% (n=1 451 343), alcohol intake in 92.0% (n=1 349 728), and body mass index in 90.2% (n=1 322 929). Overall, 86.8% (n=1 273 310) had complete data for smoking status, alcohol intake, and body mass index. The mean age was 75.3 years, and 12.0% (n=175 915) of patients had one or more unplanned hospital admissions in the past 12 months, 42.8% (n=628 106) had treated hypertension, 21.0% (n=307 499) had cardiovascular disease, 15.1% (n=220 886) had type 2 diabetes, 18.8% (n=276 001) were prescribed antidepressants, and 18.3% (n=268 821) were prescribed non-steroidal anti-inflammatory drugs (NSAIDs). The corresponding results for the validation cohort were similar.
Supplementary table 1 shows the patients’ number of medical conditions. In the derivation cohort, 17.3% (n=253 585) did not have any of the 29 conditions listed, 23.9% (n=350 994) had one condition, and 58.8% (n=862 019) had two or more conditions.
Incidence of death
Table 2⇓ shows the number of patients who died during the study period, the person years of follow-up, and the death rates by age and sex. Overall in the derivation cohort, 180 132 deaths arose from 4.39 million person years of follow-up. In the validation cohort, 61 446 deaths arose from 1.49 million person years of follow-up. In the derivation and validation cohorts 581 702 and 197 834 people, respectively, had five or more years of follow-up.
Table 3⇓ shows the adjusted hazard ratios for the final models for women and men in the derivation cohort. The final model included the variables: fractional polynomial terms for age, fractional polynomial terms for body mass index, Townsend score, ethnic group, smoking status, alcohol intake, unplanned hospital admissions in the past 12 months, atrial fibrillation, antipsychotics, cancer, asthma or chronic obstructive pulmonary disease, living in a care home, congestive heart failure, corticosteroids, cardiovascular disease, dementia, epilepsy, learning disability, leg ulcer, chronic liver disease or pancreatitis, Parkinson’s disease, poor mobility, rheumatoid arthritis, chronic kidney disease, type 1 diabetes, type 2 diabetes, venous thromboembolism, anaemia, abnormal liver function test result, high platelet count, and visits to a general practitioner in the past 12 months with either appetite loss, unexplained weight loss, or dyspnoea (breathlessness). The other variables tested did not meet the criteria for inclusion in the final model.
The graphs in figures 1 and 2⇓ show the adjusted hazard ratios in women and men, respectively, for age interaction terms that were statistically significant (see footnote in table 3⇑). For each of these interactions, hazard ratios for the predictors were higher at younger ages compared with older ages.
Table 4⇓ shows the performance of the QMortality score in the validation cohort for women and men at one year. Overall, the values for the R2, D, and C statistics were higher in women than in men indicating that the score performed better in women than in men. The mortality equation in women explained 55.6% of the variation in time to death (R2), the D statistic was 2.29, and Harrell’s C statistic was 0.85. The corresponding values for men were 53.1%, 2.18, and 0.84. Table 4⇓ also shows results for the latest version of the QAdmissions equation for predicting unplanned admissions (based on 160 217 unplanned admissions in the validation cohort over a one year period). Table 4⇓ shows that the performance of the QMortality score for predicting deaths was better than the performance of the QAdmissions score for predicting unplanned admissions.
Supplementary table 2 shows the results for the mortality scores by age group and ethnic group. Performance tended to be better in the younger age groups but was similar across all ethnic groups.
Figure 3⇓ shows plots of Harrell’s C statistic for each general practice in the validation cohort against the number of deaths in each practice in women and men separately. The summary (average) C statistic for women was 0.854 (95% confidence interval 0.850 to 0.859) from a random effects meta-analysis. The I2 value (ie, the percentage of total variation in C statistic owing to heterogeneity between practices) was 63.2%. The approximate 95% prediction interval for the true C statistic in women in a new practice was 0.80 to 0.91. For men, the summary C statistic was 0.844 (95% confidence interval 0.839 to 0.849). The I2 value was 70.3%. The approximate 95% prediction interval for the true C statistic in men in a new practice was 0.76 to 0.92.
Supplementary table 3 shows the validation statistics for the mortality score among patients with two or more morbidities (as required by the NICE guideline3).
Figure 4⇓ displays the observed risks and mean predicted risks of death across each 10th of the predicted risk score (1 representing the lowest risk and 10 the highest risk). This shows that the equation was well calibrated. Supplementary figure 1a-e shows the calibration within each age group. Supplementary table 5 shows overall calibration by age group and ethnic group and for the top 2%, 10%, and 50% of predicted risk. The results were generally good except for over-prediction in Chinese women and under-prediction in black African women, although numbers of deaths were relatively small in these subgroups.
Decision curve analysis
Figure 5⇓ displays the net benefit curves for the mortality equations at one year in men and women. These show that the prediction equations had higher net benefit than did strategies based on considering either no patients or all patients for intervention for risk thresholds up to around 50%.
Sensitivity and specificity
Table 5⇓ shows the sensitivity, specificity, and positive and negative predictive values for the mortality equation at one year for various thresholds based on patients in the validation cohort.
Table 6⇓ shows that the risk threshold for the top 2% at highest risk of death in the next year was 47.0%, for the top 10% was 20.3%, and for the top 50% was 2.9%. With a risk threshold of 20.3% over one year to identify the 10% of patients with the highest risk of death, the sensitivity for identifying deaths was 37.4%, specificity 97.3%, positive predictive value 46.0%, and negative predictive value 91.4%. Supplementary table 4 shows the results restricted to patients with two or more medical conditions.
The corresponding thresholds for risk of unplanned hospital admission over one year were 60.7% to identify the top 2%, 34.0% for the top 10%, and 10.0% for the top 50% (results not shown).
Classification of frailty
Table 7⇓ shows the characteristics of patients from the validation cohort split into four QFrailty groups that are broadly equivalent to the proportion of patients reported to be in the four categories according to the EFI.2
Group 1 represents severe frailty. This category includes 13 665 patients (ie, 2.74% of 499 478) who are either in the top 2% at highest risk of death in the next year or in the top 2% at highest risk of unplanned hospital admission.
Group 2 represents moderate frailty. This category includes 46 770 patients (ie, 9.36% of 499 478) who are either in the top 10% at highest risk of death in the next year or in the top 10% at highest risk of unplanned hospital admission (excluding those in the severe category).
Group 3 represents mild frailty. This category includes 215 253 patients (ie, 43.1% of 499 478) who are either in the top 50% at highest risk of death in the next year or in the top 50% at highest risk of unplanned hospital admission (excluding those in the severe and moderate categories).
Group 4 represents being “fit.” This category includes 223 790 patients (ie, 44.80% of 499 478) not in the above three categories.
For example, for those in the severe frailty category, the mean age is 86.1 years, 98.5% (n=13 460) have multimorbidity, 60.8% (n=8312) have poor mobility, 61.3% (n=8373) have cardiovascular disease, 50.5% (n=6895) have had falls, 46.2% (n=6310) have treated hypertension, 40.0% (n=5461) are taking antidepressants, 35.6% (n=4864) have atrial fibrillation, 34.8% (n=4756) have asthma or chronic obstructive pulmonary disease, 32.8% (n=4484) have dementia, 28.7% (n=3925) have type 2 diabetes, 25.1% (n=3436) have a diagnosis of cancer, 24.9% (n=3399) have anaemia, 19.9% (n=2724) have dyspnoea, 18.5% (n=2526) are taking anticoagulants, and 18.4% (n=2519) have peptic ulcer disease.
Recent NICE guidance on multiple morbidities3 highlighted the need to develop new robust equations to identify patients in primary care with reduced life expectancy so that relevant assessments and interventions can be targeted appropriately. Existing equations to predict risk of death are based on biased samples, are insufficiently powered, fail to handle missing data appropriately, are poorly reported, or have poor performance to the extent that NICE has been unable to make a positive recommendation for any of the 41 models included in the review.3 We therefore developed and validated equations to predict absolute risk of death over the next year in men and women aged 65-100 years. The QMortality equations performed well on a separate validation cohort, with good levels of discrimination and calibration, improving on other equations used to predict all cause mortality.2334 The final model has good face validity as it includes demographic and clinical variables that clinicians would expect to affect mortality risk such as age, body mass index, deprivation, ethnicity, smoking status, alcohol intake, unplanned admissions in the past 12 months; atrial fibrillation, antipsychotics, cancer, asthma or chronic obstructive pulmonary disease, living in a care home, congestive heart failure, corticosteroids, cardiovascular disease, dementia, epilepsy, learning disability, leg ulcer, chronic liver disease or pancreatitis, Parkinson’s disease, poor mobility, rheumatoid arthritis, chronic renal disease, type 1 diabetes, type 2 diabetes, venous thromboembolism, anaemia, abnormal liver function test result, high platelet count, and visits to a doctor in the past year with either appetite loss, unexpected weight loss, or breathlessness.
Although the QMortality equation contains many variables, it is intended to be integrated into general practice computer systems where the extraction of data and risk calculation can be automated. We considered whether to develop a more parsimonious model with fewer predictors for use in other clinical settings but decided it would be preferable to have one model and for the user to select default values on the understanding that there may be a degree of under-estimation or over-estimation of risk depending on the predictor in question.
Potential uses of the frailty classification and mortality index
In this study we have described a specific novel use for mortality estimates, which is to classify patients into four frailty categories. This has been achieved by combining the one year predicted risk of death with the one year predicted risk of unplanned hospital admission to help identify the most severely frail patients for enhanced care packages to meet the immediate requirements of the UK General Medical Services contract. The most severe frailty category will identify patients with particularly high levels of morbidity who are at highest risk of death or unplanned hospital admission. This group of patients is likely to reflect elderly patients who are the most severely frail and who can be identified for focused assessment and intervention as part of the new General Medical Services contract in England. This includes falls assessment and drug review. The QMortality score could be used in conjunction with the QAdmissions score to allocate patients to one of four QFrailty categories. It could also be used recurrently to build and maintain practice based lists of patients with different levels of frailty or mortality risks over time. This could be done as an automated procedure using electronic health records.
The models can also be used in a face-to-face consultation between the patient and clinician with the intention of sharing the information with the patient to assess management options. The decision curve analysis shows there is a higher net benefit for the prediction models than strategies based on considering either no patients or all patients for intervention for risk thresholds up to around 50%. Mortality estimates including cancer stage and grade are already used to help patients with cancer to weigh up the risks and benefits of surgery, chemotherapy, and radiotherapy.35 Patients with a high risk of death in the near future may choose to decline aggressive treatments or defer preventive treatments, screening interventions, or interventions for asymptomatic conditions.36 Mortality estimates could also be used to help guide the introduction and addition of palliative care to help plan end of life care.37 For example, six month mortality estimates in the United States are used to trigger Advance Care Planning and also to determine access to hospice services under the Medicare scheme.38 They are also used to improve self awareness of health status; to measure, monitor, and compare outcomes between different healthcare providers36; and are used by governments to decrease the burden of certain risk factors at a population level.34
We see an important distinction between factors that are included in a risk equation to ensure that the risk estimates are as accurate as possible and how the risk equation is then used in guidelines and clinical practice to ensure ethical, effective, and equitable access to services for everyone. The primary purpose of our paper is to report on the development and validation of new risk equations rather than to produce national policy or clinical guidance, although we recognise the results may be used by policy makers and clinicians. All clinical decisions about the beneficial and safe use of these risk equations necessarily remain the responsibility of the attending clinician. However, there are ethical issues to consider about how the tools might be used. We have analysed this within the “four ethical principles” framework, which is widely used in medical decision making. The four principles are autonomy, beneficence, justice, and non-maleficence.39 The new risk equations, when implemented in clinical software, are designed to provide more accurate information for patients and clinicians on which to base decisions, thereby promoting shared decision making and patient autonomy. They are intended to result in clinical benefit by identifying where changes in management are likely to benefit patients, thereby promoting the principle of beneficence. Justice can be achieved by ensuring that the use of the risk equations results in fair and equitable access to health services that are commensurate with the patients’ level of risk. Lastly, the risk assessment must not be used in a way that causes harm either to the individual patient or to others (for example, by introducing or withdrawing treatments where this is not in the patients’ best interest) thereby supporting the non-maleficence principle. How this applies in clinical practice will naturally depend on many factors, especially the patient’s wishes, the evidence base for any interventions, the clinician’s experience, national priorities, and the available resources. The risk assessment equations therefore supplement clinical decision making, not replace it.
Comparison with the other risk scores
A recent review of 41 mortality risk scores reported in 24 research papers failed to identify any that could be reliably used to predict mortality in a community settitng.3 Of the studies reviewed, the Charleston comorbidity index, which consists of 23 variables, achieved the best C statistic, with a value of 0.77 in the internal validation cohort and 0.80 in the external validation cohort.340 Other studies have used risk scores to predict mortality, such as the John Hopkins Aggregated Diagnostic Groups (ADG) score41 and the Hospital-patient One-year Mortality Risk (HOMR) score.36 The HOMR score consists of 12 patient variables and eight hospital admission factors and was designed to predict one year mortality risk in adults aged 18-100 years admitted to hospital. It includes fractional polynomial terms for continuous variables and interactions between statistically significant predictors. The HOMR score has excellent calibration and discrimination, with a C statistic of 0.92, although this may reflect the much wider age range in the HOMR study. The ADG score consists of 30 variables and has been validated using a community based sample. However, the C statistic was lower (0.81) than the values for the QMortality score (0.84 in men and 0.85 in women), and the ADG equation is not published or freely available.41
The electronic frailty index (EFI) is a simple unweighted count of the number of “deficits” a patient has out of a total of 36, where a deficit is a physical disability or social vulnerability as identified by a consensus panel.2 The EFI has also been used to predict mortality in a UK community based population, although performance (based on standard definitions42) was poor or fair, with a C statistic of 0.66 on an internal validation cohort and 0.76 on an external validation cohort.2 The EFI also had extremely low levels of explained variation in time to death of 0.02-0.04%,2 whereas the QMortality scores explained 53% and 55% of the variation in men and women, respectively. The EFI equation has not been published and it does not appear to include continuous variables such as age. The QMortality and QAdmissions equations include all the factors in the EFI, where these predicted either risk of death or risk of unplanned hospital admission. Unlike the EFI, our equations include further key determinants of death and unplanned admissions, such as age, sex, ethnic group, smoking status, alcohol intake, deprivation, and previous unplanned admissions, and also include major conditions—cancer, epilepsy, serious mental illness, chronic liver disease, inflammatory bowel disease, learning disability, specific drug treatments—which are all relevant to risk of outcomes and for which patients are likely to need ongoing careful assessment. Our multivariable analysis has allowed us to attribute appropriate weights to each factor and incorporate interactions between age and different medical conditions. This means, for example, that a patient who is 65 years old with three medical conditions will have a different absolute risk of death or unplanned hospital admission than a patient with the same conditions but who is aged 95 years.
The methods to derive and validate these models are broadly the same as for a range of other clinical risk prediction tools derived from the QResearch database.78124344 The strengths and limitations of the approach have already been discussed in detail.71443454647 In summary, key strengths include size, duration of follow-up, representativeness, and lack of selection, recall, and respondent bias. UK general practices have good levels of accuracy and completeness in recording clinical diagnoses and prescribed drugs.48 We think our study has good face validity since it has been conducted in the setting where most patients in the UK are assessed, treated, and followed up. Limitations of our study include the lack of formal adjudication of diagnoses, information bias, and potential for bias owing to missing data. Our database has linked hospital admissions data and is therefore likely to have picked up the majority of hospital admissions, thereby minimising ascertainment bias. We focused on two hard outcomes to identify frail patients (unplanned admissions and mortality) rather than admission to a nursing home or decline in function, as both of these are more difficult to measure using electronic health records. Also, for simplicity we grouped all cancers together as a single variable rather than distinguish between different types of cancer and account for grade and stage. This was a pragmatic decision, partly driven by the lack of information in general practice records about grade and stage of cancer and the availability of existing purpose designed tools such as the QCancer prognostic scores.49 QMortality will tend to have under-estimated mortality risk in those with a late stage cancer (for example) and over-estimated it in patients with an early stage cancer. We excluded patients without a deprivation score since this group may represent a more transient population where follow-up could be unreliable or unrepresentative. Their deprivation scores are unlikely to be missing at random so we did not think it would be appropriate to impute them.
We have presented sensitivity and specificity values for death at a range of centile values and combined predicted risks of death and unplanned hospital admissions into frailty categories that can be used to identify patients who are most severely frail based on their risk of clinically important outcomes. The present validation has been done on a separate set of practices and individuals to those that were used to develop the score, although the practices all use the same general practice clinical computer system (EMIS, used by 55% of UK general practitioners). An independent validation study would be a more stringent test and should be done, but when such independent studies have examined other risk equations46475051 they have shown similar performance compared with the validation in the QResearch database.124345 We have not been able to undertake direct comparisons between the QMortality score and the ADG, EFI, and HOMR scores since these are not publicly available. For transparency, we have published the source code of the QMortality equation on the QAdmissions website (www.qadmissions.org) alongside the QAdmissions equation. The rationale for this is to ensure that those interested in reviewing or using the open source will then be able to find the latest available version as the score continues to be updated. Lastly, our study was not designed to compare the performance of QMortality scores against clinical judgment alone, although we have provided sufficient information to enable other researchers to undertake such a study. Freund et al found that predictive modelling software was more effective at identifying patients at increased risk of hospital admission and death than clinical judgment alone.52 However, clinicians may be more effective at identifying those for whom preventive services may have a better impact.5253
We have developed a new equation to quantify absolute risk of death within the next year in people aged 65 or more, taking account of demographic, social, and clinical variables. The equation provides a valid measure of absolute risk of death in the general population of patients aged 65 or more as shown by the performance in a separate validation cohort. The equation can be used in conjunction with the QAdmissions equation to classify patients into four QFrailty groups to enable their identification for focused assessments and interventions.
What is already known on this topic
Recent NICE guidance on multiple morbidities has highlighted the need to develop new robust equations to identify patients in primary care with reduced life expectancy so that relevant assessments and interventions can be targeted appropriately
Existing equations to predict risk of death are based on biased samples, are insufficiently powered, fail to handle missing data appropriately, are poorly reported, or have poor performance to the extent that NICE has been unable to make a positive recommendation on any tool
What this study adds
A new equation (QMortality) quantified absolute risk of death within the next one year in people aged 65 or more, taking account of demographic, social, and clinical variables
QMortality provides a valid measure of absolute risk of death in the general population of older patients, as shown by its performance in a separate validation cohort
QMortality can be used in conjunction with the QAdmissions equation for unplanned hospital admissions to classify patients into four QFrailty groups to enable identification for focused assessments and interventions
A simple web calculator can be used to implement the QMortality algorithm (http://qmortality.org), which will be publicly available alongside the paper. It also has the open source software for download. A web calculator to implement the combined QMortality and QAdmissions calculator to derive the four frailty categories is available here http://qfrailty.org.
We acknowledge the contribution of EMIS practices who contribute to the QResearch database and EMIS and the University of Nottingham for expertise in establishing, developing, and supporting the QResearch database. The hospital episodes statistics data used in this analysis are re-used by permission from the NHS Digital who retain the copyright. We thank the Office for National Statistics for providing the mortality data. ONS and NHS Digital bear no responsibility for the analysis or interpretation of the data
Contributors: JHC initiated the study; developed the research question; undertook the literature review, data extraction, data manipulation, and primary data analysis; and wrote the first draft of the paper. CC contributed to the refinement of the research question, design, analysis, interpretation, and drafting of the paper. JHC is the guarantor for this study.
Funding: There was no external funding for this study.
Competing interests: Both authors have completed the uniform disclosure form at www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declare: JHC is codirector of QResearch, a not-for-profit organisation, which is a joint partnership between the University of Nottingham and Egton Medical Information Systems (leading commercial supplier of IT for 55% of general practices in the UK). JHC is also a paid director of ClinRisk, which produces open and closed source software to ensure the reliable and updatable implementation of clinical risk equations within clinical computer systems to help improve patient care. CC is a paid consultant statistician for ClinRisk. This work and any views expressed within it are solely those of the authors and not of any affiliated bodies or organisations.
Ethical approval: This study was approved by the East Midlands Derby Research Ethics Committee (reference 03/4/021).
Data sharing: The equations presented in this paper will be released as Open Source Software under the GNU lesser GPL v3. The open source software allows use without charge under the terms of the GNU lesser public license version 3. Closed source software can be licensed at a fee.
Transparency: The lead author (JHC) affirms that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) have been explained.
This is an Open Access article distributed in accordance with the terms of the Creative Commons Attribution (CC BY 4.0) license, which permits others to distribute, remix, adapt and build upon this work, for commercial use, provided the original work is properly cited. See: http://creativecommons.org/licenses/by/4.0/.