- Gary S Collins, senior medical statistician,
- Douglas G Altman, director and professor
- 1Centre for Statistics in Medicine, Wolfson College Annexe, University of Oxford, Oxford OX2 6UD, UK
- Correspondence to: G S Collins
- Accepted 14 May 2012
Objective To evaluate the performance of the QRISK2-2011 score for predicting the 10 year risk of cardiovascular disease in an independent UK cohort of patients from general practice and to compare it with earlier versions of the model and a National Institute for Health and Clinical Excellence version of the Framingham equation.
Design Prospective cohort study to validate a cardiovascular risk score with routinely collected data between June 1994 and June 2008.
Setting 364 practices from the United Kingdom contributing to The Health Improvement Network (THIN) database.
Participants Two million patients aged 30 to 84 years (11.8 million person years) with 93 564 cardiovascular events.
Main outcome measure First diagnosis of cardiovascular disease (myocardial infarction, angina, coronary heart disease, stroke, and transient ischaemic attack) recorded in general practice records.
Results Results from this independent and external validation of QRISK2-2011 indicate good performance data when compared with the NICE version of the Framingham equation. QRISK2-2011 had better ability to identify those at high risk of developing cardiovascular disease than did the NICE Framingham equation. QRISK2-2011 is well calibrated, with reasonable agreement between observed and predicted outcomes, whereas the NICE Framingham equation seems to consistently over-predict risk in men by about 5% and shows poor calibration in women.
Conclusions QRISK2-2011 seems to be a useful model, with good discriminative and calibration properties when compared with the NICE version of the Framingham equation. Furthermore, based on current high risk thresholds, concerns exist on the clinical usefulness of the NICE version of the Framingham equation for identifying women at high risk of developing cardiovascular disease. At current thresholds the NICE version of the Framingham equation has no clinical benefit in either men or women.
Cardiovascular disease is an important health concern, accounting for nearly one third of deaths worldwide in 2008.1 In the United Kingdom almost 200 000 deaths annually are attributed to diseases of the heart and circulatory system, with more than one in three deaths associated with cardiovascular disease (www.heartstats.org). Targeting interventions to reduce the risk of cardiovascular disease in high risk patients are now key components in national policies.2 3 Risk prediction models, including the Framingham risk score,4 the Reynolds risk score,5 6 and QRISK7 8 9 10 11 are tools used to identify people who are at high risk (≥20%)3 of developing (10 year) cardiovascular disease and could benefit from intervention.
In February 2010 the National Institute for Health and Clinical Excellence withdrew its recommendation that the Framingham risk equation be used to predict the risk of someone developing cardiovascular disease over the next 10 years.12 This was made in light of the emergence of a new cardiovascular risk prediction tool called QRISK,11 which was shown to have greater predictive ability than the Framingham risk equation.7 8 9 10 11 Despite increasing evidence to suggest that the Framingham equation is not well suited to the United Kingdom and that QRISK may be more suitably tailored, no firm recommendation exists on what model to use. Instead, NICE no longer recommends any single risk score, leaving healthcare professionals free to choose the model they consider the most appropriate.
Work has continued on QRISK since its introduction in 2007. In 2008, QRISK2 (QRISK2-2008) was developed, which was subsequently updated in 2010 (QRISK2-2010)13 and again in 2011 (QRISK2-2011) (www.qrisk.org) to capture improvements in data quality. QRISK2 (currently QRISK2-2011) now extends the age range to 30 and 84 years (from 35 to 74 years in QRISK2-2008). However, the most noticeable modification in QRISK2-2011 is the change in how smoking status is captured and included in the model. Earlier versions of QRISK2 included smoking status as a binary variable (current smoker versus not a current smoker), whereas QRISK2-2011 now defines smoking status as a five level category variable: non-smoker, former smoker, light smoker (<10 cigarettes/day), moderate smoker (10-19 cigarettes/day), and heavy smoker (≥20 cigarettes/day). Finally, the regression coefficients used to calculate the 10 year risk of developing cardiovascular disease using QRISK2 have been recalculated using more up to date and complete data in the QRESEARCH database (see www.qrisk.org for more details). QRISK2-2011 includes the risk factors of age (years), smoking status (non smoker, former smoker, light smoker, moderate smoker, and heavy smoker), self assigned ethnicity (white or not recorded, Indian, Pakistani, Bangladeshi, other Asian, black African, black Caribbean, Chinese, other including mixed race), systolic blood pressure (mm Hg), ratio of total serum cholesterol to high density lipoprotein, body mass index (kg/m2), family history of coronary heart disease in first degree relative, Townsend deprivation score, treated hypertension, and diagnosis of rheumatoid arthritis, atrial fibrillation, type 2 diabetes, and chronic renal disease. Table 1⇓ provides a description of the predictors in the QRISK and Framingham risk scores. A major feature of QRISK2-2011 is the ability, using the web calculator (www.qrisk.org) and the recently developed iPhone app, to calculate an individual’s risk with one or more missing risk factors. QRISK2-2011 includes a mechanism that replaces missing data, with values from predefined reference values and predictor algorithms. Calculating the risk using the NICE version of the Framingham equation requires all information on all risk factors.
Any revisions and updates to a risk prediction model should be subject to continual evaluation (validation) to show that its usefulness for routine clinical practice has not deteriorated, or indeed to show that its performance has improved owing to refinements to the model.14 15 We describe the results from an independent evaluation assessing the performance of QRISK2-2011 on a large dataset of general practice records in the United Kingdom, comparing its performance with earlier versions of QRISK27 8 9 10 13 and the NICE adjusted version of the Framingham risk prediction model.3 4
Participants were patients registered between 27 June 1994 and 30 June 2008 and recorded on The Health Improvement Network (THIN) database (www.thin-uk.com). We excluded patients if they had a previous diagnosis of cardiovascular disease, were registered for less than 12 months with the general practice, had invalid dates, were aged under 30 years, were aged 85 years or over, had missing Townsend scores (social deprivation), or were prescribed statins at baseline.
The primary outcome measure was the first diagnosis of cardiovascular disease (myocardial infarction, angina, coronary heart disease, stroke, and transient ischaemic stroke) recorded on the general practice’s clinical computer system.
To derive smoking status we combined two risk factors: whether the patient was a non-smoker, former smoker, or current smoker, and number of cigarettes smoked a day, defined as light (<10), moderate (10-19), or heavy (≥20).
We calculated the 10 year estimated risk of cardiovascular for every patient in the THIN cohort using the QRISK2-2011 risk score. Observed 10 year cardiovascular risks were obtained using the method of Kaplan-Meier by 10th of predicted risk and age group. To replace missing values for smoking status and body mass index we carried out multiple imputation using all predictors plus the outcome variable. This involves creating multiple copies of the data and imputing the missing values for each dataset with sensible values randomly selected from their predicted distribution. Ten imputed datasets were generated and we combined the results from analyses on each of the imputed values using Rubin’s rules to produce estimates and confidence intervals that incorporate the uncertainty of imputed values.16
We assessed the predictive performance of the QRISK2-2011 risk score on the THIN cohort by examining measures of calibration and discrimination. Calibration refers to how closely the predicted 10 year cardiovascular risk agrees with the observed 10 year cardiovascular risk. This was assessed for each 10th of predicted risk, ensuring 10 equally sized groups and each five year age band, by calculating the ratio of predicted to observed cardiovascular risk separately for men and for women. Calibration of the risk score predictions was assessed by plotting observed proportions versus predicted probabilities and by calculating the calibration slope.
Discrimination is the ability of the risk score to differentiate between patients who do and do not experience an event during the study period. This measure is quantified by calculating the area under the receiver operating characteristic curve statistic; a value of 0.5 represents chance and 1 represents perfect discrimination.17 We also calculated the D statistic and R2 statistic, which are measures of discrimination and explained variation, respectively, and are tailored towards censored survival data.18 19 Higher values for the D statistic indicate greater discrimination, where an increase of 0.1 over other risk scores is a good indicator of improved prognostic separation.19
We used decision curve analysis (accounting for censored observations) to describe and compare the clinical effects of QRISK2-2011 and the NICE Framingham equation.20 21 22 A model is considered to have clinical value if it has the highest net benefit across the range of thresholds for which an individual would be designated at high risk. Briefly, the net benefit of a model is the difference between the proportion of true positives and the proportion of false positives weighted by the odds of the selected threshold for high risk designation. At any given threshold, the model with the higher net benefit is the preferred model.
Between 27 June 1994 and 30 June 2008, 2 084 445 eligible patients, aged between 30 and 84 years, from 364 general practices in the United Kingdom were registered in the THIN database. These patients contributed 11 862 381 person years of observation, during which 93 564 incident cases of cardiovascular disease occurred. The median follow-up was 5.75 years (interquartile range 2.48-8.49) and 292 928 patients (14.1%) were followed up for 10 years or more. Table 2⇓ details the characteristics of the eligible patients.
In patients aged between 30 and 85 years the 10 year observed risk of cardiovascular in women (42 224 incident cases of cardiovascular disease) was 6.57% (95% confidence interval 6.50% to 6.64%) and in men (51 340 incident cases of cardiovascular disease) was 8.66% (8.58% to 8.75%).
Complete data on smoking status, number of cigarettes smoked daily, systolic blood pressure, total serum cholesterol to high density lipoprotein ratio, and body mass index were available for 19.6% of women (n=208 570) and 19.0% of men (n=193 825). Most patients (n=1 221 873; 58.6%) had no or only one missing risk factor (table 3⇓). Considerably more data were missing for total serum cholesterol to high density lipoprotein ratio (77.9% for women and 77.7% for men) than for remaining risk factors. For other risk factors, 20.6% of women and 29.5% of men had missing data on body mass index, 8.0% and 18.1% on systolic blood pressure, 7.3% and 14.3% on smoking status, and 6.2% and 11.5% on number of cigarettes smoked daily.
The mean absolute difference between QRISK2-2011 and QRISK2-2010 was −0.16% for women and −0.38% for men. About 95.6% of women and 98.3% of men had QRISK2-2011 scores within 2% and 3% of the QRISK2-2010, respectively. The mean absolute difference between QRISK2-2011 and the original QRISK2-2008 predicted risks was −0.40% and −0.73% for women and men, respectively, and about 93% and 97% of all women and men will have QRISK2-2011 scores within 2% and 3% of the QRISK2-2008 scores, respectively.
Discrimination and calibration
Figure 1⇓ shows the calibration plots for the three versions of QRISK2 and the NICE version of the Framingham equation. The current version of QRISK2 and its predecessors show much better agreement between the observed risk and the predicted risk grouped by 10th of risk than does the NICE Framingham equation. All three versions of the QRISK2 prediction models show good calibration in all 10ths of risk, with the exception of the final 10th in both men and women (calibration slope, range 0.92-0.95). Similarly, Figure 2⇓ shows the agreement between observed risk and predicted risk by age group for each of the QRISK2 prediction models and the NICE Framingham equation. All the QRISK2 prediction models show good agreement across the age groups, with a small divergence observed towards the latter age ranges (75 to 85 years). The NICE Framingham equation is, however, clearly miscalibrated, most noticeably for men, with a near constant over-prediction of about 5% across all age ranges (35-74 years). The observed risk across age groups in women shows a non-linear trend increasing with age, while the corresponding predicted risks exhibit a linear trend suggesting that age is not being adequately captured in this cohort of UK women.
Table 4⇓ presents performance data for the QRISK prediction models and the NICE Framingham equation. The R2 statistic (percentage of explained variation) is similar for QRISK2-2011 and QRISK2-2010 in men and women aged 30 to 84 years, indicating no change in performance of the newer QRISK2 model. Values for R2, restricted to those aged 35 to 74 years to enable comparison with the NICE Framingham equation, are about 4% to 5% higher for the QRISK2 models. The D discrimination statistic, where a higher score denotes better discrimination is higher (between 0.14 and 0.19 higher) for the QRISK2 models compared with the NICE Framingham equation, indicating improved prognostic separation. Finally, the area under the receiver operating characteristic curve is about 0.02 higher for QRISK2 in both men and women (restricted to those aged 35 to 74 years), whereas little difference is observed between QRISK2-2011 and QRISK2-2010 in those aged 30 to 84 years.
Decision curve analysis
Table 5⇓ shows how many of 1000 people would be identified as being at high risk (based on thresholds of 10%, 15%, and 20%) using either QRISK2-2011 or the NICE Framingham equation, and how many of these go on to experience a cardiovascular event compared with a strategy where all individuals are deemed at high risk. For women aged between 35 and 74 years there seems to be little difference between the two models, with both identifying similar numbers of women who experience a cardiovascular event. However, many more women would be incorrectly flagged as being at increased risk using the NICE Framingham equation. Table 5 also substantiates our earlier findings that in men aged between 35 and 74 years the NICE Framingham equation is currently over-predicting their risk of developing cardiovascular disease by about 5%. This table shows the similarity in the numbers of men who are identified at high risk and those who go on to experience a cardiovascular event if the NICE Framingham at 20% is compared with QRISK2-2011 at 15% or the NICE Framingham equation at 15% is compared with QRISK2-2011 at 10%.
Figure 3⇓ displays the net benefit curves for QRISK2-2011, QRISK2-2008, and the NICE Framingham equation for people aged between 35 and 74 years. At the traditional threshold of 20% used to designate an individual at high risk of developing cardiovascular disease, the net benefit of QRISK2-2011 for men is that the model identified five more cases per 1000 without increasing the number treated unnecessarily when compared with the NICE Framingham equation. For women the net benefit of using QRISK2-2011 at a 20% threshold identified two more cases per 1000 compared with not using any model (or the NICE Framingham equation). There seems to be no net benefit in using the 20% threshold for the NICE Framingham equation for identifying women who are at an increased risk of developing cardiovascular disease over the next 10 years. Both QRISK2 models perform similarly and clearly show greater net benefit across a range of thresholds compared with the NICE Framingham equation.
We carried out an independent evaluation of the performance of QRISK2-2011 on a large cohort of general practice patients using The Health Improvement Network (THIN) database, comprising two million patients contributing 11 862 252 person years of observation. The performance data presented in this article provide strong evidence for use of the updated QRISK2-2011 over the NICE Framingham equation. The performance of QRISK2-2011 is noticeably more impressive than the NICE version of the Framingham equation in discrimination, calibration, and clinical utility. The performance of QRISK2-2011 and its predecessor QRISK2-2010 are comparable, with no suggestion of deterioration in performance.
The NICE Framingham model for men is clearly miscalibrated, over-predicting the 10 year risk of developing cardiovascular disease by about 5%. For women, the equation seems to be performing poorly, with evidence to suggest that the model is inadequately capturing age. Furthermore, we have shown that the NICE Framingham equation has no clinical utility at the current threshold to identify those who are at an increased risk of developing cardiovascular disease. If the Framingham equation is to continue to be used and doctors advised to treat patients if their predicted risk is 20% or higher, then it is necessary for it to be recalibrated and updated to reflect current characteristics of the UK population. Without recalibration we urge caution in using the Framingham equation to identify high risk patients in the United Kingdom.
Strengths and limitations of the study
A major strength of this study is the size and the representativeness of the cohort, by including a large number of general practices using the EMIS computer system. A limitation of this study is the considerable amounts of missing data for total serum cholesterol to high density lipoprotein ratio both in the derivation and the external validation of QRISK2-2011. Despite the large amounts of missing data, information on all risk factors were available for 400 000 people, and 800 000 people had none or only one missing risk factor. However, we used current recommended approaches with multiple imputation to overcome the biases that occur when omitting people with incomplete data.25 26
We have provided an independent and external validation of QRISK2-2011 on a large cohort of general practice patients in the United Kingdom to predict the 10 year risk of developing cardiovascular disease. We have shown that the updated QRISK2-2011 model has not incurred any deterioration in performance and shows good potential clinical utility in predicting the risk of cardiovascular disease in those aged between 30 and 85 years. Furthermore, we have shown that QRISK2-2011 continues to be a considerable improvement over the NICE modification of the Framingham equation. For the Framingham equation to be even considered along side QRISK2-2011 for predicting the 10 year risk of developing cardiovascular disease, at a minimum we recommend that it should be updated and calibrated to the UK population. Finally, the current high risk threshold of 20% adopted by NICE to designate those at high risk of cardiovascular disease may need to be revisited, and there are likely to be different thresholds for women and men.
What is already known on this topic
Until recently cardiovascular risk prediction in the United Kingdom has been based on a NICE adjusted version of the US Framingham model, which has been shown to over-predict risk
QRISK2 was developed using a large cohort of UK patients and published in 2008 and updated in 2010 and 2011
Updated risk prediction models need to be independently and externally validated to objectively evaluate performance
What this study adds
Independent evaluation of QRISK2-2011 showed an improvement in performance over the NICE Framingham equation in a large external cohort of UK patients
Using current thresholds (20%) to designate those at high risk, the NICE Framingham equation has been shown to have no clinical usefulness in men or women
The NICE Framingham equation has been shown to consistently over-predict the 10 year risk of cardiovascular disease in men by about 5%
Cite this as: BMJ 2012;344:e4181
Contributors: GSC carried out the analysis and prepared the first draft, which was revised according to comments and suggestions from DGA. GSC is guarantor for the paper.
Funding: This research received no specific grant from any funding agency in the public, commercial, or not for profit sectors.
Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declare: no support from any organisation for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; and no other relationships or activities that could appear to have influenced the submitted work.
Ethical approval: This study was approved by Trent multicentre research ethics committee.
Data sharing: No additional data available.
This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: http://creativecommons.org/licenses/by-nc/2.0/ and http://creativecommons.org/licenses/by-nc/2.0/legalcode.