Predicting cardiovascular risk in England and Wales: prospective derivation and validation of QRISK2BMJ 2008; 336 doi: http://dx.doi.org/10.1136/bmj.39609.449676.25 (Published 26 June 2008) Cite this as: BMJ 2008;336:1475
- Julia Hippisley-Cox, professor of clinical epidemiology and general practice1,
- Carol Coupland, senior lecturer in medical statistics1,
- Yana Vinogradova, research fellow in medical statistics1,
- John Robson, senior lecturer in general practice2,
- Rubin Minhas, coronary heart disease lead3,
- Aziz Sheikh, professor of primary care research and development4,
- Peter Brindle, research and development strategy lead5
- 1Division of Primary Care, Tower Building, University Park, Nottingham NG2 7RD
- 2Centre for Health Sciences, Queen Mary’s School of Medicine and Dentistry, London E1 2AT
- 3Medway Primary Care Trust, Unit 2, Gillingham, Kent ME7 0NJ
- 4Division of Community Health Sciences: GP Section, University of Edinburgh, Edinburgh EH8 9DX
- 5Avon Primary Care Research Collaborative, Bristol Primary Care Trust, Bristol BS2 8EE
- Correspondence to: J Hippisley-Cox
- Accepted 28 May 2008
Objective To develop and validate version two of the QRISK cardiovascular disease risk algorithm (QRISK2) to provide accurate estimates of cardiovascular risk in patients from different ethnic groups in England and Wales and to compare its performance with the modified version of Framingham score recommended by the National Institute for Health and Clinical Excellence (NICE).
Design Prospective open cohort study with routinely collected data from general practice, 1 January 1993 to 31 March 2008.
Setting 531 practices in England and Wales contributing to the national QRESEARCH database.
Participants 2.3 million patients aged 35-74 (over 16 million person years) with 140 000 cardiovascular events. Overall population (derivation and validation cohorts) comprised 2.22 million people who were white or whose ethnic group was not recorded, 22 013 south Asian, 11 595 black African, 10 402 black Caribbean, and 19 792 from Chinese or other Asian or other ethnic groups.
Main outcome measures First (incident) diagnosis of cardiovascular disease (coronary heart disease, stroke, and transient ischaemic attack) recorded in general practice records or linked Office for National Statistics death certificates. Risk factors included self assigned ethnicity, age, sex, smoking status, systolic blood pressure, ratio of total serum cholesterol:high density lipoprotein cholesterol, body mass index, family history of coronary heart disease in first degree relative under 60 years, Townsend deprivation score, treated hypertension, type 2 diabetes, renal disease, atrial fibrillation, and rheumatoid arthritis.
Results The validation statistics indicated that QRISK2 had improved discrimination and calibration compared with the modified Framingham score. The QRISK2 algorithm explained 43% of the variation in women and 38% in men compared with 39% and 35%, respectively, by the modified Framingham score. Of the 112 156 patients classified as high risk (that is, ≥20% risk over 10 years) by the modified Framingham score, 46 094 (41.1%) would be reclassified at low risk with QRISK2. The 10 year observed risk among these reclassified patients was 16.6% (95% confidence interval 16.1% to 17.0%)—that is, below the 20% treatment threshold. Of the 78 024 patients classified at high risk on QRISK2, 11 962 (15.3%) would be reclassified at low risk by the modified Framingham score. The 10 year observed risk among these patients was 23.3% (22.2% to 24.4%)—that is, above the 20% threshold. In the validation cohort, the annual incidence rate of cardiovascular events among those with a QRISK2 score of ≥20% was 30.6 per 1000 person years (29.8 to 31.5) for women and 32.5 per 1000 person years (31.9 to 33.1) for men. The corresponding figures for the modified Framingham equation were 25.7 per 1000 person years (25.0 to 26.3) for women and 26.4 (26.0 to 26.8) for men). At the 20% threshold, the population identified by QRISK2 was at higher risk of a CV event than the population identified by the Framingham score.
Conclusions Incorporating ethnicity, deprivation, and other clinical conditions into the QRISK2 algorithm for risk of cardiovascular disease improves the accuracy of identification of those at high risk in a nationally representative population. At the 20% threshold, QRISK2 is likely to be a more efficient and equitable tool for treatment decisions for the primary prevention of cardiovascular disease. As the validation was performed in a similar population to the population from which the algorithm was derived, it potentially has a “home advantage.” Further validation in other populations is therefore advised.
Cardiovascular disease is the leading cause of premature death and a major cause of disability in the United Kingdom.1 Evidence from randomised controlled trials supports the effectiveness of statins in reducing cardiovascular risk and the National Institute for Health and Clinical Excellence (NICE) has lowered the threshold for intervention for primary prevention with statins from a 10 year risk of cardiovascular disease of 40% to 20%.2 3 In April 2008, the UK government announced a new major initiative to reduce the risk of vascular disease.4 This will build on new NICE guidelines for lipid modification to be published later this year.5 It is important that this major public health programme targets those at greatest risk and reduces, rather than exacerbates, existing persistent and widening ethnic and social inequalities in risk of cardiovascular disease.6 7 8 A broader approach to preventative cardiovascular medicine is required that recognises the evidence for the role of the biological, socioeconomic, and ethnic determinants of health9 10 and is responsive to changes in secular trends in the incidence of coronary heart disease.11
Recent advances in the development of models to assess risk of cardiovascular disease means these can now recognise and take account of the increased risk associated with social deprivation in the UK.12 13 Rates of cardiovascular disease, however, vary considerably between ethnic groups, which might reflect increased susceptibility and differential exposure to risk factors.14 15 16 17 While several risk prediction scores derived from prospective studies can be used to identify and prioritise people for risk reducing interventions,12 13 18 they do not include a variable for self assigned ethnicity. One cross sectional study used ethnicity specific levels of risk of cardiovascular disease and risks factors to estimate 10 year risk. The use of this tool, however, excluded diabetes, lacked precision because of small numbers, and has not been validated.19
In May 2008, NICE recommended multiplying the results of a modified version of the US Framingham score (“modified Framingham”) by a correction factor of 1.4 for south Asian men in the UK.5 This does not reflect the heterogeneity in risk of cardiovascular disease between south Asian populations, the increased risk in women, confounding by deprivation,20 and the possibility of double counting through adjustments for both ethnicity and family history. An appropriate estimation of risk by ethnic group is important to improve cardiovascular outcomes, avoid the potential for further deterioration in health inequalities,21 and ensure the efficient allocation of resources used to support cardiovascular disease prevention programmes.
Because of the lack of prospective outcome data on black and minority ethnic groups,22 a contemporary and specific algorithm is needed to accurately quantify risk among such patients and to identify the independent or interacting contributions of factors including deprivation, family history, and diabetes.23 24 25 The QRESEARCH database contains longitudinal data, individual risk factors, demographic data, measures of social deprivation, and, increasingly, records of self assigned ethnicity, which provide a unique opportunity to model all these factors together.
We built on our previous risk prediction algorithm (QRISK1)12 to develop a revised algorithm that incorporates self assigned ethnicity as well as a range of other potentially relevant conditions associated with cardiovascular risk such as type 2 diabetes, treated hypertension, rheumatoid arthritis, renal disease, and atrial fibrillation (QRISK2). By including an increased range of potential risk factors, we hypothesised that we would be better able to personalise risk to the individual patient.
Study design and data source
We conducted a prospective cohort study in a large UK primary care population using a similar method to our original analysis.12 We used version 19 of the QRESEARCH database (www.qresearch.org). This is a large validated primary care electronic database containing the health records of 11 million patients registered from 551 general practices using the Egton Medical Information System (EMIS) computer system.12 Practices and patients on the database are nationally representative26 and similar to those on other primary care databases that use other clinical software systems.27
The QRESEARCH database now contains information on the cause of death as recorded on the patient’s Office for National Statistics (ONS) death certificate. This data linkage, which is based on NHS number, has now been successfully completed back to 1993. A recorded cause of death is now linked for over 97% of patients on the QRESEARCH database who have died.
We included all QRESEARCH practices in England and Wales once they had been using their current EMIS system for at least a year (to ensure completeness of recording of morbidity and prescribing data), randomly allocating two thirds of practices to the derivation dataset with one third to the validation dataset. We used the simple random sampling utility in Stata to assign practices to the derivation or validation cohort.
We identified an open cohort of patients aged 35-74 at the study entry date, drawn from patients registered with eligible practices during the 15 years from 1 January 1993 to 31 March 2008. We used an open cohort design as this allows patients to enter the population throughout the whole study period rather than requiring registration on 1 January 1993, thus better reflecting the realities of routine general practice.
We excluded patients with a prior recorded diagnosis of cardiovascular or cerebrovascular disease, temporary residents, patients with interrupted periods of registration with the practice, and those who did not have a valid Townsend deprivation score. We also excluded patients who were taking statins at baseline.
For each patient we determined an entry date to the cohort, which was the latest of the following dates: 35th birthday, date of registration with the practice, date on which the practice computer system was installed plus one year, and the beginning of the study period (1 January 1993). In addition we included patients in the analysis only once they had a minimum of one year’s complete data in their medical record.
Coding of ethnicity
We used Read codes for self assigned ethnicity. The codes were grouped into the NHS standard 16+1 categories28 for the initial descriptive analysis. The 16+1 categories were then further grouped into the final nine reporting groups to ensure sufficient numbers of events to enable a meaningful analysis. The white ethnic group was combined with the group where ethnicity was not recorded since, assuming the study population is comparable with the UK population, 93% or more of people without ethnicity recorded would be expected to be from a white ethnic group. The category “other including mixed” comprised white and black Caribbean, white and black African, white and Asian, other mixed, other black, and other ethnic group. The “white or not recorded” category comprised British, Irish, and other white background as well as not recorded. This was designated as the reference category. The category of other Asian included Read codes for east African Asian, Indo-Caribbean, Punjabi, Kashmiri, Sri Lankan, Tamil, Sinhalese, Caribbean Asian, British Asian, mixed Asian, or Asian unspecified.
Cardiovascular disease outcomes
The primary outcome measure was the first recorded diagnosis of cardiovascular disease recorded on the general practice clinical computer system or their linked ONS death certificate during the study period. For this study, we included coronary heart disease (angina and myocardial infarction), stroke, or transient ischaemic attacks in the term cardiovascular disease but not peripheral vascular disease.
The Read codes used for case identification on the computer record were nationally agreed ones used in the quality and outcomes framework for general practice for coronary heart disease and cerebrovascular disease. The ICD-10 codes used for case identification on the ONS death certificate were: angina pectoris (I20); acute myocardial infarction (I22); complications following acute myocardial infarction (I23); other acute ischaemic heart disease (I24); chronic ischaemic heart disease (I25); cerebral infarction (I63); and stroke, not specified as haemorrhage or infarction (I64).
Risk factors for cardiovascular disease
We included variables in our analysis that are known or thought to affect cardiovascular risk (box). We used the value closest to the entry date to the cohort for each patient, imputing missing values where necessary, as described below.
Self assigned ethnicity (white/not recorded, Indian, Pakistani, Bangladeshi, other Asian, black African, black Caribbean, Chinese, other including mixed)
Sex (males v females)
Smoking status (current smoker, non-smoker (including ex-smoker))
Systolic blood pressure18 (continuous)
Ratio of total serum cholesterol/high density lipoprotein cholesterol18 (continuous)
Body mass index (BMI)12 (continuous)
Family history of coronary heart disease in first degree relative under 60 years12 (yes/no)
Townsend deprivation score12 (output area level 2001 census data evaluated as a continuous variable)
Treated hypertension12 (diagnosis of hypertension and at least one current prescription of at least one antihypertensive agent)
Rheumatoid arthritis29 (yes/no)
Chronic renal disease30 (yes/no)
Type 2 diabetes18 (yes/no)
Model derivation and development
We calculated crude incidence rates of cardiovascular disease according to age, ethnic group, and deprivation in fifths. We directly age standardised the incidence rates by ethnic group and deprivation using the age distribution in five year bands of the entire derivation cohort as the standard population. We also age standardised the means of continuous variables and proportions with risk factors by ethnic group using the same method.
We used Cox proportional hazards models in the derivation dataset to estimate the coefficients and hazard ratios associated with each potential risk factor for the first ever recorded diagnosis of cardiovascular disease for men and women separately. As in our previous paper, we compared models using the Bayesian information criteria (BIC).33 We used fractional polynomials to model non-linear risk relations with continuous variables where appropriate.34 35 We tested for interactions between each variable and age and between diabetes and deprivation and included significant interactions in the final model. Continuous variables were centred for analysis.
Our main analyses used multiple imputation to replace missing values for systolic blood pressure, cholesterol/HDL ratio, smoking status, and body mass index. Our final model was fitted based on multiply imputed datasets using Rubin’s rules to combine effect estimates and estimate standard errors to allow for the uncertainty caused by missing data.35 36 Multiple imputation is a statistical technique designed to reduce the biases that can occur in “complete case” analysis along with a substantial loss of power and precision.37 38 39 40 Multiple imputation allows patients with incomplete data to still be included in analyses and makes full use of all the available data, increasing power and precision.41 The imputation technique involves creating multiple copies of the data and replaces missing values with imputed values based on a suitable random sample from their predicted distribution. We used the ICE procedure in Stata42 to obtain five imputed datasets (further details are available from the corresponding author).
We took the log of the hazard ratio for each variable from the final model and used these as weights for the new cardiovascular disease risk equations. We combined these weights with the baseline survivor function centred on the means of continuous risk factors to derive a risk equation for 10 years’ follow-up.
Validation of new equation
We tested the performance of the new model (QRISK2) in the validation dataset and compared it against both the original model (QRISK1) and the modified Framingham equation recommended by NICE.43 This modified equation is based on one of the original Anderson equations18 and is used to derive separate risks for coronary heart disease and stroke for an individual. The two risks are then added together (if these combined risks exceed 100%, the risk is then set to 100%). For south Asian men, NICE advises multiplying the resulting Framingham score by 1.4. For people with a family history of coronary heart disease in a first degree relative, then the risk is multiplied by 1.5. For south Asian men with a family history of coronary heart disease both multipliers are applied to the individual.
We calculated the 10 year estimated risk of cardiovascular disease for each patient in the validation dataset using multiple imputation to replace missing values as in the derivation dataset.
We calculated the mean predicted and observed cardiovascular disease risk at 10 years12 and compared these by 10th of predicted risk for each score. The observed risk at 10 years was obtained by using the 10 year Kaplan-Meier estimate. We calculated the Brier score (a measure of goodness of fit where lower values indicate better accuracy)44 using the censoring adjusted version adapted for survival data,45 D statistic (a measure of discrimination where higher values indicate better discrimination),46 and an R2 statistic. The R2 statistic is a measure of explained variation where higher values indicate more explained variation.47 We also calculated the area under the receiver operator curve (ROC), where higher values indicate better discrimination.
We calculated the proportion of patients in the validation sample with an estimated 10 year risk of cardiovascular disease of 20% or more by age, sex, ethnicity, and deprivation according to the QRISK2 algorithm compared with the modified Framingham score. We determined the proportion of patients who would be reclassified into a higher or lower risk category using the new risk equations at the 20% thresholds and determined the observed 10 year risks among those patients who would be reclassified.
As we used all the available data on the QRESEARCH database we did not calculate required sample size before the study. Analyses were conducted using Stata (version 10), with a significance level of 0.01 (two tailed).
Derivation and validation datasets
Practices and patients
Overall, 531 UK practices met our inclusion criteria, of which 355 were randomly assigned to the derivation dataset and 176 to the validation dataset. We excluded 20 practices that did not have complete data for the relevant study period (four practices) or were from Scotland (seven practices) or Northern Ireland (nine practices).
We studied 2.29 million patients with over 16 million person years and 140 115 cardiovascular events. There were 1 591 209 patients in the derivation cohort, of whom 55 626 had cardiovascular disease before the start of the study leaving 1 535 583 patients (773 291 women, 50.4%) aged 35-74 and free of cardiovascular disease. Table 1 shows the numbers of patients in each ethnic group.⇓
Baseline characteristics of derivation and validation cohort
Table 1 compares the characteristics of eligible patients in both cohorts.⇑ Ethnicity was recorded in 209 214 (27.1%) women and 181 110 (23.8%) men. Among patients with ethnicity recorded 89.3% were from a white ethnic group. The mean follow-up was 7.3 years for women and 6.9 for men. Some 437 676 patients (232 306 women and 205 370 men) had more than 10 years of follow-up data.
While this validation cohort was drawn from an independent group of practices, the baseline characteristics were similar to those for the derivation cohort.
Incidence of cardiovascular disease
Table 2⇓ shows the incidence rates of cardiovascular disease by age, sex, deprivation, and ethnicity in the derivation cohort. There were 96 709 incident cases of cardiovascular disease (41 042 in women) during the study period from 10.9 million person years of observation. Of all events, 7.4% in women and 7.8% in men were identified with the ONS linked death data (that is, were not identified with the general practice data alone). The crude incidence rate for cardiovascular disease was slightly higher than in our original study with a rate of 7.3 per 1000 person years for women and 10.5 per 1000 person years for men. In the validation dataset there were 750 232 eligible patients aged 35 to 74, and, of these, 50.1% were women and the incidence rates were similar to the derivation dataset (data not shown, but available from the corresponding author).
The incidence of crude and age standardised cardiovascular disease varied widely between ethnic groups (table 2).⇑ The age standardised rates for the white reference group were 10.5 per 1000 person years (95% confidence interval 10.4 to 10.6) for men and 7.3 per 1000 person years (7.2 to 7.3) for women. The highest age standardised rates were among south Asians groups—for example, for Bangladeshi people the rate was 24.4 per 1000 person years (19.8 to 29.0) for men and 11.3 per 1000 person years (8.5 to 14.1) for women. Age standardised rates were also high for Indian and Pakistani men and women compared with the white reference group. They were also higher for black Caribbean women and men from the “other Asian” group. In contrast, black African, Chinese, and black Caribbean men tended to have lower rates, as did black African women (table 2).⇑
Characteristics of events
Table 3⇓ shows the characteristics of events among men and women by ethnic group. Overall, 30.8% of events were stroke or transient ischaemic attacks, but this varied between ethnic groups. For example in the derivation dataset, 48.9% of first events among black Caribbean men and 36.4% among black African men were stroke or transient ischaemic attacks; the corresponding figures for women were 33.5% and 24.2%.
Prevalence of risk factors by ethnicity
Table 4⇓ shows the distribution of risk factors, standardised for age, among each of the main ethnic groups. There was substantial heterogeneity across the ethnic groups in risk factors for cardiovascular disease and this also differed between men and women within an ethnic group. The notable results include differences in the age standardised prevalence of smoking among men of Bangladeshi (53.2%, 50.2% to 56.2%), Caribbean (40.6%, 38.9% to 42.4%), Pakistani (32.9%, 30.8% to 35.1%), white/not recorded (32.2%, 32.1% to 32.3%), Chinese (28.0%, 24.6% to 31.4%), Indian (23.7%, 22.3% to 25.1%), and black African (16.6%, 15.1% to 18.2%) origin. Current smoking rates were all lower for women in each ethnic group compared with men but varied widely between women from different groups.
There were also substantial differences in the age standardised prevalence of type 2 diabetes between ethnic groups with highest rates among Bangladeshis (14.4% women, 16.8% men), Pakistanis (14.2% women, 12.0% men), and Indians (11.7% women, 13.3% men) and lowest among the white reference group (1.5% women, 2.1% men).
Treated hypertension was highest among Caribbean and black African men and women. Recorded family history of coronary heart disease in a first degree relative was highest among Indian men and women and lowest among black African men and women.
Table 5⇓ shows the results of the Cox regression analysis for the QRISK2 model. We used a log transformation for age but otherwise fitted variables as linear terms as this provided a better fit with the data according to the fractional polynomial analysis. The table shows variables that had significant interactions with age and these indicate increased hazard ratios for the risk factors among younger patients compared with older patients (fig 1)⇓.
Calibration and discrimination of QRISK2
The QRISK2 model was marginally superior to the original QRISK1 equation and both models were superior to the modification of the Framingham score for the D statistic, ROC statistic, and the R2 value—for both men and women (table 6)⇓. For example, the QRISK2 algorithm explained 43% of the variation in women and 38% in men. The figures for modified Framingham score were 39% and 35%, respectively. Also, as an example, the D statistic was 1.79 (1.77 to 1.82) in women and 1.62 (1.59 to 1.64) in men for the QRISK2 model compared with 1.63 (1.61 to 1.66) and 1.50 (1.47 to 1.52) for the modified Framingham score. All three scores performed better in women than in men.
Figure 2⇓ compares predicted and observed risks of a cardiovascular disease event at 10 years across each 10th of predicted risk (first 10th representing the lowest risk). This shows that the QRISK2 model is better calibrated than the modified Framingham score.
Predictions with age, sex, deprivation, and ethnicity
Table 7 shows the breakdown of patients by age and sex with a predicted 10 year risk of 20% or more with the QRISK2 model and the modified Framingham score.⇓ Overall, the QRISK2 model would predict 10.4% of patients as high risk compared with 14.9% for the modified Framingham score.
Figure 3⇓ shows the proportion of patients estimated to be at high risk with QRISK2 and the Framingham score within each ethnic group. QRISK2 would identify 14.2% (11.5% to 17.0%) of Bangladeshi and 10.1% (8.8% to 11.3%) of Indian women at high estimated risk compared with 7.2% (5.1% to 9.3%) and 4.6% (3.7% to 5.5%) with the Framingham score, respectively. QRISK2 would identify 14.0% (13.9% to 14.1%) of white men at high risk compared with 22.0% (21.9% to 22.1%) with the Framingham score.
Of the 112 156 patients classified as high risk (risk of ≥20% over 10 years) with the Framingham score, 46 094 (41.1%) would be reclassified at low risk with QRISK2. The 10 year observed risk among these reclassified patients was 16.6% (16.1% to 17.0%)—that is, below the 20% threshold for high risk.
Of the 78 024 patients classified at high risk with QRISK2, 11 962 (15.3%) would be reclassified as low risk with the Framingham score. The 10 year observed risk among these patients predicted to be at high risk with QRISK2 was 23.3% (22.2% to 24.4%)—that is, above the 20% threshold for high risk.
The annual incidence rate of cardiovascular events among those with a QRISK2 score of ≥20% was 30.6 per 1000 person years (95% confidence interval 29.8 to 31.5) for women and 32.5 per 1000 person years (31.9 to 33.1) for men. Both these figures are higher than the annual incidence rate for patients identified as high risk with the modified Framingham score. The annual incidence rate for these patients was 25.7 per 1000 person years (25.0 to 26.3) for women with 26.4 (26.0 to 26.8) for men. In other words, at the 20% threshold, the population identified by QRISK2 was at higher risk of a CV event than the population identified by the Framingham score.
Table 8⇓ shows some clinical examples for patients from different ethnic groups who would be reclassified with QRISK2 compared with the modified Framingham score. We have calculated 95% confidence intervals around the QRISK2 score. For example, a 64 year old Indian woman from a moderately deprived area with a systolic blood pressure of 130, BMI of 23.1, treated hypertension, and cholesterol/HDL ratio of 5.3 would have a modified Framingham score of 12.0%, but a QRISK2 score of 24.7% (24.4% to 25.0%) at 10 years. A 54 year old Bangladeshi man who is a non-smoker and has treated hypertension, a systolic blood pressure of 142 mm Hg, a BMI of 27.0, and a cholesterol ratio of 4.2 and lives in one the most deprived areas would have a modified Framingham score of 17.0% (including the adjustment for being south Asian) but a 10 year QRISK2 score of 23.5% (22.8% to 24.1%).
We developed and validated a cardiovascular risk algorithm that simultaneously takes account of ethnicity and deprivation. The algorithm has face validity in the setting in which it will be used and had good discrimination and calibration. There are three main reasons why this study is likely to make an important impact on the decisions of doctors, patients, and commissioners. Firstly, in this prospective study we developed and validated a risk prediction algorithm that provides an individualised estimate of cardiovascular risk and includes the independent contributions of ethnicity and deprivation. This permits identification of those individuals and groups likely to be most disadvantaged by use of existing treatment algorithms. Such patients include south Asian women, who would otherwise be less likely to be identified. This information will, if acted on, help to reduce health inequalities.
Secondly, it extends and improves on our original equation for cardiovascular risk12 by incorporating important additional clinical conditions (such as rheumatoid arthritis, chronic kidney disease, and atrial fibrillation), allowing more accurate quantification of risks for individual patients. This information should be considered in the context of specific treatment guidelines. Knowledge of cardiovascular risk might be useful in assessing response efficacy and concordance with recommended healthcare interventions for these specific conditions.
Thirdly, it also allows better quantification of risk of cardiovascular disease for patients with type 2 diabetes, which is especially prevalent among south Asian patients. Though there are alternative cardiovascular risk algorithms for patients with diabetes,18 48 none is based on a large nationally representative primary care cohort, has large numbers of incident events, and also simultaneously takes account of other important risk factors such as deprivation and ethnicity. Although current guidelines might indicate statins for people with diabetes, knowledge of cardiovascular risk can be useful in helping to identify patients at particularly low risk for whom a statin might not be needed.
Strengths and limitations
We included more sophisticated modelling of the effect of age on risk factors, which results in greater weighting of some risk factors in younger patients, such as smoking status, family history of coronary heart disease, type 2 diabetes, systolic blood pressure, treated hypertension, BMI, deprivation, and atrial fibrillation. This also has the effect that in people without the risk factors the increase in risk with age will be steeper than with QRISK1. The inclusion of patients with type 2 diabetes in the main study population will have tended to increase the overall level of risk in the study population and this will also have tended to increase the risk for an individual, as can be seen from the hazard ratios (table 5).⇑
We updated the analysis to include data until March 2008, increasing the number of patients with at least 10 years of follow-up data to almost 440 000 patients. We have furthermore included the linked cause of death as recorded by the Office for National Statistics (ONS). Death linkage increased cases of cardiovascular disease by about 7% across the entire study period, as the data from ONS were not available for the full study period at the time of the original study.27
We used self assigned ethnicity as reported by the patient to their general practice; this has advantages over analyses where ethnicity is assigned by an informant rather than the patient or is imputed geographically or is related to country of birth. The latter is particularly problematic with increasing numbers of people from ethnic minorities now being born in the UK.49 We also disaggregated the south Asian groups and reported on them separately, which addresses concerns with studies that tend to combine them into one group when there are differences in exposure to risk factors and rates and outcomes of diseases.25 Though only a quarter of patients had self assigned ethnicity recorded, we think it is reasonable to assume that where patients have self assigned ethnicity recorded as Bangladeshi (for example) that this is accurate and the patient was indeed Bangladeshi. Misclassification would most affect the reference category of “white or not recorded,” but because of the mix of the populations of England and Wales less than 10% of such patients were probably from a non-white ethnic group. This misclassification would therefore, if anything, tend to underestimate the relative effect of ethnicity on cardiovascular risk.
Just fewer than 3% of our total sample were classified as belonging to a minority ethnic group compared with the national proportion in this age group of 6.6% (based on projections for 200650). The comparison, however, is not “like for like” as national estimates are for 2006 and migration patterns and population demographics have probably changed over the 15 year period of our study. None the less, the lower percentage of patients from minority groups raises concerns about the possible under-representativeness of practices from ethnically diverse inner city areas or misclassification error, or both. We think under-representativeness of practices from ethnically diverse areas is unlikely as QRESEARCH practices are drawn from across England and Wales and have been shown to be similar to practices nationally for a range of measures.26 In fact, QRESEARCH has proportionately more practices in areas of higher ethnicity such as the East Midlands, Yorkshire, and Humberside (fig 4⇓. Also, table 1 shows that among patients from both cohorts, when ethnicity was recorded 11.7% were from a minority group. This is higher than from census estimates for 2006, indicating either over-representation of practices from ethnically diverse areas or that practices in ethnically diverse areas are more likely to record ethnicity, or both. Therefore, the reason for the apparent under-representation of people from black and minority ethnic groups has arisen is probably because we combined thenot recorded and the white groups. This combined group will contain additional patients from groups classified as other than white. This would, if non-differential, result in a bias towards the null hypothesis of no difference in risk between ethnic groups. The net consequence of this would be, if anything, to underestimate hazard ratios in the minority ethnic populations in question rather than generate spurious associations.
With a number of policy and legislative drivers co-aligning, ethnicity coding is likely to improve exponentially in the UK, and this evolving picture will therefore allow us to continue to monitor the impact of incorporating more complete ethnicity data into our models. But for the present, even though it is imperfect, incorporating ethnicity into our disease risk algorithm has, we believe, clearly been an important advance in understanding risk of disease in ethnically diverse populations. Furthermore, it is unlikely that a better estimate could be obtained for England and Wales given the difficulties of assembling a sufficiently large prospective cohort for follow-up over 10 or more years.
Another potential limitation of our study is that we have assumed that the absence of a recorded diagnosis of diabetes (or family history, for example) is equivalent to the person not having that factor. This is probably valid for diabetes as there have been consistent efforts in general practice over the past 15 years to develop and validate diabetes registers (including comparisons against prescribed medication for diabetes), though we accept there will additionally be large numbers of cases not yet diagnosed by clinicians. Recording of family history is less systematic in primary care and might be more susceptible to recording bias. As recording of risk factors becomes more complete over time, then better estimates of the relevant hazard ratios will be possible.
Also relevant is that we have calculated 95% confidence intervals around the QRISK2 scores to give a better idea of precision. We have improved on the method for validation by using multiple imputation for missing values in the validation set rather than mean values by age and sex derived from the derivation dataset as in our original study and independent validation.12 27 One important limitation, though, is that while we have validated the results in a physically discrete group of practices, these practices all use the same EMIS clinical system and hence there is a potential “home advantage” that might reduce the generalisability to other systems, although, conversely, it is ideally suited for use in the EMIS system. In other words, any comparison done in the one third sample of practices in QRESEARCH will tend to favour QRISK2 compared with other prognostic scores. Our previous study27 was additionally validated in a database (THIN, “The Health Improvement Network”) derived from a set of practices using a different clinical system (In Practice Systems) and gave similar results (apart from the prevalence of family history, which was lower in the THINdatabase). This suggests that our findings are probably generalisable to the 20% of practices in England and Wales that use In Practice Systems in addition to the 60% of practices that already use the EMIS clinical system from which the equation is derived. Further validation of QRISK2 is not currently possible on the THIN database as the database does not have the linked ONS death certificate data and recording of ethnicity is too low (personal communication, THIN, 2008). The validation we have presented constitutes the best currently possible given the extent and nature of comparable datasets. The results should generalise to at least 80% of practices nationally. None the less, it is important that QRISK2 is validated by another team on external populations and an international version of QRISK2 is being developed to allow this and will be reported in due course. In particular, we are working with another primary care database (THIN) to link their data to ONS death certificate data so that this can be used as a data source for further validation. Ethnicity recording could be improved on primary care databases by linkage of individual level data on self assigned ethnicity from the 2001 census, and this will be undertaken and reported, assuming access to these data is granted.
Comparisons with the modified Framingham score
This study improves on our original equation for cardiovascular risk in terms of its potential application as outlined above and also because the more complex model has slightly better discrimination (that is, greater ability to separate patients at high and low risk) than our original model. The QRISK1 equation improved on other equations in use in the UK by including additional readily available risk factors such as deprivation, family history, BMI, and blood pressure treatment. With QRISK2, the improvement in discrimination and calibration compared with the modified Framingham score remains significant, although this is probably partly because the modelling was undertaken on a more contemporaneous population from England and Wales and we used a more sophisticated approach for modelling and included additional variables. We have not compared QRISK2 with the most recently published Framingham score as this uses a much broader definition of cardiovascular disease that is less relevant to UK guidelines.51 QRISK2 seems to improve on the Framingham score based Ethrisk,19 perhaps because of its greater precision, larger sample, and prospective study design.
In contrast to our previous study, we compared QRISK2 with the modified Framingham risk score recently recommended by NICE. The modified score, in common with the risk equation advocated by the Joint British Societies, involves summing risks from two risk equations for coronary heart disease and stroke, which is mathematically incorrect because these are not independent outcomes and therefore will give an invalid result. This addition of the two separate and non-independent risks results in some patients having an estimated risk of more than 100% and would also result in overestimation of risk for other individuals at lower estimates of risk. This might have accounted for some of the overprediction. The inflation factors of 1.4 for south Asian men and 1.5 for those with a family history coronary heart disease, which have been developed by consensus rather than a mathematical model based on individual patient data, might also have accounted for some of the overprediction, although this was still present on our previous analysis where the inflation factors had not been applied.12 27
Comparisons with the literature
We found substantial heterogeneity between risk factors within south Asian populations and our prevalence figures for risk factors are comparable with the literature,19 20 which increases the face validity of our findings. For example, as others have found, Bangladeshi men have higher rates of smoking but lower mean systolic blood pressure levels than Pakistani or Indian men.20 Indian and Pakistani men and women have higher mean BMI than Bangladeshis.20 Prevalence of type 2 diabetes was higher in Bangladeshis and Pakistanis than Indians.20 Similarly, cholesterol/HDL ratio was higher among each of the south Asian groups compared with the white reference category.20 Our findings also confirm Nazroo’s observations52 and the findings of the Whitehall II study53 of the independent effects of both ethnicity and deprivation. Overall, the results of our study add to a growing body of evidence that combining people of south Asian origin into one category is potentially misleading.
The magnitude of the increased cardiovascular risk among south Asians compared with white patients seems to be higher than the 40% previously thought in the absence of prospective incidence data.22 24 For example, in our study, compared with the white reference group the adjusted risk is 45% higher (29% to 63%) among Indian men, 67% higher (40% to 101%) among Bangladeshi men, and 97% higher (70% to 129%) among Pakistani men, even after adjustment for multiple confounders including deprivation and diabetes. Similarly, the adjusted risks for Indian, Pakistani, and Bangladeshi women are all increased compared with the white reference population. Our results also suggest that the increased cardiovascular risks observed for Pakistani men are significantly higher than those for Indian men. The difference between these two groups for women is similar, although of borderline significance when a direct comparison is made, probably because of a lack of power.
There were also differences in the proportion of events that were stroke or transient ischaemic attacks rather than coronary heart disease. For example, a high proportion of first events among black Caribbean and black Africans was stroke or transient ischaemic attacks, which is consistent with the literature.54 55 Other studies have found differences in mortality between different ethnic groups, such as the unexplained persistent higher mortality among Bangladeshis.56 This deserves further study as to the underlying causes and potential missed opportunities for care.
QRISK2 has been designed to estimate cardiovascular risk for an entire population of patients in primary care by using data already collected within the patient’s electronic health record and by using default values for body mass index, cholesterol concentration, and systolic blood pressure where these data have not been recorded in the past five years. Computer generated risk scores have been integrated within routine clinical use of computers in UK primary care for the past 10 years, and, with QRISK2 embedded within computer applications, a rank ordered recall list can be generated so that those at greatest clinical need can be recalled first. Once such patients have been recalled, the individual can have a full clinical cardiovascular check to calculate an actual QRISK2 based on the most up to date data that are then used to guide decisions about treatment.
The only item in QRISK2 that is not already routinely collected and recorded electronically is the Townsend deprivation score, which is linked to an individual postcode. This score has already been integrated into the EMIS clinical system and linked to the records of over 32 million patients. The mapping of postcode to deprivation score will also be made available, together with the supporting reference tables and algorithm itself. QRISK2 can then be integrated within clinical management systems so that it can be used on an ongoing basis to generate an estimated score based on existing data. QRISK2 will be updated as improved analytical techniques are developed for application to the QRESEARCH database. QRISK will evolve as data quality and completeness improves and population characteristics change (obesity is increasing, while incidence of cardiovascular, for example). This will ensure that future versions of QRISK remain well calibrated to the population of England and Wales and makes best use of technical developments. Lastly, the NHS’ electronic health record(NHS Care Record Service) is central to the NHS Connecting for Health’s national programme for information technology and this will, within a relatively short space of time, result in electronic health records replacing paper based records in hospitals in England.57 The plan is for these eventually to incorporate computerised decision support tools and so this will allow disease risk algorithms such as QRISK2 to be largely automatically populated with routine electronically coded data as is already possible in primary care in the UK.
These estimates, like any predictive score, are an aid but not a replacement for judgment in individual clinical circumstances. We have specifically identified atrial fibrillation and rheumatoid arthritis for consideration as both are known to be associated with increased risk31 32 58 59 and knowledge of them might inform clinical management for an individual patient. We recognise that the likely age and comorbidity of these individuals, however, might place them at being at high risk of cardiovascular disease and therefore not appropriate for a primary prevention tool such as QRISK2. Nevertheless, if we had omitted rheumatoid arthritis and atrial fibrillation, the effect would be to underestimate risk for individuals with either of these two conditions who did not yet have concurrent cardiovascular disease. The prevalence of rheumatoid arthritis and atrial fibrillation is low so this will have a minimal impact on the overall precision of the model or its application at a population level, but we believe the additional complexity of the model is justified as no additional data entry will be required from most users, while it also provides relevant information to the individual patient with one or either of these conditions and their clinicians.
QRISK2 provides a mechanism for estimating absolute risk among individuals. Use of this information, however, should be tightly coupled with suitable guidelines. There are some patients in whom a QRISK2 score should not be calculated, including those with pre-existing cardiovascular disease (who we excluded from this study). Risk estimation should not be used for people with conditions such as peripheral vascular disease, heart failure, familial hypercholesterolaemia, or other conditions not specifically identified in the algorithm that are known to be associated with high risks of cardiovascular events.5 We have not added further to the exclusions in this dataset as to do so would have added complexity with no appreciable gain in precision for people in whom we do not recommend the use of this score.
Clinical impacts and health inequalities
A risk prediction algorithm that does not include deprivation or ethnicity is likely to result in the inequitable definition of risk for affluent and deprived communities and also substantially underestimate the risk in south Asian people, especially women, in whom, like men, it is the commonest cause of premature death. Primary prevention programmes that do not take these variables into account risk exacerbating rather than reducing existing health inequalities,6 7 8 especially as the evidence suggests that health inequalities naturally widen at the start of new health initiatives.21 Other research highlights additional difficulties with accessing effective health promotion, including lack of risk awareness, influences of culture and lifestyle, time restrictions, and language difficulties60 and this needs to be addressed once patients have been identified to improve clinical outcomes.
The QRISK2 algorithm, like its predecessor, has better calibration and is a better discriminator of risk of cardiovascular disease than the modified Framingham score. A major advantage of QRISK2 is the ability of the algorithm to be updated as population demographics, ethnic composition, prevalence of risk factors, and incidence of cardiovascular diseasechange. It also demonstrates the utility of linked electronic data for research to develop tools that can help doctors to make better decisions. The marked gradient with deprivation has already been demonstrated with QRISK1. The further identification of ethnicity as an independent factor additional to deprivation is an important consideration, particularly for south Asian women at high risk. A broader range of important clinical conditions included in QRISK2 but not in the modified Framingham score make it a more clinically relevant tool. Highlighting risks of conditions including type 2 diabetes and chronic renal disease supports further integration of vascular strategies and informs individual assessment.
The modified Framingham score underestimates risk in south Asian women. Like the earlier version, QRISK2 includes BMI and treatment for hypertension, neither of which are included in the Framingham score; in QRISK2, family history contributes an important additional weighting particularly at younger ages. The clinical relevance, superior performance, and equitable assignment of QRISK2 make it an appropriate tool to assist in the delivery of public health programmes that recognise the broader determinants of cardiovascular health, such as ethnicity and deprivation. This has particular relevance to equity of delivery of health care to the UK’s south Asian communities and might help to reduce widening health inequalities.
What is already known on this topic
A 10 year cardiovascular disease risk threshold of 20% is recommended for intervention with statins for the primary prevention of cardiovascular disease
Current algorithms for risk of cardiovascular disease do not adequately account for the combined effect of socioeconomic status and ethnicity, leading to an underestimate of risk in high risk populations that might potentially exacerbate existing health inequalities
What this study adds
Compared with a white reference population, there is a substantially increased risk of cardiovascular disease in south Asian men and women that is independent of social deprivation, diabetes, and family history
The results of the calibration and discrimination statistics for QRISK2 were significantly better than those for the modified Framingham score in the validation sample
At the 10 year risk threshold of 20%, the population identified by QRISK2 was at higher risk of a CV event than the population identified by the modified algorithm
We acknowledge the contribution of David Stables (EMIS) and EMIS practices contributing to the QRESEARCH database. In particular we acknowledge his contribution in linking the ONS death certificate data to individual records held within EMIS clinical systems so that it could be extracted on to the QResearch database and used for this project. We thank Aneez Esmail (University of Manchester), Ruthie Birger and Chris Millett (Imperial College London), and Nadeem Qureshi (University of Nottingham) for ethnicity coding.
Contributors: JH-C initiated and designed the study, obtained approvals, prepared the data, undertook the analysis and interpretation, and wrote the first draft paper. CC and YV contributed to the development of the protocol, design, and analysis and interpretation and drafting of the paper. CC also undertook some of the primary analyses with JHC. JR and PB contributed to the conception, design, analysis, interpretation, and drafting of article and approved the final draft. RM and AS contributed to suggestions for analysis, drafting, interpretation, and approved the final draft. JH-C is the guarantor.
Funding: No external funding. The authors were funded as part of their clinical or academic positions and meeting expenses were met by the University of Nottingham.
Competing interests: JR chaired and PB and RM were members of the NICE guideline development group on cardiovascular risk assessment. JHC is codirector of QRESEARCH—a not for profit organisation that is a joint partnership between the University of Nottingham and EMIS. EMIS is the leading commercial supplier of IT systems for 56% of general practices in England and Wales and it is likely to implement QRISK2 into its clinical management system. EMIS is likely to also distribute the software package for those using it for academic research or other organisations interesting in implementing QRISK2 into practice or (www.qresearch.org/Public/qriskInformationforClinicians.aspx). RM is a 2008 Harkness Fellow in healthcare policy and practice and is the chair of the cardiovascular working group of the South Asian Health Foundation (SAHF), which receives unrestricted funding from the Department of Health and BHF and unrestricted grants from the pharmaceutical industry. AS chairs the equality and diversity forum of the National Clinical Assessment Service. AS is PI on NHS Connecting for Health’s evaluation of the implementation of the NHS Care Record Service. QRESEARCH undertakes analyses for the Department of Health and other government organisations.
Ethical approval: Trent multicentre research ethics committee.
Provenance and peer review: Not commissioned; externally peer reviewed.