Development and validation of QRISK3 risk prediction algorithms to estimate future risk of cardiovascular disease: prospective cohort study

Objectives To develop and validate updated QRISK3 prediction algorithms to estimate the 10 year risk of cardiovascular disease in women and men accounting for potential new risk factors. Design Prospective open cohort study. Setting General practices in England providing data for the QResearch database. Participants 1309 QResearch general practices in England: 981 practices were used to develop the scores and a separate set of 328 practices were used to validate the scores. 7.89 million patients aged 25-84 years were in the derivation cohort and 2.67 million patients in the validation cohort. Patients were free of cardiovascular disease and not prescribed statins at baseline. Methods Cox proportional hazards models in the derivation cohort to derive separate risk equations in men and women for evaluation at 10 years. Risk factors considered included those already in QRISK2 (age, ethnicity, deprivation, systolic blood pressure, body mass index, total cholesterol: high density lipoprotein cholesterol ratio, smoking, family history of coronary heart disease in a first degree relative aged less than 60 years, type 1 diabetes, type 2 diabetes, treated hypertension, rheumatoid arthritis, atrial fibrillation, chronic kidney disease (stage 4 or 5)) and new risk factors (chronic kidney disease (stage 3, 4, or 5), a measure of systolic blood pressure variability (standard deviation of repeated measures), migraine, corticosteroids, systemic lupus erythematosus (SLE), atypical antipsychotics, severe mental illness, and HIV/AIDs). We also considered erectile dysfunction diagnosis or treatment in men. Measures of calibration and discrimination were determined in the validation cohort for men and women separately and for individual subgroups by age group, ethnicity, and baseline disease status. Main outcome measures Incident cardiovascular disease recorded on any of the following three linked data sources: general practice, mortality, or hospital admission records. Results 363 565 incident cases of cardiovascular disease were identified in the derivation cohort during follow-up arising from 50.8 million person years of observation. All new risk factors considered met the model inclusion criteria except for HIV/AIDS, which was not statistically significant. The models had good calibration and high levels of explained variation and discrimination. In women, the algorithm explained 59.6% of the variation in time to diagnosis of cardiovascular disease (R2, with higher values indicating more variation), and the D statistic was 2.48 and Harrell’s C statistic was 0.88 (both measures of discrimination, with higher values indicating better discrimination). The corresponding values for men were 54.8%, 2.26, and 0.86. Overall performance of the updated QRISK3 algorithms was similar to the QRISK2 algorithms. Conclusion Updated QRISK3 risk prediction models were developed and validated. The inclusion of additional clinical variables in QRISK3 (chronic kidney disease, a measure of systolic blood pressure variability (standard deviation of repeated measures), migraine, corticosteroids, SLE, atypical antipsychotics, severe mental illness, and erectile dysfunction) can help enable doctors to identify those at most risk of heart disease and stroke.


Introduction
The first QRISK model to estimate 10 year risk of cardiovascular disease was published in 2007. 1 It was followed by an updated model (QRISK2) in 2008, which included ethnic origin and additional risk factors (type 2 diabetes, rheumatoid arthritis, atrial fibrillation, and chronic renal disease). Since then, QRISK2 has been updated annually and recalibrated to the latest version of the QResearch database 2 ; the age range across which it applies has also been extended from 35-74 years to

WhAT IS AlReAdy knoWn on ThIS TopIC
Methods to identify patients at increased risk of cardiovascular disease (CVD) are needed to identify those for whom interventions or more frequent assessment may be required QRISK2 algorithms are widely used to estimate the 10 year risks of CVD in people aged 25-84 taking account of information recorded in primary care electronic records and that the patient can also provide

WhAT ThIS STudy AddS
Updated algorithms (QRISK3) quantify the absolute risks of CVD in people aged , which include established and new risk factors New factors are an expanded definition of chronic kidney disease (stage 3, 4, or 5), migraine, corticosteroid use, systemic lupus erythematosus, atypical antipsychotic use, severe mental illness, erectile dysfunction, and a measure of blood pressure variability (standard deviation of repeated measures) The updated risk algorithms provide valid measures of absolute risk in the general population of patients as shown by the performance in a separate validation cohort 25-84 years, type 1 diabetes has been included as a separate variable, smoking is assessed at five levels instead of two, and the Townsend score has been updated using the most recent values from the 2011 census. This helps to ensure that the algorithms reflect the changes in population characteristics (such as changes in prevalence of smoking, body mass index, or declining incidence of cardiovascular disease) and improvements in data quality (such as improved recording of risk factors and data linkage to Hospital Episode Statistics, 3 which has increased ascertainment of cardiovascular events 4 ). The QRISK algorithms have been validated by ourselves and others in independent groups of patients using UK primary care databases such as QResearch, 4 Clinical Practice Research Datalink (CPRD), 4 The Health Improvement Network (THIN), [5][6][7][8][9] and clinical cohorts [10][11][12] as well as in international populations. 13 14 Their use has been evaluated in observational studies, 15 cost effectiveness evaluations, 16 and clinical trials. 17 18 QRISK2 is now used across England's health service (NHS England) and recommended in the NHS Quality and Outcomes Framework, 19 guidance from the National Institute of Health and Care Excellence, 20 and NHS Health Check. 21 QRISK2 is also used in occupational health settings and internationally, with over two million hits on the QRISK website (www.qrisk.org). A new NICE guideline on lipid modification and cardiovascular risk assessment was published in 2014. 20 This guideline highlighted a number of conditions associated with increased cardiovascular risk that may not be fully captured by QRISK2, including HIV/AIDS, stage 3 kidney disease, systemic lupus erythematosus (SLE), severe mental illness, and use of atypical antipsychotics or corticosteroids. 20 These conditions are not specifically identified within QRISK2, which may result in underestimation of risk in the relevant patient groups. In addition, recently published research has highlighted increased cardiovascular risk and potential prognostic importance for erectile dysfunction, [22][23][24] migraine, 25 and blood pressure variability. 26 We therefore derived and validated a new version of the algorithms, QRISK3, to determine whether these factors should be incorporated into the algorithms to improve estimation of cardiovascular risk for these patients.
Methods study design and data source Using the QResearch database (version 41) we undertook a cohort study in a large population of primary care patients. We included all practices in England that had been using the EMIS computer system for at least one year and randomly allocated three quarters of practices to the derivation dataset and the remainder to a validation dataset. We identified an open cohort of patients aged 25-84 years registered with the practices between 1 January 1998 and 31 December 2015. Patients were excluded if they had no postcode related Townsend score (since these usually result from patients moving to newly built houses with new postcodes not being yet linked to deprivation data or from patients being homeless or not having a permanent residence), had pre-existing cardiovascular disease (on general practice records or linked hospital records), or were using prescribed statins at cohort entry. We determined an entry date to the cohort for each patient, which was the latest of the following: 25th birthday, date of registration with the practice plus one year, date on which the practice computer system was installed plus one year, or the study start date (1 January 1998). Patients were censored at the earliest date of the diagnosis of cardiovascular disease, death, deregistration with the practice, last upload of computerised data, or study end date (31 December 2015).

Outcomes
Our outcome was cardiovascular disease, which was defined as a composite outcome of coronary heart disease, ischaemic stroke, or transient ischaemic attack. The QResearch database is linked at individual patient level to hospital admissions data (Hospital Episode Statistics), and mortality records obtained from the Office for National Statistics. The records are linked using a pseudonymised NHS number specific to the QResearch database. The recording of NHS numbers is valid and complete for 99.8% of patients with data on QResearch, 99.9% for ONS mortality records, and 98% for hospital admissions records. 3 27 We classified patients as having cardiovascular disease if there was a record of the relevant clinical code in either their general practice record, their linked hospital record, or their linked mortality record. We used Read codes to identify cardiovascular disease cases from the general practice record. The Read codes are listed in table 1 of the web appendix. We used ICD-10 (international classification of diseases, 10th revision) clinical codes to identify cases from hospital and mortality records except for the three years between 1 January 1998 and 31 December 2000, when ICD-9 was in use for mortality records. The ICD-10 codes used were G45 (transient ischaemic attack and related syndromes), I20 (angina pectoris), I21 (acute myocardial infarction), I22 (subsequent myocardial infarction), I23 (complications after myocardial infarction), I24 (other acute ischaemic heart disease), I25 (chronic ischaemic heart disease), I63 (cerebral infarction), and I64 (stroke not specified as haemorrhage or infarction). The corresponding ICD-9 codes used were 410, 411, 412, 413, 414, 434, and 436. General practice and linked mortality and Hospital Episode Statistics data were available until 31 December 2015. We used the earliest recorded date of cardiovascular disease on any of the three data sources as the outcome date.

Predictor variables
We examined the predictor variables in box 1 based on established risk factors already included in the current version of QRISK2 and new candidate variables highlighted in the literature or National Institute for Health and Care Excellence guidelines.
From the general practice record we extracted data for demographic factors, clinical diagnoses, and clinical values. For clinical values (systolic blood pressure and body mass index) and smoking status we obtained the most recent values recorded before the baseline date. We selected the closest value to cohort entry for total choles- terol: high density lipoprotein cholesterol ratio, restricting values after the baseline date to those before the patient had a diagnosis of cardiovascular disease or was censored, and before any statin prescriptions. To assess variability in systolic blood pressure, we identified all systolic blood pressure values recorded in the five years before study entry and calculated the standard deviation where there were two or more recorded values. Use of drugs at baseline was defined as at least two prescriptions, with the most recent one no more than 28 days before the date of entry to the cohort. All other predictor variables were based on the latest information recorded in the general practice record before entry to the cohort.
Derivation and validation of the models We developed and validated the risk prediction algorithms using established methods 1 5 8 10 28 and performed an initial analysis based on patients with complete variables. For our main analysis, we used multiple imputation with chained equations to replace missing values for body mass index, systolic blood pressure, standard deviation of systolic blood pressure, serum cholesterol, high density lipoprotein cholesterol, and smoking status and used these values in our main analyses. [29][30][31][32] We log transformed values for continuous variables that were not normally distributed for inclusion in the imputation model so that the imputed values would better match the distribution of observed values.
Five imputations were carried out as this has a relatively high efficiency 33 and was a pragmatic approach accounting for the size of the datasets and capacity of the available servers and software. In the imputation model we included all predictor variables, along with age interaction terms, the Nelson-Aalen estimator of the baseline cumulative hazard, and the outcome indicator. Cox's proportional hazards models were used to estimate the coefficients for each risk factor in women and men separately. We used Rubin's rules to combine the results across the imputed datasets. 34 Fractional polynomials 35 were used to model non-linear risk relations with continuous variables using data from patients with recorded values to derive the fractional polynomial terms. We fitted full models initially. For consistency, we included variables from existing QRISK2 models and then retained additional variables if they had an adjusted hazard ratio of less than 0.90 or greater than 1.10 (for binary variables) and were statistically significant at the 0.01 level. We developed three main models. Model A contains the same variables as the latest version of QRISK2-2017. Model B includes the additional variables that met our inclusion criteria but not the standard deviation of serial systolic blood pressure values. Model C is the same as model B except that it includes the standard deviation of serial systolic blood pressure values. We examined interactions between new predictor variables and age at study entry and included significant interactions in models B and C along with interactions already included in QRISK2.
From the final models we used the regression coefficients for each variable as weights, which we combined with the baseline survivor function evaluated up to 15 years to derive risk equations over a period of 15 years of follow-up. 36 This enabled us to derive risk estimates for each year of follow-up, with a specific focus on 10 year risk estimates. We estimated the baseline survivor function based on zero values of centred continuous variables, with all binary predictor values set to zero.
validation of the models In the validation cohort we used multiple imputation to replace missing values for body mass index, systolic blood pressure, standard deviation of systolic blood pressure, serum cholesterol, high density lipoprotein cholesterol, and smoking status. We carried out five imputations. The risk equations for women and men obtained from the derivation cohort for models A, B, and C were applied to the validation cohort and Age at study entry (baseline) Ethnic origin (nine categories) Deprivation (as measured by the Townsend score, where higher values indicate higher levels of material deprivation) Systolic blood pressure Body mass index Total cholesterol: high density lipoprotein cholesterol ratio Smoking status (non-smoker, former smoker, light smoker (1-9/day), moderate smoker (10-19/day), or heavy smoker (≥20/day)) Family history of coronary heart disease in a first degree relative aged less than 60 years Diabetes (type 1, type 2, or no diabetes) Treated hypertension (diagnosis of hypertension and treatment with at least one antihypertensive drug) Rheumatoid arthritis (diagnosis of rheumatoid arthritis, Felty's syndrome, Caplan's syndrome, adult onset Still's disease, or inflammatory polyarthropathy not otherwise specified) Atrial fibrillation (including atrial fibrillation, atrial flutter, and paroxysmal atrial fibrillation) Chronic kidney disease (stage 4 or 5) and major chronic renal disease (including nephrotic syndrome, chronic glomerulonephritis, chronic pyelonephritis, renal dialysis, and renal transplant)

new or amended risk factors considered
Expanded definition of chronic kidney disease (to include general practitioner recorded diagnosis of chronic kidney disease stage 3 in addition to stages 4 and 5 as well as major chronic renal disease) Measure of systolic blood pressure variability (standard deviation of repeated measures) Diagnosis of migraine (including classic migraine, atypical migraine, abdominal migraine, cluster headaches, basilar migraine, hemiplegic migraine, and migraine with or without aura) Corticosteroid use (British National Formulary (BNF) chapter 6.3.2 including oral or parenteral prednisolone, betamethasone, cortisone, depo-medrone, dexamethasone, deflazacort, efcortesol, hydrocortisone, methylprednisolone, or triamcinolone) Systemic lupus erythematosus (including diagnosis of SLE, disseminated lupus erythematosus, or Libman-Sacks disease) Second generation "atypical" antipsychotic use (including amisulpride, aripiprazole, clozapine, lurasidone, olanzapine, paliperidone, quetiapine, risperidone, sertindole, or zotepine) Diagnosis of severe mental illness (including psychosis, schizophrenia, or bipolar affective disease) Diagnosis of HIV or AIDS Diagnosis of erectile dysfunction or treatment for erectile dysfunction (BNF chapter 7.4.5 including alprostadil, phosphodiesterase type 5 inhibitors, papaverine, or phentolamine) measures of discrimination calculated. As in previous studies, 4 we calculated R 2 values (explained variation where higher values indicate a greater proportion of variation in time to cardiovascular disease diagnosis is explained by the model 37 ), D statistic 38 (a measure of discrimination where higher values indicate better discrimination), and Harrell's C statistic at 10 years and combined these across datasets using Rubin's rules.
Harrell's C statistic 39 is a measure of discrimination that is similar to the area under a receiver operating characteristic curve but takes account of the censored nature of the data.
We assessed calibration (comparing the mean predicted risks at 10 years with the observed risk by 10th of predicted risk). The observed risks were obtained using the Kaplan-Meier estimates evaluated at 10 years. We also evaluated performance in each age group (<40, 40-59, ≥60 years), ethnic origin subgroup, and each comorbidity and treatment subgroup. Performance was also evaluated by calculating Harrell's C statistics in individual general practices and combining the results using meta-analytical techniques for comparison with a previous study of QRISK2. 9 reclassification statistics In line with current NICE guidelines, 20 we classified patients as being at high risk of cardiovascular disease if their 10 year risk was 10% or greater. We compared predicted risks for our final models (QRISK3) with the latest version of QRISK2-2017 to determine the percentage of patients who would be reclassified at this threshold according to each model. Among the reclassified patients we also calculated the observed risks of cardiovascular disease at 10 years using the Kaplan-Meier method.
To maximise the power and generalisability of the results we used all the relevant patients on the database. STATA (version 14) was used for all analyses. The study adhered to the TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) statement for reporting. 40

Patient involvement
Over the past 10 years since the original publication of QRISK 1 there has been extensive discussion about methods for assessment of cardiovascular risk. This has included a series of public stakeholder consultations in relation to updates of NICE guidance on lipid modification, 20 the NHS Quality and Outcomes Framework, and NHS Health Check. 21 We therefore decided to focus on issues highlighted in NICE guidance and the literature rather than to consult patient or professional groups. We decided it would be more transparent and effective to discuss the addition of new variables once the paper was published and the relative contribution of individual risk factors had been quantified. Given the widespread implementation of QRISK2 across the NHS and its inclusion in guidelines, this would give time for feedback from a range of stakeholders (including patient groups and charities) as to which changes would be most beneficial and how improvements might be implemented.

study population
Overall, 1309 practices contributing to the QResearch database in England met our inclusion criteria. Of these, 981 were randomly assigned to the derivation dataset and the remainder (n=328) to a validation cohort. For the derivation cohort we identified 8 602 833 patients aged 25-84 years. We excluded 31 433 (0.4%) with no recorded Townsend score, 344 669 (4.0%) with a diagnosis of cardiovascular disease at baseline recorded on the general practice or Hospital Episode Statistics record, and 336 928 (3.9%) prescribed statins at baseline. Overall, 7 889 803 patients were included in the derivation analysis.
For the validation cohort we identified 2 918 082 patients aged 25-84 years. We excluded 13 862 (0.5%) with no recorded Townsend score, 118 057 (4.0%) with a diagnosis of cardiovascular disease recorded on the general practice or Hospital Episode Statistics record, and 114 865 (3.9%) prescribed statins at baseline. In total, 2 671 298 patients were included in the validation analysis.
baseline characteristics Table 1 shows the baseline characteristics of men and women in the derivation and validation cohorts. In the derivation cohort, self assigned ethnic origin was recorded for 64.9% of women and 59.7% of men, smoking status for 85.0% and 77.7%, respectively, systolic blood pressure for 82.8% and 68.3%, respectively, body mass index for 72.8% and 64.0%, respectively, and total cholesterol: high density lipoprotein cholesterol ratio for 39.8% and 37.9%, respectively. Complete information for smoking status, systolic blood pressure, body mass index, and total cholesterol: high density lipoprotein cholesterol ratio was provided for 28.5% of women and 24.6% of men. At least two systolic blood pressures were recorded for 77.7% of women and 64.0% of men from which the standard deviations were calculated. These values were similar to corresponding values for both sexes in the validation cohort (table 1). Table 1 also shows comorbidities at study entry. For the new variables of interest, severe mental illness was recorded for 6.8% of women and 4.3% of men, migraine for 6.4% and 2.7%, respectively, chronic kidney disease (stage 3, 4, or 5) for 0.5% and 0.3%, respectively; prescribed atypical antipsychotics for 0.5% of women and men, and prescribed corticosteroids for 2.4% and 1.5%, respectively, and 2.3% of men had a diagnosis of or treatment for erectile dysfunction. SLE was recorded for 0.1% of women and less than 0.1% of men and HIV/AIDS for 0.1% of women and 0.2% of men. The mean of the most recent systolic blood pressure values was 123.2 mm Hg in women and 129.2 mm Hg in men and the mean of the standard deviations of repeated systolic blood pressure values was 9.3 in women and 9.9 in men.
incidence rates of cardiovascular disease Table 2 shows the numbers of patients with a new diagnosis of cardiovascular disease during follow-up by age group (five year intervals) in women and men in the  derivation cohort based on the linked general practice, hospital, and Office for National Statistics morality records. In the derivation cohort, we identified 363 565 incident cases of cardiovascular disease arising from 50.8 million person years of observation. The incidence of cardiovascular disease increased steeply by age group and values were higher in men than women for all age groups. Table 2 in the web appendix shows a similar breakdown by nine ethnic groups. For example, 4758 events occurred in Indian women and men arising from 8 819 177 person years of observation and 417 events in Chinese women and men arising from 210 267 person years of observation. Table 3 in the web appendix shows the source of the data that first identified the incident event by type of event in the derivation cohort. It also shows the number and percentage of cases that were identified only using general practice data with no subsequent evidence of cardiovascular disease on hospital or mortality records.  Table 3 shows the adjusted hazard ratios for women in the derivation cohort and table 4 shows the corresponding values for men. Of the new risk factors, all met our model inclusion criteria except for HIV/AIDS, which was associated with a 25% increased risk in women and 17% increased risk in men, but these were not statistically significant at the 0.01 level. Model A is the latest version of QRISK2 (2017). Model B includes the additional variables that met our inclusion criteria. Model C is the same as model B except that it includes the standard deviation of serial systolic blood pressure values.

Predictor variables
The supplementary figure shows graphs of the adjusted hazard ratios for model B for the fractional polynomial terms for age and body mass index as well as the interaction terms between age and relevant predictor variables, as listed in the footnotes of tables 3 and 4 . For the new variables of interest in model B, migraine was associated with a 36% increased risk of cardiovascular disease for women and a 29% increased risk for men, corticosteroids were associated with an 82% increased risk for women and 58% increased risk for men, SLE was associated with a 115% increased risk for women and a 55% increased risk for men, atypical antipsychotics were associated with a 29% increased risk for women and a 15% increased risk for men, severe mental illness was associated with a 14% increased risk for women and a 13% increased risk for men. Erectile dysfunction was associated with a 25% increased risk. Where there were age interactions these values relate to risks evaluated at the mean ages. The full list of age interactions is shown in the footnotes for tables 3 and 4. For the new variables, there were statistically significant interactions between age and migraine as well as age and corticosteroid use in both sexes. In women, there was also a statistically significant interaction between age and SLE. In men, there was also a statistically significant interaction between age and erectile dysfunction. For each of these interactions, hazard ratios for the predictors were higher at younger ages compared with older ages, except for erectile dysfunction in men, where hazard ratios were highest for men aged around age 45 and then declined gradually with increasing age.
For model C, the standard deviation of systolic blood pressure values was included in the model in addition to the single most recent systolic blood pressure value. Overall a 10 unit increase in the standard deviation of systolic blood pressure was associated with an 8% increased risk of cardiovascular disease in women (table 3 ) and an 11% increased risk in men (table 4).
Tables 4 and 5 in the web appendix show the results of complete case analyses for models B and C for women and men, respectively (ie, the results based on patients with complete data). The hazard ratios associated with total cholesterol: high density lipoprotein cholesterol ratio, systolic blood pressure, and standard deviation of systolic blood pressure were similar to those obtained in the main models using multiply imputed data.
validation Discrimination Table 5 shows the performance of each algorithm in the validation cohort for women and men for each of models A, B, and C. For model B in women, the algorithm explained 59.5% of the variation in time to diagnosis of cardiovascular disease (R 2 ), the D statistic was 2.48, and the Harrell's C statistic was 0.88. The corresponding values for men were 54.8%, 2.26, and 0.86. Measures of performance were similar for all three models. Table 6 in the web appendix shows the validation statistics for model B in various subgroups, including three age groups, ethnic groups, and in those with specific comorbidities. The highest performance values by ethnic origin were in Chinese women (R 2 =64.7%; D=2.77; Harrell's C=0.91) and the lowest values were in Caribbean women (R 2 =51.6%; D=2.11; Harrell's C=0.85). Performance values were highest in the youngest age group (25-39 years) and lowest in the oldest age group (60-84 years).
For the subgroup of women with type 1 diabetes the R 2 was 47.3%, D statistic was 1.94, and Harrell's C statistic was 0.82. The corresponding values for men with type 1 diabetes were 45.6%, 1.87, and 0.80. For the subgroup of women with type 2 diabetes the R 2 was 25.2%, D statistic was 1.19, and Harrell's C statistic was 0.70. The corresponding values for men with type 2 diabetes were 22.9%, 1.12, and 0.70. Figure 1 shows the funnel plots of Harrell's C statistic for model B across the 328 practices in the validation cohort. The funnel plots show Harrell's C statistic for each general practice versus the number of cardiovascular events in each practice in women and men separately. Practices with fewer cardiovascular events had wider variation in the C statistic than practices with more events. The summary (average) C statistic for women was 0.874 (95% confidence interval 0.869 to 0.880) from a random effects meta-analysis. The I 2 value (ie, the percentage of total variation in C statistics 1.14 (1.13 to 1.15) 1.14 (1.14 to 1.15) 1.14 (1.13 to 1.14) Standard deviation of blood pressure (per 10 unit increase) NA NA 1.11 (1.09 to 1.12) NA=not applicable; HDL=high density lipoprotein. Includes chronic kidney disease (stage 4 or 5) fractional polynomial terms for age (age -1 and age 3 ) and body mass index (BMI −2 and BMI −2 ln(BMI)), and interactions with age for body mass index, systolic blood pressure, Townsend score, family history of coronary heart disease, treated hypertension, atrial fibrillation, type 1 diabetes, type 2 diabetes, chronic kidney disease, and smoking status. †Same as model A with chronic kidney disease (stage 3, 4, or 5), extra variables listed in table, and additional age interactions for: migraine, corticosteroid use, and erectile dysfunction. ‡Same as model B but with standard deviation of systolic blood pressure. §Interaction with age; hazard ratios evaluated at mean age. owing to between practice heterogeneity) was 93.3%. The approximate 95% prediction interval for the true C statistic in women in a new practice was 0.79 to 0.96. The summary C statistic for men was 0.851 (95% confidence interval, 0.847 to 0.855) from a random effects meta-analysis. The I 2 value was 84.2%. The approximate 95% prediction interval for the true C statistic in men in a new practice was 0.79 to 0.91.

Calibration
In women, the mean 10 year predicted risk was 4.7% for models A, B, and C. The observed 10 year risk was 5.8% (95% confidence interval 5.8% to 5.9%). In men, the mean 10 year predicted risk was 6.4% for models A, B, and C. The observed 10 year risk was 7.5% (7.5% to 7.6%). Figure 2 shows the mean predicted risks and observed risks at 10 years by 10th of predicted risk, applying each algorithm to all women and men in the validation cohort and to separate age groups (25-39, 40-59, and 60-84 years). There was close correspondence between the mean predicted risks and the observed risks within each model 10th overall and in each age group in women and men indicating that the algorithms were well calibrated. The exception was in those aged 25-39 where mean predicted risks were slightly higher than observed risks. Using model A, the number of patients with a 10 year risk score of 15% or more was 308 130 (11.5%) and with a risk of 20% of more was 214 451 (8.0%). The corresponding numbers for models B and C were similar.
Of 458 263 patients with a 10 year predicted risk score of 10% or more using model A, 10 948 (2.4%) would be reclassified as low risk (predicted risk <10% over 10 years) using model B. The 10 year observed risk among these reclassified patients was 10.3% (95% confidence interval 9.6% to 11.1%), just above the 10% threshold. Conversely,  Of the 458 869 patients with a 10 year predicted risk score of 10% or more using model B, 9102 (2.0%) would be reclassified as low risk using model C. The 10 year observed risk among these reclassified patients was 9.6% (95% confidence interval 8.9% to 10.5%), marginally below the 10% threshold. Conversely, of the 2 213 429 with a 10 year predicted risk score of less than 10% using model B, 9101 (2.4%) would be reclassified as high risk using model C. The 10 year observed risk among these reclassified patients was 10.7% (9.9% to 11.6%), marginally above the 10% threshold. Table 6 shows clinical examples where use of model A, B, or C would result in a reclassification above or below the 10% threshold. Figures 3 and 4 show screenshots of the updated web calculator with a clinical example which can be found at www.qrisk.org.

discussion
We have developed and validated updated algorithms (QRISK3) to predict 10 year risk of cardiovascular disease in women and men aged 25-84 years. The algorithms incorporate established predictor variables from QRISK2 as well as new variables associated with increased risk of cardiovascular disease. These include an expanded definition of chronic kidney disease to include chronic kidney disease stage 3, migraine, corti- , atypical antipsychotic use, severe mental illness, erectile dysfunction, and a measure of blood pressure variability (standard deviation of repeated values). We have produced three main final models: model A, which includes the same variables and coefficients as the current version of QRISK2-2017; model B, which includes the new variables and the latest systolic blood pressure value and is for use where only the current reading is available; and our preferred model C, which additionally includes a measure of blood pressure variability that may be more suitable for integration into general practice computer systems where longitudinal repeated values are likely to be available. Although in population terms the overall performance of all three models is similar, for those who have one or more of the conditions included in the newer models, having the additional risk taken into account could result in the difference between taking or not taking risk reducing treatment. The increased complexity is unlikely to affect the take-up of the new models as they are designed to be calculated automatically from the electronic patient record.

Comparisons with the literature
The hazard ratios of the new risk variables included in our final models are similar in both magnitude and direction to those reported in other studies. 25 Migraine Sufficient pathophysiological and epidemiological evidence have now accumulated for some experts to propose that migraine should be included as a marker for future cardiovascular disease. 41 Our results support this since we found that migraine was associated with a 36% increased risk of cardiovascular disease for women and 29% for men (model B). This is consistent with the increased risk of 42% in 27 840 women aged 45 and over in the Women's Health Study 42  Your risk of having a heart attack or stroke within the next years is: In other words, in a crowd of people with the same risk factors as you, are likely to have a heart attack or stroke within the next years.
Your score has been calculated using estimated data, as some information was le blank.  42 and might reflect differences in cohort selection, clinical setting, consulting patterns, diagnostic criteria, or recording of diagnoses. For example, our study is based on routinely collected health records and uses diagnoses recorded by clinicians before entry to the cohort. In contrast, the Nurses' Health Study II used self report questionnaires at three time points over a six year period. Our study, which also includes men, is much larger than previous studies. 25 42 While our study may be more representative of the general population than patients recruited to a trial, it is also susceptible to ascertainment bias. This would be the case if not all patients with migraine visited their general practitioner and not all of those diagnoses are recorded. Conversely, the Nurses' Health Study II and the Women's Health Study may be subject to recall bias owing to the use of self reported questionnaires inquiring about historical diagnoses. Also, our definition of migraine included a range of subtypes so it is not possible to say which of these are associated with the additional risk of being categorised as having migraine. For example, the bulk of the risk could be coming from those with migraine with aura rather than other subtypes. 43 While the magnitude of the increased risk associated with migraine is relatively small at the individual level, it is important at the population level since migraine is so prevalent. 41 Hence there is good justification for including clinician recorded diagnosed migraine in our new models.

Corticosteroids and antipsychotics
The National Institute for Health and Care Excellence guidance states that cardiovascular disease risk scores will underestimate cardiovascular risk among people who are taking medicines that cause dyslipidaemia such as antipsychotic drugs or corticosteroids. 20 In line with other studies, 44 we found evidence to support the increased risk with corticosteroids despite simultaneous adjustment of lipid levels. Current corticosteroids (defined as ≥2 prescriptions, with the most recent one within the 28 days before study entry) were prescribed Your risk of having a heart attack or stroke within the next years is: In other words, in a crowd of people with the same risk factors as you, are likely to have a heart attack or stroke within the next years.
Your score has been calculated using estimated data, as some information was le blank. for 2.4% of women and 1.5% of men and were associated with an 82% increased cardiovascular risk in women and 58% increased risk in men. This is similar to the increased risks with corticosteroids found in other studies. 45 46 However, our definition was relatively simple (and could be used in clinical practice) but did not account for duration of use and dose and so allows for substantial heterogeneity in the indications for steroid use, and the effect may not apply equally to those with different levels of exposure. Similarly, atypical antipsychotic drugs were prescribed to 0.5% of men and women and were associated with a 29% increased cardiovascular risk in women and 15% increased risk in men. Both corticosteroids and atypical antipsychotics therefore seem to be clinically important variables to include in QRISK, taking account of the magnitude of the risk and the potential numbers of patients affected.

severe mental illness
The NICE guidance highlights the increased cardiovascular risk associated with severe mental illness, 20 although this is contrary to a recent systematic review and meta-analysis, which failed to find sufficient evidence to support this conclusion. 47 Our study found that 6.8% of women and 4.3% of men had a diagnosis of severe mental illness affects and it was associated with a 14% increased risk of cardiovascular disease for women and a 13% increased risk for men (model B). This is independent of the risk associated with atypical antipsychotics and hence both factors have been included separately as they will have a compound effect on cardiovascular risk. Clinicians will now be able to provide better information to these patients both about interventions to reduce cardiovascular risk and about the potential effects of atypical antipsychotics.

sle
The NICE guidance on lipid modification 20 highlights the increased cardiovascular risk associated with SLE. The excess risk is thought to be driven largely by inflammation and an active immunological response. 48 Reduction in risk in patients with SLE may need both modification of SLE specific factors such as disease activity and drug therapy as well as modification of traditional cardiovascular disease risk factors, although the role of anti-inflammatory treatments is not yet clear. 48 We found that a diagnosis of SLE is associated with a 115% increased risk for women and a 55% increased risk for men. While SLE is relatively uncommon (affecting 0.1% of women and rarely affecting men), the magnitude of the increased risk is high (substantially higher than rheumatoid arthritis for example) particularly at younger ages (hazard ratios were >2 for ages ≤45 years). This makes it an important risk factor for these patients and is consistent with other studies examining cardiovascular outcomes in patients with these conditions. 48 Chronic kidney disease The NICE guidance 20 states "do not use a risk assessment tool in people with an estimated glomerular filtration rate (eGFR) of less than 60/mL/1.73 m 2 and/or albuminuria. These people are at increased risk of cardiovascular disease . . . Atorvastatin should be offered to people with CKD [chronic kidney disease]." Our expanded definition of chronic kidney disease now includes chronic kidney disease stage 3 (eGFR 30-59/ mL/1.73 m 2 ) in addition to stages 4 and 5, in line with other published studies. 49 This means QRISK3 can be used in such patients and will provide them with better information to inform their choice about use of statins and potentially other non-drug interventions to reduce their cardiovascular risk and to "encourage the person to participate in reducing their risk" in line with the recommendations for other patients.
type 1 diabetes Although the NICE guidance on lipid modification 20 recommends the use of QRISK2 in patients with type 2 diabetes, it states "do not use a risk assessment tool to assess CVD [cardiovascular disease] risk in patients with type 1 diabetes." Instead it recommends that "statin treatment is offered to all patients with type 1 diabetes who are older than 40 years or have had diabetes for more than 10 years or have established nephropathy or have other CVD risk factors." The current model for QRISK2 and the models presented in this paper allow calculation of cardiovascular risk for patients with type 1 diabetes. The performance among patients with type 1 diabetes is good (see table 6 in the web appendix). We can see no reason why patients with type 1 diabetes should not have similar discussions to other patients regarding the risks and benefits of interventions. Use of the calculator in patients with type 1 diabetes is intended to allow better information to be shared with such patients on their cardiovascular risk profile. It may identify patients with a risk under 10% who may not want to take statins as well as facilitate a discussion on a range of interventions to reduce risk, including weight loss, blood pressure control, and smoking cessation. The performance of the models in patients with type 2 diabetes was lower than for patients with type 1 diabetes (for example in men with type 2 diabetes Harrell's C=0.70, R 2 =22.9% compared with Harrell's C=0.80, R 2 =45.6% in men with type 1 diabetes).

blood pressure variability
Recent studies have suggested that higher blood pressure variability is associated with increased risks of stroke 26 and other cardiovascular events. 50 This may be independent of mean blood pressure values, 50 although the increased risk of cardiovascular events associated with blood pressure variability in the recent meta-analysis by Stevens et al was based on one study of 8811 patients aged more than 55 years with type 2 diabetes. 51 In our study, both the most recent value at baseline and the standard deviation of systolic blood pressure were independently associated with increased risk of cardiovascular disease, although the addition of the standard deviation to the model did not improve discrimination or calibration. It may be difficult to implement the model with blood pressure variability in a setting where there is no historical information on blood pressure available, such as with a web calculator. While the performance and reclassification statistics suggest that its inclusion will not make a major difference at a population level, there may be some benefit from taking this factor into account for those patients with highly variable blood pressure.

erectile dysfunction
The true prevalence of erectile dysfunction is difficult to determine, and estimates range from 1% to 100% depending on the age of the population and how the diagnosis was made. 24 Our study indicated that erectile dysfunction affected 2.3% of men, but this is likely to be an underestimate as it includes only men who present to their doctor with the condition and have the diagnosis or treatment recorded on their electronic record. We showed that erectile dysfunction is likely to be an independent risk factor for cardiovascular disease and was associated with a 25% increased risk of cardiovascular disease (at the mean age), which is compatible with the findings of a meta-analysis that examined the association between erectile dysfunction and cardiovascular disease risk in 13 studies. 23 While the overall relative risk estimate from these studies was 1.44, the 95% confidence interval was broad (1.27 to 1.63) and there was substantial heterogeneity across the studies. The association was reduced to 1.34 (1.17 to 1.54) when only high quality studies were included. Our definition and others only provide a summary effect and it should be recognised that the causes of erectile dysfunction are usually a combination of the physiological and psychological and that men with vascular causes are likely to be at higher risk of cardiovascular disease than those for whom the cause is largely psychological. hiv/aiDs Data from large cohorts have reported that people infected with HIV have approximately 50% greater risk of acute myocardial infarction and stroke compared with those without HIV, 52 which may be related to antiretroviral treatment. 53 While we found a tendency towards an increased risk of cardiovascular disease among people with HIV/AIDS this did not reach statistical significance at the 0.01 level so was not included in the final models. These results may reflect the relatively small numbers with HIV/AIDS recorded on the general practice clinical system. Also, people with HIV/AIDS tend to be younger and so have low absolute event rates and shorter periods of follow-up with an individual general practice, which may tend to underestimate the long term association. People with HIV/AIDS may receive healthcare (and prescriptions for antiretroviral treatment) from specialist clinics rather than general practices, which may explain why there are few prescriptions recorded for antiretroviral treatment on the QResearch database. Over time the recording of HIV/AIDS and prescribing of antiretroviral treatment may increase and so it is important to reassess the suitability of HIV/AIDS for inclusion in QRISK3 periodically to ensure that affected people have accurate cardiovascular risk assessments.

Comparison with the original version of QrisK2, 2008
Our new models are well calibrated when applied to a separate validation cohort and have high levels of discrimination. We found an improvement in performance from all three models over the original version of QRISK2 from 2008, 28 although some of this improvement is likely to be owing to the wider age range (25-84 compared with 35-74 years). Since 2008, improvements have been made to the underlying QResearch database used to derive the QRISK algorithm, which may have resulted in improvements to the performance of the algorithm over and above extending the age range from 35-74 to 25-84 years and the inclusion of additional variables. Ascertainment of cardiovascular events has improved with the linkage of the QResearch database to both Office for National Statistics mortality and Hospital Episode Statistics since 1998. The number of practices contributing to the database has more than doubled, from 531 in 2008 to over 1300. The size of the derivation cohort has increased fivefold, with 363 565 cardiovascular events arising from 50.8 million person years of observation compared with 96 709 events arising from 10.9 million person years in 2008. The recording of self assigned ethnic origin has increased; 25% in 2008 compared with 62% in the current derivation cohort. As a result of these factors, there are many more events within each ethnic group-for example, there has been a 10-fold increase in the number of cardiovascular events for non-white ethnic groups compared with 2008. This is reflected in the more accurate hazard ratios with tighter confidence intervals and improved performance statistics.

strengths and limitations of this study
The methods used to derive and validate these models are broadly the same as for a range of other clinical risk prediction tools derived from the QResearch database. 28 54-57 The strengths and limitations of the approach have already been discussed in detail. 8 54 57-60 In summary, key strengths include size, duration of follow up, representativeness, and lack of selection, recall, and respondent bias. UK general practices have good levels of accuracy and completeness in recording clinical diagnoses and prescribed drugs. 61 We think our study has good face validity since it has been conducted in the setting where most patients in the UK are assessed, treated, and followed up. Limitations of our study include the lack of formal adjudication of diagnoses, information bias, and potential for bias owing to missing data. Our database has linked hospital and mortality records for nearly all patients and is therefore likely to have picked up the majority of cardiovascular events thereby minimising ascertainment bias. We excluded patients using statins at baseline as in previous versions of QRISK and QRISK2. Over the past decade a change in guidelines will have led to a higher proportion of at risk patients being prescribed statins in the absence of established cardiovascular disease. Removing patients at high risk will tend to lower overall event rates. We excluded patients without a valid deprivation score since this group may represent a more transient population, where follow-up could be unreliable or unrepresentative. Their deprivation scores are unlikely to be missing at random so we did not think it would be appropriate to impute them. Given the number tested for inclusion, there may be some over fitting of interaction terms. We have continued to use the well recognised total cholesterol: high density lipoprotein cholesterol ratio as a predictor rather than low density lipoprotein cholesterol values alone as the ratio resulted in improved prediction during earlier versions of QRISK and QRISK2 and is measured directly, whereas low density lipoprotein cholesterol is calculated.
The present validation has been done on a separate set of practices and individuals to those that were used to develop the score, although the practices all use the same general practice clinical computer system (EMIS, used by 55% of UK general practices). An independent validation study would be a more stringent test and should be done, but when such independent studies have examined QRISK2 and other risk algorithms, 6 7 59 60 they have shown comparable performance compared with the validation in the QResearch database. 28 54 58 We have published the source code to enable accurate implementation of QRISK3 on the QRISK website (www.qrisk.org) with earlier versions of the score from previous annual updates. The rationale for this is to ensure that those interested in reviewing or using the open source will then be able to find the current version as the score continues to be updated.

Conclusion
We have developed updated algorithms (QRISK3) to quantify absolute risks of cardiovascular disease in people aged 25-84 years, which include established risk factors and new risk factors: expanded definition of chronic kidney disease (stage 3, 4, or 5), migraine, corticosteroid use, SLE, atypical antipsychotic use, severe mental illness, erectile dysfunction, and a measure of blood pressure variability (standard deviation of repeated measures). The updated risk algorithms provide valid measures of absolute risk in the general population of patients, as shown by the performance in a separate validation cohort.
A simple web calculator to implement the QRISK3 algorithms can be accessed at www.qrisk.org/Open source software is also available for download.
We thank the EMIS practices that contribute to QResearch, and EMIS and the University of Nottingham for expertise in establishing, developing, and supporting the QResearch database, and the Office for National Statistics for providing the mortality data. The Hospital Episode Statistics data in this analysis are reused by permission from NHS Digital, which retains the copyright. ONS and NHS Digital bear no responsibility for the analysis or interpretation of the data.
Contributors: JHC initiated the study, developed the research question, undertook the literature review, extracted and manipulated the data, performed the primary data analysis, and wrote the first draft of the paper. CC contributed to the refinement of the research question, design, analysis, interpretation, and drafting of the paper. PB contributed to the development of the research question, design, interpretation, and drafting of the paper.
Funding: No external funding was received for this study.
Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf and declare: JHC is professor of clinical epidemiology at the University of Nottingham and codirector of QResearch a not-for-profit organisation that is a joint partnership between the University of Nottingham and Egton Medical Information Systems (leading commercial supplier of IT for 55% of general practices in the UK). JHC is also a paid director of ClinRisk, which produces open and closed source software to ensure the reliable and updatable implementation of clinical risk algorithms within clinical computer systems to help improve patient care. CC is associate professor of medical statistics at the University of Nottingham and a paid consultant statistician for ClinRisk. PB is partly funded by Health Research Collaboration for Leadership in Applied Health Research and Care West (NIHR CLAHRC West), Bristol Clinical Commissioning Group and the West of England Academic Health Science Network.. This work and any views expressed within it are solely those of the authors and not of any affiliated bodies or organisations.
Ethical approval: The study was reviewed in accordance with the QResearch agreement with East Midlands-Derby Research Ethics Committee (reference 03/4/021).
Data sharing: The algorithms presented in this paper will be released as open source software under the GNU lesser GPL v3. The open source software allows use without charge under the terms of the GNU lesser public license version 3. Closed source software can be licensed at a fee.
Transparency: The lead author (JHC) affirms that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) have been explained. This is an Open Access article distributed in accordance with the terms of the Creative Commons Attribution (CC BY 4.0) license, which permits others to distribute, remix, adapt and build upon this work, for commercial use, provided the original work is properly cited. See: http://creativecommons.org/licenses/by/4.0/.