Development and validation of QRISK3 risk prediction algorithms to estimate future risk of cardiovascular disease: prospective cohort study
BMJ 2017; 357 doi: https://doi.org/10.1136/bmj.j2099 (Published 23 May 2017) Cite this as: BMJ 2017;357:j2099
All rapid responses
Rapid responses are electronic comments to the editor. They enable our users to debate issues raised in articles published on bmj.com. A rapid response is first posted online. If you need the URL (web address) of an individual response, simply click on the response headline and copy the URL from the browser window. A proportion of responses will, after editing, be published online and in the print journal as letters, which are indexed in PubMed. Rapid responses are not indexed in PubMed and they are not journal articles. The BMJ reserves the right to remove responses which are being wilfully misrepresented as published articles or when it is brought to our attention that a response spreads misinformation.
From March 2022, the word limit for rapid responses will be 600 words not including references and author details. We will no longer post responses that exceed this limit.
The word limit for letters selected from posted responses remains 300 words.
Dear Editor
We would like to provide additional information in response to the letter from Professor Osborn and Dr Walters in relation to our recent paper published in the BMJ describing the derivation and validation of QRISK3 [1]. Of the 10,561,100 patients in our QRISK3 derivation and validation cohorts, 593,738 (5.62%) were coded as having severe mental illness. Our definition was based on a combination of the Quality and Outcomes Framework (QOF) definition of severe mental illness plus a subset of the codes from the QOF definition of depression (having excluded those codes indicating mild depression).
Overall across both derivation and validation cohorts, we identified 110,798 patients (1.05% of 10,561,100) with a coded diagnosis of either schizophrenia, bipolar affective disorders or other psychoses, the remaining 482,940 (4.57%) having a coded diagnosis of depression which was either moderate or severe depression or not identified as mild. We based our definition of depression on Read codes indicating moderate or severe depression, for example severe depression, major depression, recurrent depression, psychotic depression, depressive disorder, endogenous depression. We think we should have made this clearer in Box 1 of the original paper.
Overall, across the validation and derivation cohorts, we identified 52,128 patients (0.49% of 10,561,100) who were prescribed atypical antipsychotics at study entry. Of these, 35,452 (68.01%) had a diagnosis of severe mental illness (using the broader definition above) and of these, 24,394 (68.81%) had a diagnosis of bipolar affective disorders or schizophrenia. There were 16,676 (31.99%) of 52,128 patients on atypical antipsychotics, who did not have a recorded diagnosis of schizophrenia, bipolar affective disorders or moderate/severe depression. Most atypical antipsychotic drugs will be initiated in secondary care with the ongoing prescriptions generally issued by primary care.
Stevens et al may have misinterpreted how we calculated the standard deviation of systolic blood pressure so we would like to clarify that. In our Methods section, we stated the following, “To assess variability in systolic blood pressure, we identified all systolic blood pressure values recorded in the five years before study entry and calculated the standard deviation where there were two or more recorded values”. Of those who had a standard deviation calculated, this was based on two values for 33.9% of patients and on three or more values for the remaining 66.1% of patients. So, if there were only two values, these would have been used but if there had been 20 values in the preceding five years for an individual patient, then all 20 would have been used to calculate the standard deviation. In other words, we used all the available values to calculate the standard deviation and in so doing, developed an approach which could be implemented back into the GP computer systems where all such values will be recorded. Regarding the number of imputations, because we calculated variability over 2 or more readings, we feel that 5 imputations remains a pragmatic choice in view of the volume of data (nearly 8 million patients in the derivation cohort). Given the magnitude and significance of the coefficients in our models any imputation variability will have little substantive impact on the precision of estimates and selection of variables based on tests of significance [2].
In our study, we reported an adjusted hazard ratio of 1.08 (95% CI 1.07 to 1.09) in women and 1.11 (1.09 to 1.12) in men associated with a 10-unit increase in standard deviation of systolic blood pressure. This is lower than that reported in the paper by Stevens et al [3] which was 1.18 (95% CI 1.07 to 1.30) but their hazard ratio relates to a standardised measure of systolic blood pressure variability on a different scale to our values (“blood pressure variability divided by its sample standard deviation”). Although Stevens et al undertook a meta-analysis for other outcomes, they were only able to identify a single eligible study for the cardiovascular disease outcome [4]. This study consisted of a highly selected group of 8811 patients with type 2 diabetes recruited to a clinical trial experiencing 404 cardiovascular events who are unlikely to be representative of the general population eligible for cardiovascular disease risk assessment. Their cohort was considerably older (mean age 66 compared with 43 in QRISK3) and had substantially higher systolic blood pressure values compared with the QRISK3 population. For example, the mean systolic blood pressure was 137 mmHg compared with 123 mmHg in women and 129 mmHg in men in our study. Furthermore the blood pressure measurements in the trial were made at specific follow-up times using standardised equipment which does not reflect the situation in a primary care setting where the QRISK3 risk prediction models are intended to be used.
Stevens et al correctly state that patients using antihypertensive medication were included in the cohort of patients used to develop QRISK3. The use of antihypertensive medication in patients with a diagnosis of hypertension was included as a parameter in the risk equation in a similar way to earlier versions of QRISK [5]. Given the purpose of QRISK3 (which is to assess CVD risk at a point in time based on information which is already available), we decided to assess each predictor at baseline based on information that was already available, not on information which might change at a future point. Whilst it would be possible to model changes in medication during follow up as a time varying exposure, it is conceptually difficult to see how such a risk equation could be used in clinical practice as the information would not be known to either the doctor or the patient at the time of assessment. This is also relevant to the interesting point that Peek et al make regarding interventions which may occur during follow-up.
Stevens et al also highlight the similarities in the validation statistics between the models with and without the standard deviation of blood pressure. Whilst we felt that, on average the models had the same overall performance, for those patients that do have higher levels of blood pressure variability, it seemed reasonable to choose the model that could represent that increased risk to some degree. It is possible that the risk associated with blood pressure variability has been underestimated, but choosing a model without any variable for this, would lead to even more underestimation of the true risk in people with variable blood pressure. Further versions of QRISK3 could seek to improve how blood pressure variability is represented in the model.
Lastly, whilst independent external validation of risk assessment tools is the gold standard, numerous validation studies of the QRISK2 cardiovascular risk prediction algorithms (including external studies) have shown that the results in our independent validation practices pretty much match the results when tested in other similar databases both in the UK [6-10] and internationally [11] [12]. We have no reason to think this study should be different but also look forward to future validation studies to confirm our results.
Julia Hippisley-Cox
Carol Coupland
Peter Brindle
References
1. Hippisley-Cox J, Coupland C, Brindle P. Development and validation of QRISK3 risk prediction algorithms to estimate future risk of cardiovascular disease: prospective cohort study. BMJ 2017;357 doi: 10.1136/bmj.j2099
2. Bodner TE. What Improves with Increased Missing Data Imputations? Structural Equation Modeling: A Multidisciplinary Journal 2008;15(4):651-75. doi: 10.1080/10705510802339072
3. Stevens SL, Wood S, Koshiaris C, et al. Blood pressure variability and cardiovascular disease: systematic review and meta-analysis. BMJ 2016;354
4. Hata J, Arima H, Rothwell PM, et al. Effects of visit-to-visit variability in systolic blood pressure on macrovascular and microvascular complications in patients with type 2 diabetes mellitus: the ADVANCE trial. Circulation 2013;128(12):1325-34. doi: 10.1161/circulationaha.113.002717 [published Online First: 2013/08/09]
5. Hippisley-Cox J, Coupland C, Vinogradova Y, et al. Predicting cardiovascular risk in England and Wales: prospective derivation and validation of QRISK2. BMJ 2008:bmj.39609.449676.25. doi: 10.1136/bmj.39609.449676.25
6. Collins GS, Altman DG. Predicting the 10 year risk of cardiovascular disease in the United Kingdom: independent and external validation of an updated version of QRISK2. BMJ 2012;344:e4181. doi: 10.1136/bmj.e4181
7. Collins GS, Altman DG. An independent and external validation of QRISK2 cardiovascular disease risk score: a prospective open cohort study. BMJ 2010;340:c2442. doi: 10.1136/bmj.c2442
8. Hippisley-Cox J, Coupland C, Brindle P. The performance of seven QPrediction risk scores in an independent external sample of patients from general practice: a validation study. BMJ Open 2014;4(8):e005809. doi: 10.1136/bmjopen-2014-005809
9. Hippisley-Cox J, Coupland C, Vinogradova Y, et al. Performance of the QRISK cardiovascular risk prediction algorithm in an independent UK sample of patients from general practice: a validation study. Heart 2008;94:34-39. doi: 10.1136/hrt.2007.134890
10. Riley RD, Ensor J, Snell KIE, et al. External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges. BMJ 2016;353 doi: 10.1136/bmj.i3140
11. Arts EEA, Popa C, Den Broeder AA, et al. Performance of four current risk algorithms in predicting cardiovascular events in patients with early rheumatoid arthritis. Annals of the Rheumatic Diseases 2015;74(4):668-74. doi: 10.1136/annrheumdis-2013-204024
12. Pike MM, Decker PA, Larson NB, et al. Improvement in Cardiovascular Risk Prediction with Electronic Health Records. Journal of Cardiovascular Translational Research 2016:1-9. doi: 10.1007/s12265-016-9687-z
Competing interests: see competing interests for the original paper at bmj.com
We note with interest the inclusion of a measure of blood pressure (BP) variability in the new QRISK3 algorithm by Hippisley-Cox and colleagues. As a recognised novel risk factor for cardiovascular disease (CVD),[1] the impact of including measures of long-term blood pressure variability in cardiovascular risk equations is the subject of ongoing work by ourselves and colleagues in the Clinical Practice Research Datalink.
Our experience leads us to question several of the decisions made in the current analysis:
Firstly, we question whether two repeat measurements – the bare minimum with which standard deviation (SD) can be calculated – is adequate to measure BP variability. Our experience is similar to that of Rothwell and colleagues, who found very different and increasing hazard ratios as more measurements were included in their measure of variability.[2] The current analysis reports significantly reduced hazard rations than would have been expected from our previous meta-analysis.[1]
Missing data is a significant issue in the utilisation of routine healthcare records and the investigators used five imputations as a “pragmatic” choice reflecting volume of data and availability of computing power. Our experience is that as many as 50 imputations are necessary when dealing with BP variability over two measures, in order to reduce the Monte Carlo error to adequate levels (e.g. Monte Carlo error of a coefficient test statistic should be approximately 0.1).[3]
Those using antihypertensive medication were included in the development of the risk algorithm by Hippisley-Cox and colleagues and this may have confounded the observed relationship between BP variability and outcomes.[4] Patterns of adherence to medication could also be expected to affect observed variation in blood pressure,[5] as could changing medications over time. Both of these are likely to be associated with cardiovascular risk, but the authors do not give details of the timing of measurements with respect to medication change.
The authors conclude that the model incorporating variability is “preferred” even though the reported c-statistics for models with and without variability are identical to three decimal places, and the few individuals whose risk would be classified differently are those whose risk is very close to the decision threshold according to either model.
Finally, the true performance of these new models cannot be reliably determined until they have undergone independent external validation in other populations. We look forward to future validation studies to confirm the results present by Hippisley-Cox and colleagues and further studies that more carefully consider the utility of BP variability in cardiovascular risk prediction.
Yours faithfully,
Sarah L Stevens and Richard J McManus
Competing interests: SS and the work described in this rapid response are funded by the National Institute for Health Research School for Primary Care Research (NIHR SPCR). The views expressed are those of the authors and not necessarily those of the NHS, the NIHR, or the Department of Health. RM has received grants and personal fees from Omron and grants from Lloyds Pharmacy, outside the described work.
1 Stevens SL, Wood S, Koshiaris C, et al. Blood pressure variability and cardiovascular disease: systematic review and meta-analysis. BMJ 2016;:i4098. doi:10.1136/bmj.i4098
2 Rothwell PM, Howard SC, Dolan E, et al. Prognostic significance of visit-to-visit variability, maximum systolic blood pressure, and episodic hypertension. Lancet 2010;375:895–905. doi:10.1016/S0140-6736(10)60308-X
3 White IR, Royston P, Wood AM. Multiple imputation using chained equations: Issues and guidance for practice. Stat Med 2011;30:377–99.
4 Rothwell PM, Howard SC, Dolan E, et al. Effects of β blockers and calcium-channel blockers on within-individual variability in blood pressure and risk of stroke. Lancet Neurol 2010;9:469–80. doi:10.1016/S1474-4422(10)70066-1
5 Muntner P, Levitan EB, Joyce C, et al. Association Between Antihypertensive Medication Adherence and Visit-to-Visit Variability of Blood Pressure. J Clin Hypertens 2013;15:112–7.
Competing interests: SS and the work described in this rapid response are funded by the National Institute for Health Research School for Primary Care Research (NIHR SPCR). The views expressed are those of the authors and not necessarily those of the NHS, the NIHR, or the Department of Health. RM has received grants and personal fees from Omron and grants from Lloyds Pharmacy, outside the described work.
Hari Seldon was one of Isaac Asimov’s science fiction characters, famed for developing Psychohistory, an algorithmic way to predict society’s future through statistical ‘laws’ derived from ‘big data’. Using his algorithms, Seldon predicted the future of the Galactic Empire, provided that two conditions were met. Firstly, the population whose behaviour was modeled had to be sufficiently large; secondly, citizens should not be told the results of their psychohistorical analyses so as to prevent "Prediction Paradox" (predictions influencing behaviours that in turn invalidate predictions). So, Seldon has become a cautionary icon of Big Data research [1].
In the real world, ‘big data’ are widely used to predict the risks of adverse health events over the life courses of patients. The risk models are typically developed using data from dedicated cohort studies (e.g. Framingham [2]) or naturalistic cohorts derived from electronic health records (e.g. QRISK from QResearch [3–5]). Such models are used to support decisions about: the care of individual patients; the management and funding of healthcare systems; and the prevention of disease in populations.
Last month witnessed the publication of QRISK3, the third in a series of cardiovascular risk prediction algorithms [5]. The first QRISK model was published in 2007 and was followed by an updated model (QRISK2) in 2008 which included additional risk factors. Since then, QRISK2 has been updated annually and recalibrated to the latest version of the QResearch database. QRISK2 is used across England’s health service (NHS England) and recommended in the NHS Quality and Outcomes Framework, in guidance from the National Institute of Health and Care Excellence, and in the NHS Health Check.
The newly developed QRISK3 includes new risk factors such as an expanded definition of chronic kidney disease; migraine; corticosteroid use; systemic lupus erythematosus; atypical antipsychotic use; severe mental illness; erectile dysfunction; and a measure of blood pressure variability. It was also derived from larger dataset than its predecessor, describing 7.89 million patients across 1309 English general practices. While all new risk factors proved to be statistically significant contributors to risk prediction, no improvement in either model discrimination or explained variation was found.
Patients were included in the derivation dataset if they were registered with the practices between 1 January 1998 and 31 December 2015, free of cardiovascular disease, and not prescribed statins at baseline. Interestingly, these 18 years have witnessed dramatic improvements in primary prevention of cardiovascular disease due to increased awareness among clinicians in both primary and secondary care; introduction of legislation with smoking bans within enclosed public places and the workplace; financial incentivisation through the Quality and Outcomes Framework; introduction of the NHS Health Check; lower treatment thresholds; and widespread use of preventative treatments such as statins. Both the incidence of cardiovascular events and asociated mortality has dropped substantially in this period.[6] A major contribution to these improvements has come from the use of the QRISK2 model in primary care. As such, QRISK has created its own Prediction Paradox.
The study population included in the QRISK3 development will not have been naïve with respect to CVD interventions. Patients classified as high risk in this population will include predominantly individuals for whom risk was not adequately recognised in the past and so not treated. Except for the new risk factors (~2% of population) this is likely to be a small and diminishing group: QRISK2 has been routinely used from 2008. It may also include patients for whom risk was recognised and treated at some point in time, but they failed to respond to treatment (or were non-adherent). Conversely, individuals from this population will be classified as low risk if their risk was adequately recognised and treated in the past, and they responded to treatment. This includes the increasing group of patients in whom risk was identified with QRISK2 during 2008-2015 and were subsequently treated. So, when QRISK3 is implemented in the future similar patients might be excluded from the treatment that is required to bring about their low risk. Importantly, excluding patients taking statins at baseline does not overcome this issue, since it does not adjust for ‘treatment drop-in’ – where patients commence the statins during follow-up.
A dataset of 2.67 million patients from 328 separate practices, collected over the same time frame as the derivation set, was used to validate the QRISK3 model. This validation shows how the model would have performed if it had existed and had been used between 1 January 1998 and 31 December 2015. It provides no insights into the effects of the Prediction Paradox because it is caught in its own circular reasoning. Nor does it provide evidence that the model performs well in contemporary practice, a situation exacerbated by the index date definition meaning that the most common index date is 1 January 1998 – over 19 years ago. The effects of the Prediction Paradox, and performance of QRISK3 in today’s patients, could only be shown in a prospective validation.
We caution against a potential “Asimov scenario” whereby series of clinical predictive models inherit the effects of their predecessors in a singularity of uncertainty.
References
1 Boellstorff T. Making big data, in theory. First Monday 2013;18. doi:http://dx.doi.org/10.5210/fm.v18i10.4869
2 Wilson P, D’Agostino R, Levy D, et al. Prediction of coronary heart disease using risk factor categories. Circulation 1998.
3 Hippisley-Cox J, Coupland C, Vinogradova Y, et al. Derivation and validation of QRISK, a new cardiovascular disease risk score for the United Kingdom: prospective open cohort study. BMJ 2007;335:136. doi:10.1136/bmj.39261.471806.55
4 Hippisley-Cox J, Coupland C, Vinogradova Y, et al. Predicting cardiovascular risk in England and Wales: prospective derivation and validation of QRISK2. BMJ 2008;336.
5 Hippisley-Cox J, Coupland C, Brindle P. Development and validation of QRISK3 risk prediction algorithms to estimate future risk of cardiovascular disease: prospective cohort study. BMJ 2017;357.
6 Bhatnagar P, Wickramasinghe K, Williams J, et al. The epidemiology of cardiovascular disease in the UK 2014. Heart 2015;101:1182–9. doi:10.1136/heartjnl-2015-307516
Competing interests: No competing interests
We welcome the publication of the QRisk3 score which includes variables for both severe mental illness (SMI) and antipsychotics. We have published a range of work regarding SMI and cardiovascular disease, including research showing that existing cardiovascular risk scores may perform less well for people with SMI (1). This is important since people with SMI continue to suffer elevated rates of CVD compared to the general population.
Qrisk3 defines SMI similarly to our own work and similarly to NICE, to include schizophrenia, bipolar disorder and other psychoses. However we are interested in the high prevalence of SMI in the QRisk3 cohort which is 4.3% for men and 6.8% for women. This is far higher than the usual community rates of SMI, the SMI rates in other UK primary care database studies or rates that are quoted in NICE indicators (namely 0.5-2%) (1-3)
Conversely, the levels of second generation antipsychotic prescribing in the QRisk3 cohort (0.5%) are more in keeping with published SMI rates, although it is of note that around half UK antipsychotics are prescribed to people without SMI diagnoses (4).
It would be helpful if the authors clarified the diagnoses included within their SMI category, so that the correct conditions can be included when applying and assessing the new QRisk3 score.
Prof David Osborn
Dr Kate Walters
UCL
1. Osborn, D. P., Hardoon, S., Omar, R. Z., Holt, R. I., King, M., Larsen, J., . . . Petersen, I. (2015). Cardiovascular risk prediction models for people with severe mental illness: results from the prediction and management of cardiovascular risk in people with severe mental illnesses (PRIMROSE) research program. JAMA Psychiatry, 72 (2), 143-151. doi:10.1001/jamapsychiatry.2014.2133
2. Hardoon, S., Hayes, J. F., Blackburn, R., Petersen, I., Walters, K., Nazareth, I., & Osborn, D. P. J. (2013). Recording of Severe Mental Illness in United Kingdom Primary Care, 2000-2010. PLoS One.
3. National Institute for Clinical Excellence. Standards and Indicators. NM120. (2015)
https://www.nice.org.uk/standards-and-indicators/qofindicators/the-perce... (accessed May 24th 2017).
4. Marston, L., Nazareth, I., Petersen, I., Walters, K., & Osborn, D. P. (2014). Prescribing of antipsychotics in UK primary care: a cohort study. BMJ Open, 4 (12), e006135-?. doi:10.1136/bmjopen-2014-006135
Competing interests: DO and KW have received grant funding from the NIHR and MRC related to the assessment and management of cxardiovascular risk in people with SMI
Qrisk: are we overstating patients' risk of "heart attack or stroke" by 50%?
Dear Editor
J Hippisley-Cox and colleagues helpfully provide an appendix of the read codes forming the basis for the Qrisk calculation [1] – essentially comprising MI, CVA, angina and TIA.
In view of these codes, it is puzzling to see that Qrisk is widely described as giving a risk of "heart attack or stroke", disregarding the somewhat less consequential diagnoses of angina and TIA.
Examples of the incomplete description include the Qrisk website itself [2]; publications by NICE [3[, Public Health England [4] and the lay press [5]; and numerous UK GP surgery websites..
I don't have access to the Qrisk data, but a glance at UK incidence figures would imply a "best guess" that a Qrisk of 10% would comprise approximately:
- 5% incidence of MI
- 2% incidence of CVA
- 2% incidence of angina
- 1% incidence of TIA
If so, a Qrisk of 10% implies a risk of "heart attack or stroke" of around 7%. Patients therefore seem to be receiving an estimate of their "heart attack or stroke" risk which is relatively overstated by up to 50%.
This suggests a widespread need to improve communication of Qrisk results.
References
1. https://www.bmj.com/content/bmj/suppl/2017/05/23/bmj.j2099.DC1/hipj03651...
2. https://qrisk.org/three/
3. https://www.nice.org.uk/news/article/nice-recommends-wider-use-of-statin...
4. https://www.healthcheck.nhs.uk/seecmsfile/?id=1687
5. https://www.theguardian.com/society/2022/jan/17/nhs-pilots-genetic-testi...
Competing interests: No competing interests