Hari Seldon, QRISK3, and the Prediction Paradox
Hari Seldon was one of Isaac Asimov’s science fiction characters, famed for developing Psychohistory, an algorithmic way to predict society’s future through statistical ‘laws’ derived from ‘big data’. Using his algorithms, Seldon predicted the future of the Galactic Empire, provided that two conditions were met. Firstly, the population whose behaviour was modeled had to be sufficiently large; secondly, citizens should not be told the results of their psychohistorical analyses so as to prevent "Prediction Paradox" (predictions influencing behaviours that in turn invalidate predictions). So, Seldon has become a cautionary icon of Big Data research .
In the real world, ‘big data’ are widely used to predict the risks of adverse health events over the life courses of patients. The risk models are typically developed using data from dedicated cohort studies (e.g. Framingham ) or naturalistic cohorts derived from electronic health records (e.g. QRISK from QResearch [3–5]). Such models are used to support decisions about: the care of individual patients; the management and funding of healthcare systems; and the prevention of disease in populations.
Last month witnessed the publication of QRISK3, the third in a series of cardiovascular risk prediction algorithms . The first QRISK model was published in 2007 and was followed by an updated model (QRISK2) in 2008 which included additional risk factors. Since then, QRISK2 has been updated annually and recalibrated to the latest version of the QResearch database. QRISK2 is used across England’s health service (NHS England) and recommended in the NHS Quality and Outcomes Framework, in guidance from the National Institute of Health and Care Excellence, and in the NHS Health Check.
The newly developed QRISK3 includes new risk factors such as an expanded definition of chronic kidney disease; migraine; corticosteroid use; systemic lupus erythematosus; atypical antipsychotic use; severe mental illness; erectile dysfunction; and a measure of blood pressure variability. It was also derived from larger dataset than its predecessor, describing 7.89 million patients across 1309 English general practices. While all new risk factors proved to be statistically significant contributors to risk prediction, no improvement in either model discrimination or explained variation was found.
Patients were included in the derivation dataset if they were registered with the practices between 1 January 1998 and 31 December 2015, free of cardiovascular disease, and not prescribed statins at baseline. Interestingly, these 18 years have witnessed dramatic improvements in primary prevention of cardiovascular disease due to increased awareness among clinicians in both primary and secondary care; introduction of legislation with smoking bans within enclosed public places and the workplace; financial incentivisation through the Quality and Outcomes Framework; introduction of the NHS Health Check; lower treatment thresholds; and widespread use of preventative treatments such as statins. Both the incidence of cardiovascular events and asociated mortality has dropped substantially in this period. A major contribution to these improvements has come from the use of the QRISK2 model in primary care. As such, QRISK has created its own Prediction Paradox.
The study population included in the QRISK3 development will not have been naïve with respect to CVD interventions. Patients classified as high risk in this population will include predominantly individuals for whom risk was not adequately recognised in the past and so not treated. Except for the new risk factors (~2% of population) this is likely to be a small and diminishing group: QRISK2 has been routinely used from 2008. It may also include patients for whom risk was recognised and treated at some point in time, but they failed to respond to treatment (or were non-adherent). Conversely, individuals from this population will be classified as low risk if their risk was adequately recognised and treated in the past, and they responded to treatment. This includes the increasing group of patients in whom risk was identified with QRISK2 during 2008-2015 and were subsequently treated. So, when QRISK3 is implemented in the future similar patients might be excluded from the treatment that is required to bring about their low risk. Importantly, excluding patients taking statins at baseline does not overcome this issue, since it does not adjust for ‘treatment drop-in’ – where patients commence the statins during follow-up.
A dataset of 2.67 million patients from 328 separate practices, collected over the same time frame as the derivation set, was used to validate the QRISK3 model. This validation shows how the model would have performed if it had existed and had been used between 1 January 1998 and 31 December 2015. It provides no insights into the effects of the Prediction Paradox because it is caught in its own circular reasoning. Nor does it provide evidence that the model performs well in contemporary practice, a situation exacerbated by the index date definition meaning that the most common index date is 1 January 1998 – over 19 years ago. The effects of the Prediction Paradox, and performance of QRISK3 in today’s patients, could only be shown in a prospective validation.
We caution against a potential “Asimov scenario” whereby series of clinical predictive models inherit the effects of their predecessors in a singularity of uncertainty.
1 Boellstorff T. Making big data, in theory. First Monday 2013;18. doi:http://dx.doi.org/10.5210/fm.v18i10.4869
2 Wilson P, D’Agostino R, Levy D, et al. Prediction of coronary heart disease using risk factor categories. Circulation 1998.
3 Hippisley-Cox J, Coupland C, Vinogradova Y, et al. Derivation and validation of QRISK, a new cardiovascular disease risk score for the United Kingdom: prospective open cohort study. BMJ 2007;335:136. doi:10.1136/bmj.39261.471806.55
4 Hippisley-Cox J, Coupland C, Vinogradova Y, et al. Predicting cardiovascular risk in England and Wales: prospective derivation and validation of QRISK2. BMJ 2008;336.
5 Hippisley-Cox J, Coupland C, Brindle P. Development and validation of QRISK3 risk prediction algorithms to estimate future risk of cardiovascular disease: prospective cohort study. BMJ 2017;357.
6 Bhatnagar P, Wickramasinghe K, Williams J, et al. The epidemiology of cardiovascular disease in the UK 2014. Heart 2015;101:1182–9. doi:10.1136/heartjnl-2015-307516
Competing interests: No competing interests