Intended for healthcare professionals

CCBY Open access

Rapid response to:

Research Methods & Reporting

TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods

BMJ 2024; 385 doi: https://doi.org/10.1136/bmj-2023-078378 (Published 16 April 2024) Cite this as: BMJ 2024;385:e078378

Linked Opinion

Making the black box more transparent: improving the reporting of artificial intelligence studies in healthcare

Linked Editorial

TRIPOD+AI: an updated reporting guideline for clinical prediction models

Rapid Response:

Prognostic models for decision support need to report their targeted treatments and the expected changes in treatment decisions

Dear Editor,

TRIPOD+AI ensures that researchers report sufficient information for readers to verify whether a published prediction model is fit for purpose, doing the field a tremendous service by improving standards and reducing research waste. But when it comes to prognostic models whose primary use is to support clinical decision making, we propose researchers report on two crucial considerations related to the role of treatments.

First, TRIPOD+AI requires specifying which clinical decisions a model is intended to support (item 3b). We refer to these as the targeted treatment(s). We propose that researchers should additionally report how the predictions depend on these targeted treatment(s). This is crucial since in the development and evaluation data typically some patients received the targeted treatment(s), which in turn affected their outcomes.
For example, QRISK is a prediction model for the 10 year risk of cardiovascular disease used in primary care in the UK for supporting decisions on starting the preventive use of statins [1,2]. In the data used for development of QRISK, many patients initiated statin therapy during follow up which lowered their risk of adverse outcomes. A patient may thus be graded as being ‘low-risk’ by the model because similar patients initiated statins at some point during follow-up. Not prescribing statins to these patients because they are classified as ‘low-risk’ would lead to more adverse outcomes [3, 4, 5].

TRIPOD+AI states that “any treatments received … ” and “how they were handled … (if relevant)” should be reported (item 6c), but it does not distinguish targeted treatments from other treatments. It is pivotal that this distinction be made. Researchers could otherwise misunderstand this item as simply a call to describe the target population. Making the role of targeted treatments explicit allows readers to verify whether the study design and analysis are such that it is valid to use the prediction model for clinical decision making as was intended, and thereby lower the risk of unintended errors such as those which may occur with QRISK.

Second, since the clinical impact of a prognostic model is determined by how it changes clinical decisions, authors should also describe for which types of patients treatment decisions are likely to change when using the model, and in what way.

As an example of how things can go wrong, Cooper and colleagues described an algorithm that was proposed to support the decision on which patients with pneumonia could be safely treated as outpatients and which should be admitted to hospital, by predicting their mortality risk [6]. Crucially, patients with asthma had historically lower mortality risk due to effective treatments received in hospital. The model trained on historical data suggested that patients with asthma could be safely treated as outpatients, leading to a potentially unsafe situation [7]. The model’s detrimental impact could have been spotted if the authors were required to report how the targeted treatment - hospitalization - was administered in the past, and how it would then be re-allocated (asthmatics would be less likely to be hospitalized) if the model were to be used for decision support.

We believe our proposed additions to the checklist, with an initial formulation of items found in a supplement (https://doi.org/10.5281/zenodo.11193841), would cement the role of TRIPOD+AI as a cornerstone for transparent model reporting. More broadly, incorporating causal thinking into the development and evaluation of prediction models could further mitigate the problems raised above by rigorously addressing the role of the targeted treatments both before and after model deployment [8 - 12].

References
[1] Hippisley-Cox, J., Coupland, C., Vinogradova, Y., Robson, J., May, M., & Brindle, P. (2007). Derivation and validation of QRISK, a new cardiovascular disease risk score for the United Kingdom: prospective open cohort study. Bmj, 335(7611), 136.

[2] Hippisley-Cox, J., Coupland, C. A., Bafadhel, M., Russell, R. E., Sheikh, A., Brindle, P., & Channon, K. M. (2024). Development and validation of a new algorithm for improved cardiovascular risk prediction. Nature Medicine, 1-8.

[3] Peek N, Sperrin M, Mamas M, van Staa T, Buchan I. Hari Seldon, QRISK3, and the prediction paradox. BMJ. 2017;357:2099.

[4] Sperrin M, Martin GP, Pate A, Van Staa T, Peek N, Buchan I. Using marginal structural models to adjust for treatment drop-in when developing clinical prediction models. Stat Med. 2018;37:4142–4154.

[5] Xu Z, Arnold M, Stevens D, et al. Prediction of Cardiovascular Disease Risk Accounting for Future Initiation of Statin Treatment. American Journal of Epidemiology. 2021;190:2000–14.

[6] Cooper, G. F., Aliferis, C. F., Ambrosino, R., Aronis, J., Buchanan, B. G., Caruana, R., ... & Spirtes, P. (1997). An evaluation of machine-learning methods for predicting pneumonia mortality. Artificial intelligence in medicine, 9(2), 107-138.

[7] Caruana, R., Lou, Y., Gehrke, J., Koch, P., Sturm, M., & Elhadad, N. (2015, August). Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1721-1730).

[8] van Amsterdam, W. A. C., van Geloven, N., Krijthe, J. H., Ranganath, R., & Ciná, G. (2024). When accurate prediction models yield harmful self-fulfilling prophecies (arXiv:2312.01210). arXiv. https://doi.org/10.48550/arXiv.2312.01210

[9] van Amsterdam, W. A. C., Jong, P. A. de, Verhoeff, J. J. C., Leiner, T., & Ranganath, R. (2024). From algorithms to action: Improving patient care requires causality. BMC Medical Informatics and Decision Making, 24(1). https://doi.org/10.1186/s12911-024-02513-3

[10] Dickerman, B. A., Dahabreh, I. J., Cantos, K. V., Logan, R. W., Lodi, S., Rentsch, C. T., Justice, A. C., & Hernán, M. A. (2022). Predicting counterfactual risks under hypothetical treatment strategies: An application to HIV. European Journal of Epidemiology, 37(4), 367–376. https://doi.org/10.1007/s10654-022-00855-8

[11] van Geloven, N., Swanson, S. A., Ramspek, C. L., Luijken, K., van Diepen, M., Morris, T. P., ... & le Cessie, S. (2020). Prediction meets causal inference: the role of treatment in clinical prediction models. European journal of epidemiology, 35, 619-630.

[12] van Geloven, N., Keogh, R. H., van Amsterdam, W., Cinà, G., Krijthe, J. H., Peek, N., ... & Didelez, V. (2024). Causal blind spots when using prediction models for treatment decisions. arXiv preprint arXiv:2402.17366.

Competing interests: No competing interests

15 May 2024
Wouter A.C. van Amsterdam
Assistant Professor
Giovanni Cinà, Vanessa Didelez, Ruth Keogh, Niels Peek, Matthew Sperrin, Andrew Vickers, Nan van Geloven, Uri Shalit
University Medical Center Utrecht
Heidelberglaan 100, Utrecht