Commentary: Prognostic models: clinically useful or quickly forgotten?
BMJ 1995; 311 doi: https://doi.org/10.1136/bmj.311.7019.1539 (Published 09 December 1995) Cite this as: BMJ 1995;311:1539- aMedical Informatics, Biomedical Informatics Unit, Imperial Cancer Research Fund, PO Box 123, London WC2A 3PX
- bICRF Medical Statistics Group, Centre for Statistics in Medicine, Institute for Health Sciences, Oxford OX3 7LF
We are all familiar with using single items of patient data such as age or smoking history to help in making difficult clinical decisions.1 Prognostic models are more complex tools for helping decision making that combine two or more items of patient data to predict clinical outcomes.2 They are of potential value when doctors are making difficult clinical decisions (such as ordering invasive tests or selecting which patients should benefit from scarce resources),3 conducting comparative audit,4 or selecting uniform groups of patients for clinical trials.5 Another prognostic model appears in this week's BMJ6 to join the hundreds published every year.7 However, apart from exceptions such as the Glasgow coma scale8 and APACHE III,9 few of these models are routinely used to inform difficult clinical decisions.
It might be argued that doctors never prognosticate, working always in the present, but studies of medical decision making show this is untrue.10 11 Some doctors might claim that they can foretell patient outcomes better than any statistical model, but again there is contrary evidence.9 12 A journal editor's view might be that most published models reflect preliminary work and need further research before clinical adoption.13 Finally, some models predict events that are of no clinical relevance or do not generate predictions in time to inform clinical decisions, suggesting that their developers wished merely to publish journal articles, not build clinically useful tools.14
We believe that the main reasons why doctors reject published prognostic models are lack of clinical credibility and lack of evidence that a prognostic model can support decisions about patient care (that is, evidence of accuracy, generality, and effectiveness). We examine each of these issues in turn.
Clinical credibility of the model
However accurate a model is in statistical terms, doctors will be reluctant to use it to inform their patient management decisions unless they believe in the model and its predictions. Some prerequisites for clinical credibility include:
All clinically relevant patient data should have been tested for inclusion in the model. For example, it would be foolish to omit smoking history from the variables screened when constructing a model to predict the probability of acute myocardial infarction
It should be simple for doctors to obtain all the patient data required, reliably and without expending undue resources, in time to generate the prediction and guide decisions. Data should be obtainable with high reliability, particularly in those patients for which the model's prediction are most likely to be needed. This is not always the case15
Model builders should try to avoid arbitrary thresholds for continuous variables. For example, it is unlikely that the prognosis for a woman with an ulcerated melanoma 3.9 mm thick would be very different if it was 4 mm thick. (Aitchison et al6 discuss this issue in their paper.6)
The model's structure should be apparent and its predictions should make sense to the doctors who will rely on them, as only then will the law treat users of the model as “learned intermediaries” in a case of alleged negligence.16 This requirement means that the statistical modelling method must be correctly applied and not transgress the method's assumptions; these were checked in only a fifth of one series of published models.17 It also makes “black box” models such as artificial neural networks less suitable for clinical applications18
It should be simple for doctors to calculate the model's prediction for a patient. Thus, a model which takes the form of a printed tree6 or clinical algorithm19 will probably be used more often than one requiring data entry to a computer to make complex calculations.20
Evidence of accuracy
A prognostic model is unlikely to be useful unless its predictions are at least as accurate as those of the doctors who would use it. A low error rate alone, however, is not enough, as there are two kinds of errors with different clinical consequences. Thus, the model should rarely fail to predict an event that will occur (have a low false negative rate) and also seldom mistakenly predict it when it will not occur (have a low false positive rate). These error rates should be checked on a large test set of cases in which occurrence or absence of the event has already been reliably determined21 and which has not been used to derive the model.22 “Large” means that there should be at least five22 to ten23 test cases in which the outcome being predicted (such as death) occurs per item of clinical data used to predict the outcome.
If the model predicts an outcome with a probability (for example, “40% mortality”) the probability should also be accurate, meaning that 40% of the patients in whom such a prediction is issued do in fact die.24 Few models are shown to be both accurate and “well calibrated” in this way (such as Franklin et al,19 Murray et al,25 MRC Antiepileptic Drug Withdrawal Study Group26), but both kinds of accuracy are necessary if doctors are to rely on predictions when making difficult decisions.
Evidence of generality
Some doctors believe that no prognostic model derived from one population can be generalised to patients drawn from another,27 in the same way that some deny that clinical trials or overviews can inform individual decisions about treatment. We and others21 22 believe that model predictions cannot be safely applied to other patients unless
There has been separate testing—in time and place—of the model on a new test set. In view of the established difficulty of transferring prognostic models,28 it is surprising that such follow up testing is seldom performed. Use of statistical techniques such as shrinkage of the regression coefficients29 to correct for overoptimism in the model (for example, MRC Antiepileptic Drug Withdrawal Study Group26) may help to make models more transferable
Each item of data used as input to the model has been defined clearly using definitions that reflect widespread use.1 Even apparently obvious items like age need clarification: does this mean the patient's age at the start of symptoms, at diagnosis, or when the prognosis is required?
The model has been derived and validated using a defined “inception cohort” of patients21 whose selection criteria describe the patients to which doctors may safely apply the model. This implies that both development and testing of models should take place prospectively according to a protocol,30 not retrospectively using existing databases with all their biases.31
Evidence of clinical effectiveness
No one would prescribe routine drugs on the basis of in vitro testing alone: clinical trials are vital to evaluate safety and efficacy. Even in the case of an accurate credible prognostic model, doctors should demand empirical evidence from well conducted clinical trials that the model is clinically effective. These trials should be controlled studies in which the effects of providing predictions on actual clinical practices or patient outcomes are measured. The studies must be carefully designed to eliminate biases such as the checklist effect, Hawthorne effect, and contamination.32 Although such studies require considerable resources, they are as necessary for prognostic models as are phase 3 randomised trials for drugs.33 34 There have been few trials of prognostic models (for example, Murray et al35 and Goldman et al36). There have, however, been at least 28 trials of computerised decision support systems, a similar technology, and an overview of these trials showed that many had a beneficial impact on clinical practice or patient outcome.2
Conclusions
Since only a handful of prognostic models have yet accrued adequate evidence of accuracy, generality, and effectiveness, doctors' poor uptake of these tools to aid difficult decisions may result from healthy scepticism of undocumented new technology. A recent development that may help to improve clinical acceptability is combining the prognosis generated by the model with the doctor's own estimate of prognosis, since this may add to the model's performance.37 Another is modification of the model in the light of clinical experience.38
Given the increasing complexity of the methods used to develop prognostic models7 and their potential power to influence clinical decisions,2 it is reassuring that the model described in this week's BMJ was developed and evaluated by statisticians working in close collaboration with doctors.6 This pattern of collaboration and adherence to the principles described in this paper seem likely to yield models that will be useful to doctors making difficult decisions.
JCW acknowledges the many useful insights resulting from collaboration with David Spiegelhalter of the MRC Biostatistics Unit, Cambridge.
Footnotes
-
Funding No additional funding.
-
Conflict of interest None.