BMJ 1995;311:1539-1541 (9 December)

Papers

Commentary: Prognostic models: clinically useful or quickly forgotten?

Jeremy C Wyatt, consultant,a Douglas G Altman, head b

a Medical Informatics, Biomedical Informatics Unit, Imperial Cancer Research Fund, PO Box 123, London WC2A 3PX, b ICRF Medical Statistics Group, Centre for Statistics in Medicine, Institute for Health Sciences, Oxford OX3 7LF

We are all familiar with using single items of patient data such as age or smoking history to help in making difficult clinical decisions.1 Prognostic models are more complex tools for helping decision making that combine two or more items of patient data to predict clinical outcomes.2 They are of potential value when doctors are making difficult clinical decisions (such as ordering invasive tests or selecting which patients should benefit from scarce resources),3 conducting comparative audit,4 or selecting uniform groups of patients for clinical trials.5 Another prognostic model appears in this week's BMJ6 to join the hundreds published every year.7 However, apart from exceptions such as the Glasgow coma scale8 and APACHE III,9 few of these models are routinely used to inform difficult clinical decisions.

It might be argued that doctors never prognosticate, working always in the present, but studies of medical decision making show this is untrue.10 11 Some doctors might claim that they can foretell patient outcomes better than any statistical model, but again there is contrary evidence.9 12 A journal editor's view might be that most published models reflect preliminary work and need further research before clinical adoption.13 Finally, some models predict events that are of no clinical relevance or do not generate predictions in time to inform clinical decisions, suggesting that their developers wished merely to publish journal articles, not build clinically useful tools.14

We believe that the main reasons why doctors reject published prognostic models are lack of clinical credibility and lack of evidence that a prognostic model can support decisions about patient care (that is, evidence of accuracy, generality, and effectiveness). We examine each of these issues in turn.

Clinical credibility of the model

However accurate a model is in statistical terms, doctors will be reluctant to use it to inform their patient management decisions unless they believe in the model and its predictions. Some prerequisites for clinical credibility include:

* All clinically relevant patient data should have been tested for inclusion in the model. For example, it would be foolish to omit smoking history from the variables screened when constructing a model to predict the probability of acute myocardial infarction

* It should be simple for doctors to obtain all the patient data required, reliably and without expending undue resources, in time to generate the prediction and guide decisions. Data should be obtainable with high reliability, particularly in those patients for which the model's prediction are most likely to be needed. This is not always the case15

* Model builders should try to avoid arbitrary thresholds for continuous variables. For example, it is unlikely that the prognosis for a woman with an ulcerated melanoma 3.9 mm thick would be very different if it was 4 mm thick. (Aitchison et al6 discuss this issue in their paper.6)

* The model's structure should be apparent and its predictions should make sense to the doctors who will rely on them, as only then will the law treat users of the model as "learned intermediaries" in a case of alleged negligence.16 This requirement means that the statistical modelling method must be correctly applied and not transgress the method's assumptions; these were checked in only a fifth of one series of published models.17 It also makes "black box" models such as artificial neural networks less suitable for clinical applications18

* It should be simple for doctors to calculate the model's prediction for a patient. Thus, a model which takes the form of a printed tree6 or clinical algorithm19 will probably be used more often than one requiring data entry to a computer to make complex calculations.20

Evidence of accuracy

A prognostic model is unlikely to be useful unless its predictions are at least as accurate as those of the doctors who would use it. A low error rate alone, however, is not enough, as there are two kinds of errors with different clinical consequences. Thus, the model should rarely fail to predict an event that will occur (have a low false negative rate) and also seldom mistakenly predict it when it will not occur (have a low false positive rate). These error rates should be checked on a large test set of cases in which occurrence or absence of the event has already been reliably determined21 and which has not been used to derive the model.22 "Large" means that there should be at least five22 to ten23 test cases in which the outcome being predicted (such as death) occurs per item of clinical data used to predict the outcome.

If the model predicts an outcome with a probability (for example, "40% mortality") the probability should also be accurate, meaning that 40% of the patients in whom such a prediction is issued do in fact die.24 Few models are shown to be both accurate and "well calibrated" in this way (such as Franklin et al,19 Murray et al,25 MRC Antiepileptic Drug Withdrawal Study Group26), but both kinds of accuracy are necessary if doctors are to rely on predictions when making difficult decisions.

Evidence of generality

Some doctors believe that no prognostic model derived from one population can be generalised to patients drawn from another,27 in the same way that some deny that clinical trials or overviews can inform individual decisions about treatment. We and others21 22 believe that model predictions cannot be safely applied to other patients unless

* There has been separate testing--in time and place--of the model on a new test set. In view of the established difficulty of transferring prognostic models,28 it is surprising that such follow up testing is seldom performed. Use of statistical techniques such as shrinkage of the regression coefficients29 to correct for overoptimism in the model (for example, MRC Antiepileptic Drug Withdrawal Study Group26) may help to make models more transferable

* Each item of data used as input to the model has been defined clearly using definitions that reflect widespread use.1 Even apparently obvious items like age need clarification: does this mean the patient's age at the start of symptoms, at diagnosis, or when the prognosis is required?

* The model has been derived and validated using a defined "inception cohort" of patients21 whose selection criteria describe the patients to which doctors may safely apply the model. This implies that both development and testing of models should take place prospectively according to a protocol,30 not retrospectively using existing databases with all their biases.31

Evidence of clinical effectiveness

No one would prescribe routine drugs on the basis of in vitro testing alone: clinical trials are vital to evaluate safety and efficacy. Even in the case of an accurate credible prognostic model, doctors should demand empirical evidence from well conducted clinical trials that the model is clinically effective. These trials should be controlled studies in which the effects of providing predictions on actual clinical practices or patient outcomes are measured. The studies must be carefully designed to eliminate biases such as the checklist effect, Hawthorne effect, and contamination.32 Although such studies require considerable resources, they are as necessary for prognostic models as are phase 3 randomised trials for drugs.33 34 There have been few trials of prognostic models (for example, Murray et al35 and Goldman et al36). There have, however, been at least 28 trials of computerised decision support systems, a similar technology, and an overview of these trials showed that many had a beneficial impact on clinical practice or patient outcome.2

Conclusion

s

Since only a handful of prognostic models have yet accrued adequate evidence of accuracy, generality, and effectiveness, doctors' poor uptake of these tools to aid difficult decisions may result from healthy scepticism of undocumented new technology. A recent development that may help to improve clinical acceptability is combining the prognosis generated by the model with the doctor's own estimate of prognosis, since this may add to the model's performance.37 Another is modification of the model in the light of clinical experience.38

Given the increasing complexity of the methods used to develop prognostic models7 and their potential power to influence clinical decisions,2 it is reassuring that the model described in this week's BMJ was developed and evaluated by statisticians working in close collaboration with doctors.6 This pattern of collaboration and adherence to the principles described in this paper seem likely to yield models that will be useful to doctors making difficult decisions.

JCW acknowledges the many useful insights resulting from collaboration with David Spiegelhalter of the MRC Biostatistics Unit, Cambridge.

Funding: None.

Conflict of interest: None.

  1. Wyatt JC. Clinical data systems. Part 1: data and medical records. Lancet 1994;344:1543-7. [Medline]
  2. Johnston ME, Langton KB, Haynes RB, Matthieu D. A critical appraisal of research on the effects of computer-based decision support systems on clinician performance and patient outcomes. Ann Intern Med 1994;120:135-42. [Abstract/Free Full Text]
  3. Holmes L, Loughead K, Treasure T, Gallivan S. Which patients will not benefit from further intensive care after cardiac surgery? Lancet 1994;344:1200-2.
  4. Rowan KM, Kerr JH, McPherson K, Short A, Vessey MP. Intensive Care Society's APACHE II study in Britain and Ireland--II: Outcome comparisons of intensive care units after adjustment for case mix by the American APACHE II method. BMJ 1993;307:977-81.
  5. Riethmuller G, Schneider-Gadicke W, Schlimok G, Schmiegel W, Raab R, Hoffken K, et al. Randomised trial of, monoclonal antibody for adjuvant therapy of resected Dukes' C colorectal carcinoma. Lancet 1994;343:1177-83. [Medline]
  6. Aitchison TC, Sirel JM, Watt DC, MacKie RM. Prognostic trees to aid prognosis in patients with cutaneous malignant melanoma. BMJ 1995;311:1536-9. [Abstract/Free Full Text]
  7. Concato J, Feinstein AR, Holford TR. The risk of determining risk with multivariable models. Ann Intern Med 1993;118:201-10. [Abstract/Free Full Text]
  8. Teasdale C, Jennett B. Assessment of coma and impaired consciousness: a practical scale. Lancet 1974;ii:81-4.
  9. Knaus WA, Wagner DP, Lynn J. Short term mortality predictions for critically ill hospitalised adults: science and ethics. Science 1991;254:389-94. [Abstract/Free Full Text]
  10. Elstein AS, Shulman LS, Sprafka SA. Medical problem solving. Cambridge, MA: Harvard University Press, 1978.
  11. Patel V, Evans D, Groen G. Biomedical knowledge and clinical reasoning. In: Evans D, Patel V, eds. Cognitive science in medicine. Cambridge, MA: MIT Press, 1989:53-112.
  12. Lee KL, Pryor DB, Harrell FE, Califf HM, Behar VS, Floyd PL, et al. Predicting outcome in coronary disease: statistical models vs. expert clinicians. Am J Med 1986;80:553-60. [Medline]
  13. Haynes RB. Loose connections between peer-reviewed clinical journals and clinical practice. Ann Intern Med 1990;113:724-8.
  14. Altman DG. The scandal of poor medical research. BMJ 1994;308:283-4. [Free Full Text]
  15. Rowley G, Fielding K. Reliability and accuracy of the Glasgow coma scale with experienced and inexperienced users. Lancet 1991;337:535-8. [Medline]
  16. Brahams D, Wyatt J. Decision-aids and the law. Lancet 1989;ii:632-4.
  17. Coste J, Fermanian J, Venot A. Methodological and statistical problems in the construction of composite measurement scales: a survey of six medical and epidemiological journals. Stat Med 1995;14:331-45. [Medline]
  18. Hart A, Wyatt J. Evaluating black boxes as medical decision-aids: issues arising from a study of neural networks. Med Inf (Lond) 1990;15:229-36.
  19. Franklin RCG, Spiegelhalter DJ, Macartney F, Bull K. Evaluation of a diagnostic algorithm for heart disease in neonates. BMJ 1991;302:935-9.
  20. Heathfield HA, Wyatt IC. Philosophies for the design and development of clinical decision-support systems. Methods Inf Med 1993;32:1-8. [Medline]
  21. Sackett D, Haynes R, Guyatt G, Tugwell P. Clinical epidemiology: a basic science for clinical medicine. 2nd ed. Boston, MA: Little Brown, 1991:173-85.
  22. Wasson JH, Sox HC, Neff RK, Goldman L. Clinical prediction rules: applications and methodological standards. N Engl J Med 1985;313:793-9. [Abstract]
  23. Harrell FE, Lee KL, Matchar DB, Reichert TA. Regression models for prognostic, prediction: advantages, problems and suggested solutions. Cancer Treatment Reports 1985;69:1071-7. [Medline]
  24. Wyatt J, Spiegelhalter D. Evaluating medical expert systems: what to test and how? Med Inf (Lond) 1990;15:205-17.
  25. Murray. GD, Murray LS, Barlow P, Teasdale GM, Jennett WB. Assessing the performance and clinical impact of a computerised prognostic system in severe head injury. Stat Med 1986;5:403-10. [Medline]
  26. Medical Research Council Antiepileptic Drug Withdrawal Study Group. Prognostic index for recurrence of seizures after remission of epilepsy. BMJ 1993;306:1374-8.
  27. Pilkington SN. APACHE scoring and prediction of survival in intensive care. BMJ 1995;310:1197. [Free Full Text]
  28. Centor AM, Yarbrough B, Wood JP. Inability to predict relapse in acute asthma. N Engl J Med 1984;310:577-80. [Medline]
  29. Phillips AN, Thompson SG, Pocock SJ. Prognostic scores for detecting a high risk group: estimating the sensitivity when applied to new data. Stat Med 1990;9:1189-98. [Medline]
  30. Arnbjornsson E. Scoring system for computer-aided diagnosis of acute appendicitis: the value of prospective versus retrospective studies. Ann Chir Gynaecol 1985;74:159-66. [Medline]
  31. Wyatt JC. Acquisition and use of clinical data for audit and research. Journal of Evaluation in Clinical Practice 1995;1:15-27. [Medline]
  32. Wyatt J, Spiegelhalter D. Field trials of medical decision-aids: potential problems and solutions. In: Clayton P, ed. Proceedings of the 15th symposium on computer applications in medical care, Washington 1991. New York: McGraw Hill, 1991:3-7.
  33. Spiegelhalter DJ. Evaluation of medical decision-aids, with an application to a system for dyspepsia. Stat Med 1983;2;207-16.
  34. Simon R, Altman DG. Statistical aspects of prognostic factor studies in oncology, Br J Cancer 1994;69:979-85.
  35. Murray LS, Teasciale GM, Murray GD, Jennett B, Miller JD, Pickard JD, et al. Does prediction of outcome alter patient management? Lancet 1993;341:1487-91.
  36. Goldman L, Cook EF, Brand DA, Lee TH, Rouan GW, Weisberg MC, et al. A computer protocol to predict myocardial infarction in emergency room patients. N Engl J Med 1988;318:797-803. [Abstract]
  37. Knaus WA, Harrell FE, Lynne J, Goldman L, Phillips RS, Connors AF, et al. The SUPPORT prognostic model: objective estimates of survival for seriously ill, hospitalised patients. Ann Intern Med 1995;122:191-203. [Abstract/Free Full Text]
  38. Van Houwelingen H, Thorogood J. Construction, validation and updating of a prognostic model for kidney graft survival. Stat Med 1995;14:1999-2008. [Medline]

Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to StumbleUpon StumbleUpon   Add to Technorati Technorati    What's this?

Relevant Articles

An independent external validation and evaluation of QRISK cardiovascular risk prediction: a prospective open cohort study
Gary S Collins and Douglas G Altman
BMJ 2009 339: b2584. [Abstract] [Full Text] [PDF]

Prognosis and prognostic research: application and impact of prognostic models in clinical practice
Karel G M Moons, Douglas G Altman, Yvonne Vergouwe, and Patrick Royston
BMJ 2009 338: b606. [Full Text]

Prognosis and prognostic research: validating a prognostic model
Douglas G Altman, Yvonne Vergouwe, Patrick Royston, and Karel G M Moons
BMJ 2009 338: b605. [Full Text]

Prognosis and prognostic research: what, why, and how?
Karel G M Moons, Patrick Royston, Yvonne Vergouwe, Diederick E Grobbee, and Douglas G Altman
BMJ 2009 338: b375. [Full Text]

How computers help make efficient use of consultations
Frank Sullivan and Jeremy C Wyatt
BMJ 2005 331: 1010-1012. [Extract] [Full Text] [PDF]

This article has been cited by other articles:

  • Collins, G. S, Altman, D. G (2009). An independent external validation and evaluation of QRISK cardiovascular risk prediction: a prospective open cohort study. BMJ 339: b2584-b2584 [Abstract] [Full text]  
  • Moons, K. G M, Altman, D. G, Vergouwe, Y., Royston, P. (2009). Prognosis and prognostic research: application and impact of prognostic models in clinical practice. BMJ 338: b606-b606 [Full text]  
  • Altman, D. G, Vergouwe, Y., Royston, P., Moons, K. G M (2009). Prognosis and prognostic research: validating a prognostic model. BMJ 338: b605-b605 [Full text]  
  • Moons, K. G M, Royston, P., Vergouwe, Y., Grobbee, D. E, Altman, D. G (2009). Prognosis and prognostic research: what, why, and how?. BMJ 338: b375-b375 [Full text]  
  • Rothwell, P M (2008). Prognostic models. PN 8: 242-253 [Abstract] [Full text]  
  • Konig, I. R., Ziegler, A., Bluhmki, E., Hacke, W., Bath, P. M.W., Sacco, R. L., Diener, H. C., Weimar, C., on behalf of the Virtual International Stroke Tria, (2008). Predicting Long-Term Outcome After Acute Ischemic Stroke: A Simple Index Works in Patients From Controlled Clinical Trials. Stroke 39: 1821-1826 [Abstract] [Full text]  
  • Hand, P. J., Wardlaw, J. M., Rivers, C. S., Armitage, P. A., Bastin, M. E., Lindley, R. I., Dennis, M. S. (2006). MR diffusion-weighted imaging and outcome prediction after ischemic stroke. Neurology 66: 1159-1163 [Abstract] [Full text]  
  • Williams, B A, Wright, R S, Murphy, J G, Brilakis, E S, Reeder, G S, Jaffe, A S (2006). A new simplified immediate prognostic risk score for patients with acute myocardial infarction. Emerg. Med. J. 23: 186-192 [Abstract] [Full text]  
  • Le Gal, G., Righini, M., Roy, P.-M., Sanchez, O., Aujesky, D., Bounameaux, H., Perrier, A. (2006). Prediction of Pulmonary Embolism in the Emergency Department: The Revised Geneva Score. ANN INTERN MED 144: 165-171 [Abstract] [Full text]  
  • Sullivan, F., Wyatt, J. C (2005). How computers help make efficient use of consultations. BMJ 331: 1010-1012 [Full text]  
  • Gimotty, P. A., Guerry, D., Ming, M. E., Elenitsas, R., Xu, X., Czerniecki, B., Spitz, F., Schuchter, L., Elder, D. (2004). Thin Primary Cutaneous Malignant Melanoma: A Prognostic Tree for 10-Year Metastasis Is More Accurate Than American Joint Committee on Cancer Staging. JCO 22: 3668-3676 [Abstract] [Full text]  
  • Schulz, U G R (2004). Predicting functional outcome in acute stroke--prognostic models and clinical judgement. J. Neurol. Neurosurg. Psychiatry 75: 351-352 [Full text]  
  • Counsell, C, Dennis, M, McDowall, M (2004). Predicting functional outcome in acute stroke: comparison of a simple six variable model with other predictive systems and informal clinical prediction. J. Neurol. Neurosurg. Psychiatry 75: 401-405 [Abstract] [Full text]  
  • Weir, N U, Counsell, C E, McDowall, M, Gunkel, A, Dennis, M S (2003). Reliability of the variables in a new set of models that predict outcome after stroke. J. Neurol. Neurosurg. Psychiatry 74: 447-451 [Abstract] [Full text]  
  • (2003). Performance of a Statistical Model to Predict Stroke Outcome in the Context of a Large, Simple, Randomized, Controlled Trial of Feeding. Stroke 34: 127-133 [Abstract] [Full text]  
  • Donner-Banzhoff, N, Beck, C, Meyer, F, Werner, J., Baum, E (2002). Clinical findings in patients presenting with sore throat A study on inter-observer reliability. Fam Pract 19: 466-468 [Abstract] [Full text]  
  • Counsell, C., Dennis, M., McDowall, M., Warlow, C. (2002). Predicting Outcome After Acute and Subacute Stroke: Development and Validation of New Prognostic Models. Stroke 33: 1041-1047 [Abstract] [Full text]  
  • Silaruks, S., Thinkhamrop, B., Tantikosum, W., Wongvipaporn, C., Tatsanavivat, P., Klungboonkrong, V. (2002). A prognostic model for predicting the disappearance of left atrial thrombi among candidates for percutaneous transvenous mitral commissurotomy. J Am Coll Cardiol 39: 886-891 [Abstract] [Full text]  
  • WADE, A. (2000). Derivation versus validation. Arch. Dis. Child. 83: 459-460 [Full text]  
  • Klinger, G., Chin, C.-N., Beyene, J., Perlman, M. (2000). Predicting the Outcome of Neonatal Bacterial Meningitis. Pediatrics 106: 477-482 [Abstract] [Full text]  
  • (2000). Guidelines on diagnosis and management of acute pulmonary embolism. Eur Heart J 21: 1301-1336  
  • Signorini, D. F, Andrews, P. J D, Jones, P. A, Wardlaw, J. M, Miller, J D. (1999). Predicting survival using simple clinical variables: a case study in traumatic brain injury. J. Neurol. Neurosurg. Psychiatry 66: 20-25 [Abstract] [Full text]  
  • Jefferson, M. F., Pendleton, N., Mohamed, S., Kirkman, E., Little, R. A., Lucas, S. B., Horan, M. A. (1998). Prediction of hemorrhagic blood loss with a genetic algorithm neural network. J. Appl. Physiol. 84: 357-361 [Abstract] [Full text]  
  • Baumer, J H, Wright, D, Mill, T (1997). Illness severity measured by CRIB score: a product of changes in perinatal care?. Arch. Dis. Child. Fetal Neonatal Ed. 77: 211F-215 [Abstract] [Full text]  
  • Fine, M. J., Auble, T. E., Yealy, D. M., Hanusa, B. H., Weissfeld, L. A., Singer, D. E., Coley, C. M., Marrie, T. J., Kapoor, W. N. (1997). A Prediction Rule to Identify Low-Risk Patients with Community-Acquired Pneumonia. NEJM 336: 243-250 [Abstract] [Full text]  



Access jobs at BMJ Careers
Whats new online at Student 

BMJ