Prognosis and prognostic research: Developing a prognostic model

BMJ 2009; 338 doi: 10.1136/bmj.b604 (Published 31 March 2009)
Cite this as: BMJ 2009;338:b604

Access to the full text of this article requires a subscription or payment. Please log in or subscribe below.

  1. Patrick Royston, professor of statistics1,
  2. Karel G M Moons, professor of clinical epidemiology2,
  3. Douglas G Altman, professor of statistics in medicine3,
  4. Yvonne Vergouwe, assistant professor of clinical epidemiology2
  1. 1MRC Clinical Trials Unit, London NW1 2DA
  2. 2Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht, Netherlands
  3. 3Centre for Statistics in Medicine, University of Oxford, Oxford OX2 6UD
  1. Correspondence to: P Royston pr{at}ctu.mrc.ac.uk
  • Accepted 6 October 2008

In the second article in their series, Patrick Royston and colleagues describe different approaches to building clinical prognostic models

The first article in this series reviewed why prognosis is important and how it is practised in different medical settings.1 We also highlighted the difference between multivariable models used in aetiological research and those used in prognostic research and outlined the design characteristics for studies developing a prognostic model. In this article we focus on developing a multivariable prognostic model. We illustrate the statistical issues using a logistic regression model to predict the risk of a specific event. The principles largely apply to all multivariable regression methods, including models for continuous outcomes and for time to event outcomes.

Summary points

Models with multiple variables can be developed to give accurate and discriminating predictions

In clinical practice simpler models are more practicable

There is no consensus on the ideal method for developing a model

Methods to develop simple, interpretable models are described and compared

The goal is to construct an accurate and discriminating prediction model from multiple variables. Models may be a complicated function of the predictors, as in weather forecasting, but in clinical applications considerations of practicality and face validity usually suggest a simple, interpretable model (as in box 1).

Box 1 Example of a prognostic model

Risk score from a logistic regression model to predict the risk of postoperative nausea or vomiting (PONV) within the first 24 hours after surgery2:

Risk score= −2.28+(1.27×female sex)+(0.65×history of PONV or motion sickness)+(0.72×non-smoking)+(0.78×postoperative opioid use)

where all variables are coded 0 for no or 1 for yes.

The value −2.28 is called the intercept and the other numbers are the estimated regression coefficients for the predictors, which indicate their mutually adjusted relative contribution to the outcome risk. The regression coefficients are log(odds ratios) for a change of 1 unit in …

Access to the full text of this article requires a subscription or payment

Article access

Article access for 1 day

Purchase this article for £20 $30 €32*

The PDF version can be downloaded as your personal record

* Prices do not include VAT

THIS WEEK'S POLL