Guide to presenting clinical prediction models for use in clinical settingsBMJ 2019; 365 doi: https://doi.org/10.1136/bmj.l737 (Published 17 April 2019) Cite this as: BMJ 2019;365:l737
- Laura J Bonnett, tenure track fellow1,
- Kym I E Snell, research fellow in biostatistics2,
- Gary S Collins, professor of medical statistics3,
- Richard D Riley, professor of biostatistics2
- 1Department of Biostatistics, University of Liverpool, Liverpool L69 3GL, UK
- 2Centre for Prognosis Research, Research Institute for Primary Care and Health Sciences, Keele University, Keele, UK
- 3Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK
- Correspondence to: L J Bonnett
- Accepted 8 February 2019
Presentation format for clinical prediction models deemed suitable for use is important but receives relatively limited attention in the literature
Clear presentation of a prediction model is fundamental to ensure other researchers can independently validate the model, and that healthcare professionals and others can implement it within healthcare
Presentation of the full model equation is essential. There are many ways to present prediction models for end users, which range from points score systems and nomograms, to websites and mobile apps
The best presentation is user and environment specific, and it is preferably determined through engagement of stakeholders, including patients
If presentation requires a simplified version of the full model to be generated, then the predictive performance of this simplified model should also be validated and compared with that of the full model
Clinical prediction models estimate the risk of existing disease (diagnostic prediction model) or future outcome (prognostic prediction model) for an individual, which is conditional on the values of multiple predictors (prognostic or risk factors) such as age, sex, and biomarkers.1 A large number of prediction models are published in the medical literature each year,2 and most are developed using a regression framework such as logistic and Cox regression (box 1). Prediction models are also known as risk scores, prognostic indices, or prognostic scores. Examples include the Framingham risk score, which predicts 10 year risk of coronary heart disease,3 and the APACHE (acute physiological assessment and chronic health evaluation) scores for mortality after admission to an intensive care unit.45
The transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) statement provides guidance on key information for authors to report when developing or validating prediction models.67 Although the TRIPOD statement highlights the importance of the presentation of models, relatively little information or practical guidance exists on how to actually present a prediction model for use after development—for example, to aid implementation in clinical settings, if appropriate. A key resource is chapter 18 in the book by Steyerberg.8
When choosing the format to present a prediction model, researchers should carefully consider the intended users, settings, and timing of use. It is helpful to ask: “who will be accessing the model in this format, and when and in what setting will they use it?” The presentation can then be formatted accordingly. Fundamentally, the full model equation should always be presented in the journal article67; this is essential to enable independent external validation. However, additional presentation formats might be required, perhaps outside of the journal article, to enable healthcare professionals to use the model in a particular clinical setting (eg, when access to computers or mobile devices is limited). Similarly, the format might need tailoring for lay people when using the model at home (eg, patients with asthma deciding on appropriate management of their condition using Asthma UK’s asthma attack risk checker9) to improve shared decision making.10 User groups can help guide the best presentation choices in each situation, including healthcare professionals, patients, and the public. Patient and public involvement groups and focus groups are useful arenas for this; they highlight the need for participant and public information and engagement within health research.
Alongside the full model equation, a range of presentation formats might be required that differ according to several factors: the medium by which they are presented (paper versus electronic), the setting in which the models are to be applied (eg, clinic, bedside, or at home), the level of detail required in the predictions (eg, approximate or rounded risk estimates, or exact risk estimates), and user friendliness (simple to complex formats).8 In this paper, we summarise four key ways of presenting clinical prediction models that could aid their use in clinical practice, if appropriate. We outline how to create each format and describe their advantages and disadvantages in relation to the clinical context of use, and the intended user. Table 1 provides an overview of the different presentation formats.
Our article is not about how to develop or validate a prediction model,811 or how to decide if it is fit for clinical use112; rather, we assume a model has been developed and has been deemed potentially useful for clinical practice, and so the researcher needs to consider how to present the model to aid implementation. For this purpose we use the model shown in box 2 to illustrate how to predict mortality risk over time in patients with a diagnosis of primary biliary cirrhosis. This survival model is used throughout the article, but the presentation formats described also apply to other risk prediction models developed using regression, such as those derived using logistic regression. Many of these models are relevant for prediction of a continuous outcome (eg, using linear regression).
Presentation using a points score system
Within points score systems, points are assigned based on the predictor values for an individual. The total points score is then mapped to a corresponding risk of event or survival probability.21 The intended users for points score systems are healthcare professionals and patients. These systems can be presented on a screen (eg, monitor, tablet) as part of a consultation, printed off as a take home sheet for patients, or used on the wards as a reference guide.
How to derive a points score system
To produce a points score system, a prediction model is first developed (eg, using logistic or Cox regression; see boxes 1 and 2), and then the regression coefficients of included predictors are assigned integer scores, which can be negative or positive. Unfortunately, any continuous predictors need to be categorised, and so some predictive accuracy is sacrificed. Categories do not need to be of equal size, and by having unequal categories non-linearity can be more appropriately handled. The steps to develop a points score system can be summarised:
1. Organise the continuous predictors into categories and determine the midpoint for each category.
2. Choose a reference category for each predictor (continuous, binary, and categorical).
3. For continuous variables, determine how far each category is from the reference category, and then multiply each difference by the regression coefficient for that predictor to determine the difference in “regression units.” For binary and categorical variables, the “regression unit” is simply the regression coefficient for that predictor.
4. Define the number of regression units that will correspond to 1 point in the points scoring system; this definition is usually based on clinician preference.
5. Determine the points (rounded to the nearest integer) associated with each of the categories of the predictors.
6. Determine the minimum and maximum possible points totals.
7. Calculate the risk estimate for each points total across the range by using the original model with the points scores (rounded) to get the predicted probabilities. This estimate is essentially a new risk prediction model that approximates the full model.
Note that sometimes scores are derived based on hazard ratios or odds ratios rather than the corresponding regression coefficients. This approach is mathematically inappropriate as Cox (or logistic) regression models assume additivity of the log hazard (or log odds) ratios.22
Alongside the point score system, it is important to also present the accompanying table of probabilities (absolute risk predictions) to allow the points score to be translated to a predicted risk. Decisions such as low, intermediate, or high risk only based on a points total are uninformative unless it is clear how these decisions are defined on the predicted absolute risk scale.
A full worked example is provided in the appendix. Also, see Sullivan and colleagues for further details, including the mathematical formula associating the risk of outcome with each possible total points score for logistic and survival regression models.21
Advantages and disadvantages
Points score systems are easy to understand after an initial explanation or demonstration, and so instructions on their use should be given alongside these systems. Depending on the complexity of the model, and the number of included predictors, paper based point score systems can enable risks to be estimated quicker than directly inputting patient values into an online calculator or published model. However, the predictions of risk (or survival) are only approximations of the actual predicted risk from the full model; this is because information on continuous predictor values is discarded by categorisation, and regression units are rounded. Researchers must always check that the predictive performance of the simplified model based on a points score system is similar to (and has the same potential clinical impact as) the original full model.
Tables 3a and b illustrate a points score system for the primary biliary cirrhosis example. In table 3a, scores are assigned to categories of each predictor, which have been scaled according to a 15 year increase in age (that is, the regression coefficient for age multiplied by 15); table 3b presents probabilities of the outcome that correspond to the points total. For example, an individual aged 55 (0 points), with cirrhosis (3 points), albumin of 34.4 g/dL (0 points), and central cholestasis (5 points) has a points total of 8; this corresponds to a probability of death of 0.46 at one year and 0.90 at three years. These figures are similar to the equivalent estimates of 0.44 and 0.89 for the risk of death at one and three years, respectively, directly obtained from the full model equation for this individual. These data suggest that the simplification led to only small changes in risk predictions from the points score system for this individual.
Presentation using graphical score chart
Graphical score charts are highly simplified, colour coded versions of points score systems. Similar to the points score system, a graphical score chart is a presentation format for a prediction model, when the intended users are healthcare professionals and patients. The chart can be used either on screen or as a print out. An example of using this approach is the SCORE (systematic coronary risk evaluation) model for predicting cardiovascular disease.23
How to derive a graphical score chart
The probability of the outcome must be calculated for each relevant combination of predictors as described in box 1, and is based on the average value of the category or the group of individuals in that category in the development data. Probabilities can then be tabulated and colour coded based on clinically important categories of risk. For example, those with higher risk predictions (near to 1) could be coded as red, and those with low risk predictions (near to 0) could be coded as yellow.
Advantages and disadvantages
Graphical score charts are easy to understand and the colour coding can increase ease of use compared with points score systems.14 Additionally, decision guidelines can be coupled to the predictions; for example, dark red implies referral to intensive care. This coupling enables the rapid stratification of patients. Choosing decision thresholds needs careful thought and evaluation; in particular, arbitrary cut-off values should be avoided.24
This presentation usually requires some simplification of the model because it can only accommodate a limited number of predictors and requires continuous predictors to be presented as categories. Loss of information about predicted risks also occurs because the results are typically presented as ranges of predicted risks rather than specific values. Each time point of interest requires its own graphical score chart, which is a further disadvantage. Similar to the points score system, the simplified model based on a score chart should be checked for its predictive performance (at each time point of interest) compared with the full model.
Figure 1 shows a graphical score chart for the primary biliary cirrhosis example. The chart was created using the point score system shown in tables 3a and b. Risks of death of widths 0.1 might be considered clinically meaningful for example, up until 0.3, and were thus chosen as the four colour categories. According to this chart, risk of death at one year is 0.46 for a patient who is aged 55, has cirrhosis, central cholestasis, albumin of 34.4 g/dL, and been treated with azathioprine. This predicted risk is similar to the value of 0.44 estimated from the full model.
Presentation using a nomogram
Nomograms are another graphical presentation format for a clinical prediction model (fig 2). Similar to the points score system, points are assigned based on the predictor values for a particular individual, which are then equated to a risk of event or survival probability.25 The intended users of a nomogram are healthcare professionals. Nomograms are best used as reference guides, potentially on wards or during consultations. Similar to graphical score charts, nomograms can be colour coded to aid interpretation.26
How to derive a nomogram
The steps to build a nomogram are:
1. For each predictor, calculate the maximum change in the developed model’s linear predictor by multiplying the predictor’s regression coefficient by the difference between the maximum and minimum value of the predictor in the dataset. Order the predictors by their calculated maximum change.
2. Assign up to 100 points for each predictor. First assign 100 points to the predictor with the largest maximum change as identified from step 1. Call this predictor A. Then provide a points score for the other predictors equal to 100×(maximum change for predictor/maximum change for predictor A).
3. Calculate the minimum and maximum possible total points based on all possible combinations of predictors; project the points onto the probability scale by fitting a prediction model as in box 1 with total points as the only predictor.
Advantages and disadvantages
The main advantages of nomograms over the other presentation formats is that continuous predictors do not need to be categorised, and multiple time points can be included in a single nomogram by incorporating multiple probability scales based on the possible total points. Additionally, the relative importance of predictors can be judged by the length of the lines within the nomogram. Furthermore, interaction and non-linear terms can be well handled.8 Complex models, for example those with time dependent predictors, can also be presented in this way.29 Nomograms can easily be applied away from a computer, especially when a model includes only a small number of predictors.
Nomograms can, however, appear relatively complex at first sight and they require an explanation as to how they should be used (highlighted in the TRIPOD statement6). Additionally, they can be inaccurate depending on the size and resolution of the published figure; the larger the number of predictors included in the model, the more challenging the nomogram is to interpret. Rounding of coefficients might also be required.
Figure 2 shows the nomogram associated with the primary biliary cirrhosis example from box 2. To determine the survival probability at a specified time point, the user identifies the points score associated with each predictor value by reading up from that predictor value to the points scale at the top. Once a score has been assigned to each predictor value, a total points score is calculated. Translation from total points to the probability of the outcome is then made by reading down to the associated probability of the outcome from the total points scale. Therefore, using figure 2, it can be seen that an individual aged 55 (24 points), with cirrhosis (42 points), albumin of 34.4 g/dL (65 points), and central cholestasis (62 points), and who has been treated with azathioprine (0 points) has a total points score of 193. This equates to a one year and three year probability of death of 0.40 and 0.85, respectively, which are again similar to the estimates of 0.44 and 0.89 obtained directly from the full regression model. If the individual was not treated (32 points), but all other characteristics were unchanged, the total points score would be 171, which equates to a one year and three year probability of death of 0.25 and 0.68, respectively.
Presentation within websites and mobile apps
Increasingly, prediction models are being made available through a website calculator or within an app for a tablet or smartphone device. These calculators and apps are generally interactive graphical user interfaces that provide individualised risk estimates from an underlying prediction model, which are conditional on the user’s inputted predictor values. Often access is free, but sometimes a fee is charged.
How to develop a website
Websites are developed using a building platform or content management system; they also require a domain name and web host. A variety of website building platforms exist, including specific tools that enable statistical software packages to run web apps—for example, Shiny for R and SWire for Stata.3031 Websites and apps are available to healthcare professionals and the public, and they can be used by interested individuals from anywhere in the world. These websites and apps can also be designed for use in specific circumstances, such as requiring log in details from registered users, or through a National Health Service server to ensure that the information is delivered to the patient through a healthcare professional.
Websites need to be explicit about the target user and target population, and any website or app should clearly state how to use the model. References to manuscripts describing the model development and subsequent validation (and potentially clinical impact evaluation) should also be provided. The website or app calculator should be checked to ensure that the predicted probability agrees with the predictions from the underlying regression model. For models with continuous predictors, entering values outside the range (of the development dataset) should also be restricted to avoid extrapolation, or at least provide a warning to the user.
Advantages and disadvantages
A major advantage is that the full model equation can be embedded behind the scenes, and thus no approximation is required, and any complexity is “hidden” from the end user. Websites and apps can provide a user friendly interface in front of complex statistical models, which include large numbers of predictors, non-linear terms, and interactions. Additionally, much of the data input can be automated; for example, in general practice the age and sex of patients are probably already recorded in the medical centre’s computer system. Prediction models can be implemented within electronic health records to provide real time feedback to clinicians, although missing data and implausible values might be problematic. Digital applications easily enable switching between units for laboratory results and anthropometrics, such as height recorded in either metres or feet.
Because anyone can create a web calculator, there is currently no assurance that the underlying model is appropriate for use, has been developed adequately, or is validated for the relevant populations accessing the website.32 Additionally, it is often difficult to know how the model has been translated into the graphical tool. Furthermore, the target user (eg, healthcare professionals or patients) might not be clear, and public access websites could lead to overuse or access by people for whom the model is not intended.
Clearly, access to the internet is required. Data privacy and storage can be a concern, particularly if a website is designed to collect and present data; this concern should be clearly signposted on the website or app. Furthermore, accompanying graphical presentations such as colour coded stick men, smiley faces, or similar, to show the proportion of people predicted to have the outcome depend on the target user and need to be carefully considered.3334 Beside graphical presentations, many other metrics can be used in risk communication derived from prediction models. An example is the heart age metric recommended by the European Society of Cardiology.3536 Finally, the model might be updated over time to reflect changes in the underlying population characteristics. The web address could also change. Sometimes these changes might not have been tracked. Version control is therefore vital and the reasons for any model update recorded. These changes should be clearly signposted on the website.
An example of a website is Your Heart Forecast tool from New Zealand.37 This tool provides a graphic design that compares a patient’s predicted cardiovascular disease risk with that of the healthy population of the same age. Because risk can be hard for patients to understand, the model provides a graphical depiction of heart age and a future projection that depends on whether the individual does or does not modify their predictors. Other examples include GRACE (for acute coronary events),38 ASCVS Plus (for atherosclerotic cardiovascular disease),39 and Predict (for breast cancer).10 Two examples of websites for primary biliary cirrhosis, but with different predictors than those in this article’s running example, are UK-PBC40 and GLOBE.41
The format of presentation is an important consideration when a clinical prediction model is deemed suitable for use in clinical practice. In addition to providing the full equation (which is essential), there are many ways to present models to aid clinical use, ranging from points score systems and nomograms, to websites and mobile apps. If a model is to be presented in a reduced format (eg, predictors based on categorised values even though they were originally continuous in the full model), this reduced model should undergo the same validation process as the full model before it can be deemed suitable for clinical use.
The best format is user and environment specific, with bedside tools for healthcare professionals requiring different options than patients at home on a computer or tablet. For this reason, means of presentation are best determined through stakeholder engagement, including healthcare professionals and patients. Empirical evidence is now required to determine whether certain formats promote better uptake, use, or understanding. In due course, a similar guide will be required for models developed using advanced or alternative modelling techniques, such as landmarking and machine learning.
The authors thank The BMJ manuscript committee and peer reviewers for their assistance with the development of this manuscript.
Contributors: GSC had the idea for the article. His research interests are focused on methodological aspects of prediction model development and validation, and he led an international collaboration to produce the TRIPOD consensus guidance. The article was written by LJB, with feedback and revisions from all the authors. LJB has extensive experience of developing and validating prediction models for people with chronic conditions, and her models have informed the Driver and Vehicle Licensing Agency’s guidelines for time off driving for people with epilepsy. LJB is the guarantor for this work. RDR leads the prognosis research strategy that seeks to improve the standards of prognosis research. KIES works in the field of prognosis research and leads a training course on statistical methods for risk prediction and prognostic models. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.
Funding: This article is independent research arising from a postdoctoral fellowship (LJB: PDF-2015-08-044) supported by the National Institute for Health Research (www.nihr.ac.uk/). GSC was supported by the NIHR Biomedical Research Centre, Oxford. RDR and KIES are supported by funding from the Evidence Synthesis Working Group, which is funded by the National Institute for Health Research School for Primary Care Research (NIHR SPCR; project No 390). KIES is also supported by an NIHR SPCR launching fellowship. The views expressed in this publication are those of the authors and not necessarily those of the NHS, the National Institute for Health Research, or the Department of Health.
Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf and declare: LJB had financial support from the National Institute for Health Research for the submitted work; all authors declare no other financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.