Clinical prediction rules

BMJ 2012; 344 doi: (Published 16 January 2012)
Cite this as: BMJ 2012;344:d8312

Recent rapid responses

Rapid responses are electronic letters to the editor. They enable our users to debate issues raised in articles published on Although a selection of rapid responses will be included as edited readers' letters in the weekly print issue of the BMJ, their first appearance online means that they are published articles. If you need the url (web address) of an individual response, perhaps for citation purposes, simply click on the response headline and copy the url from the browser window.

Displaying 1-9 out of 9 published

Statistical modelling techniques should not be confused with formats for presentation

Adams and Leveson describe ‘five main methods to develop clinical prediction models: scoring systems derived from univariate analysis; multivariable regression analysis; nomograms; neural networks and CART analysis’.1 This list is a mix of statistical techniques to develop models and formats to present models for the user. We believe that such a list can be confusing rather than help doctors to become familiar with decision making tools. Model development and model presentation are two separate steps. First an adequate statistical technique needs to be chosen for model development. Given the developed model, an easy to use format can be chosen.2
Univariate and multivariable regression modelling are statistical techniques to assess strengths of predictive effects. Multivariable models are preferrable, since correlation between predictors is considered. Neural networks and CART analyses are alternatives for regression analysis. The large disadvantage of the latter two is that the derived rules may be good at describing the data used for development and but perform often poorly in new patients.3 Once the model is developed with the chosen statistical technique, the analyst may present the model as a scoring system, nomogram, or decision tree. The format should be based on the intended application. Decision trees can be produced by CART analysis, but have the disadvantage of a limited number of predicted risks, as mentioned by the authors.
Prediction models may be underused in clinical practice because of inappropriate model development (e.g. in small datasets with relatively many predictors and poor handling of missing data), lack of validation, and no analysis for impact. Prediction modelling is an area of expertise in itself. The upcoming guidelines for the reporting of risk prediction models under the wings of the EQUATOR network ( will be an important step forward in prediction research.

1 Adams ST, Leveson SH. Clinical prediction rules. BMJ 2012;344:d8312 (16 January).

2. Steyerberg EW. Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. New York, NY: Springer Publishing Company; 2009

3. Tu JV. Advantages and Disadvantages of Using Artificial Neural Networks versus Logistic Regression for Predicting Medical Outcomes. J Clin Epidemiol 1996; 49: 1225-31.

Competing interests: None declared

Yvonne Vergouwe, Clinical epidemiologist

Ben van Calster, Ewout W. Steyerberg

Department of Public Health, Erasmus MC, PO Box 2040, 3000 CA Rotterdam, Netherlands

Click to like:


We greatly appreciated Adams and Levenson1 research article related to clinical prediction rules.1 They describe perfectly the decision-making process that takes place in clinicians’ minds and translate it into statistical concepts. Through all the models described, one constant should be highlighted: from scoring systems to the most complex neural networks or decision trees, models relie on probability calculation. However, medical decision is essentially binary (treat or not, pursue diagnostic investigations or not) although scores, probabilities, biomarker levels are continuous data. In one hand, a tricky point is to adequately transform a continuous measure into a binary decision, choosing an adapted threshold. All consecutive discriminative capacities of every prediction tool will then depend on the cut-off chosen, so this choice is of critical importance. As the authors point out, sensitivity and specificity vary in opposite direction – as one rises, the other falls, so that the cut-off choice is a trade-off between the two, based on helpful tools such as ROC curves, for example. However, other methods exist to visualize this trade-off and help decision-making regarding the most appropriate cut-off to use: Pencina,2 Vickers,3 also developed tools providing model with discriminative ability that do not support a single cut-off, but to a continuum of thresholds.

On the other hand, when developing a prediction tool, the objective is not to replace the gold standard test for diagnosis or a perfect vision of future for prognosis; it aims at offering an intermediate strategy between doing nothing or performing the gold standard test for everyone or treating everyone as being the patient at the highest risk of disease. Depending on the weight of missing a patient (false negative) and performing the standard test to non-diseased patients (false positive), defined as the benefit/harm balance, the trade-off between sensitivity and specificity will vary, and clinicians are best able to guide the choice of the most appropriate trade-off. As clinicians have to become more familiar with prediction tools and the underlying statistical principles, epidemiologists and statisticians must also familiarize themselves with the medical decision-making and associated clinical constraints to work more effectively with physicians. Because of the ever-increasing time and cost constraints imposed on clinicians, prediction algorithms are promising tools that assist clinicians to make the most appropriate decisions in the face of such constraints. Nevertheless, prediction tools currently are little validated or evaluated for their impact, and are therefore seldom used. Prediction tools should not ne considered as rigid rules but as supporting tools that empower clinicians with complementary data which help clinical judgment and incorporate physicians’ preferences (in term of cut-off for example) in the decision-making process. A closer collaboration between statisticians, epidemiologists and clinicians would adequately bridge methodological concepts and prediction tools in order to make them both more useful and used in the daily clinical life.

1 Adams ST, Leveson SH. Clinical prediction rules. BMJ 2012;344:d8312.
2 Pencina MJ, D’Agostino RBS, DARB et al. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Statist Med 2008;27:157-172.
3 Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making 2006;26:565-574.

Competing interests: None declared

Sandrine Leroy, epidemiologist and pediatrician

Paul Landais, professor of biostatistics and nephrologist, Paris Descartes University, APHP, Department of Biostatistics, Necker hospital, EA4472, Paris, France

Pasteur Institute, Epidemiology Unit for Emerging Diseases, Paris, France

Click to like:

In their excellent Review Article on clinical prediction rules, Adams and Levenson (1) outline the concepts underlying development of quantitative clinical decision making and how databases and artificial intelligence can help a clinician. We would like to discuss two issues.

1. External validity of a clinical prediction rule should be a major concern. Testing of the rule in a separate population to see if it remains reliable may be disappointing (2). Subsequent updating of the model requires revalidation. Thus, clinicians need to be cautious when applying results to their practice. Patients should meet the inclusion criteria of the study which generated the predictive model (i.e the case mix) unless their estimation of the post test probability might be biased. Numerous factors external to the study may impact estimations . We propose studies elaborating clinical prediction rules apply to STARD statement (3). We would be interested to know authors' position.

2. Data quality is important for the development of the prediction rules (4). At an individual clinical level data quality is important for patients intended to be diagnosed or treated according to the prediction rule (there must be a sort of clinical GIGO rule). A thorough medical history and clinical examination performed by a scrupulous and experimented physician is probably what is required. High empathy could help. Something computers are not very good at.

1. Adams ST, Leveson SH. Clinical prediction rules. BMJ 2012;344:d8312
2. Toll DB, Janssen KJ, Vergouwe Y, Moons KG. Validation, updating and impact of clinical prediction rules: a review. J Clin Epidemiol. 2008;61:1085-94.
3. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, Moher D,Rennie D, de Vet HC, Lijmer JG; Standards for Reporting of Diagnostic Accuracy.The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Ann Intern Med. 2003;138:W1-12.
4. Simel DL, Rennie D, Bossuyt PM. The STARD statement for reporting diagnostic accuracy studies: application to the history and physical examination. J Gen ntern Med. 2008;23:768-74

Competing interests: None declared

Olivier Nardi, MD, MPH

Anne-Valérie Gainet, MD Clinique de Goussonville 78930 Goussonville, France

Raymond Poincaré Hospital, AP-HP, 104 Bd Raymond Poincaré 92380 Garches, France

Click to like:

Peter Bourdillon's second rapid response is gracious. And there's an important message here. Statistical information about tests is not provided in a form that's most helpful to clinicians.(1)

Suppose we're doing a test for a disease and we're given the test's sensitivity and specificity. If sensitivity or specificity are high a positive result indicates disease. If either is too low a positive result indicates not disease. But we can't tell what level is 'too low' without doing some arithmetic. We need to calculate the relationship between sensitivity and specificity. Likelihood ratio gives us that relationship.

If the positive likelihood ratio for a test is greater than one, then a positive result makes the disease more likely. If the test's positive likelihood ratio is less than one, then a positive result makes the disease less likely.

(1). Treasure W. Diagnosis and Risk Management in Primary Care. Radcliffe. London, 2011.

Competing interests: I've written a book dealing with this and published for profit - Diagnosis and Risk Management in Primary Care

Wilfrid Treasure, Salaried GP

Whalsay Health Centre, Symbister, ZE2 9AE

Click to like:

I thank Wilfred Treasure and Pelham Barton for drawing attention to the error in my earlier letter. The mistake resulted from a miscalculation of the post-test probability of a negative test.

Competing interests: None declared

Peter J Bourdillon, Honorary Senior lecturer

Imperial College, ECG Department, Hammersmith Hospital, Imperial College Healthcare NHS Trust, London W12 0HS

Click to like:

I do not understand the basis on which Peter Bourdillon makes the statement "a test with a sensitivity numerically larger than its corresponding specificity produces an unexpected result: the post-test probability of the outcome is greater with a “negative” test than with a “positive” test."

Using the definitions as I have always understood them, and as given in the original paper by Adams and Leveson, take an example with pre-test probability (A+C) 0.3, sensitivity [A/(A+C)] 0.9, specificity [D/(B+D] 0.8.

Then we have A = 0.27, C = 0.03, B+D = 0.7 so D = 0.56 and B = 0.14. The overall probability of a positive test result is A+B = 0.41, and of a negative test result C+D = 0.59.

Then the probability of the outcome following a positive test result is 0.27/0.41 (approximately 0.66); following a negative test result it is 0.03/0.59 (approximately 0.05), much lower than for a positive test result.

Competing interests: None declared

Pelham M Barton, Reader in Mathematical Modelling

University of Birmingham, Public Health Building, University of Birmingham, B15 2TT

Click to like:

The likelihood ratio(1) is more useful to the clinician than sensitivity and specificity.

Peter Bourdillon points out that if sensitivity is greater than specificity a positive result makes the diagnosis less likely. The positive likelihood ratio makes this clear: if it's greater than unity the diagnosis is more likely; if it's less than unity the diagnosis is less likely.

Huw Llewelyn notes that some results are best considered not as normal or abnormal but as a range. In this situation there can be a range of likelihood ratios corresponding to the range of results. For instance, in the tricky situation of estimating the risk of someone haveing prostate cancer, there is a different likelihood ratio for each prostate specific antigen level.

It's a pity that the literature doesn't routinely provide us with likelihood ratios. However, likelihood ratios can easily be calculated from sensitivity and specificity.

(1) Treasure w. Diagnosis and Risk Management in Primary Care: words that count, numbers that speak. Radcliffe, 2011.

Competing interests: My book 'Diagnosis and Risk Management in Primary Care: words that count, numbers that speak' is published by Radcliffe.

Wilfrid Treasure, GP

Whalsay Health Centre, Symbister, ZE2 9AE

Click to like:

Adams and Leveson (ref1) explain that the choice of sensitivity and specificity for a test is arbitrary and depends on several factors such as the severity of the outcome and the potential negative consequences of the test. They did not mention that a test with a sensitivity numerically larger than its corresponding specificity produces an unexpected result: the post-test probability of the outcome is greater with a “negative” test than with a “positive” test.

Was NICE aware of this when it cited 2 such examples in deriving the guidance “Do not use exercise ECG to diagnose or exclude angina for people without known coronary artery disease”(ref2)? NICE assumed a sensitivity of 67% and a specificity of 69% for the exercise ECG. NICE’s alternative strategy recommended using either calcium scoring, for which NICE assumed a sensitivity of 89% and a specificity of 43%, or myocardial perfusion scintigraphy, for which NICE assumed a sensitivity of 86% and a specificity of 64%(ref3).

1. Adams ST, Leveson SH. Clinical prediction rules. BMJ 2012;344:d8312
2. Cooper A, Timmis A, Skinner J. Assessment of recent onset chest pain or discomfort of suspected cardiac origin: summary of NICE guidance. BMJ 2010;340:c1118
3. National Institute for Health and Clinical Excellence. Chest pain of recent onset: assessment and diagnosis of recent onset chest pain or discomfort of suspected cardiac origin. Appendix F: Economic Models for Stable Chest Pain. Table 1 of Accessed 7 February 2012

Competing interests: None declared

Peter J Bourdillon, Honorary Senior Lecturer

Imperial College, ECG Department, Hammersmith Hospital, Imperial College Healthcare NHS Trust, London W12 0HS

Click to like:

Adams and Leveson provide a useful review of clinical prediction rules. However, clinical prediction rules and other tests should not only be assessed in terms of sensitivity and specificity. These indices were originally designed to assess the way that single tests will detect single diagnoses in well defined populations (e.g. using mammography to screen for breast cancer in women in a geographical area). In screening tests, the sensitivity is usually set to be high and the specificity allowed to be low so that a positive result only suggests a differential diagnosis to be investigated later by a clinician.

In the clinical setting, tests and clinical predictions rules often generate a differential diagnosis [1]. To be useful in the differential diagnostis of another finding, a test result must occur commonly in at least one differential diagnosis and rarely in at least one other. This is a ratio of sensitivities [1].

It can also be misleading to use cut-off points to designate a result as high, normal or low. Experienced doctors will use the actual result. For example, a haemoglobin of 10 or 6 or 2gr/dl are all ‘low’ but each level has its own differential diagnosis. The actual value of the MCV (and not ‘high’, ‘normal’ or ‘low’ MCV) can also be used to differentiate between some of these differential diagnoses.

Test results and clinical prediction rules can also be used as diagnostic and treatment selection criteria. For example, although an albumin excretion rate (AER) of 30mcg/min is abnormal, the number needed to treat with an angiotensin receptor blocker to stop one patient getting diabetic nephropathy within 2 years is about 100. However, if the AER is 60 mcg/min the NNT is about 12 [2]. In other words, even if the test result is above some cut-off point the actual value has to be interpreted also.

Diagnostic tests, and clinical prediction rules, are not assessed properly in the clinical setting at present, including by NICE. This is probably leading to huge waste from inappropriate use of bad tests and failure to use good tests as diagnostic leads, diagnostic differentiators and as diagnostic and treatment selection criteria. Failure to assess treatment selection criteria also calls into question the validity of the related randomised clinical trials.

Tests are only being assessed consistently in a proper way at present for use as screening tests. Perhaps we will have to wait until some of the current young readers of the Oxford Handbook of Clinical Diagnosis are in a position to change things!


1. Llewelyn H, Ang AH, Lewis K, Abdullah A. The Oxford Handbook of Clinical Diagnosis, 2nd edition. Oxford University Press, Oxford 2009

2. Llewelyn D E H, Garcia-Puig, J. How different urinary albumin excretion rates can predict progression to nephropathy and the effect of treatment in hypertensive diabetics. JRAAS 2004, 5; 141-5.

Competing interests: None declared

Huw Llewelyn, General Physician and Endocrinologist

Nevill Hall Hospital, Brecon Road, Abergavenny NP7 7EG

Click to like: