Intended for healthcare professionals


Prediction models for diagnosis and prognosis in Covid-19

BMJ 2020; 369 doi: (Published 14 April 2020) Cite this as: BMJ 2020;369:m1464

Linked Research

Prediction models for diagnosis and prognosis of covid-19 infection

Read our latest coverage of the coronavirus pandemic

  1. Matthew Sperrin, senior lecturer in health data science1,
  2. Stuart W Grant, academic clinical lecturer in cardiothoracic surgery2,
  3. Niels Peek, professor of health informatics1
  1. 1Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9PL, UK
  2. 2Division of Cardiovascular Sciences, University of Manchester, Manchester, UK
  1. Correspondence to: matthew.sperrin{at}

All models are wrong but data sharing and better reporting could improve this

The covid-19 pandemic is a rapidly developing global emergency. Healthcare providers are facing critical time sensitive decisions regarding patients and their treatment; decisions that are made more difficult owing to a lack of robust evidence based decision support tools.

Decision support tools are commonly underpinned by clinical prediction models. These models use patient data to calculate a predicted probability of either existing disease (diagnostic model) or future outcome (prognostic model).12 Both elements are highly relevant in responding to the pandemic, and a linked article by Wynants and colleagues (doi:10.1136/bmj.m1328) reports a systematic review of clinical prediction models for diagnosis and prognosis of patients with covid-19.3

In just over three months from the start of the pandemic to the most recent search, the authors identified 27 studies describing 31 models. This number shows the potential of the academic community to respond quickly to this healthcare crisis. It also highlights the importance of publishing the systematic review as a living review—continually updated as evidence mounts.4

Unfortunately, the review demonstrates that the quality of the identified models is uniformly poor and none can be recommended for clinical use. Why is this the case? One might argue that the urgent situation means that methodological shortcuts and poor adherence to guidelines are justified to make decision support tools available as quickly as possible. However, models developed in such a way could well do more harm than good. If a model is used to facilitate decisions such as whether a patient should be offered mechanical ventilation then it should be robustly developed and as accurate as possible.

Developing a clinical prediction model is a science and an art. The objective, intended population, predictors, and outcome, must be clinically relevant and clearly described. A balance needs to be struck between the ability to apply the model widely in similar patient cohorts and optimising statistical performance in the development cohort. As identified in the review, developers often focus solely on the discriminatory ability of the model or C statistic to the detriment of other components that are essential for a useful model.

Even in ideal circumstances, so-called perfect clinical prediction models do not exist. George Box, the eminent British statistician said that “all models are wrong, but some are useful.”5 Wynants and colleagues conclude that all clinical prediction models for covid-19 to date are wrong and none are useful. How then do we develop models that are both needed and useful in a timely manner? It is certainly feasible that, with the right data analysis pipelines and expertise, this can be achieved while still maintaining high methodological standards and validity.

Research reporting guidelines such as TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis)6 could be extended to cover model development when limited data are available. This extension could include recommending when and how it is appropriate to use historical data from similar populations. An issue that frequently hampers the development of useful clinical prediction models is inadequate sample size,7 and Wynants and colleagues rightly call for individual patient data on patients with covid-19 to be urgently shared to deal with this.

Unfortunately, even in the face of a healthcare crisis, incentives for sharing data are not well established despite various initiatives8 and available platforms.9 Why would a research group share data when working towards a high impact original publication? Some responsibility lies with journals that publish poor quality predictive models and more must be done to ensure that reporting checklists such as TRIPOD are routinely applied. However, the research community as a whole needs to acknowledge that failure to develop good quality models based on large data collaborations is the path of least resistance.

The preponderance of poor quality clinical prediction models is not unique to covid-19, but the current situation brings the issue into acute focus. Academic leaders should ensure that there are incentives for data sharing and infrastructure to facilitate high quality model development and, while some initiatives are under way, more needs to be done. Establishing frameworks for the development of high quality clinical prediction models will benefit patients in all areas of healthcare.

As no covid-19 clinical prediction models can currently be recommended, clinicians will have to rely on their clinical acumen and shared experiences of best practice for now. We recommend regularly consulting this living systematic review to identify when useful clinical prediction models do become available.3


This article is made freely available for use in accordance with BMJ's website terms and conditions for the duration of the covid-19 pandemic or until otherwise determined by BMJ. You may use, download and print the article for any lawful, non-commercial purpose (including text and data mining) provided that all copyright notices and trade marks are retained.