Intended for healthcare professionals

CCBY Open access

Clinical prediction models for mortality in patients with covid-19: external validation and individual participant data meta-analysis

BMJ 2022; 378 doi: (Published 12 July 2022) Cite this as: BMJ 2022;378:e069881
  1. Valentijn M T de Jong, assistant professor1 2 3,
  2. Rebecca Z Rousset, masters student1,
  3. Neftalí Eduardo Antonio-Villa, research assistant45,
  4. Arnoldus G Buenen, emergency physician6 7,
  5. Ben Van Calster, associate professor8 9 10,
  6. Omar Yaxmehen Bello-Chavolla, assistant professor4,
  7. Nigel J Brunskill, professor11 12,
  8. Vasa Curcin, reader13,
  9. Johanna A A Damen, assistant professor1 2,
  10. Carlos A Fermín-Martínez, doctoral candidate4 5,
  11. Luisa Fernández-Chirino, professor and research assistant4 14,
  12. Davide Ferrari, doctoral student13 15,
  13. Robert C Free, research fellow16 17,
  14. Rishi K Gupta, senior mentor18,
  15. Pranabashis Haldar, clinical senior lecturer16 17 19,
  16. Pontus Hedberg, doctoral candidate20 21,
  17. Steven Kwasi Korang, medical doctor22,
  18. Steef Kurstjens, medical resident23,
  19. Ron Kusters, professor23 24,
  20. Rupert W Major, honorary associate professor11 25,
  21. Lauren Maxwell, senior researcher26,
  22. Rajeshwari Nair, research faculty27 28,
  23. Pontus Naucler, associate professor20 21,
  24. Tri-Long Nguyen, assistant professor1 29 30,
  25. Mahdad Noursadeghi, infectious diseases consultant31,
  26. Rossana Rosa, infectious diseases consultant32,
  27. Felipe Soares, doctoral candidate33,
  28. Toshihiko Takada, associate professor1 34,
  29. Florien S van Royen, doctoral candidate1,
  30. Maarten van Smeden, associate professor1,
  31. Laure Wynants, assistant professor7 35,
  32. Martin Modrák, postdoctoral researcher36,
  33. the CovidRetro collaboration,
  34. Folkert W Asselbergs, professor37 38 39,
  35. Marijke Linschoten, medical doctor and doctoral candidate37,
  36. CAPACITY-COVID consortium,
  37. Karel G M Moons, professor1 2,
  38. Thomas P A Debray, assistant professor1 2
  1. 1Julius Center for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
  2. 2Cochrane Netherlands, University Medical Centre Utrecht, Utrecht University, Netherlands
  3. 3Data Analytics and Methods Task Force, European Medicines Agency, Amsterdam, Netherlands
  4. 4Dirección de Investigación, Instituto Nacional de Geriatría, Mexico City, Mexico
  5. 5MD/PhD (PECEM) Program, Faculty of Medicine, National Autonomous University of Mexico, Mexico City, Mexico
  6. 6Maxima MC, Veldhoven, the Netherlands
  7. 7Bernhoven, Uden, Netherlands
  8. 8Department of Development and Regeneration, KU Leuven, Leuven, Belgium
  9. 9Department of Biomedical Data Sciences, Leiden University Medical Centre, Leiden, Netherlands
  10. 10EPI-centre, KU Leuven, Leuven, Belgium
  11. 11Department of Cardiovascular Sciences, College of Life Sciences, University of Leicester, Leicester, UK
  12. 12John Walls Renal Unit, University Hospitals of Leicester NHS Trust, Leicester, UK
  13. 13School of Population Health and Environmental Sciences, King’s College London, London, UK
  14. 14Faculty of Chemistry, Universidad Nacional Autónoma de México, México City, Mexico
  15. 15Centre for Clinical Infection and Diagnostics Research, School of Immunology and Microbial Sciences, King’s College London, London, UK
  16. 16Department of Respiratory Sciences, College of Life Sciences, University of Leicester, Leicester, UK
  17. 17NIHR Leicester Biomedical Research Centre, University of Leicester, Leicester, UK
  18. 18Institute for Global Health, University College London, London, UK
  19. 19Department of Respiratory Medicine, University Hospitals of Leicester NHS Trust, Leicester, UK
  20. 20Department of Infectious Diseases, Karolinska University Hospital, Stockholm, Sweden
  21. 21Division of Infectious Diseases, Department of Medicine Solna, Karolinska Institute, Stockholm, Sweden
  22. 22Copenhagen Trial Unit, Centre for Clinical Intervention Research, Department 7812, Rigshospitalet, Copenhagen University Hospital, Denmark
  23. 23Laboratory of Clinical Chemistry and Haematology, Jeroen Bosch Hospital, Den Bosch, Netherlands
  24. 24Department of Health Technology and Services Research, Technical Medical Centre, University of Twente, Enschede, Netherlands
  25. 25Department of Cardiovascular Sciences, College of Life Sciences, University of Leicester, Leicester, UK
  26. 26Heidelberger Institut für Global Health, Universitätsklinikum Heidelberg, Germany
  27. 27University of Iowa Carver College of Medicine, Iowa City, IA, USA
  28. 28Centre for Access and Delivery Research Evaluation Iowa City Veterans Affairs Health Care System, Iowa City, IA, USA
  29. 29Section of Epidemiology, Department of Public Health, University of Copenhagen, Copenhagen, Denmark
  30. 30Department of Pharmacy, University Hospital Centre of Nîmes, Nîmes, France
  31. 31Division of Infection and Immunity, University College London, London, UK
  32. 32Infectious Diseases Service, UnityPoint Health-Des Moines, Des Moines, IA, USA
  33. 33Industrial Engineering Department, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
  34. 34Department of General Medicine, Shirakawa Satellite for Teaching And Research (STAR), Fukushima Medical University, Fukushima, Japan
  35. 35Department of Epidemiology, CAPHRI Care and Public Health Research Institute, Maastricht University, Maastricht, Netherlands
  36. 36Institute of Microbiology of the Czech Academy of Sciences, Prague, Czech Republic
  37. 37Department of Cardiology, Division of Heart and Lungs, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
  38. 38Health Data Research UK and Institute of Health Informatics, University College London, London, UK
  39. 39Institute of Cardiovascular Science, Faculty of Population Health Sciences, University College London, London, UK
  1. Correspondence to: V M T de Jong V.M.T.deJong-2{at}
  • Accepted 25 May 2022


Objective To externally validate various prognostic models and scoring rules for predicting short term mortality in patients admitted to hospital for covid-19.

Design Two stage individual participant data meta-analysis.

Setting Secondary and tertiary care.

Participants 46 914 patients across 18 countries, admitted to a hospital with polymerase chain reaction confirmed covid-19 from November 2019 to April 2021.

Data sources Multiple (clustered) cohorts in Brazil, Belgium, China, Czech Republic, Egypt, France, Iran, Israel, Italy, Mexico, Netherlands, Portugal, Russia, Saudi Arabia, Spain, Sweden, United Kingdom, and United States previously identified by a living systematic review of covid-19 prediction models published in The BMJ, and through PROSPERO, reference checking, and expert knowledge.

Model selection and eligibility criteria Prognostic models identified by the living systematic review and through contacting experts. A priori models were excluded that had a high risk of bias in the participant domain of PROBAST (prediction model study risk of bias assessment tool) or for which the applicability was deemed poor.

Methods Eight prognostic models with diverse predictors were identified and validated. A two stage individual participant data meta-analysis was performed of the estimated model concordance (C) statistic, calibration slope, calibration-in-the-large, and observed to expected ratio (O:E) across the included clusters.

Main outcome measures 30 day mortality or in-hospital mortality.

Results Datasets included 27 clusters from 18 different countries and contained data on 46 914patients. The pooled estimates ranged from 0.67 to 0.80 (C statistic), 0.22 to 1.22 (calibration slope), and 0.18 to 2.59 (O:E ratio) and were prone to substantial between study heterogeneity. The 4C Mortality Score by Knight et al (pooled C statistic 0.80, 95% confidence interval 0.75 to 0.84, 95% prediction interval 0.72 to 0.86) and clinical model by Wang et al (0.77, 0.73 to 0.80, 0.63 to 0.87) had the highest discriminative ability. On average, 29% fewer deaths were observed than predicted by the 4C Mortality Score (pooled O:E 0.71, 95% confidence interval 0.45 to 1.11, 95% prediction interval 0.21 to 2.39), 35% fewer than predicted by the Wang clinical model (0.65, 0.52 to 0.82, 0.23 to 1.89), and 4% fewer than predicted by Xie et al’s model (0.96, 0.59 to 1.55, 0.21 to 4.28).

Conclusion The prognostic value of the included models varied greatly between the data sources. Although the Knight 4C Mortality Score and Wang clinical model appeared most promising, recalibration (intercept and slope updates) is needed before implementation in routine care.


Covid-19 has had a major impact on global health and continues to disrupt healthcare systems and social life. Millions of deaths have been reported worldwide since the start of the pandemic in 2019.1 Although vaccines are now widely deployed, the incidence of SARS-CoV-2 infection and the burden of covid-19 remain extremely high. Many countries do not have adequate resources to effectively implement vaccination strategies. Also, the timing and sequence of vaccination schedules are still debatable, and virus mutations could yet hamper the future effectiveness of vaccines.2

Covid-19 is a clinically heterogeneous disease of varying severity and prognosis.3 Risk stratification tools have been developed to target prevention and management or treatment strategies, or both, for people at highest risk of a poor outcome.4 Risk stratification can be improved by the estimation of the absolute risk of unfavourable outcomes in individual patients. This involves the implementation of prediction models that combine information from multiple variables (predictors). Predicting the risk of mortality with covid-19 could help to identify those patients who require the most urgent help or those who would benefit most from treatment. This would facilitate the efficient use of limited medical resources, and reduce the impact on the healthcare system—especially intensive care units. Furthermore, if a patient’s risk of a poor outcome is known at hospital admission, predicting the risk of mortality could help with planning the use of scarce resources. In a living systematic review (update 3, 12 January 2021;, 39 prognostic models for predicting short term (mostly in-hospital) mortality in patients with a diagnosis of covid-19 have been identified.5

Despite many ongoing efforts to develop covid-19 related prediction models, evidence on their performance when validated in external cohorts or countries is largely unknown. Prediction models often perform worse than anticipated and are prone to poor calibration when applied to new individuals.678 Clinical implementation of poorly performing models leads to incorrect predictions and could lead to unnecessary interventions, or to the withholding of important interventions. Both result in potential harm to patients and inappropriate use of medical resources. Therefore, prediction models should always be externally validated before clinical implementation.9 These validation studies are performed to quantify the performance of a prediction model across different settings and populations and can thus be used to identify the potential usefulness and effectiveness of these models for medical decision making.78101112 We performed a large scale international individual participant data meta-analysis to externally validate the most promising prognostic models for predicting short term mortality in patients admitted to hospital with covid-19.


Review to identify covid-19 related prediction models

We used the second update (21 July 2020) of an existing living systematic review of prediction models for covid-19 to identify multivariable prognostic models and scoring rules for assessing short term (at 30 days or in-hospital) mortality in patients admitted to hospital with covid-19.5 During the third update of the living review (12 January 2021),13 additional models were found that also met the study eligibility criteria of this individual participant data meta-analysis, which we also included for external validation.

We considered prediction models to be eligible for the current meta-analysis if they were developed using data from patients who were admitted to a hospital with laboratory confirmed SARS-CoV-2 infection. In papers that reported multiple prognostic models, we considered each model for eligibility. As all the prognostic models for covid-19 mortality in the second update (21 July 2020) of the living systematic review had a lower quality and high risk of bias in at least one domain of PROBAST (prediction model study risk of bias assessment tool),78 we only excluded models that had a high risk of bias for the participant domain and models for which applicability was deemed poor, as well as imaging based algorithms (see fig 1).

Fig 1
Fig 1

Flowchart of inclusion of prognostic models. The second update took place on 21 July 2020. ICU=intensive care unit

Review to identify patient level data for model validation

We searched for individual studies and registries containing data from routine clinical care (electronic healthcare records), and data sharing platforms with individual patient data of those admitted to hospital with covid-19. We further identified eligible data sources through the second update (21 July 2020) of the living systematic review.513 In addition, we consulted the PROSPERO database, references of published prediction models for covid-19, and experts in prognosis research and infectious diseases.

Data sources were eligible for model validation if they contained data on mortality endpoints for consecutive patients admitted to hospital with covid-19. We included only patients with a polymerase chain reaction confirmed SARS-CoV-2 infection. We excluded patients with no laboratory data recorded in the first 24 hours of admission. In each data source, we adopted the same eligibility criteria for all models that we selected for validation. We used 30 days for the scoring rule by Bello-Chavolla et al14 when available, otherwise in-hospital mortality was used (see table 1).

Statistical analyses

For external validation and meta-analysis we used a two stage process.1516 The first stage consisted of imputing missing data and estimating performance metrics in individual clusters. For datasets that included only one hospital (or cohort) we defined the cluster level as the individual hospital (or cohort). In the CAPACITY-COVID dataset,17 which contains data from multiple countries, we considered each country as a cluster. For the data from UnityPoint Hospitals in Iowa, United States, we considered each hospital as a cluster. We use the term cluster throughout the paper. In the second stage we performed a meta-analysis of the performance metrics.1819 We did not perform an a priori sample size calculation, as we included all data that we found through the review and that met the inclusion criteria.

Stage 1: Validation

We imputed sporadically missing data 50 times by applying multiple imputation (see supplementary material B). Using each of the eight models, we calculated the mortality risk or mortality score of all participants, in clusters where the respective models’ predictors were measured in at least some of the participants. Subsequently, we calculated the concordance (C) statistic, observed to expected ratio (O:E ratio), calibration slope, and calibration-in-the-large for each model in each imputed cluster.11 The C statistic is an estimator for the probability of correctly identifying the patient with the outcome in a pair of randomly selected patients of which one has developed the outcome and one has not.20 The O:E ratio is the ratio of the number of observed outcomes divided by the number of outcomes expected by the prediction model. The calibration slope is an estimator of the correction factor the prediction model coefficients need to be multiplied with, to obtain coefficients that are well calibrated to the validation sample.1121 The calibration-in-the-large is an estimator for the (additive) correction to the prediction model’s intercept, while keeping the prediction model’s coefficients fixed.1121 Supplementary material B provides details of the model equations.

Stage 2: Pooling performance

In the second stage of the meta-analysis, we pooled the cluster specific logit C statistic, calibration slope, and log O:E ratios from stage 1.22 We used restricted maximum likelihood estimation and the Hartung-Knapp-Sidik-Jonkman method to derive all confidence intervals.2324 To quantify the presence of between study heterogeneity, we constructed approximate 95% prediction intervals, which indicated probable ranges of performance expected in new clusters.25 We performed the analysis in R (version 4.0.0 or later, using packages mice, pROC, and metamisc) and we repeated the main analyses in STATA.2627282930 This study is reported following the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) checklist for prediction model validation (see supplementary material C).3132

Sensitivity analysis

None of the datasets contained all predictors, meaning the models could not all be validated in a single dataset, which hampered the interpretation. As such, for each performance measure taken separately, we performed a meta-regression on all performance estimates where we included country (not cluster, to save degrees of freedom) and model as predictors (both as dummy variables), which we had not prespecified in our protocol. Then we used these meta-models to predict the performance (and 95% confidence intervals) of each prediction model in each included country, thereby allowing for a fairer comparison of the performance between models. All R code is available from

Patient and public involvement

Patients and members of the public were not directly involved in this research owing to lack of funding, staff, and infrastructure to facilitate their involvement. Several authors were directly involved in the treatment of patients with covid-19, have been in contact with hospital patients with covid-19, or have had covid-19.


Review of covid-19 related prediction models

We identified six prognostic models and two scoring rules that met the inclusion criteria (fig 1). Table 1 summarises the details of the models and scores. The score developed by Bello-Chavolla et al predicted 30 day mortality,14 whereas the other score and the six models predicted in-hospital mortality.3334353637

Table 1

Overview of selected models for predicting short term mortality in patients admitted to hospital with SARS-CoV-2 infection

View this table:

The six prognostic models were estimated by logistic regression. The Bello-Chavolla score and Knight et al 4C Mortality Score were (simplified) scoring rules that could be used to stratify patients into risk groups. The Bello-Chavolla score was developed with Cox regression, whereas the 4C Mortality Score was developed with lasso logistic regression and its weights were rescaled and rounded to integer values.

Although the 4C Mortality Score itself does not provide absolute risks, these were available through an online calculator. As the authors promoted the use of the online calculator, we have used these risks in our analysis. For two models by Wang et al (clinical and laboratory), no intercept was available and were approximated.38

Review of patient level data for model validation

We identified 10 data sources, including four through living systematic reviews, one through a data sharing platform, and five by experts in the specialty (fig 2). The obtained datasets included 27 clusters from 18 different countries and contained data on 46 701 patients, 16 418 of whom died (table 2). Study recruitment was between November 2019 and April 2021. Most clusters included all patients with polymerase chain reaction confirmed covid-19, although in some clusters only patients admitted through a specific department were included (see supplementary material A, table S1). Mean age ranged from 45 to 71 years.

Fig 2
Fig 2

Flowchart of data sources

Table 2

Characteristics of the included external validation cohorts and clusters

View this table:

External validation and meta-analysis

All results are presented in supplementary material D. The Wang clinical model could be validated in 24 clusters (see supplementary table S2), followed by Hu et al’s model, which was validated in 16 clusters (see supplementary table S3). The remaining models were less often validated, as predictor measurements were available in fewer datasets: the Bello-Chavolla score was validated in seven clusters (see supplementary table S4), the model by Xie et al in nine clusters (see supplementary table S5), the DCS model by Zhang et al in six clusters (see supplementary table S6), the DCSL model by Zhang et al in six clusters (see supplementary table S7), the 4C Mortality Score in six clusters (see supplementary table S8), and the Wang laboratory model in three clusters (see supplementary table S9).


The 4C Mortality Score showed the highest discrimination, with a pooled C statistic of 0.80 (95% confidence interval 0.75 to 0.84, fig 3 and see supplementary fig S4). The heterogeneity of discrimination of this model across datasets (95% prediction interval 0.72 to 0.86) was low compared with that of the other models. The next best discriminating model was the Wang clinical model, with a pooled C statistic of 0.77 (0.73 to 0.80, fig 3), and a greater heterogeneity (95% prediction interval 0.63 to 0.87). Two other models attained a summary C statistic >0.70: the Xie model with a C statistic of 0.75 (0.68 to 0.80, 95% prediction interval 0.58 to 0.86, fig 3), and the Hu model with a C statistic of 0.74 (0.66 to 0.80, 95% prediction interval 0.41 to 0.92, fig 3). The summary C statistic estimates for the remaining models were <0.70 (see supplementary fig S1).

Fig 3
Fig 3

Pooled C statistic estimates with corresponding 95% confidence interval and approximate 95% prediction intervals for four models (see supplementary file for full data). The Knight et al 4C Mortality Score had a C statistic of 0.786 (95% confidence interval 0.78 to 0.79) in the development data and 0.767 (0.76 to 0.77) in the validation data in the original publication. The Wang et al clinical model had a C statistic of 0.88 (0.80 to 0.95) in the development data and 0.83 (0.68 to 0.93) in the validation data in the original publication. The Xie et al model had a C statistic of 0.89 (0.86 to 0.93) in the development data, 0.88 after optimism correction, and 0.98 (0.96 to 1.00) in the validation data in the original publication. The Hu et al model had a C statistic of 0.90 in the development data and 0.88 in the validation data in the original publication. UCLH=University College London; DGAE=General Directorate of Epidemiology; KCH=King’s College Hospital

Calibration: observed to expected

The O:E ratio of the Xie model was the closest to 1, with a meta-analysis summary estimate of 0.96 (95% confidence interval 0.59 to 1.55, 95% prediction interval 0.21 to 4.28, fig 4), indicating on average the number of predicted deaths was in agreement with the number of observed deaths. However, the relatively wide 95% confidence interval and 95% prediction interval indicate some heterogeneity. The 4C Mortality Score attained an O:E ratio of 0.71 (0.45 to 1.11, 95% prediction interval 0.21 to 2.39, fig 4). The Hu model attained an O:E ratio of 0.61 (0.42 to 0.87, 95% prediction interval 0.15 to 2.48, fig 4). The Wang clinical model attained an O:E ratio of 0.65 (0.52 to 0.82, 95% prediction interval 0.23 to 1.89, fig 4). Supplementary figure S2 shows the O:E ratios of the other models and supplementary table S10 the calibration-in-the-large values.

Fig 4
Fig 4

Pooled observed to expected ratio estimates with corresponding 95% confidence interval and approximate 95% prediction interval for four models. Estimates are presented on the log scale. See supplementary file for full data. UCLH=University College London; DGAE=General Directorate of Epidemiology; KCH=King’s College Hospital

Calibration: slope

Supplementary material D figures S3 and S4 show the forest plots for all calibration slopes. The estimate for the calibration slope was the closest to 1 for the 4C Mortality Score (1.22, 95% confidence interval 0.92 to 1.52, 95% prediction interval 0.63 to 1.80). The Wang clinical model had a calibration slope of 0.50 (0.44 to 0.56, 95% prediction interval 0.34 to 0.66). The calibration slope for the Xie model was 0.45 (0.27 to 0.63, 95% prediction interval −0.07 to 0.96) and for the Hu model was 0.32 (0.15 to 0.49, 95% prediction interval −0.34 to 0.98). Supplementary material D presents details of the remaining models that were estimated.

Sensitivity analyses—meta-regression

In the meta-regression, where all performance estimates were regressed on the country and the prediction model, the point estimate of the discrimination was the highest for the 4C Mortality Score (reference) and lowest for the Wang laboratory model. Country wise, the point estimate was highest in Israel and lowest in Mexico (see supplementary material D for point estimates and supplementary table S11 for predicted C statistics for each country).

The results for the predicted O:E ratio were less straightforward, as the predicted values for each model except the Wang laboratory model (see supplementary table S12) were greater than 1 in some countries and smaller than 1 in other countries. Similarly, this was the case for the predicted values of the calibration slopes for all models (see supplementary table S13). This implied that none of the included models were well calibrated to the data from all included countries. Supplementary table S14 shows the predicted calibration-in-the-large estimates.


In our individual participant data meta-analysis we found that previously identified prediction models varied in their ability to discriminate between those patients admitted to hospital with covid-19 who will die and those who will survive. The 4C Mortality Score, the Wang clinical model, and the Xie model achieved the highest discrimination on average in our study and could therefore serve as starting points for implementation in clinical practice. The 4C Mortality Score could only be validated in six clusters, which might indicate limited usefulness in clinical practice. Whereas the discrimination of both Wang models and the Xie model was lower than in their respective development studies, the discrimination of the 4C Mortality Score was similar to the estimates in its development study.

Although the summary estimates of discrimination performance are rather precise owing to the large number of included patients, some are prone to substantial between cluster heterogeneity. Discrimination varied greatly across hospitals and countries for all models, but least for the 4C Mortality Score. For some models the 95% prediction interval of the C statistic included 0.5, which implies that in some countries these models might not be able to discriminate between patients with covid-19 who survive or die during hospital admission.

All models were prone to calibration issues. Most models tended to over-predict mortality on average, meaning that the actual death count was lower than predicted. The Xie model achieved O:E ratios closest to 1, but this model’s predicted risks were often too extreme: too high for high risk patients and too low for low risk patients, as quantified by the calibration slope, which was less than 1. The calibration slope was closest to 1 for the 4C Mortality Score, and this was the only model for which the 95% confidence interval included 1. All other summary calibration slopes were less than 1. This could be due to overfitting in the model development process. All the models were prone to substantial between cluster heterogeneity. This implies that local revisions (such as country specific or even centre specific intercepts) are likely necessary to ensure that risk predictions are sufficiently accurate.

Implementing existing covid-19 models in routine care is challenging because the evolution and management of SARS-CoV-2 and the consequences of changes to the virus over time and across geographical areas. In addition, the studied models were developed and validated using data collected during periods of the pandemic, and general practice might have subsequently changed. As a result, baseline risk estimates of existing prediction models (eg, the intercept term) might have less generalisability than anticipated and might require regular updating, as shown in this meta-analysis. As predictor effects might also change over time or geographical region, a subsequent step might be to update these as well.40 Since most data originate from electronic health record databases, hospital registries offer a promising source for dynamic updating of covid-19 related prediction models.414243 As data from new individuals become available, the prognostic models should be updated, as well as their performance in external validation sets.414243

Limitations of this meta-analysis

All the models we considered were developed and validated using data from the first waves of the covid-19 pandemic, up to April 2021, mostly before vaccination was implemented widely. Since the gathering of data, treatments for patients with covid-19 have improved and new options have been introduced. These changes are likely to reduce the overall risk of short term mortality in patients with covid-19. Prediction models for covid-19 for which adequate calibration has previously been established may therefore still yield inaccurate predictions in contemporary clinical practice. This further highlights the need for continual validation and updates.43

An additional concern is that prediction models are typically used to decide on treatment strategies but do not indicate to what extent patients benefit from individualised treatment decisions. Although patients at high risk of death could be prioritised for receiving intensive care, it would be more practical to identify those patients who are most likely to benefit from such care. This individualised approach towards patient management requires models to predict (counterfactual) patient outcomes for all relevant treatment strategies, which is not straightforward.4445 These predictions of patients’ absolute risk reduction require estimation of the patients’ short term risk of mortality with and without treatment, which might require the estimation of treatment effects that differ by patient.45

As variants of the disease emerge, new treatments are developed, and the disease is better managed, predictor effects and the incidence of mortality due to covid-19 may vary, thereby potentially limiting the predictive performance of the models we investigated.

We only considered models for predicting mortality in patients with covid-19 admitted to hospital, as outcomes such as clinical deterioration might increase the risk of heterogeneity from variation in measurements and differences in definitions. Mortality, however, is commonly recorded in electronic healthcare systems, with limited risk for misclassification. Furthermore, it is an important outcome that is often considered in decision making.

We had to use the reported nomograms to recover the intercepts for two prediction models from one group.3132 Ideally, authors would have adhered to the TRIPOD guidelines, which would have facilitated the evaluation of their models.


In this large international study, we found considerable heterogeneity in the performance of the prognostic models for predicting short term mortality in patients admitted to hospital with covid-19 across countries. Caution is therefore needed in applying these tools for clinical decision making in each of these countries. On average, the observed number of deaths was closest to the predicted number of deaths by the Xie model. The 4C Mortality Score and Wang clinical model showed the highest discriminative ability compared with the other validated models. Although they appear most promising, local and dynamic adjustments (intercept and slope updates) are needed before implementation in routine care. The usefulness of the 4C Mortality Score may be affected by the limited availability of the predictor variables.

What is already known on this topic

  • Numerous prognostic models for predicting short term mortality in patients admitted to hospital with covid-19 have been published

  • These models need to be compared head-to-head in external patient data

What this study adds

  • On average, the 4C Mortality Score by Knight et al and the clinical model by Wang et al showed the highest discriminative ability to predict short term mortality in patients admitted to hospital with covid-19

  • In terms of calibration, all models require local updating before implementation in new countries or centres

Ethics statements

Ethical approval

Medisch Ethische Toetsingscommissie (METC) Utrecht (protocol No 21-225) waived ethical approval.

Data availability statement

The data from Tongji Hospital, China that support the findings of this study are available from Data collected within CAPACITY-COVID is available on reasonable request (see Data for the CovidRetro study are available on request from MM or the secretariat of the Institute of Microbiology of the Czech Academy of Sciences (contact via for researchers who meet the criteria for access to confidential data. The data are not publicly available owing to privacy restrictions imposed by the ethical committee of General University Hospital in Prague and the GDPR regulation of the European Union. We can arrange to run any analytical code locally and share the results, provided the code and the results do not reveal personal information. The remaining data that support the findings of this study are not publicly available.


We thank the patients whose data support the findings in this study.

This study is supported by ReCoDiD (Reconciliation of Cohort data in Infectious Diseases;, COVID-PRECISE (Precise Risk Estimation to optimise COVID-19 Care for Infected or Suspected patients in diverse sEttings;, and CAPACITY-COVID ( Together we form a large unique group of researchers and medical care providers who are actively involved in covid-19 related data gathering, management, sharing, and analysis.

We thank the participating sites and researchers, part of the COVID-PRECISE consortium and the CAPACITY-COVID collaborative consortium. CAPACITY-COVID acknowledges the following organisations for assistance in the development of the registry and/or coordination regarding the data registration in the collaborating centres: partners of the Dutch CardioVascular Alliance, the Dutch Association of Medical Specialists, and the British Heart Foundation Centres of Research Excellence. In addition, the consortium is grateful for the endorsement of the CAPACITY-COVID initiative by the European Society of Cardiology, the European Heart Network, and the Society for Cardiovascular Magnetic Resonance. The consortium also appreciates the endorsement of CAPACITY-COVID as a flagship research project within the National Institute for Health and Care Research/British Heart Foundation Partnership framework for covid-19 research. The views expressed in this paper are the personal views of the authors and may not be understood or quoted as being made on behalf of or reflecting the position of the regulatory agency/agencies or organisations with which the authors are employed or affiliated.


  • Contributors: FvR, JD, MvS, TT, KM, VdJ, TD, BVC, and LW were responsible for the systematic review and design of the study. VdJ and TD were responsible for the statistical analysis plan and R code. FWA, OB-C, VC, RZR, FS, YY, TT, PN, PH, SK, RK, ML, RKG, MN, LFCM, AB, CAPACITY-COVID consortium (see supplementary material E), and CovidRetro collaboration (see supplementary material F) were responsible for primary data collection. RZR, DF, MM, PH, RKG, RN, PN, MN, and ML were responsible for the primary data analysis. RZR and SKK were responsible for the meta-analysis. VdJ and RZR were responsible for the sensitivity analysis. VDJ and RZR were responsible for the initial draft of the manuscript. TD, TT, TLN, ML, FWA, LM, JD, BVC, LW, and KM revised the initial draft. RZR was responsible for the supplementary material on data and results (supplementary material A and D). VdJ and TT were responsible for the supplementary material on models (B). All authors contributed to the critical revision of the manuscript, approved the final version of the manuscript and agree to be accountable for the content. VdJ and RZR contributed equally. VdJ, TD, and KM are the guarantors of this manuscript.

  • Funding: This project received funding from the European Union’s Horizon 2020 research and innovation programme under ReCoDID grant agreement No 825746. This research was supported by the National Institute for Health and Care Research (NIHR) Leicester Biomedical Research Centre. RKG is supported by the NIHR. MN is supported by the Wellcome Trust (207511/Z/17/Z) and by NIHR Biomedical Research Funding to University College London and University College London Hospital. MM is supported by ELIXIR CZ research infrastructure project (MEYS grant No LM2018131), including access to computing and storage facilities. The CAPACITY-COVID registry is supported by the Dutch Heart Foundation (2020B006 CAPACITY), ZonMw (DEFENCE 10430102110006), the EuroQol Research Foundation, Novartis Global, Sanofi Genzyme Europe, Novo Nordisk Nederland, Servier Nederland, and Daiichi Sankyo Nederland. The Dutch Network for Cardiovascular Research, a partner within the CAPACITY-COVID consortium, received funding from the Dutch Heart Foundation (2020B006 CAPACITY) for site management and logistic support in the Netherlands. ML is supported by the Alexandre Suerman Stipend of the University Medical Centre Utrecht. FWA is supported by CardioVasculair Onderzoek Nederland 2015-12 eDETECT and by NIHR University College London Hospital Biomedical Research Centre. LW and BVC are supported by the COPREDICT grant from the University Hospitals KU Leuven, and by Internal Funds KU Leuven (C24M/20/064). The funders had no role in the study design; in the collection, analysis, and interpretation of data; in the writing of the report; and in the decision to submit the article for publication in the analysis and interpretation of data, in the writing of the report, and in the decision to submit the article for publication. We operated independently from the funders.

  • Competing interests: All authors have completed the ICMJE uniform disclosure form at and declare: funding from the European Union’s Horizon 2020 research and innovation programme. ML and FWA have received grants from the Dutch Heart Foundation and ZonMw; FWA has received grants from Novartis Global, Sanofi Genzyme Europe, EuroQol Research Foundation, Novo Nordisk Nederland, Servier Nederland, and Daiichi Sankyo Nederland, and MM has received grants from Czech Ministry of Education, Youth and Sports for the submitted work; RKG has received grants from National Institute for Health and Care Research; FS has received an AWS DDI grant and grants from University of Sheffield and DBCLS; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; TD works with International Societiy for Pharmacoepidemiology Comparative Effectiveness Research Special Interest Group (ISPE CER SIG) on methodological topics related to covid-19 (non-financial); no other relationships or activities that could appear to have influenced the submitted work.

  • The manuscript’s guarantors (VdJ, TD, and KM) affirm that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as originally planned (and, if relevant, registered) have been explained. All authors had access to statistical reports and tables. Authors did not have access to all data, for privacy, ethical and/or legal reasons. Authors listed under “Primary data collection” in the contributorship section had access to data and take responsibility for the integrity of the data. Authors listed under the analysis bullets in the contributorship section take responsibility for the accuracy of the respective data analyses. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.

  • Dissemination to participants and related patient and public communities: We plan to share the results of this study on multiple social media platforms, including Twitter and LinkedIn. Copies of the manuscript will be sent to contributing centres, as well as being shared on the ReCoDID ( and COVID-PRECISE ( websites.

This is an Open Access article distributed in accordance with the terms of the Creative Commons Attribution (CC BY 4.0) license, which permits others to distribute, remix, adapt and build upon this work, for commercial use, provided the original work is properly cited. See: