Risk stratification of patients admitted to hospital with covid-19 using the ISARIC WHO Clinical Characterisation Protocol: development and validation of the 4C Mortality Score

Abstract Objective To develop and validate a pragmatic risk score to predict mortality in patients admitted to hospital with coronavirus disease 2019 (covid-19). Design Prospective observational cohort study. Setting International Severe Acute Respiratory and emerging Infections Consortium (ISARIC) World Health Organization (WHO) Clinical Characterisation Protocol UK (CCP-UK) study (performed by the ISARIC Coronavirus Clinical Characterisation Consortium—ISARIC-4C) in 260 hospitals across England, Scotland, and Wales. Model training was performed on a cohort of patients recruited between 6 February and 20 May 2020, with validation conducted on a second cohort of patients recruited after model development between 21 May and 29 June 2020. Participants Adults (age ≥18 years) admitted to hospital with covid-19 at least four weeks before final data extraction. Main outcome measure In-hospital mortality. Results 35 463 patients were included in the derivation dataset (mortality rate 32.2%) and 22 361 in the validation dataset (mortality rate 30.1%). The final 4C Mortality Score included eight variables readily available at initial hospital assessment: age, sex, number of comorbidities, respiratory rate, peripheral oxygen saturation, level of consciousness, urea level, and C reactive protein (score range 0-21 points). The 4C Score showed high discrimination for mortality (derivation cohort: area under the receiver operating characteristic curve 0.79, 95% confidence interval 0.78 to 0.79; validation cohort: 0.77, 0.76 to 0.77) with excellent calibration (validation: calibration-in-the-large=0, slope=1.0). Patients with a score of at least 15 (n=4158, 19%) had a 62% mortality (positive predictive value 62%) compared with 1% mortality for those with a score of 3 or less (n=1650, 7%; negative predictive value 99%). Discriminatory performance was higher than 15 pre-existing risk stratification scores (area under the receiver operating characteristic curve range 0.61-0.76), with scores developed in other covid-19 cohorts often performing poorly (range 0.63-0.73). Conclusions An easy-to-use risk stratification score has been developed and validated based on commonly available parameters at hospital presentation. The 4C Mortality Score outperformed existing scores, showed utility to directly inform clinical decision making, and can be used to stratify patients admitted to hospital with covid-19 into different management groups. The score should be further validated to determine its applicability in other populations. Study registration ISRCTN66726260


Supplementary material
Appendix 1. ISARIC WHO CCP-UK risk stratification score derivation and validation protocol Background Patients hospitalised with covid-19 are at high risk of mortality. Stratification of patients on admission may aid clinicians in determining immediate management decisions (home discharge, ward-level care, escalation to ICU) and medical treatment. High risk of bias exists in novel covid-19 risk stratification tools, with small cohorts in limited geographical areas and potential for over-fitting. Many of these scores are also complex to calculate. This limits clinical utility.


Develop risk stratification score using the largest known hospitalised cohort of covid-19 patients  The risk stratification score must have high clinical utility (defined here as having the ability to be calculated without a complex equation or algorithm)  Following derivation, determine discriminatory performance in validation cohorts and compare to existing risk stratification tools (for pneumonia, influenza and covid-19)

Primary outcome
In-hospital mortality with minimum 28d follow up.

Patient inclusion
 All adult patients (≥18 years old on admission)  Index admission (readmission episode excluded)  Completed index admission

Identification
The systematic literature search (see below) will identify potential predictor variables for mortality, disease severity and/or critical care requirement in pneumonia, influenza or covid-19 patients.

Score development
All patients within the database on 20 th May 2020 will be included within the derivation cohort.
With the overall aim to derive a risk stratification score with high clinical utility, an a priori decision has been made to categorise final included predictor variables for ease of calculation in a clinical environment. However, to avoid loss of information through categorisation, generalised additive models (GAMs) will be used first to identify final predictor variables prior to categorisation.
All remaining candidate predictor variables following application of exclusion criteria (availability in database, missingness) will be entered into a GAMs. These variables will then be removed individually and GAMs run again, determining the explained deviance and unbiased risk estimator (UBRE; essentially a scaled AIC) following exclusion of each individual variable. Final variables to include within the risk stratification score will then be selected by explained deviance, R 2 and UBRE.
GAMs curves for each continuous variable will then be created for each final included variable and cut-offs determined based on outcome risk. Once categorised, the final variables will be placed in a least absolute shrinkage and selection operator (LASSO) to ensure all final variables should be selected within the risk score and to reduce the risk of over-fitting. Shrunk coefficients will be converted to produce an index score.
In parallel, a machine learning (ML) model will be derived using extreme gradient boosted trees methodology (XGBoost), representing a 'best-in-class' model. This will include all candidate predictor variables included within the GAM model.

Statistical analysis
Model performance will be determined using the AUROC, with calibration and Brier score calculated for each final model (derived risk score and ML model).
To determine the impact of missingness, a missing data analysis will be performed. Multivariate imputation by chained equations (MICE) will be used for all candidate predictor variables (except those with high levels of missingness). It will be assumed that variables are missing at random and the primary outcome will be used in derivation dataset imputation models. Ten sets with ten iterations will be performed. Model performance will again be determined as detailed above, with Rubin's Rules used to combine model parameter estimates.

Validation
All patients entered into the database after the specified derivation cohort cut-off will be included, with the same patient inclusion/criteria applied.

Exiting risk stratification tool identification
Risk stratification scores created and or validated for pneumonia, influenza and covid-19 will be included. These will be identified through the systematic literature search (see below) and do not have to have been peer-reviewed for inclusion.
Only risk stratification scores with all predictor variables will be considered for inclusion. Decisions for inclusion of risk stratification score where one variable is missing will be made on a case-by-case basis by consensus within the study group. If the missing variable is deemed a key contributor to risk prediction within the tool it will be excluded.

Statistical analysis
Discriminatory performance (AUROC) and other performance metrics (sensitivity, specificity, PPV and NPV) will be calculated for all included risk stratification tools and compared with the derived risk and ML model in each of the validation cohorts. Calibration and Brier score will also be determined for the derived risk score in each validation cohort.

B
Appendix 6. Discrimination of models in imputed validation dataset. Missing data patterns were analysed (finalfit package) and data were considered missing at random (as opposed to missing completely at random). Multiple imputation of missing values was performed (mice package) with 10 iterations to create 10 imputed sets using the 28 predictor variables plus outcome for the derivation cohort, and 28 predictors without outcome in the validation cohort. Imputation methods were continuous variables: predictive mean matching; 2-level factors: binary logistic regression; and >2-level factors: polytomous regression (all considered unordered). Distributions of imputed variables were inspected across iterations.
Generalised additive model (GAM) included continuous predictors with L2-penalised thin-plate splines with comorbidities considered as a continuous count.
Gradient boosting tree (XGBoost) models included all continuous predictors and categorical predictors, including individual comorbidities. Two models were trained. The first used the multiply imputed datasets. The second used non-imputed data with missing modelled in the model building process.
Penalised logistic regression (LASSO) model used categorised variables with discrimination determined using exact coefficients.
4C mortality score is the prognostic index developed from the model building process. Appendix 7. Sensitivity analysis: discrimination of models with complete case data.  Neutrophil count (10 9 /L) 5.7 (4.9) 6.0 (5.0) Lymphocyte count (10 9 /L) 0.9 (0.7) 0.9 (0.7) Appendix 14. Sensitivity analysis of discriminatory performance for risk stratification scores after stratification of validation cohort by geography to predict inpatient mortality in patients hospitalised with covid-19. Predictors and regression coefficients of the final developed model, including intercept, are fully reported and correspond with the 4C Mortality Score index values + N/A + *ROB -Risk of Bias; + indicates low ROB/low concern regarding applicability; − indicates high ROB/high concern regarding applicability; and ? indicates unclear ROB/unclear concern regarding applicability.