Risk of clinical sequelae after the acute phase of SARS-CoV-2 infection: retrospective cohort studyBMJ 2021; 373 doi: https://doi.org/10.1136/bmj.n1098 (Published 19 May 2021) Cite this as: BMJ 2021;373:n1098
- Sarah E Daugherty, senior researcher1,
- Yinglong Guo, director1,
- Kevin Heath, national medical director for clinical intelligence and physician2 ,
- Micah C Dasmariñas, data scientist1,
- Karol Giuseppe Jubilo, data scientist1,
- Jirapat Samranvedhya, principal data scientist1,
- Marc Lipsitch, professor3,
- Ken Cohen, executive director of translational research and clinician2 4
- 1OptumLabs at UnitedHealth Group, Minneapolis, MN, USA
- 2OptumCare, Minneapolis, MN, USA
- 3Harvard T H Chan School of Public Health, Boston, MA, USA
- 4OptumLabs at UnitedHealth Group, Golden, CO, USA
- Correspondence to: S E Daugherty
- Accepted 26 April 2021
Objective To evaluate the excess risk and relative hazards for developing incident clinical sequelae after the acute phase of SARS-CoV-2 infection in adults aged 18-65.
Design Retrospective cohort study.
Setting Three merged data sources from a large United States health plan: a large national administrative claims database, an outpatient laboratory testing database, and an inpatient hospital admissions database.
Participants Individuals aged 18-65 with continuous enrollment in the health plan from January 2019 to the date of a diagnosis of SARS-CoV-2 infection. Three comparator groups, matched by propensity score, to individuals infected with SARS-CoV-2: a 2020 comparator group, an historical 2019 comparator group, and an historical comparator group with viral lower respiratory tract illness.
Main outcome measures More than 50 clinical sequelae after the acute phase of SARS-CoV-2 infection (defined as the date of first SARS-CoV-2 diagnosis (index date) plus 21 days) were identified using ICD-10 (international classification of diseases, 10th revision) codes. Excess risk in the four months after acute infection and hazard ratios with Bonferroni corrected 95% confidence intervals were calculated.
Results 14% of adults aged ≤65 who were infected with SARS-CoV-2 (27 074 of 193 113) had at least one new type of clinical sequelae that required medical care after the acute phase of the illness, which was 4.95% higher than in the 2020 comparator group. The risk for specific new sequelae attributable to SARS-Cov-2 infection after the acute phase, including chronic respiratory failure, cardiac arrythmia, hypercoagulability, encephalopathy, peripheral neuropathy, amnesia (memory difficulty), diabetes, liver test abnormalities, myocarditis, anxiety, and fatigue, was significantly greater than in the three comparator groups (2020, 2019, and viral lower respiratory tract illness groups) (all P<0.001). Significant risk differences because of SARS-CoV-2 infection ranged from 0.02 to 2.26 per 100 people (all P<0.001), and hazard ratios ranged from 1.24 to 25.65 compared with the 2020 comparator group.
Conclusions The results indicate the excess risk of developing new clinical sequelae after the acute phase of SARS-CoV-2 infection, including specific types of sequelae less commonly seen in other viral illnesses. Although individuals who were older, had pre-existing conditions, and were admitted to hospital because of covid-19 were at greatest excess risk, younger adults (aged ≤50), those with no pre-existing conditions, or those not admitted to hospital for covid-19 also had an increased risk of developing new clinical sequelae. The greater risk for incident sequelae after the acute phase of SARS-CoV-2 infection is relevant for healthcare planning.
Emerging data suggest that the sequelae of infection with SARS-CoV-2 and the disease it causes, covid-19, could vary in presentation and extend beyond the typical postviral recovery period. Hence epidemiologic interest in morbidity after the acute infection in survivors is growing. Some survivors experience serious complications during the acute phase of the illness, affecting pulmonary, cardiovascular, hepatic, renal, cognitive, and neurologic function.1234 Survivors also report a range of persistent symptoms adversely affecting physical, mental, and social wellbeing.5678 At least some of these complications might occur independent of the severity of covid-19.9 Although individuals admitted to hospital with community acquired (non-covid-19) pneumonia or influenza are at risk of cardiovascular, cerebrovascular, and neurological complications,1011 the degree of increased risk resulting from SARS-CoV-2 infection is unclear. Longitudinal studies on survivors of other coronaviruses (Middle East respiratory syndrome (MERS) and severe acute respiratory syndrome (SARS)) suggest long term physical and mental sequelae are not uncommon.121314
Most published studies so far have been small and mainly focused on clinical sequelae in patients admitted to hospital for covid-19.15 Hence many of these studies might not apply to the larger population of individuals infected with SARS-CoV-2. Little is known of the incidence of clinical sequelae caused by SARS-CoV-2 infection after the acute phase of the illness in adults aged 18-65 who might be considered to have a lower risk of severe covid-19. Also, few studies have been powered to evaluate whether factors such as age, sex, pre-existing conditions, and admission to hospital because of covid-19 modify the risk of clinical sequelae after the acute infection.
We estimated the excess risk and hazard ratios of new clinical sequelae attributable to SARS-CoV-2 in adults aged 18-65 after the acute phase of covid-19. Our analysis included a large generalizable sample of commercially insured adults; determined objective outcomes with the valid ICD-10 (international classification of diseases, 10th revision) codes in claims; and had the power to detect rare diagnoses and evaluate associations across subgroups.
We conducted a retrospective cohort analysis with three data sources within the UnitedHealth Group Clinical Discovery Database: de-identified administrative outpatient and inpatient claims, outpatient laboratory results for SARS-CoV-2, and a hospital admissions database updated daily with patients admitted with a primary, secondary, or tertiary diagnosis of covid-19. Quality control efforts for data were applied (supplementary appendix A). All data were for commercially insured patients enrolled with one large national health insurance provider in the United States.
Individuals diagnosed as having SARS-CoV-2
This group consisted of individuals aged 18-65 with continuous enrollment in the health plan from 1 January 2019 to the index date. We defined the index date as the date of the first occurrence of any of these events: diagnosis of a primary, secondary, or tertiary diagnosis of covid-19, identified by administrative claims with ICD-10 code U07.1 or either B34.2 or B97.29 before 1 April 2020; or documentation of a positive polymerase chain reaction (PCR) test in an outpatient laboratory dataset; or admitted to hospital for covid-19, identified from a hospital admissions database with a diagnosis code on admission or a primary, secondary, or tertiary diagnosis code of U07.1 or U07.2.
We included ICD-10 codes B34.2 and B97.29 because many physicians used these codes early in the pandemic to clinically diagnose infection before the US Centers for Disease Control and Prevention recommended U07.1 as the primary code for clinical diagnosis on 1 April 2020. The U07.1 code accounted for 98.2% of all clinically diagnosed patients in this study.
For individuals with a positive PCR test identified in our database (n=77 273), 45% also had a clinical diagnosis code of U07.1. We assumed that the remaining individuals with a positive PCR test without diagnostic codes were asymptomatic or had mild symptoms and did not seek medical care.
For patients admitted to hospital for covid-19 (n=21 746), over half (51.3%) were coded as U07.1 (covid-19, virus identified) and the other half were coded with the World Health Organization emergency code for clinically diagnosed covid-19 with no laboratory confirmed infection, U07.2 (covid-19, virus not identified). Although U07.2 was not officially adopted by the US, many clinicians used this code to identify patients with suspected covid-19.
We excluded individuals with a positive SARS-CoV-2 antibody serology without documented infection (n=28 810) because index dating of the illness was not possible. We also excluded individuals with a diagnosis code of B34.2 or B97.29 on or after 1 April 2020 (n=24 865) and individuals admitted to hospital for suspected covid-19 but missing diagnosis code U07.1 or U07.2 in the primary, secondary, or tertiary position (n=1247). Figure 1 shows a flowchart with details of population sampling.
2020 comparator group
The 2020 comparator group was individuals aged 18-65 who did not have a clinical diagnosis related to covid-19, a positive PCR test, or were admitted to hospital for covid-19 in 2020. Continuous enrollment in the health plan was required from 1 January 2019 to a randomly assigned index date drawn from the SARS-CoV-2 infection group.
2019 comparator group
We created this historical comparison group to account for possible ascertainment bias because of reduced use of healthcare services during the 2020 pandemic. Individuals aged 18-65 were required to have continuous enrollment in the health plan from 1 January 2018 to a randomly assigned month and day in 2019, drawn from the SARS-CoV-2 infection group.
Viral lower respiratory tract illness comparator group
We created this historical comparison group to evaluate the clinical sequelae specific to SARS-CoV-2 infection because many serious viral illnesses have a risk of morbidity after the acute illness. The viral lower respiratory tract illness group included individuals aged 18-65 who developed influenza (J09, J10, J11), non-bacterial pneumonia (J12, J18.9), acute bronchitis (J20), acute lower respiratory infection (J22), or chronic obstructive pulmonary disease with acute lower respiratory infection (J44.0), between 1 January 2017 and 31 October 2017, 1 January 2018 and 31 October 2018, or 1 January 2019 and 31 October 2019. We included chronic obstructive pulmonary disease exacerbation because instances are typically induced by a virus and identified only with code J44.0. We defined the index date as the date of the first diagnosis of viral lower respiratory tract illness in the corresponding year of the cohort. Individuals were required to have continuous enrollment in the health plan from 1 January 2016, 1 January 2017, or 1 January 2018 to the index date, respectively.
We used ICD-10 codes to identify new clinical diagnoses from the administrative claims data between 1 January 2020 and 31 October 2020. eTable 1 shows the ICD-10 classification details. We created domain clusters based on clinically similar diagnoses, and we included atopic dermatitis as a negative control.16
We used administrative claims between 1 January 2019 and 30 days before the index date to determine length of hospital stay, previous clinical conditions, history of comorbidities derived from the Charlson Index and Elixhauser score, and previous visits to a primary care physician, cardiologist, or nephrologist. Demographic, clinical, and testing data were obtained from administrative claims between 1 January 2020 and 31 October 2020. We derived socioeconomic status scores specific to zip codes and proportions of white, black, and Hispanic populations by zip code.17 Missing values for zip code derived variables (socioeconomic status scores n=11 713; race n=444 080) were imputed with the median of non-missing values for these variables. Because all other variables were derived from administrative claims and the continuous enrollment criteria were applied, clinical events without a claim were considered to have not occurred and were given a value of zero.
Follow-up periods for the primary analysis for individuals infected with SARS-CoV-2 and comparator groups started at the index date plus 21 days and continued up to a diagnostic event, disenrollment from the insurance plan (because of death or withdrawal), or the end of the study period (31 October 2020 or 31 October of the corresponding year for the historical comparator groups), whichever occurred first. We performed a secondary analysis evaluating rates by month of follow-up starting 30 days before the index date and ending after six months of follow-up.
Propensity score matching
We used matching by propensity score to create three cohorts with similar baseline characteristics and relevant confounders.18 We constructed a propensity score for every individual based on 108 variables (age, number; sex, male yes/no; socioeconomic status scores specific to zip codes, number; proportion of white, black, and Hispanic populations in a zip code, number; state, 10 binary variables in the top 10 states of SARS-CoV-2 diagnosis; index month, January-March, April, May, June, July, August, and September yes/no; pre-existing comorbidities, yes/no; total length of stay as an inpatient in the previous year, number in days; previous number of visits to a primary care physician, cardiologist, or nephrologist, number; and previous clinical conditions, yes/no) by logistic regression with ridge penalty.1920
Because of the large study population (about 10 million), we did not perform conventional one-to-one nearest neighbor matching. We generated 40 000 bins based on the propensity scores of the SARS-CoV-2 infected group. In each small bin, we created equal sized sets of individuals with SARS-CoV-2 and in the comparator group. We matched within the bins and then combined all sets to form the propensity matched groups. This process was repeated for each of the comparator groups (2020, 2019, and viral lower respiratory tract illness groups). We achieved three balanced cohort groups with this approach (eTable 1a and eFigure 1a-c).
We evaluated demographic and clinical factors with the t test and the Pearson χ2 test for the unmatched population to compare numeric and categorical data, respectively. We used a paired t test and the McNemar test for evaluating numeric and categorical factors in the matched populations.
We calculated the proportion of individuals in the matched populations with a follow-up time of at least 21 days after the index date who did not develop new clinical sequelae, or developed one or more than one new type of clinical sequelae after the acute infection. No events that occurred before or during the acute phase were counted. Symptoms such as fatigue, myalgia, and anosmia, measured by ICD-10 codes, were included in this assessment of clinical sequelae after the acute infection.
When we calculated the risk for a specific incident outcome after the acute infection, individuals with the diagnosis of interest before the index date (in the SARS-CoV-2 and respective comparator group) or individuals (and their comparator match) who experienced the diagnosis of interest during the acute phase were removed from the post-acute calculation. Individuals with less than 21 days of follow up after the index date and their matched pair were also excluded from the analysis. Therefore, the denominator for the incidence of each outcome included only those at risk for the diagnosis of interest at the index date plus 21 days. We reported the risk difference as the difference between the cumulative incidence calculated by the Kaplan-Meier estimator at day 120 from the time origin (index date plus 21 days) in the SARS-CoV-2 group and the comparator group, multiplied by 100. We calculated the 95% confidence intervals and P values with a pairwise bootstrap method, with a one sided test. We tested the proportional hazards assumption with Schoenfeld residuals.21 Hazard ratios and 95% confidence intervals were estimated by fitting Cox proportional hazards models using robust variance estimator with clustering for matched pairs. We used the Wald test to evaluate significance at the 0.05 level. Individuals were censored based on the outcome of interest, disenrollment from the insurance plan, or the end of the study period (31 October 2020).
In a secondary analysis, we created seven one month time intervals (one month before the index date to six months after the index date). Within each time interval, we only included the matched pairs who were both at risk for the diagnosis of interest at the beginning of each time interval. We estimated the hazard ratios and confidence intervals by fitting Cox proportional hazard models using robust variance estimator with clustering for matched pairs. Individuals were censored based on the diagnosis of interest, disenrollment from the insurance plan, or the end of the time interval.
Because of the large number of comparisons, we applied the Bonferroni correction for all P values and confidence intervals, by multiplying P values by N and estimating (1−(0.05÷N))×100% confidence intervals, where N=51 and is the number of clinical sequelae we tested. For all data analyses, we used Python with scikit-learn, statsmodels, lifelines, and scipy libraries and R with survminer,22 survival,23 glmnet,24 and stats libraries.25
We performed four stratified analyses (age (18 to ≤34, >34 to ≤50, and >50), sex (male/female), any pre-existing clinical comorbidity (yes/no), and admitted to hospital for covid-19 (yes/no)) with the SARS-CoV-2 and 2020 comparator group. We constructed a propensity score, and performed matching by propensity score within each subgroup for age, sex, and pre-existing condition status before calculating the risk difference or hazard ratio within each stratum. The SARS-CoV-2 and 2020 comparator groups, matched for propensity score, were stratified based on whether individuals with SARS-CoV-2 were admitted to hospital for covid-19. To test for interaction, we evaluated the significance of the fitted coefficient for the interaction term in a model including the main effect variables, with a Bonferroni correction (N=51×number of levels in each subgroup analysis).
We evaluated the robustness of the definition of the post-acute phase by varying the cut-off points from the index date at 14 and 28 days. We also considered a period effect because of changes in the availability of testing and advancements in management and treatment over time. We evaluated differences by two periods (January-June and July-October). Finally, we evaluated the number of visits to a primary care physician after the acute infection by type of diagnostic method (PCR positive, clinical diagnosis, and admitted to hospital) and among the three comparator groups.
Patient and public involvement
This retrospective analysis did not directly involve patients or the public in the development of the research question or conduct of the analysis because of funding and training restrictions for retrospective analyses. A patient reviewer provided insightful comments and contributed to the expansion of the outcomes reported in this manuscript.
Among 9 247 505 individuals meeting the study criteria in 2020, we identified 266 586 (2.9%) with SARS-CoV-2 infection. After matching by propensity score, we identified 266 586 matched pairs for the primary (2020) and secondary (2019) comparison groups (100% of SARS-CoV-2 individuals matched). We identified 244 276 matched pairs for the viral lower respiratory tract illness comparison group (91.6% of SARS-CoV-2 individuals matched).
Individuals with SARS-CoV-2 infection were more likely than their unmatched 2020 and 2019 comparators to be younger, to be women, have a lower socioeconomic status score index, live in a zip code with a higher proportion of black or Hispanic individuals, have a pre-existing comorbidity or at risk condition, been admitted to hospital with a longer length of stay during the preceding year, visited a primary care physician or other specialists more often, and live in the northeast or southern US (areas of high incidence during the study period; all P<0.05) (table 1). Among those individuals with SARS-CoV-2, 8.2% were admitted to hospital and 1.1% were admitted to the intensive care unit. We found different patterns between individuals with SARS-CoV-2 and the viral lower respiratory tract illness comparison group who were more likely to be older, to be women, have a comorbidity, smoke, visit a primary care physician or cardiologist more often, and live in the southern or midwestern US. After matching by propensity score, most of these differences were resolved, although some small differences between the matched SARS-CoV-2 infected group and the viral lower respiratory tract illness comparator were significant (all P <0.05) (eTable 1a). Despite these minor differences, balance was achieved overall, as shown by the standardized mean differences between matched groups by key variables at less than 0.10 (eFig 1a-c).
Among the matched individuals infected with SARS-CoV-2 identified in our study with a follow-up time of at least 21 days from the index date (n=193 113), 85.98% had no new clinical sequelae after the acute infection that required medical care during their follow-up, 10.01% had one type of new sequelae that required medical care, and 4.01% had more than one type of new sequelae (table 2). The proportion of individuals diagnosed as having any new clinical sequelae after the acute phase was higher in the SARS-CoV-2 infected group than in the three comparator groups, although the differences were smallest compared with the viral lower respiratory tract illness group.
Estimates of risk difference by type of new clinical sequelae were calculated for individuals infected with SARS-CoV-2 who were still at risk 21 days after the index date (n=193 113 for 2020 comparison). Follow-up time was considered up to day 141 after the index date (120 days from the start of the post-acute period and 73.5 centile of the follow-up distribution, with a median of 87 days (interquartile range 45-124 days)). Figure 2 summarizes the most common new clinical outcomes in the SARS-CoV-2 group (incidence ≥0.1%) and eTable 2a the less common outcomes. Symptoms are included separately in eTable 2b and are not represented in figure 2. Overall, the excess risk attributable to SARS-CoV-2 infection was low for incident diagnoses (0.02-2.26 per 100 individuals) four months after the acute phase (fig 2 and eTable 2a-b). The increased risk, however, was consistently seen for many outcomes across all three comparison groups (2020, 2019, and viral lower respiratory tract illness groups; eTable 2a).
Despite the small absolute risk attributable to SARS-CoV-2 infection, the hazard ratios for individuals infected with SARS-CoV-2 and the 2020 comparator group after the acute infection were large (significant hazard ratios of 1.24-25.65 (all P<0.001 except for atopic dermatitis (P=0.033); eTable 2c-d and fig 2). When we evaluated rates over time, hazard ratios were highest in the first month of the index date but were substantially elevated up to six months for some common events, such as hypertension (hazard ratio 1.81 (95% confidence interval 1.10 to 2.96)), diabetes (2.47 (1.14 to 5.38)), sleep apnea (2.31 (1.23 to 4.32)), and fatigue (2.20 (1.48 to 3.27)), suggesting the hazard for some new clinical sequelae was sustained months after the initial SARS-CoV-2 infection (eTable 3). An increased risk 30 days before the index date was likely because of a delay in testing or documentation of a confirmed diagnosis in symptomatic individuals. Figure 3 shows select graphs of the cumulative hazards for the most common or most severe clinical sequelae.
Excess risk for developing many new outcomes after the acute infection increased significantly (most P<0.001) with age (fig 4 and eTable 4a). The risk for clinical sequelae was greatest in individuals aged >50 but the absolute risk in young adults aged 18-34 was significantly elevated, albeit modestly so, for some conditions including, but not limited to, hypertension, arrhythmia, hypercoagulability, amnesia, diabetes, and fatigue (all P<0.001). The risk of developing any mental health outcome was significantly increased regardless of age (Pinteraction=0.35).
Excess risk for new clinical sequelae after acute covid-19 rarely differed between men and women, apart from fatigue and anosmia (more commonly diagnosed in women) and myocarditis, hypercoagulability, deep vein thrombosis, kidney injury, and sleep apnea (more commonly diagnosed in men) (fig 4 and eTable 4b). With a few exceptions, individuals with pre-existing conditions (fig 4 and eTable 4c) and individuals admitted to hospital with covid-19 (fig 4 and eTable 4d) had a greater excess risk of developing new clinical sequelae because of SARS-CoV-2 infection.
We did not find a significant period effect for most outcomes (eTable 4e). In our sensitivity analysis, the risk differences increased when the post-acute phase was shortened to an index date plus 14 days (eTable 5a). Similar risk differences were seen for the index date plus 21 days and the index date plus 28 days (eTable 5b), suggesting an index date plus 21 days is a reasonable start to the post-acute phase of the illness.
Our retrospective study conducted in a large administrative database evaluated the excess risk of developing a wide range of clinical sequelae after the acute phase of SARS-CoV-2 infection in commercially insured adults aged 18-65. We found that 14% of individuals aged ≤65 who were infected with SARS-CoV-2 developed at least one new type of clinical sequelae that required medical care after the acute phase of SARS-CoV-2 infection, which was 4.95% higher than the 2020 comparator group and 1.65% higher than individuals diagnosed as having viral lower respiratory tract illness. This finding suggests that the SARS-CoV-2 virus is not unique in causing clinical sequelae after the acute infection. Our results confirmed an excess risk for specific types of sequelae in the four months after the acute phase (index date plus 21 days). Our analysis also showed that although the risk increased with age, pre-existing conditions, and admission to hospital for covid-19, younger adults (aged ≤50), those with no pre-existing conditions, and individuals infected with SARS-CoV-2 not admitted to hospital were also at risk for new clinical sequelae after the acute infection. Finally, our results suggested that the risk for some clinical sequelae, such as mental health diagnoses, were increased regardless of age and pre-existing condition.
Comparison with other studies
When we considered the risk attributable to SARS-Cov-2 infection, several clinical sequelae were increased in survivors after the acute infection regardless of comparison group (2020, 2019, or viral lower respiratory tract illness group). These outcomes included chronic respiratory failure, cardiac irregularities, such as tachycardia and arrythmia, hypercoagulability in the form of pulmonary embolism and deep vein thrombosis, anxiety, encephalopathy, peripheral neuropathy, amnesia, diabetes, liver test abnormalities, myocarditis, and fatigue. Many of these outcomes have been previously reported in case studies or observational studies during the acute phase of covid-19 (including tachycardia,26 hypercoagulability,27 mental health outcomes,28 encephalopathy,2930 diabetes,31 and amnesia32). A few studies have also highlighted persistent symptoms or new clinical diagnoses after the acute infection,32333435363738 although few have reported on a full range of new clinical diagnoses across multiple organ systems in such a large population.
Studies of individuals infected with other coronaviruses have shown arrhythmia and tachycardia sequelae in survivors of SARS,3940 and central and peripheral nervous system sequelae in survivors of SARS and MERS.41 The proportion of individuals infected with SARS-CoV-2 with a new diagnosis of encephalopathy (0.23) and peripheral neuropathy (0.31) in our study (encephalopathy risk difference 0.09-0.19 and peripheral neuropathy risk difference 0.12-0.16) was closer to the higher rates of central and peripheral nervous system sequelae seen with MERS.41 In another study, anxiety was reported to be the most common type of mental health sequelae in individuals with a diagnosis of covid-19 (hazard ratio 1.59-2.62) at 14-90 days after SARS-CoV-2 infection.42 Our results on mental health outcomes (anxiety; hazard ratio 1.24-1.54) were similar, although absolute risks were lower in our study population.
We also identified excess risk for clinical sequelae that were not unique to SARS-CoV-2 and are commonly seen with other serious viral infections. The magnitude of the relative risk for these incident sequelae (eg, hypertension, stroke, and kidney injury), however, was nearly twice that typically seen in the general population in a normal year. Because of the scale of the SARS-CoV-2 pandemic in the US, these findings suggest more planning for healthcare resources is needed to look at the health complications in survivors.
Strengths and limitations
Our study had the power to quantify the small, but not trivial, risk among younger, healthier adults. In our study population of adults aged ≤65, more than 90% of individuals recovered at home. Younger individuals might experience complications caused by covid-19, especially if they are admitted to hospital and have pre-existing conditions,43 but few studies have reported new outcomes after the acute phase of the illness in individuals with milder symptoms.
Fatigue is often the most reported symptom after the acute infection, with self-reported estimates in surveys ranging from 13.6%36 to 77.7%44 depending on whether individuals were admitted to hospital and length of follow-up. Although fatigue was also the most common diagnosis after the acute infection in this cohort (4.64%), our estimates only reflected fatigue that was reported to and noted by the physician. Most symptoms, when associated with a primary viral illness such as covid-19, are frequently not coded by the clinician because they are presumed to be part of the infectious process. Also, many individuals with covid-19 might not seek medical care unless the symptoms are unusually long lasting or severe. ICD-10 codes have been shown to be valid for many clinical diagnoses45 but are inaccurate and unreliable for determining symptoms.46 Therefore, we intentionally did not evaluate a broad list of symptoms in this study as we expect that the true incidence of symptoms was not accurately reflected when determined by ICD-10 codes and would be best established through patient reported surveys.
We could not determine race or ethnicity at the individual level in the commercial claims. More research is needed to better understand how race and ethnicity modify the risk for long term clinical sequelae. Moreover, our population required continuous enrollment from January 2019 until the index date, and so individuals who might not have been insured during this period were not included in our sample.
Death might be an important competing risk. We did not perform a competing risk analysis, however, for two reasons. First, we could not identify mortality as an outcome in our commercial claims database. Disenrollment from the insurance plan might be used as a proxy but the reason for disenrollment was not available. Therefore, we could not distinguish between withdrawal because of loss or change in employment and withdrawal because of death. Second, less than 5% of people disenrolled from the insurance plan during the follow-up period in our matched cohorts. Also, the difference in the number of disenrollments between individuals infected with SARS-CoV-2 and the three comparison groups was minimal (range 0.03-0.43%), suggesting that disenrollment was not an informative censoring event.
We might have misclassified individuals because of the retrospective nature of our study and the inherent limitations of using claims to define variables. First, we relied on a database of laboratory results to identify individuals with a positive PCR test. Individuals infected with SARS-CoV-2 might have been misclassified into the 2020 comparator group if they tested positive outside of our network and did not have enough symptoms to require medical care. This misclassification would likely have biased our findings toward the null. Second, because we required continuous enrollment for study eligibility, we captured clinical diagnoses and comorbidities that occurred in the previous year and in the year of the index date up until 30 days before the index date. An incident or new diagnosis in the post-acute phase for a specific condition might have been an exacerbation of a pre-existing condition that did not receive medical care within the 13-22 month window created by our continuous enrollment criteria. Third, in the group with no pre-existing comorbidities, we might have misclassified some individuals, potentially inflating the incidence in that subgroup. This misclassification is likely minimal, however, because most medical conditions and comorbidities require individuals to participate in at least annual check-ups with primary care physicians or specialists. Fourth, we might not have captured a comprehensive list of all ICD-10 codes for each outcome, although the most common ones were included. Finally, because SARS-CoV-2 is a new virus, we believe that physicians might have underestimated the clinical significance of some outcomes, especially early in the pandemic. Our sensitivity analysis, however, does not support this temporal lack of documentation because we found no significant difference in risk by period.
Our small excess risks might be a result of increased medical care after SARS-CoV-2 infection, such that the ascertainment (diagnosis) rather than the incidence of a new diagnosis was triggered by SARS-CoV-2 infection. This ascertainment bias is unlikely to fully explain our results, however, because individuals with viral lower respiratory tract illness should receive similar medical care after the acute illness but have fewer visits to primary care physicians than individuals infected with SARS-CoV-2, regardless of the method of diagnosis (eFigure 2).
This retrospective analysis was strengthened by its large sample size that powered the assessment of multiple and rare outcomes simultaneously overall and across multiple subgroups. We used valid ICD-10 codes to determine outcomes, and included a broad definition of SARS-CoV-2 infection (positive PCR results or clinical diagnosis), producing a generalizable sample of individuals.
With almost 70 million individuals infected with SARS-CoV-2 worldwide and rising, the number of survivors with potential sequelae after covid-19 will continue to grow. To manage these patients effectively, understanding the incidence and natural history of these sequelae is important. Our results provide clinicians with a comprehensive understanding of the excess risk for over 50 clinical morbidities across multiple organ systems affecting adults aged ≤65 after the acute phase of SARS-Cov-2 infection. Knowing the magnitude of risk for rare and common clinical sequelae might improve the diagnosis and management of individuals infected with SARS-CoV-2. Also, our results could help providers and other key stakeholders anticipate the scale of future health complications and improve planning on the use of healthcare resources.
We found that 14% of individuals with SARS-CoV-2 infection developed a new type of clinical sequelae that required medical care after the acute phase of the illness, which was 4.95% higher than the 2020 comparator group. An increased and sustained risk for clinical sequelae was seen during the four months after the acute illness, particularly, but not exclusively, in individuals with pre-existing conditions or admitted to hospital for covid-19. More follow-up is needed to determine resolution of risk over time.
What is already known on this topic
Small observational studies of patients admitted to hospital have shown that some covid-19 survivors had short term and long term clinical sequelae
Few studies have characterized the excess risk of clinical sequelae attributable to SARS-CoV-2 after the acute infection in adults aged ≤65 in a large generalizable sample
What this study adds
14% of individuals aged ≤65 who were infected with SARS-CoV-2 (27 074 of 193 113) developed at least one new type of clinical sequelae that required medical care after the acute phase of the illness, which was 4.95% higher than the 2020 comparator group
An increased risk of specific clinical sequelae after the acute infection was noted across a range of organ systems, including cardiovascular, neurologic, kidney, respiratory, and mental health complications
The risk for incident sequelae increased with age, pre-existing conditions, and admission to hospital for covid-19, but in adults aged ≤50 and those with no pre-existing conditions or not admitted to hospital for covid-19, the risk for some clinical sequelae was still elevated
We thank Andrea Rotnitzky and Jamie Robins for reviewing and providing feedback on our methods.
Contributors: SED, YG, KH, JS, MCD, KGJ, ML, and KC played an active role in all aspects of the development of the research, including design, conduct, and interpretation of the data; preparation, review, and approval of the manuscript; and decision to submit the manuscript for publication. A description of the data source is included in supplementary appendix A. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted. SED and YG serve as joint guarantors who accept full responsibility for the work and/or the conduct of the study, had access to the data, and controlled the decision to publish.
Funding: The study was funded by OptumLabs, the research and development arm of UnitedHealth Group, and the authors SED, KC, KH, YG, MCD, KGJ, and JS are full time employees at UnitedHealth Group. UnitedHealth Group, the organization, was not involved in determining the details of the study design, analysis plan, interpretation of data, or writing of the report.
Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf and declare: support from OptumLabs for the submitted work and no other organization or company provided support for this work; SED, YG, KH, JS, MCD, KGJ declare that they have had no support from any organization for the submitted work; no financial relationships with any organizations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted. KC declares consulting from Pfizer. ML declares honoraria/consulting from Merck, Sanofi-Pasteur, Bristol Myers-Squibb, and Antigen Discovery; research funding (institutional) from Pfizer, and unpaid scientific advice to Janssen, Astra-Zeneca, One Day Sooner, and Covaxx (United Biomedical).
Ethical approval: This research was determined to be exempt from human research regulations by the UnitedHealth Group Office of Human Research Affairs (action ID: 2020-0076-01).
Data sharing: The data are proprietary and are not available for public use but, under certain conditions, might be made available to editors and their approved auditors under a data use agreement to confirm the findings of this study.
The lead author, SED, affirms that this manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) have been explained.
Dissemination to participants and related patient and public communities: The content of this paper will be summarized and made publicly available on the United In Research (unitedinresearch.com) platform sponsored by OptumLabs.
Provenance and peer review: Not commissioned; externally peer reviewed.
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.