Electronic health record data for identifiyng COVID-19 long-term sequelae must be interpreted cautiously
I read with interest the study by Xie et al. concerning the neuropsychiatric sequelae of COVID-19 survivors (1). In related publications (2), the authors reported an increased incidence of cardiovascular (3), metabolic (4), and renal (5) long-term conditions among COVID-19 survivors. While these reports are important, there are open questions regarding data analysis that cast serious doubts on the authors’ main conclusions.
It is of utmost importance to understand that EHR data do not only reflect the patients’ health, but their interactions with the health care system (6). For example, surveillance bias implies that patients who are followed-up with increased vigilance are more likely to receive EHR diagnoses (8,9). In other words, an EHR diagnosis must not only be interpreted as a proxy for sickness, because it might also be a proxy for diagnostic intensity. Closely related, informed presence bias implies that patients with higher frequency of healthcare encounters, receive more EHR diagnoses, due to repeated interactions with the healthcare system (10), which necessitates a code for these encounters. In fact, some researchers believe that the predictive value of underlying healthcare processes might even be more relevant than the predictive value of the pathophysiological processes (6).
I reassessed the data of this study and found evidence of different healthcare utilization between exposure and control group. The authors report that 50.95% of COVID-19 patients had three or more encounters with the healthcare system vs. 27.83% of the control group. Fortunately, the authors used “frequency of outpatient encounters” as a covariate and achieved balance after matching. Nonetheless, according to the raw data in Nature, COVID-19 patients received more significantly more EHR codes for “medical examinations”, “encounters for administrative purposes” and reception of “diagnostic agents”. The latter clearly implies larger diagnostic work-up of COVID-19 patients. Of course, this might indicate poorer health of COVID-19 patients in general, however to the absolute minimum, the possible role of these biases and their impact on the measured findings must be highlighted as a major limitation of all conclusions in this study.
The authors contradict themselves in their published reports. In the Nature study, substance use disorders are not associated with COVID-19, except a positive trend for alcohol-related disorders. In the BMJ study, they do. Further baseline characteristics of COVID-19 patients differed dramatically. For example, after matching, the BMJ study reports 23.95% of COVID 19 patients suffering from diabetes mellitus type 2, vs. 29.27% in Nature vs. 24.39% in the Nature Medicine or vs. 39.4% in the JASN report, respectively. To the authors defense, all studies differ in statistical adjustment and selection of covariates but the degree of model dependence is concerning.
Strangely enough, in the Nature study, the authors show that COVID-19 is inversely associated with tobacco- and cannabis-related disorders, which is evidently implausible and clearly indicates residual confounding. What is most concerning, however, is that in the BMJ report the authors report substance use disorders – except tobacco-related disorders, which is the only outcome indicating uncontrolled confounding. The authors provide no rationale why they leave out the data on nicotine adherence.
Lastly, the authors used traumatic events and neoplasms as negative-outcome controls. The authors state that “if there were biases in the analytical approach, this would extend to the chosen negative-outcome controls”. However, the authors only chose outcomes, which align with their hypothesis. According to the raw data in Nature, COVID 19 is also associated with “refractive errors” and “acquired foot deformities”. It remains unclear, why those are not considered negative-outcome controls, as they are clearly not “plausibly associated to COVID-19 infections”.
The conclusions of this author group is hampered by severe uncontrolled confounding and needs careful reevaluation. To optimally assess the epidemiological burden of long-term sequelae of COVID-19, we must not rely on self-report or observational data, but on gold-standard prospective, double-blinded cohort studies (14).
I declare no competing interests.
The opinions expressed here do not necessarily represent the opinions of my affiliations.
1. Xie, Y., Xu, E. & Al-Aly, Z. Risks of mental health outcomes in people with covid-19: cohort
study. BMJ e068993 (2022) doi:10.1136/bmj-2021-068993.
2. Al-Aly, Z., Xie, Y. & Bowe, B. High-dimensional characterization of post-acute sequelae of
COVID-19. Nature 594, 259–264 (2021).
3. Xie, Y., Xu, E., Bowe, B. & Al-Aly, Z. Long-term cardiovascular outcomes of COVID-19. Nature
Medicine 28, 583–590 (2022).
4. Xie, Y. & Al-Aly, Z. Risks and burdens of incident diabetes in long COVID: a cohort study. The
Lancet Diabetes & Endocrinology 10, 311–321 (2022).
5. Bowe, B., Xie, Y., Xu, E. & Al-Aly, Z. Kidney Outcomes in Long COVID. Journal of the American
Society of Nephrology 32, 2851–2862 (2021).
6. Agniel, D., Kohane, I. S. & Weber, G. M. Biases in electronic health record data due to
processes within the healthcare system: retrospective observational study. BMJ k1479 (2018)
7. Hripcsak, G., Albers, D. J. & Perotte, A. Parameterizing time in electronic health record studies.
Journal of the American Medical Informatics Association 22, 794–804 (2015).
8. Haut, E. R. Surveillance Bias in Outcomes Reporting. JAMA 305, 2462 (2011).
9. Pierce, C. A. et al. Surveillance Bias and Deep Vein Thrombosis in the National Trauma Data
Bank: The More We Look, The More We Find. Journal of Trauma: Injury, Infection & Critical
Care 64, 932–937 (2008).
10. Goldstein, B. A., Bhavsar, N. A., Phelan, M. & Pencina, M. J. Controlling for Informed Presence
Bias Due to the Number of Health Encounters in an Electronic Health Record. American
Journal of Epidemiology 184, 847–855 (2016).
11. Vai, B. et al. Mental disorders and risk of COVID-19-related mortality, hospitalisation, and
intensive care unit admission: a systematic review and meta-analysis. The Lancet Psychiatry 8,
12. Mena, G. E. et al. Socioeconomic status determines COVID-19 incidence and related mortality
in Santiago, Chile. Science (1979) 372, (2021).
13. Lipsitch, M., Tchetgen Tchetgen, E. & Cohen, T. Negative Controls. Epidemiology 21, 383–388
14. Sneller, M. C. et al. A Longitudinal Study of COVID-19 Sequelae and Immunity: Baseline
Findings. Annals of Internal Medicine 175, 969–979 (2022)
Competing interests: No competing interests