Diagnostic accuracy of serological tests for covid-19: systematic review and meta-analysisBMJ 2020; 370 doi: https://doi.org/10.1136/bmj.m2516 (Published 01 July 2020) Cite this as: BMJ 2020;370:m2516
- Mayara Lisboa Bastos, postdoctoral fellow12,
- Gamuchirai Tavaziva, research assistant1,
- Syed Kunal Abidi, medical student1,
- Jonathon R Campbell, postdoctoral fellow16,
- Louis-Patrick Haraoui, assistant professor3,
- James C Johnston, clinical associate professor4,
- Zhiyi Lan, consultant1,
- Stephanie Law, postdoctoral fellow5,
- Emily MacLean, doctoral student6,
- Anete Trajman, visiting researcher12,
- Dick Menzies, full professor16,
- Andrea Benedetti, associate professor16,
- Faiz Ahmad Khan, associate professor16
- 1Respiratory Epidemiology and Clinical Research Unit, Centre for Outcomes Research and Evaluation, Research Institute of the McGill University Health Centre, Montreal, Canada
- 2Social Medicine Institute, State University of Rio de Janeiro, Rio de Janeiro, Brazil
- 3Department of Microbiology and Infectious Diseases, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, Québec, Canada
- 4University of British Columbia, Vancouver, Canada
- 5Department of Global Health and Social Medicine, Harvard Medical School, Boston, MA, USA
- 6Departments of Epidemiology, Biostatistics and Occupational Health, and Medicine, McGill University, Montreal, Canada
- Correspondence to: F Ahmad Khan
- Accepted 23 June 2020
Objective To determine the diagnostic accuracy of serological tests for coronavirus disease-2019 (covid-19).
Design Systematic review and meta-analysis.
Data sources Medline, bioRxiv, and medRxiv from 1 January to 30 April 2020, using subject headings or subheadings combined with text words for the concepts of covid-19 and serological tests for covid-19.
Eligibility criteria and data analysis Eligible studies measured sensitivity or specificity, or both of a covid-19 serological test compared with a reference standard of viral culture or reverse transcriptase polymerase chain reaction. Studies were excluded with fewer than five participants or samples. Risk of bias was assessed using quality assessment of diagnostic accuracy studies 2 (QUADAS-2). Pooled sensitivity and specificity were estimated using random effects bivariate meta-analyses.
Main outcome measures The primary outcome was overall sensitivity and specificity, stratified by method of serological testing (enzyme linked immunosorbent assays (ELISAs), lateral flow immunoassays (LFIAs), or chemiluminescent immunoassays (CLIAs)) and immunoglobulin class (IgG, IgM, or both). Secondary outcomes were stratum specific sensitivity and specificity within subgroups defined by study or participant characteristics, including time since symptom onset.
Results 5016 references were identified and 40 studies included. 49 risk of bias assessments were carried out (one for each population and method evaluated). High risk of patient selection bias was found in 98% (48/49) of assessments and high or unclear risk of bias from performance or interpretation of the serological test in 73% (36/49). Only 10% (4/40) of studies included outpatients. Only two studies evaluated tests at the point of care. For each method of testing, pooled sensitivity and specificity were not associated with the immunoglobulin class measured. The pooled sensitivity of ELISAs measuring IgG or IgM was 84.3% (95% confidence interval 75.6% to 90.9%), of LFIAs was 66.0% (49.3% to 79.3%), and of CLIAs was 97.8% (46.2% to 100%). In all analyses, pooled sensitivity was lower for LFIAs, the potential point-of-care method. Pooled specificities ranged from 96.6% to 99.7%. Of the samples used for estimating specificity, 83% (10 465/12 547) were from populations tested before the epidemic or not suspected of having covid-19. Among LFIAs, pooled sensitivity of commercial kits (65.0%, 49.0% to 78.2%) was lower than that of non-commercial tests (88.2%, 83.6% to 91.3%). Heterogeneity was seen in all analyses. Sensitivity was higher at least three weeks after symptom onset (ranging from 69.9% to 98.9%) compared with within the first week (from 13.4% to 50.3%).
Conclusion Higher quality clinical studies assessing the diagnostic accuracy of serological tests for covid-19 are urgently needed. Currently, available evidence does not support the continued use of existing point-of-care serological tests.
Study registration PROSPERO CRD42020179452.
Accurate and rapid diagnostic tests will be critical for achieving control of coronavirus disease 2019 (covid-19), a pandemic illness caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Diagnostic tests for covid-19 fall into two main categories: molecular tests that detect viral RNA, and serological tests that detect anti-SARS-CoV-2 immunoglobulins. Reverse transcriptase polymerase chain reaction (RT-PCR), a molecular test, is widely used as the reference standard for diagnosis of covid-19; however, limitations include potential false negative results,12 changes in diagnostic accuracy over the disease course,3 and precarious availability of test materials.4 Serological tests have generated substantial interest as an alternative or complement to RT-PCR in the diagnosis of acute infection, as some might be cheaper and easier to implement at the point of care. A clear advantage of these tests over RT-PCR is that they can identify individuals previously infected by SARS-CoV-2, even if they never underwent testing while acutely ill. As such, serological tests could be deployed as surveillance tools to better understand the epidemiology of SARS-CoV-2 and potentially inform individual risk of future disease.
Many serological tests for covid-19 have become available in a short period, including some marketed for use as rapid, point-of-care tests. The pace of development has, however, exceeded that of rigorous evaluation, and important uncertainty about test accuracy remains.5 We undertook a systematic review and meta-analysis to assess the diagnostic accuracy of serological tests for SARS-CoV-2 infection. Our objectives were to evaluate the quality of the available evidence, to compare pooled sensitivities and specificities of different test methods, and to identify study, test, and patient characteristics associated with test accuracy.
Search strategy and selection criteria
Our systematic review and meta-analysis is reported according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines6 (see supplementary file). We searched Ovid-Medline for studies published in 2020, with no restrictions on language. Subject headings/subheadings (when applicable) combined with text words were used for the concepts of covid-19 (or SARS-CoV-2) and serological tests. The supplementary file provides the complete search strategy, run on 6 April 2020 and repeated on 30 April 2020. To identify pre-peer reviewed (preprints) studies, we searched the entire list of covid-19 preprints from medRxiv and bioRxiv (https://connect.medrxiv.org/relate/content/181) initially on 4 April 2020, and again on 28 April 2020. We also considered articles referred by colleagues or identified in references of included studies.
Eligible studies were randomised trials, cohort or case-control studies, and case series, reporting the sensitivity or specificity, or both of a serological test for covid-19. We excluded review articles, editorials, case reports, modelling or economic studies, articles with sample sizes less than five, and studies that only reported analytical sensitivity (ie, dilutional identification of detection limits).7 Three investigators (MB, GT, FAK) independently screened titles and abstracts, and two (MB, GT) independently screened full text papers. We used a sensitive screening strategy at the title or abstract level wherein selection by a single reviewer was sufficient for a study to undergo full text review. A third reviewer (FAK) resolved disagreements between reviewers at the full text stage. In the systematic review and meta-analyses, we included studies when sensitivity or specificity, or both of at least one covid-19 serological test was measured against a reference standard of viral culture or RT-PCR.
In our primary analysis, we estimated pooled sensitivity and specificity by method of serological test. We expected that accuracy would be associated with the immunoglobulin class being measured, as is the case for other coronaviruses.8910 As such, we stratified the primary results by class of immunoglobulin detected.
One investigator (MB) extracted aggregate study level data using a piloted standardised electronic data entry form. For each study, a second reviewer (ZL or EM) verified all entered data. No duplicate data were identified. We collected information on study characteristics (location, design), study populations (age, sex, clinical severity, sources of populations used for estimating specificity), the timing of specimen collection in relation to onset of symptoms, and methodological details about index and reference tests. We categorised the tests by method: enzyme linked immunosorbent assays (ELISAs), lateral flow immunoassays (LFIAs), or chemiluminescent immunoassays (CLIAs). In several studies, investigators assessed the accuracy of more than one test method (eg, ELISA and LFIA) or more than one particular index test (eg, one study evaluated nine different LFIAs). For each particular index test performed in a study, we extracted the numbers needed to construct 2×2 contingency tables. Each evaluation of a particular index test was considered its own study arm. For example, a study that assessed nine LFIAs and two ELISAs on the same set of patients would contribute 11 study arms.
Two reviewers independently assessed risks of bias and applicability concerns using the quality assessment of diagnostic accuracy studies 2 (QUADAS-2) tool, for the domains of patient selection, performance of the index test, performance of the reference test, and flow and timing (for risk of bias only).11 Conflicts were resolved through consensus. We performed a quality assessment for each test method and population. For example, an article that assessed nine LFIAs and two ELISAs on the same set of patients would have two QUADAS-2 assessments (one for the LFIAs and one for the ELISAs).
The main summary measures were pooled sensitivity and pooled specificity, with 95% confidence intervals estimated using bivariate generalised linear mixed models. We specified random effects at the level of the particular study and of the particular test. The study level random effect accounted for correlation of results that could arise from study level factors, such as using the same set of samples to evaluate more than one test in a study. The test level random effect was added to account for differences arising from characteristics of individual tests. When models with two random effects did not converge, we used only the test level random effect.
We first estimated pooled sensitivity and specificity by test method (ELISA, LFIA, CLIA) and immunoglobulin class detected (IgM or IgG, or both). Separately, we reported results from studies evaluating serological tests that measured IgA or total immunoglobulin levels and without meta-analyses owing to small numbers. To describe heterogeneity, we constructed summary receiver operating characteristic (ROC) curves with 95% prediction regions, estimated using bivariate meta-analysis with a test level random effect only, and forest plots. As our models were bivariate, we did not use the I2 statistic. Studies that did not report both sensitivity and specificity were excluded from bivariate meta-analyses.
To assess prespecified variables as potential determinants of diagnostic accuracy, we compared pooled sensitivity and specificity across several subgroups according to: peer review status; reporting of data at the level of patients or samples; the type of SARS-CoV-2 antigen used; whether testing was by commercial kit or an in-house assay; whether the population used to estimate specificity consisted of samples collected before the emergence of SARS CoV-2, individuals without suspected covid-19 tested during the epidemic, individuals with suspected covid-19, or individuals with other viral infections; and the timing of sample collection in relation to the onset of symptoms (during the first week, during the second week, or after the second week). In these analyses, to maximize sample size we pooled data regardless of immunoglobulin class. To do so, we used the combined IgG and IgM result when available, otherwise we used the separate IgG and IgM results. For tests that had a 2×2 table for IgM and another 2×2 table for IgG, both contributed arms, sharing the same test level and study level random effects. Because data were not available to study the association between the timing of sampling and specificity, this analysis was done with univariate models and included studies that only reported sensitivity.
Patient and public involvement
Patients were not involved in the development of the research question or its outcome measures, conduct of the research, or preparation of the manuscript.
Figure 1 shows the selection of studies. Overall, 5014 records (4969 unique) were identified through database searches and two full text articles from hand searches. In total, 4696 records based on screening of titles or abstracts and 235 after full text review were excluded. Forty studies totalling 73 study arms15161718192021222324252627282930313233343536373839404142434445464748495051525354 met the inclusion criteria. Table 1 summarises the studies by test method; the sum of the number of studies exceeds 40 because some evaluated more than one method. Seventy per cent (28/40) of the studies were from China,16171819202122232425262728293031323334353839404145464748 8% (3/40) from Italy,153643 and the remainder from the United States (3/40),425052 Denmark (1/40),51 Spain (1/40),37 Sweden (1/40),53 Japan (1/40),44 the United Kingdom (1/40),49 and Germany (1/40).54 Both sensitivity and specificity were reported in 80% (32/40) of the studies, sensitivity alone in 18% (7/40), and specificity alone in 3% (1/40).33 Among included studies, 50% (20/40) were not peer reviewed. Eighty per cent (32/40) of studies used a case-control design for selecting the study population and 10% (4/40) included outpatient populations. Disease severity was reported in 40% (16/40) and sensitivity stratified by time since symptom onset was reported in 45% (18/40). Several studies used samples rather than individual patients to estimate accuracy. In these studies, one patient could have contributed multiple samples for estimating sensitivity or specificity, or both. Approaches to estimating specificity included using specimens collected before the emergence of covid-19; specimens collected during the epidemic from individuals not suspected of having covid-19, or specimens from individuals with covid-19 symptoms and a negative RT-PCR result for SARS-CoV-2; or specimens from individuals with laboratory confirmed infection with other viruses (respiratory or non-respiratory). Supplementary tables S1 and S2 report the characteristics of each individual study.
Table 2 provides information about the serological (index) and reference tests that were used in the included studies. Supplementary table S3 provides details for each study. Most of the studies evaluated commercial serology test kits (see supplementary table S4 for names). Studies varied for measured immunoglobulin class and antigen target. Among 17 studies that evaluated potential point-of-care tests (LFIAs), only two performed testing at the point of care. Direct testing on whole blood specimens—as would be done at the point of care—was performed in 6/17 (35%) studies of LFIAs, and outcomes of such testing were available for 44 patients across all study arms (2% of LFIAs performed). All 39 studies that reported sensitivity used RT-PCR as the reference standard to rule in SARS-CoV-2 infection, but the type and number of specimens varied.
Figure 2 summarises the QUADAS-2 assessment, and supplementary figure S1 displays each of the 49 individual QUADAS-2 evaluations. For the patient selection domain, a high or unclear risk of bias was seen in 98% (48/49) of QUADAS-2 assessments, mostly related to a case-control design and not using consecutive or random sampling. For the index test domain, 73% (36/49) of assessments concluded a high or unclear risk of bias because it was not clear whether the serological test was interpreted blind to the reference standard or whether the cut-off values for classifying results as positive, negative, or indeterminate were prespecified. For LFIAs (18 of the QUADAS-2 assessments), when test results are subjectively interpreted by a human reader (eg, appearance of a line), a description of the number of readers and assessment of reliability were provided in 17% (3/18) of assessments. For the reference standard domain, we judged the risk of bias as unclear in 94% (46/49) of assessments owing to inadequate details about specimens used for RT-PCR or use of specimens other than nasopharyngeal swabs. We also classified the risk as unclear if fewer than two RT-PCRs were used to rule out infection, or if the number was not reported. Risk of bias from flow and timing was high or unclear in 67% (33/49) owing to missing information or results not stratified by the timing of sample collection in relation to symptom onset. Major applicability concerns for the index test were seen in 29% (14/49) of assessments, mostly owing to LFIA being performed in laboratories and not using point-of-care type specimens.
Table 3 enumerates within study and pooled sensitivity stratified by test type and immunoglobulin class. Within each test method (CLIA, ELISA, LFIA), point estimates were similar between the different types of immunoglobulins, and confidence intervals overlapped. Within each class of immunoglobulin, sensitivity was lowest for the LFIA method. Table 4 reports on specificity. Pooled specificities ranged from 96.6% (95% confidence interval 94.3% to 98.2%) for LFIAs measuring IgM and IgG, to 99.7% (99.0% to 100%) for ELISAs measuring IgM. Pooled specificity for CLIA tests that measured IgM and IgG (n=2) could not be estimated because of non-convergence. For all test methods and immunoglobulin classes, visual inspection of summary ROC curves (supplementary figure S2) and of forest plots (supplementary figure S3) showed important heterogeneity.
Supplementary table S5 provides sensitivity and specificity reported in three studies that used serological test methods other than ELISAs, LFIAs, or CLIAs. Sensitivity or specificity, or both were low for all, with the exception of an IgM enzyme immunoassay in one arm of 16 patients. Supplementary table S6 reports sensitivity and specificity of serological tests that measured IgA (one ELISA, one CLIA)4751 and those measuring total immunoglobulin levels (three ELISAs, one CLIA, one LFIA).203051 All four studies were classified as high risk of bias from patient selection, and unclear risk of bias from performance of the reference standard, and three had high or unclear risk of bias in the domains of index test performance and flow and timing (supplementary figure S1). Sensitivity ranged from 93.1% to 98.6%, and specificity from 93.3% to 100%.
Table 5 reports stratified meta-analyses for evaluating potential sources of heterogeneity in sensitivity and specificity. Peer review was not associated with accuracy. For ELISAs and LFIAs, accuracy estimates at the sample level (ie, in studies when it was possible for patients to contribute more than one sample to the analysis) were similar to estimates using only one sample for each patient. For CLIAs, specificity was higher from studies reported at the sample level. Point estimates for pooled sensitivity and specificity were higher when both surface and nucleocapsid proteins were used, although confidence intervals overlapped. Point estimates of pooled sensitivity were lower for commercial kits versus in-house assays, for all three methods, with the strongest difference seen for LFIAs, where the sensitivity of commercial kits was 65.0% (49.0% to 78.2%) and that of non-commercial tests was 88.2% (83.6% to 91.3%). For all three test methods, pooled specificity was high when measured in populations where covid-19 was not suspected, regardless of whether the sampling had been done before or during the epidemic. For both LFIAs and CLIAs, pooled specificity was lower among individuals with suspected covid-19 compared with other groups; similar data were not available for ELISAs. For LFIAs, specificity was lower when estimated in individuals with other viral infections, but this was not the case for ELISAs or CLIAs.
Table 6 shows pooled sensitivity stratified by the timing of sample collection in relation to symptom onset. Regardless of immunoglobulin class or test method, pooled sensitivity was lowest in the first week of symptom onset and highest in the third week or later. Data on specificity stratified by timing were not available.
Table 7 provides a summary of our main findings, with examples of hypothetical testing outcomes for 1000 people undergoing serological testing in settings with a prevalence of SARS-CoV-2 ranging from 5%, 10%, and 20%. For example, in a population with a true SARS-CoV-2 prevalence of 10%, for every 1000 people tested with an LFIA: among those who had covid-19, 66 will test positive and 34 will be incorrectly classified as uninfected. Among those without covid-19, 869 will test negative and 31 will be incorrectly classified as having antibodies to SARS-CoV-2.
In this systematic review and meta-analysis, existing evidence on the diagnostic accuracy of serological tests for covid-19 was found to be characterised by high risks of bias, heterogeneity, and limited generalisability to point-of-care testing and to outpatient populations. We found sensitivities were consistently lower with the LFIA method compared with ELISA and CLIA methods. For each test method, the type of immunoglobulin being measured—IgM, IgG, or both—was not associated with diagnostic accuracy. Pooled sensitivities were lower with commercial kits and in the first and second week after symptom onset compared with the third week or later. Pooled specificities of each test method were high. However, stratified results suggested specificity was lower in individuals with suspected covid-19, and that other viral infections could lead to false positive results for the LFIA method. These observations indicate important weaknesses in the evidence on covid-19 serological tests, particularly those being marketed as point-of-care tests.
Meaning of the study
The utility of a low cost, rapid, and accurate point-of-care test55 has spurred the development and marketing of several covid-19 LFIA serological tests.56 We found only two studies where LFIA had been performed at the point of care. The low sensitivity of LFIA is of particular concern given that most studies used sample preparation steps that are likely to increase sensitivity compared with the use of whole blood as would be done at the point of care. These observations argue against the use of LFIA serological tests for covid-19 beyond research and evaluation purposes and support interim recommendations issued by the World Health Organization.57
Cautious interpretation of specificity estimates is warranted for several reasons. Importantly, few data were available from people who were tested because of suspected SARS-CoV-2 infection; hence our overall pooled estimates might not be generalisable to people who need testing because of covid-19 symptoms. For CLIAs, the lower specificity among people with suspected covid-19 could be a spurious finding from a false negative RT-PCR result, given that the specificity for CLIAs was high among people with confirmed other viral infections. By contrast, for LFIAs, other viral infections could have contributed to the lower specificity in suspected covid-19.
Our time stratified analyses suggest that current serological tests for covid-19 have limited utility in the diagnosis of acute covid-19. For example, of those tested for covid-19 within one week of symptom onset, on average 44% to 87% will be falsely identified as not having infection. And while sensitivity estimates were higher in the third week or later, even at this time point we found important false negative rates. For example, in people with covid-19 who are tested three weeks after symptom onset, ELISA IgG will misclassify 18% as not having been infected and LFIA IgG will misclassify 30%.
Overall, the poor performance of existing serological tests for covid-19 raises questions about the utility of using such methods for medical decision making, particularly given time and effort required to do these tests and the challenging workloads many clinics are facing. Our findings should also give pause to governments that are contemplating the use of serological tests—in particular, point-of-care tests—to issue immunity “certificates” or “passports.” For example, if an LFIA is applied to a population with a true SARS-CoV-2 prevalence of 10%, for every 1000 people tested, 31 who never had covid-19 will be incorrectly told they are immune, and 34 people who had covid-19 will be incorrectly told that they were never infected.
Strengths and limitations of this review
Our review has several strengths. We used sensitive search strategies and included pre-peer reviewed literature, and although our use of studies published as preprints might be criticised, we found that the peer reviewed literature also had biases. Moreover, preprints have taken an unprecedented larger role58 in discussions and policy making around covid-19—hence the importance of subjecting pre-peer reviewed literature to critical appraisal. Another strength of our review was that two independent reviewers systematically assessed potential sources of bias. Finally, a second investigator verified all data extraction.
Our study also has some limitations. Most importantly, we compared pooled estimates between different study populations. As such, the possibility of confounding exists (eg, from differences in timing of sampling between studies), explaining differences in sensitivity or specificity.59 This approach was taken because few studies performed head-to-head comparisons. We did not perform metaregression as many studies would have been excluded owing to limited reporting of covariates. Another limitation is that as we did not search Embase we might have missed some published studies.
Conclusion and future research
Future studies to evaluate serological tests for covid-19 should be designed to overcome the major limitations of the existing evidence base. This can be readily accomplished by adhering to the fundamentals of the design for diagnostic accuracy studies: a well defined use-case (ie, specific purpose for which the test is being used); consecutive sampling of the target population within the target use-case; performance of the index test in a standardised and blinded manner using the same methods that will be applied in the specialty; and ensuring the reference test is accurate, performed on all participants, and interpreted blind to the results of the index test. To reduce the likelihood of misclassification, the reference standard should consist of RT-PCR performed on at least two consecutive specimens, and, when feasible, include viral cultures. To reduce variability in estimates and enhance generalisability, sensitivity and specificity should be stratified by setting (outpatient versus in-patient), severity of illness, and the number of days elapsed since symptom onset.
In summary, we have found major weaknesses in the evidence base for serological tests for covid-19. The evidence does not support the continued use of existing point-of-care serological tests for covid-19. While the scientific community should be lauded for the pace at which novel serological tests have been developed, this review underscores the need for high quality clinical studies to evaluate these tools. With international collaboration, such studies could be rapidly conducted and provide less biased, more precise, and more generalisable information on which to base clinical and public health policy to alleviate the unprecedented global health emergency that is covid-19.
What is already known on this topic
Serological tests to detect antibodies against severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) could improve diagnosis of coronavirus disease 2019 (covid-19) and be useful tools for epidemiological surveillance
The number of serological tests has rapidly increased, and many are being marketed for point-of-care use
The evidence base supporting the diagnostic accuracy of these tests, however, has not been formally evaluated
What this study adds
The available evidence on the accuracy of serological tests for covid-19 is characterised by risks of bias and heterogeneity, and as such, estimates of sensitivity and specificity are unreliable and have limited generalisability
Evidence is particularly weak for point-of-care serological tests
Caution is warranted if using serological tests for covid-19 for clinical decision making or epidemiological surveillance
Current evidence does not support the continued use of existing point-of-care tests
We thank Geneviève Gore, McGill University Academic Librarian, for assistance in developing the search strategy for Ovid-Medline and the preprint literature, and Coralie Gesic for designing the forest plots. The study protocol is available at www.crd.york.ac.uk/prospero/display_record.php?RecordID=179452.
Contributors: MLB and FAK (equally) and AB, DM, and JC conceived the study and study design. MLB drafted the initial search strategy and executed the search. MLB, GT, and FAK screened the studies. MLB, GT, SL, EM, AT, ZL, and SA extracted data extraction and performed the quality assessment. MLB, AB, FAK, and ZL analysed the data. MLB and FAK wrote the first draft of the manuscript. FAK is the guarantor. All authors interpreted the data and wrote and critically reviewed the manuscript and all revisions. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.
Funding: This study was funded (publication costs) by a grant (ECRF-R1-30) from the McGill Interdisciplinary Initiative in Infection and Immunity (MI4). MB is supported by the Canadian Institutes of Health Research (award #FRD143350). JRC is supported by Fonds de Recherche Sante Quebec (FRSQ award #258907 and #287869). SL holds a research training award from the FRSQ. AB holds a research salary award from the FRSQ. AT is supported by Conselho Nacional de Ensino, Pesquisa e Desenvolvimento Tecnológico (award #303267/2018-6). FAK receives salary support from the McGill University Department of Medicine. MI4 and these agencies had no input into the study design, data collection, data analysis or interpretation, report writing, or the decision to submit the paper for publication.
Competing interests: Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf and declare: SL reports personal fees from Carebook Technologies, outside the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.
Ethical approval: Not required.
Data sharing: Data can be requested from the corresponding author.
The study guarantor (FAK) affirms that this manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) have been explained.
Dissemination to participants and related patient and public communities: The results of the meta-analysis will be disseminated to patients, providers, policy makers through social media, and academic and institutional networks.
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.