SARS-CoV-2 lateral flow assays for possible use in national covid-19 seroprevalence surveys (React 2): diagnostic accuracy study

Abstract Objective To evaluate the performance of new lateral flow immunoassays (LFIAs) suitable for use in a national coronavirus disease 2019 (covid-19) seroprevalence programme (real time assessment of community transmission 2—React 2). Design Diagnostic accuracy study. Setting Laboratory analyses were performed in the United Kingdom at Imperial College, London and university facilities in London. Research clinics for finger prick sampling were run in two affiliated NHS trusts. Participants Sensitivity analyses were performed on sera stored from 320 previous participants in the React 2 programme with confirmed previous severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection. Specificity analyses were performed on 1000 prepandemic serum samples. 100 new participants with confirmed previous SARS-CoV-2 infection attended study clinics for finger prick testing. Interventions Laboratory sensitivity and specificity analyses were performed for seven LFIAs on a minimum of 200 serum samples from participants with confirmed SARS-CoV-2 infection and 500 prepandemic serum samples, respectively. Three LFIAs were found to have a laboratory sensitivity superior to the finger prick sensitivity of the LFIA currently used in React 2 seroprevalence studies (84%). These LFIAs were then further evaluated through finger prick testing on participants with confirmed previous SARS-CoV-2 infection: two LFIAs (Surescreen, Panbio) were evaluated in clinics in June-July 2020 and the third LFIA (AbC-19) in September 2020. A spike protein enzyme linked immunoassay and hybrid double antigen binding assay were used as laboratory reference standards. Main outcome measures The accuracy of LFIAs in detecting immunoglobulin G (IgG) antibodies to SARS-CoV-2 compared with two reference standards. Results The sensitivity and specificity of seven new LFIAs that were analysed using sera varied from 69% to 100%, and from 98.6% to 100%, respectively (compared with the two reference standards). Sensitivity on finger prick testing was 77% (95% confidence interval 61.4% to 88.2%) for Panbio, 86% (72.7% to 94.8%) for Surescreen, and 69% (53.8% to 81.3%) for AbC-19 compared with the reference standards. Sensitivity for sera from matched clinical samples performed on AbC-19 was significantly higher with serum than finger prick at 92% (80.0% to 97.7%, P=0.01). Antibody titres varied considerably among cohorts. The numbers of positive samples identified by finger prick in the lowest antibody titre quarter varied among LFIAs. Conclusions One new LFIA was identified with clinical performance suitable for potential inclusion in seroprevalence studies. However, none of the LFIAs tested had clearly superior performance to the LFIA currently used in React 2 seroprevalence surveys, and none showed sufficient sensitivity and specificity to be considered for routine clinical use.


PARTICIPANTS
Sensitivity analyses were performed on sera stored from 320 previous participants in the React 2 programme with confirmed previous severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection. Specificity analyses were performed on 1000 prepandemic serum samples. 100 new participants with confirmed previous SARS-CoV-2 infection attended study clinics for finger prick testing.

INTERVENTIONS
Laboratory sensitivity and specificity analyses were performed for seven LFIAs on a minimum of 200 serum samples from participants with confirmed SARS-CoV-2 infection and 500 prepandemic serum samples, respectively. Three LFIAs were found to have a laboratory sensitivity superior to the finger prick sensitivity of the LFIA currently used in React 2 seroprevalence studies (84%). These LFIAs were then further evaluated through finger prick testing on participants with confirmed previous SARS-CoV-2 infection: two LFIAs (Surescreen, Panbio) were evaluated in clinics in June-July 2020 and the third LFIA (AbC-19) in September 2020. A spike protein enzyme linked immunoassay and hybrid double antigen binding assay were used as laboratory reference standards.

Introduction
A detailed understanding of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) seroprevalence is key to public health policy and in anticipating  (covid-19) pandemic. In contrast to routine serology assays, the use of lateral flow immunoassays (LFIAs) does not require the support of central laboratories and offers a rapid, scalable, and affordable method of testing. This approach has been used in Spain 1 and in our own React 2 (real time assessment of community transmission 2) study in England 2 to conduct national seroprevalence studies 2 and to monitor the persistence of SARS-CoV-2 antibodies. 3 The React 2 programme consists of participants self-administering LFIAs at home. Participants complete an online questionnaire and read the results by using uploaded test images. 2 The first round of the programme consisted of more than 100 000 participants and was completed on 13 July 2020. The results showed a SARS-CoV-2 antibody prevalence of approximately 6% nationally, with ethnic minority groups, healthcare workers, and care home workers disproportionately affected. 2 Two subsequent rounds of surveillance have now been completed, all using the same assay (Fortress Diagnostics, Northern Ireland), which was selected after rigorous clinical and laboratory evaluation 4 and engagement with the public and participants. 5 Diagnostics development continues at pace internationally, with over 200 LFIAs commercialised. 6 Despite this effort, no LFIA evaluated to date meets the Medicines and Healthcare products Regulatory Agency criteria for approval for individual testing in the United Kingdom, 7 which requires the sensitivity (proportion of people with SARS-CoV-2 infection with a positive test result) and specificity (proportion of people without SARS-CoV-2 infection with a negative test result) to exceed 98%. When used in population studies, analyses can adjust for the performance characteristics of tests that do not meet such stringent criteria. However, to ensure these adjustments are accurate, evaluation in the intended setting of use is required. At present, the evaluation of LFIAs has focused on performance in the laboratory. 8 9 Rigorous clinical evaluation of SARS-CoV-2 LFIAs of different population subgroups (such as people admitted to hospital compared with those not admitted to hospital, or people with severe symptoms compared with those without symptoms) is urgently needed to enhance generalisability and reduce variability in sensitivity and specificity estimates. 10 In this study, we continue our programme of evaluating LFIAs that have been prioritised from published evaluations and that have the potential for large scale application. The primary objective was to establish the sensitivity and specificity of new LFIAs, and to identify the most suitable candidate for deployment in future rounds of the React 2 study and potentially for individual use. A new LFIA would be considered to replace the Fortress LFIA in future rounds of React 2 seroprevalence surveys if it showed significantly superior sensitivity on finger prick testing and the capacity to be procured rapidly and at scale.

Methods
The methods reported here describe round 2 of our LFIA evaluation (React 2, study 1) and are based on the same principles as the round 1 methods. 4 However, in contrast to round 1, LFIAs were only evaluated on capillary blood in the clinic if they showed equivalent or superior sensitivity and specificity on testing of sera in the laboratory compared with the Fortress LFIA currently being used for the React 2 seroprevalence studies. A comprehensive React protocol has been published and describes the study design, sampling size and strategy, and data collection and analysis of the various ongoing React 1 and React 2 studies. 11 Laboratory assessment of sensitivity and specificity using sera Initial assessment of LFIA sensitivity used sera stored from 320 participants from round 1 with a previous positive test for SARS-CoV-2 by reverse transcription polymerase chain reaction (RT-PCR) on nasopharyngeal swab. Supplementary figure i presents a detailed summary of the flow of participants during sensitivity and specificity analysis in the laboratory. Sera were stored at −80ºC, subjected to a maximum of two freezethaw cycles, and brought to room temperature before testing. Research technicians blinded to the reference standard assay results performed LFIA testing in the laboratory. The two reference standards used in this study-the in-house SARS-CoV-2 spike protein enzyme linked immunoassay (S-ELISA) and a hybrid double antigen binding assay (hybrid DABA)-have been shown to have high sensitivity and specificity; these reference standards have been described previously. 4 The viral antigen used in the S-ELISA is the SARS-CoV-2 spike protein whereas the hybrid DABA uses the SARS-CoV-2 spike protein and receptor binding domain. This variation results in inherent differences in the detection of antibody classes between the two in-house ELISAs; therefore, a composite outcome of a positive result on S-ELISA or hybrid DABA was used as the benchmark for sensitivity analysis throughout this study. Research technicians performing LFIA testing in the laboratory were aware of whether they were testing LFIAs for sensitivity or specificity, but were blinded to clinical information about participants and the reference standard assay. Assessors of the reference standard were blinded to clinical information and LFIA test results, and repeated any borderline positive or negative samples to give a determinate result.
Sensitivity in the laboratory for detection of SARS-CoV-2 antibodies was estimated for each LFIA using a minimum of 200 sera and compared with positive results on the S-ELISA or hybrid DABA. When testing the AbC-19 LFIA, a scoring card supplied by the manufacturer was used to grade the intensity of immunoglobulin G (IgG) bands on a scale of 1-10 (supplementary table i). All other LFIA results were interpreted as either IgG positive or negative. Specificity analysis was performed on prepandemic sera collected as part of the Airwave Health Monitoring Study before August 2019. 12 Round 2 used a different cohort of 500 prepandemic sera (Airwave2) from that used previously (Airwave1). 4 Test selection for clinic In round 1 of the React 2 programme, 4 five LFIAs were initially evaluated using finger prick testing in the clinic. These LFIAs also underwent sensitivity and specificity analyses on sera from the assembled cohort and 500 prepandemic sera, respectively. A further six LFIAs underwent sensitivity analysis and four of these achieved sufficient sensitivity to proceed to specificity testing on 500 prepandemic sera. In round 1, specificity was performed on LFIAs that showed a sensitivity of more than 80%. The two best performing LFIAs, which also showed high specificity scores of 99.8% (Surescreen, Panbio), were identified for potential clinic evaluation in round 2 ( fig 1).
In this study, we report the results of round 2 of the study in which seven further LFIAs were initially selected for evaluation based on the manufacturer's performance and published data, if available (supplementary table ii), before undergoing sensitivity analysis. LFIAs with a laboratory sensitivity greater than the sensitivity of the Fortress LFIA (>84%) proceeded to specificity testing on 500 prepandemic sera. The AbC-19 LFIA was selected for clinic evaluation because it showed the best performance in the laboratory and it has the potential to be procured at scale. Altogether, three LFIAs were selected for testing in the clinic in round 2: Panbio (Abbott 13 ), AbC-19 (TT3, Abingdon rapid test consortium 14 ), and Surescreen. 15 Participant recruitment in clinic Clinical recruitment of participants took place over two periods. Round 2a ran from 17 June to 2 July 2020 and tested the Panbio and Surescreen tests in parallel in the same participants. Round 2b ran from 4 to 21 September 2020 and tested the AbC-19 LFIA. People who worked in one of five NHS hospitals in two NHS trusts were invited to participate. Additionally, participants who had been involved in previous rounds were invited to reattend to test new LFIAs. No participants tested the same LFIA more than once. Supplementary figure ii presents a detailed summary of the flow of participants during sensitivity evaluation in the clinic.
Eligibility criteria were broadened in round 2 (supplementary fig iii). People who were known to be or had been seropositive based on formal laboratory antibody testing performed before attending the study clinic, or people with PCR confirmed SARS-CoV-2 infection (or both) were included. Additionally, family members of staff could participate provided they had also received a positive PCR result or were previously confirmed to be seropositive on formal laboratory antibody testing. Finally, participants who were admitted to hospital with covid-19 and were previously excluded from the study could also take part. Participants were enrolled after a minimum of 21 days had passed from symptom onset or positive PCR result (whichever occurred earlier). People considered seropositive for SARS-CoV-2 antibodies on previous finger prick LFIA only were excluded.
Clinic procedure Study clinics were run at two sites in round 2a and at a single site in round 2b. Participants were required to provide evidence of the result and date of a previous positive SARS-CoV-2 PCR or laboratory based antibody test, and to complete a questionnaire. The questionnaire included demographic information, medical history, and information detailing the timing, duration, and severity of illness caused by SARS-CoV-2 infection.
Each participant performed one or two LFIAs using capillary blood through finger prick, under the supervision of a research nurse or practitioner, before analysis against reference standards. Participants followed the protocol provided by the manufacturer and verbal instructions from trained research staff in the clinic to ensure that the test was performed correctly. Interpretation of the LFIA result by the participant and trained observer was recorded independently and photographs of the completed tests were obtained. At each attendance, a venous blood sample was taken for laboratory testing. Invalid or failed tests, where the control line was absent, were excluded from the analysis and participants repeated the test.
To enable direct comparison of performance with capillary blood in clinic and sera in the laboratory, one LFIA (AbC-19) was retested by a research technician using matched sera from clinic participants, according to the manufacturer's protocol (supplementary table i).

Sample size
Sample size was calculated assuming 90% power, SARS-CoV-2 infection prevalence of 100%, and expected test sensitivity of 85%. To evaluate sensitivity with a two sided delta of 10%, a target sample size of 153 participants was calculated. For specificity, a sample size of 361 was calculated based on an expected specificity of 98% with a lower limit of 95%.

Performance analysis
Statistical analysis was performed as previously described. 4 The primary outcome of the study was the sensitivity and specificity of each LFIA in detecting SARS-CoV-2 IgG antibodies. Sensitivity analysis included performance on finger prick selftesting (participant interpretation), finger prick selftesting (observer interpretation), and serum in the laboratory. Two comparisons were made: against confirmed previous SARS-CoV-2 infection (by PCR swab or previous laboratory antibody test) and against confirmed positive results by S-ELISA or hybrid DABA from venous samples taken at the study clinic appointment. As previously described, specificity was calculated as the proportion of known negative samples that were negative on the LFIAs. Data are presented using a binomial confidence interval of 95% and significance was denoted by a P value less than 0.05.
For comparison of clinic and laboratory performance of individual LFIAs, agreement was assessed using the κ statistic with the following interpretation: less than 0, poor agreement; 0.00-0.20, slight agreement; 0.21-0.40, fair agreement; 0.41-0.6, moderate agreement; 0.61-0.8, good agreement; and higher than 0.8, almost perfect agreement 16 Analysis of antibody concentrations is presented using quantitative S-ELISA data for round 2a and 2b. S-ELISA titres showed a skewed distribution and were log 10

Patient and public involvement
Public involvement and participant feedback have been central to the design of the React 2 programme. There has been extensive involvement from patient panels and rigorous evaluation of the usability of LFIAs included in the React 2 studies has been undertaken. 5 User expressed feedback during clinics, and formal evaluation of instruction materials provided to manufacturers through patient panels have been incorporated in reports to companies.    (supplementary fig ii). Table  3 and supplementary table iv present their baseline characteristics. The sensitivity of the Surescreen and Panbio LFIAs, prioritised from round 1, were 88% (82.5% to 92.2%) and 91% (85.5% to 94.3%), respectively on testing of sera, and both had 99.8% specificity (98.9% to 100%).

Results
The Surescreen LFIA tested in the clinic attained a sensitivity of 86% (72.7% to 94.8%) compared with the benchmark ELISAs, which was not significantly different (P=0.8) than it had shown against round 1 sera in the laboratory or significantly higher (P=0.772) than the Fortress LFIA. By contrast, AbC-19 and Panbio tests showed a lower sensitivity on finger prick capillary blood in the clinic compared with sera in the laboratory (table 1, table 2 , and fig 2). The sensitivity of Panbio dropped significantly (P=0.018) from 91% (85.5% to 94.3%) in the laboratory with round 1 sera to 77% (61.4% to 88.2%) on finger prick blood in clinic versus S-ELISA or hybrid DABA.
Despite attaining the highest sensitivity (100%, 98.1% to 100%) during laboratory testing with stored sera in the laboratory, the sensitivity of AbC-19 LFIA with capillary blood through finger prick also reduced significantly (P<0.001) to 69% (53.8% to 81.3%) versus S-ELISA or hybrid DABA. Supplementary table v includes additional sensitivity estimates for each LFIA by method of SARS-CoV-2 diagnosis (PCR v previous formal laboratory antibody testing) and by symptom severity.
Given the possibility that the cohort tested on AbC-19 had low antibody titres, matched serum samples from the participants evaluated in the clinic were tested in the laboratory on the AbC-19 LFIA and a sensitivity of 92% (80.0% to 97.7%) was found. This result was significantly higher than the sensitivity on finger prick testing in the clinic performed on the same participants (P=0.007). Concordance between the sensitivity for finger prick and for sera from these matched samples, determined by κ score, was slight (0.07, 95% confidence interval −0.14 to 0.28) in contrast to moderate (0.56, 0.25 to 0.86) for matched samples previously tested on the Fortress LFIA.
Because of low rates of new SARS-CoV-2 infection locally at the time of round 2 clinic testing, we

Discussion
This study shows that LFIA sensitivity is variable on serum and finger prick testing, and often differs from that stated by the manufacturer. Specificity of all LFIAs that underwent this analysis was high. One further LFIA (Surescreen) is identified as suitable for use in seroprevalence studies because it showed comparable performance to the LFIA currently used in the React 2 seroprevalence studies (Fortress). However, the performance of Surescreen was not significantly better than Fortress; as a result, Fortress is still considered the most suitable candidate for ongoing rounds of the React 2 programme. LFIAs remain an important tool in the assessment of population seroprevalence of SARS-CoV-2 infection. Evaluation of new LFIAs often relies on performance using sera in the laboratory with only a minority of studies evaluating whole blood or capillary finger prick testing, which ultimately is the intended use. 17 An accurate assessment of the performance of LFIAs with capillary blood is key to interpreting large scale seroprevalence studies, and before clinical implementation, given the reduced sensitivity compared with laboratory analysis for some tests. 4 18 Laboratory testing is an essential component of LFIA evaluation. 15 18 In this study, as in previous work, we have shown that most LFIAs evaluated had lower sensitivity or specificity than reported in preliminary results by the manufacturers. Only three LFIAs in round 2, and eight of 18 LFIAs evaluated in the React 2 programme to date showed sufficient sensitivity and specificity during analysis on sera to justify progression to clinic testing.
Of the three LFIAs tested in the clinic in this study, two showed a significant difference in sensitivity between serum and finger prick testing in two tests. AbC-19, the best performing test in the laboratory (100% sensitivity, 95% confidence interval 98.1% to 100.0%; and 99.8% specificity, 98.9% to 100.0%), showed the lowest sensitivity (69%, 53.8% to 81.3%) compared with S-ELISA or hybrid DABA upon finger prick testing in the clinic. Of the remaining two tests, only the Surescreen LFIA showed a marginally higher sensitivity than the Fortress LFIA, which has been used in previous rounds of the React national seroprevalence study.  Several potential reasons could explain differences in test performance between the laboratory and clinic. A possibility is that with sequential testing, participants recruited later after acute infection are likely to have lower antibody titres. Time elapsed post symptom onset was considerably different between all three rounds. Additionally, by broadening the inclusion criteria to include those with positive serology only (as opposed to PCR positivity), later cohorts might represent milder disease or be more likely to contain participants with false positive results. However, these more varied presentations of previous SARS-CoV-2 infection are a more accurate reflection of the population for which the LFIAs are intended to be used than healthcare employees testing positive on previous PCR alone.
Distribution of antibody titres was substantially different across all three periods of testing. A rapid fall in new SARS-CoV-2 infections locally after the first wave of the pandemic, and the timing at which new LFIAs became available for evaluation meant that there was a considerable difference in time since symptom onset in rounds 1, 2a, and 2b. Therefore, we considered whether a possible threshold effect relating to antibody concentration could account for a drop off in sensitivity observed between laboratory and clinic testing. However, on evaluation of the lowest quarter of serum antibody concentration from samples across all three rounds, the Fortress LFIA detected a numerically higher proportion of samples with a low antibody titre on finger prick testing than AbC-19. Additionally, the median antibody titre in round 2a was higher than that of round 1; despite this, the Panbio LFIA also showed a (non-significant) difference in sensitivity between serum and finger prick testing. Finally, the considerable difference in sensitivity on the AbC-19 LFIA between finger prick and sera testing on matched samples from round 2b suggests other factors in test design and use could be more important.

Study limitations
Several limitations are acknowledged in this study. Important differences were found between cohorts and corresponding antibody titres. Additionally, while laboratory staff were blinded to the results of LFIA interpretation on finger prick testing, participants and research staff in the clinic were aware that all participants had evidence of previous SARS-CoV-2 infection. Therefore, LFIA interpretation in the clinic was not blinded, which could have led to an overestimation of LFIA sensitivity on self-testing. In round 2a, participants were evaluated on two LFIAs in the same appointment and it is possible that the result of the test evaluated first could have influenced the interpretation of the second test.

Conclusions
This study confirms the importance of assessing LFIAs in the intended population because laboratory results might not accurately predict performance in the clinic. We have shown that analyses should account for changes in antibody levels over time and the comparison of tests on a consistent cohort of laboratory sera remains an important part of the evaluation. Through a robust approach to LFIA evaluation, we characterised the performance of nine LFIAs, and identified one new LFIA with performance comparable to the Fortress LFIA, which has been used successfully in large seroprevalence studies. For our React 2 programme, a new LFIA would have to perform considerably better than the Fortress LFIA to outweigh the scientific value in repeating seroprevalence surveys using the same assay. At this time, no LFIA offers enough improvement in performance to merit inclusion in subsequent React 2 seroprevalence studies. Additionally, no LFIA has reached the standard set by the Medicines and Healthcare products Regulatory Agency for individual use. We thank all the participants who volunteered for finger prick testing to help with this study. We extend our gratitude to Margaret-Anne Bevan, Helen Stockmann, Danielle Davy, Chloe Wood, Billy Hopkins, Miranda Cowen, Norman Madeja, Nidhi Gandhi, Vaishali Dave, and Narvada Jugnee who ran the React 2 antibody testing clinics. We are grateful for the support from the Imperial National Institute for Health