Accuracy of urinary human papillomavirus testing for presence of cervical HPV: systematic review and meta-analysisBMJ 2014; 349 doi: https://doi.org/10.1136/bmj.g5264 (Published 16 September 2014) Cite this as: BMJ 2014;349:g5264
- Neha Pathak, academic foundation year 2 doctor1,
- Julie Dodds, senior clinical trials manager1,
- Javier Zamora, senior lecturer in perinatal statistics12,
- Khalid Khan, professor of women’s health and clinical epidemiology1
- 1Women’s Health Research Unit, Centre for Primary Care and Public Health, Blizard Institute, Barts and The London School of Medicine and dentistry, London E1 2AB, UK
- 2Clinical Biostatistics Unit, Hospital Ramon y Cajal (IRYCIS) and CIBER Epidemiologia y Salud Publica, Madrid, Spain
- Correspondence to: J Zamora
- Accepted 15 August 2014
Objective To determine the accuracy of testing for human papillomavirus (HPV) DNA in urine in detecting cervical HPV in sexually active women.
Design Systematic review and meta-analysis.
Data sources Searches of electronic databases from inception until December 2013, checks of reference lists, manual searches of recent issues of relevant journals, and contact with experts.
Eligibility criteria Test accuracy studies in sexually active women that compared detection of urine HPV DNA with detection of cervical HPV DNA.
Data extraction and synthesis Data relating to patient characteristics, study context, risk of bias, and test accuracy. 2×2 tables were constructed and synthesised by bivariate mixed effects meta-analysis.
Results 16 articles reporting on 14 studies (1443 women) were eligible for meta-analysis. Most used commercial polymerase chain reaction methods on first void urine samples. Urine detection of any HPV had a pooled sensitivity of 87% (95% confidence interval 78% to 92%) and specificity of 94% (95% confidence interval 82% to 98%). Urine detection of high risk HPV had a pooled sensitivity of 77% (68% to 84%) and specificity of 88% (58% to 97%). Urine detection of HPV 16 and 18 had a pooled sensitivity of 73% (56% to 86%) and specificity of 98% (91% to 100%). Metaregression revealed an increase in sensitivity when urine samples were collected as first void compared with random or midstream (P=0.004).
Limitations The major limitations of this review are the lack of a strictly uniform method for the detection of HPV in urine and the variation in accuracy between individual studies.
Conclusions Testing urine for HPV seems to have good accuracy for the detection of cervical HPV, and testing first void urine samples is more accurate than random or midstream sampling. When cervical HPV detection is considered difficult in particular subgroups, urine testing should be regarded as an acceptable alternative.
Human papillomavirus (HPV) is one of the commonest sexually transmitted infections. Up to 80% of sexually active women are infected at some point in their lives and 10-20% develop persistent infection.1 Infection with specific strains of HPV has been associated with the development of cervical cancer,2 a preventable and treatable disease for which routine screening using a cervical cytology based method is employed to detect precancerous cervical intraepithelial neoplasia (CIN). Despite screening, cervical cancer is still the most common malignancy in women aged less than 35, and there has been a downward trend in coverage of screening in this population.3 4 This may partly be because the current screening by cervical cytology sampling is invasive, is time consuming, and requires a clinician.
The detection of HPV in the cervix is being piloted as a new method of cervical cancer screening and is recommended for secondary prevention.5 This is based on four randomised controlled trials and a pooled analysis of these, which showed that HPV detection is more protective against grade 3 CIN and invasive cervical cancer compared with current screening methods.6 7 8 9 However, cervical HPV detection shares many of the problems of current cytology based screening programmes. It is still invasive and time consuming and is unlikely to resolve the problem of poor screening uptake. Detection of HPV in urine would offer a more accessible and acceptable method.10 It could also be used for post-vaccination HPV surveillance programmes, where pelvic examination is not practical. Published reviews assessing detection of urine HPV conclude that urine sampling is a feasible alternative to cervical sampling. However, they do not include a meta-analysis of test accuracy.11 12We conducted a systematic review and meta-analysis to determine the accuracy of detection of HPV in urine compared with the cervix in sexually active women.
A prospective protocol was registered on PROSPERO (identification number CRD42013006928).13 This review was performed using recommended methods and reported in accordance with the preferred reporting items for systematic reviews and meta-analyses (PRISMA) statement.14 15
We searched several electronic sources from inception to December 2013: Medline, Embase, the Cochrane Library, Web of Science, BIOSIS, DARE, and SIGLE. MeSH and free text combinations using Boolean logic of the following search terms were used: urin*, self, home, test*, detect*, screen*, diagnos*, DNA, deoxyribonucleic acid, polymerase chain reaction, NAAT, NAT, nucleic acid test, nucleic acid amplification test, HPV, human papillomavir*, cervical cancer, and cervical pre-cancer. We manually searched recent issues of relevant publications and the reference lists of included texts and relevant articles. Experts were contacted for additional studies and data. There were no language restrictions.
Eligibility criteria were any test accuracy study where the detection of HPV DNA in urine was compared with its detection in the cervix in any sexually active woman concerned about HPV infection or the development of cervical cancer. We excluded studies if a different or no reference standard was used. We included studies in the meta-analysis if 2×2 tables could be constructed from published or requested data. Certain factors can overestimate the diagnostic value of a test.16 Therefore we excluded studies from the meta-analysis if they used case-control designs, tested only patients with cervical cancer, or the total number of non-infected participants was zero.
Study selection and data extraction
We screened all titles and abstracts for relevant studies. Two reviewers (NP and JD) independently reviewed full texts for final selection. They documented reasons for exclusion.
We developed a data extraction sheet, piloted it on randomly selected studies, and refined it appropriately. Two reviewers (NP and JD) extracted the following data independently: study characteristics (authors, year of publication, country, context and purpose of testing), patient characteristics (including mean age and range, HIV status, cytology and biopsy results), characteristics of the index test (urine sample type, sample volume, storage temperature, DNA extraction method, DNA amplification method, timing of test in relation to reference standard), and accuracy of results into 2×2 tables of urine positivity versus cervical swab positivity for any HPV, high risk HPV, and HPV 16 and 18. We considered the following HPV strains to be high risk: 16, 18, 31, 33, 35, 39, 45, 51, 52, 56, 58, 59, 68, 73, and 82. We emailed study authors for missing data.
We discussed all discrepancies and involved a third independent reviewer (KK) if the discrepancy could not be resolved.
Assessment of study quality
We applied the QUADAS-2 tool to all studies.17 Quality assessment involved scrutinising patient selection, conduct of the index test, conduct of the reference standard, and patient flow. We considered a study to be high quality if it used an appropriate patient spectrum, it used consecutive or random recruitment of participants, all participants used the same reference standard, the index and reference standard were performed within two weeks, and the majority of recruited participants were included in analyses.
The following were considered to be inappropriate patient spectrums that introduced bias as a result of a higher prevalence of HPV: populations comprising only patients with HIV, cervical cancer, or high grade CIN, or whose age was below current screening recommendations. We did not consider lack of blinding to test results as posing a high risk of bias, as the HPV test is objective. We assessed publication bias by regressing log(DOR) on inverse root squared of the effective sample size. However, this result should be interpreted cautiously given the lack of statistical power of this test and the absence of consensus on adequate methods to detect publication bias.15 18
Data synthesis was performed according to a priori hypotheses outlined in the protocol. We constructed 2×2 tables of detection of any HPV, high risk HPV, and HPV 16 and 18. For these three groups we fitted bivariate mixed effects logistic regression analysis. From the estimates we derived a summary receiver operating characteristic curve and the following summary accuracy measures with 95% confidence intervals: sensitivity (true positive rate), test specificity (true negative rate), positive likelihood ratio, and negative likelihood ratio. Where studies used more than one method of urine HPV testing, we included the method that was most similar to that of other studies in this review.
To visually explore heterogeneity, we generated forest plots for test sensitivity (true positive rate) and test specificity (true negative rate) with 95% confidence intervals for individual studies. To investigate sources of heterogeneity for both sensitivity and specificity, we included in the bivariate mixed effects models the following planned covariates: purpose of testing (HPV surveillance versus cervical cancer screening and follow-up of CIN), mean age, HIV status (positive versus negative for antibodies to HIV), prevalence of low grade or worse intraepithelial lesions on cytology, prevalence of grade 2 or worse CIN on biopsy, urine sampling method (first void urine versus random and midstream urine), HPV detection method (real time polymerase chain reaction (PCR) and nested PCR versus conventional PCR), use of non-commercial versus commercial DNA extraction methods, use of non-commercial versus commercial DNA amplification methods, and low versus high risk of bias as a result of patient selection. Owing to the restricted number of studies, we entered only one covariate in each analysis. We also did a sensitivity analysis to investigate the effect of studies including a narrow patient spectrum.
Statistical analyses were performed using the metandi and midas functions in STATA (version 13.0), and using METADAS macro in SAS (version 9.3).
Figure 1⇓ summarises the identification and selection of studies. Of the 1373 potential records, 23 articles reporting on 21 studies (2277 sexually active women) were included in the systematic review.10 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Of these, 16 articles reporting on 14 studies (1535 women recruited, 1443 women analysed) were included in the meta-analysis.19 20 21 22 24 27 28 29 30 31 34 35 36 37 38 39
Description of studies
Supplementary appendix 1 details the characteristics of individual studies. Twelve out of 21 study populations were recruited from gynaecology or colposcopy outpatient clinics and seven from genitourinary medicine or HIV clinics. For most study populations the purpose of testing was for cervical cancer screening (15/21). The remainder were for HPV surveillance (5/21) or follow-up of CIN (1/21). Four out of the 21 populations were positive for HIV. Of the 11 populations with reported cytology results, 35.9% (304/847) of women had low grade dysplasia or worse. Of the 10 populations with reported biopsy results, 54.1% (385/712) of women had grade 2 or worse CIN and 17.0% (121/712) had biopsy proved cervical cancer. Most of the studies used conventional PCR (18/21), but testing methods were not uniform. Two of the 21 studies used nested PCR and one out of the 21 used PCR based DNA microarray.32 33 34 Three studies evaluated quantitative real time PCR and hybrid capture in addition to conventional PCR.10 23 36 In these cases, only the results for conventional PCR were included in the meta-analysis. The majority of urine sampling was first void (12/21). Other sampling methods included random (2/21), midstream (2/21), morning (1/21), and not specified (4/21). Urine storage temperature ranged from −70°C to 4°C. Sixteen studies used commercial DNA extraction kits and 11 used commercial amplification platforms. The remainder used in-house methods. The reference standard in all studies was a cervical sample taken by a clinician to test for HPV DNA.
Quality of studies
Figure 2⇓ outlines the quality assessment of studies included in the meta-analysis. All included studies avoided case-control designs and most studies (9/14) used consecutive or random recruitment of participants. Six studies had a high risk of bias for patient selection owing to narrow patient spectrums: four articles reported on three studies of only patients with HIV,27 28 35 39 two studies reported on only adolescents,21 36 and one study reported on only patients with high grade CIN.24 All studies had a low risk of bias owing to patient flow and timing; 13/14 analysed all recruited participants and one analysed 94% of recruited participants. There was an appropriate interval between tests, with 8/14 studies completing both tests on the same day and urine samples being taken before cervical samples. All studies had a low risk of bias for the conduct of the reference standard. Five of the 14 studies used in-house methods for the index test and did not specify a threshold. These were rated as having an unclear risk of bias. The remainder was rated as low risk of bias as they used a prespecified index test threshold (9/14). Only one study reported blinding to test results,37 although DNA testing is objective and should not result in bias. Regarding applicability of studies to the review questions, there were no concerns about the index test or reference standard. Most studies (19/21) were rated as low concern of applicability for patient selection to the review question. The two rated as having high concerns were studies of adolescents only, because they included patients who would not normally be screened.21 36 We found no significant asymmetry in the funnel plot (P=0.62) and hence no evidence of publication bias.
Supplementary appendix 2(a-c) illustrates the variation in sensitivity and specificity between individual studies for urine detection of any HPV (14 studies), high risk HPV (11 studies), and HPV 16 and 18 (11 studies). For urine detection of any HPV, individual sensitivities ranged from 53%19 to 99%35 and specificities from 38%27 to 99%.31 For urine detection of high risk HPV, individual sensitivities ranged from 50%19 to 98%30 and specificities from 17%35 to 99%.22 For urine detection of HPV 16 and 18, individual sensitivities ranged from 23%20 to 97%30 and specificities from 56%28 39 to 99%.31
Figure 3⇓ summarises the pooled sensitivity and specificities as summary receiver operating curves for the same three groups. Urine detection of any HPV had a pooled sensitivity of 87% (95% confidence interval 78% to 92%) and specificity of 94% (95% confidence interval 82% to 98%). Urine detection of high risk HPV had a pooled sensitivity of 77% (68% to 84%) and specificity of 88% (58% to 97%). Urine detection of HPV 16 and 18 had a pooled sensitivity of 73% (56% to 86%) and specificity of 98% (91% to 100%). The 95% prediction regions consistently occupy the whole upper left quadrant of the receiver operating characteristic plots in figure 3. This demonstrates high heterogeneity between studies. For detection of HPV 16 and 18, the 95% prediction region has the most heterogeneity, occupying most of the plot in figure 3. Between study variance in specificity (6.0, 95% confidence interval 1.8 to 19.7) was higher when detecting high risk HPV compared with variance in sensitivity (0.4, 95% confidence interval 0.1 to 2.2, fig 3). For detection of any HPV, the positive likelihood ratio was 15.22 (95% confidence interval 4.56 to 50.81) and the negative likelihood ratio was 0.14 (95% confidence interval 0.10 to 0.20). For detection of high risk HPV, the positive likelihood ratio was 6.33 (1.48 to 27.00) and the negative likelihood ratio was 0.26 (0.16 to 0.41). For detection of HPV 16 and 18, the positive likelihood ratio was 36.97 (6.77 to 201.91) and the negative likelihood ratio was 0.27 (0.15 to 0.49).
Sources of heterogeneity
The table⇓ summarises the results of the bivariate metaregression based on planned covariates. There was a 22-fold increase in overall accuracy when samples were collected as first void urine compared with random or midstream urine samples (relative diagnostic odds ratio 21.7, 95% confidence interval 1.3 to 376). However, this difference in accuracy is exclusively based on a significant increase in sensitivity of first void urine (relative sensitivity 1.2, 95% confidence interval 1.06 to 1.37, P=0.004). Specificity was not affected by the urine sampling method (P=0.46). Purpose of testing, mean age of participants, HIV status, cytology and biopsy results, detection methods, use of commercial methods, or risk of bias as a result of patient selection did not explain any heterogeneity between indices for study accuracy.
Pooled sensitivity and specificity for detection of any HPV in urine was similar when studies with a narrow spectrum of patients were excluded.21 24 27 28 35 36 39 Sensitivity was 80% (95% confidence interval 71% to 88%) and specificity was 98% (95% confidence interval 89% to 100%).
Our review shows that detection of human papillomavirus (HPV) DNA in urine has a good accuracy for the presence of cervical HPV. Sensitivity was moderate for detection of any HPV, high risk HPV, and HPV 16 and 18. The specificity for detection of HPV in urine was especially high for any HPV and the most oncogenic strains, HPV 16 and 18.
Strengths and weaknesses of this study
The strength of this review lies in the methodology used. We followed guidelines for the conduct and reporting of systematic reviews to ensure high quality selection of studies and data extraction of data.14 15 17 We undertook an extensive literature search using all relevant electronic databases, and we manually searched through references and journals. Two reviewers independently reviewed all titles, abstracts, and full texts, and extracted data, with no restriction on language.
Our meta-analysis included 14 studies and a large sample of women. The quality of included studies was generally high. The main deficiency in included studies was the use of narrow patient spectrums in six studies including only participants with HIV, adolescents, or participants with high grade cervical intraepithelial neoplasia (CIN).21 24 27 28 35 36 39 These factors lead towards a high prevalence and could result in biased estimation of test accuracy.16 41 However, metaregression analysis did not reveal any variation in accuracy when we used HIV status, mean age, and biopsy results as covariates. Although we could not perform a multivariable metaregression analysis owing to the limited number of studies available, a sensitivity analysis excluding the six studies with narrow patient spectrums had little impact on point estimates of pooled accuracy measures.
Although most of the studies used conventional polymerase chain reaction (PCR), allowing us to pool results, we did identify heterogeneity in testing methods. Studies were conducted in diverse settings, including primary and secondary care, with different HPV testing platforms and conditions. One previous review states that such heterogeneity makes pooled sensitivities and specificities redundant.11 However, we argue that confirming a high accuracy, despite variations in testing methods, makes the test worthy of further investigation and standardisation. We also performed a metaregression to identify whether this variation in testing methods affected results. Only urine sampling was identified as a source of heterogeneity, with a 22-fold reduction in accuracy when samples were collected as random or midstream samples rather than as first void. This is an important and expected finding as first void urine samples contain higher levels of DNA making them more amenable to PCR. They are the sample of choice for viral DNA detection, and our findings show that they should be the sample of choice for urine HPV detection.42
A major limitation of this meta-analysis is the between study variation in pooled sensitivities and specificities, which are evident in figure 3. This means that all results must be interpreted with caution as they may have been overestimated or underestimated.
Comparison with existing literature
Three reviews have been published on the detection of HPV DNA in urine. The first concluded that urine HPV detection was worse than cervical HPV detection at predicting CIN.43 The second focused on surveillance in adolescents rather than in women at an age to be included in cervical cancer screening programmes.12 The third appraised the potential importance of variations in urine sampling, storage, and testing methods.11 The latter two reviews concluded that urine HPV detection could be an adequate tool in women, but none of the three reviews included a meta-analysis to support their conclusions.
Our review provides meta-analysis and metaregression demonstrating the accuracy of detection of HPV in urine for the presence of cervical HPV. We also update the literature by reporting on four additional studies (476 women) published since previous reviews.28 35 37 39 44 We agree with previous reviews that heterogeneous methods of urine testing affect the interpretation of pooled accuracy measures and that a uniform method for detection of HPV in urine must be developed. However, by providing quantitative evidence of accuracy in urine HPV detection and establishing that urine sampling affects accuracy, our review can drive the prioritisation of efforts to standardise urine HPV testing.
Implications for clinical practice and future research
The detection of HPV in urine is non-invasive, easily accessible, and acceptable to women,10 and a test with these qualities could considerably increase uptake. Urine based testing has been successful for the detection of common sexually transmitted infections, including Chlamydia trachomatis and Neisseria gonorrhoea.45 We have shown that testing urine for HPV could accurately replace cervical testing for HPV in this context. In particular, the high specificity and the large positive likelihood ratio provoke important changes in the likelihood of infection for a woman with a positive test result. Our review predicts that positive test results are 15 times more likely to occur in infected women than in non-infected women. This is a major strength of the testing method, as false positive results would lead to women undergoing unnecessary invasive investigations, including cytology, colposcopy, or biopsy, to prove lack of disease. This would generate increased anxiety and costs, which could be reduced by urine based testing. The high specificity of this test makes this scenario less likely and could thereby increase trust and uptake.
However, our results must be interpreted with caution for several reasons. Firstly, sensitivity was not as high as specificity and negative test results are only seven times more likely to occur in non-infected women than in infected women. Secondly, there was wide variation in accuracy between individual studies, and pooled sensitivities and specificities may have been overestimated or underestimated. The consequences of overestimation are especially important as they can lead to unacceptable morbidity and mortality. False negative results would lead to missing cases of precancerous or cancerous lesions, and false positive results would lead to over-investigation and anxiety. Both scenarios could easily result in a lack of trust in HPV testing. To confidently adopt the test into current practice, test methods must be more consistent and reproducible. As the consequences of false negative results are serious (missing cervical precancer and cancer), the test could be done more frequently than current screening methods. This would improve the chances of minimising false negative results. Our metaregression identified that the current variation between study results is partly explained by different urine sampling techniques. No other explanations of heterogeneity were identified by our metaregression, including analysis of the method for HPV detection and commercial versus non-commercial extraction and amplification methods. However, these analyses are based on a limited number of studies and therefore we cannot exclude the presence of other associations. We therefore recommend the standardisation of methods for urine testing to minimise variation before incorporating urinary detection of HPV into guidelines for cervical cancer screening. The World Health Organization HPV Laboratory Network is active in this domain, and our results should drive further prioritization of urine testing.46
Finally, our review focused on the accuracy of detecting HPV in the cervix, so that where cervical testing for HPV is being considered for guidelines on cervical cancer screening, urine based testing can be considered instead. Published guidance on recommending diagnostic tests emphasises that test accuracy is only a surrogate measure for patient important outcomes, which must also be considered before adopting a screening method.47 In the case of cervical cancer screening, patient important outcomes that must be considered include acceptability of testing, prediction of CIN or invasive cancer, management of positive test results, and safe intervals for testing between negative test results. Although acceptability of urine HPV testing has already been shown in published literature,10 the remaining outcomes have not. We were unable to do a meta-analysis on accuracy of CIN prediction as too few studies report adequate data. Two studies included in our meta-analysis reported relevant data. The first had a sensitivity of 62.7% and a specificity of 47.1%37 and the second study a sensitivity of 80.8%.24 We were unable to calculate a valid estimation of specificity for the second study, as no participants were infected with HPV. Future test accuracy studies must report paired results for detection of HPV in urine and cytology or biopsy outcomes so that this can be meta-analysed. To our knowledge, pathways for the management of positive and negative test results have not been reported in the literature. New studies on the detection of HPV in urine must assess the feasibility and costs of these pathways.48
Our review demonstrates the accuracy of detection of HPV in urine for the presence of cervical HPV. When cervical testing for HPV is sought, urine based testing should be an acceptable alternative to increase coverage for subgroups that are hard to reach. However, results must be interpreted with caution owing to variation between individual studies for participant characteristics, lack of standardised methods of urine testing, and the surrogate nature of cervical HPV for cervical disease.
What is already known on this topic
Human papillomavirus (HPV) is a common sexually transmitted infection, with an established causal link to cervical cancer
Cervical testing for HPV is being considered as a more accurate method of screening, but as an invasive test it shares many problems of the current screening programme
Urine HPV detection is a more acceptable, non-invasive alternative that could improve uptake
What this study adds
This meta-analysis shows that testing urine for HPV has good accuracy for detecting cervical HPV
This accuracy remains high when testing specifically for oncogenic strains of HPV
Accuracy is improved if first void urine samples are used; however, heterogeneity in accuracy measures was identified in the studies
These findings should drive the further investigation and standardisation of urine testing as an acceptable method of HPV detection
Cite this as: BMJ 2014;349:g5264
Contributors: NP designed the protocol and data extraction form, selected eligible texts, extracted data, wrote the statistical analysis plan, cleaned and analysed the data, and drafted and revised the paper. She is guarantor. JD selected eligible texts, extracted data, and revised the paper. JZ wrote the statistical analysis plan and drafted and revised the paper. KK resolved discrepancies between reviewers on text selection and data extraction and revised the draft paper.
Funding: This study did not receive any funding.
Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf and declare: no support from any organisation for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.
Ethical approval: Not required.
Data sharing Additional data including dataset and statistical codes are available from the corresponding author at.
Transparency: The lead author (NP), the manuscript’s guarantor, affirms that this manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and registered) have been explained.
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 3.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/3.0/.