- J Hodgkinson, research fellow1,
- J Mant, professor2,
- U Martin, clinical reader in clinical pharmacology3,
- B Guo, research fellow4,
- F D R Hobbs, professor1,
- J J Deeks, professor4,
- C Heneghan, reader in evidence based medicine 5,
- N Roberts, information specialist6,
- R J McManus, professor1
- 1Primary Care Clinical Sciences, University of Birmingham, Edgbaston, Birmingham B15 2PP
- 2General Practice and Primary Care Research Unit, University of Cambridge, Cambridge CB2 0SR
- 3School of Clinical and Experimental Medicine, University of Birmingham, Edgbaston
- 4Public Health, Epidemiology and Biostatistics, University of Birmingham, Edgbaston
- 5Department of Primary Health Care, University of Oxford, Headington, Oxford OX3 7LF
- 6Bodleian Health Care Libraries, Knowledge Centre, ORC Medical Research Building, Oxford OX3 7DQ
- Correspondence to: R J McManus
- Accepted 17 April 2011
Objective To determine the relative accuracy of clinic measurements and home blood pressure monitoring compared with ambulatory blood pressure monitoring as a reference standard for the diagnosis of hypertension.
Design Systematic review with meta-analysis with hierarchical summary receiver operating characteristic models. Methodological quality was appraised, including evidence of validation of blood pressure measurement equipment.
Data sources Medline (from 1966), Embase (from 1980), Cochrane Database of Systematic Reviews, DARE, Medion, ARIF, and TRIP up to May 2010.
Eligibility criteria for selecting studies Eligible studies examined diagnosis of hypertension in adults of all ages using home and/or clinic blood pressure measurement compared with those made using ambulatory monitoring that clearly defined thresholds to diagnose hypertension.
Results The 20 eligible studies used various thresholds for the diagnosis of hypertension, and only seven studies (clinic) and three studies (home) could be directly compared with ambulatory monitoring. Compared with ambulatory monitoring thresholds of 135/85 mm Hg, clinic measurements over 140/90 mm Hg had mean sensitivity and specificity of 74.6% (95% confidence interval 60.7% to 84.8%) and 74.6% (47.9% to 90.4%), respectively, whereas home measurements over 135/85 mm Hg had mean sensitivity and specificity of 85.7% (78.0% to 91.0%) and 62.4% (48.0% to 75.0%).
Conclusions Neither clinic nor home measurement had sufficient sensitivity or specificity to be recommended as a single diagnostic test. If ambulatory monitoring is taken as the reference standard, then treatment decisions based on clinic or home blood pressure alone might result in substantial overdiagnosis. Ambulatory monitoring before the start of lifelong drug treatment might lead to more appropriate targeting of treatment, particularly around the diagnostic threshold.
High blood pressure is a key risk factor for the development of cardiovascular disease1 and is a major cause of morbidity and mortality worldwide.2 Hypertension is the commonest chronic disorder seen in primary care, with around one in eight of all people receiving antihypertensive treatment.3 4
Initial management of hypertension conventionally requires a diagnosis based on several clinic or office blood pressure measurements.5 6 7 National and international guidelines recommend similar strategies, although the thresholds of blood pressure for diagnosis and risk vary.5 6 7 8 9 Ambulatory blood pressure monitoring, however, estimates “true” mean blood pressure more accurately than clinic measurement because multiple readings are taken; it has also been shown to have better correlation with a range of cardiovascular outcomes and end organ damage.10 11 12 13 14 15 Ambulatory blood pressure monitoring is typically used when there is uncertainty in diagnosis, resistance to treatment, irregular or diurnal variation, or concerns about variability and the “white coat” effect.16 17 18 It has therefore arguably become the reference standard for the diagnosis of hypertension.
Home blood pressure monitoring, which provides multiple readings over several days, is also better correlated with end organ damage than clinic measurement.19 20 It seems to be a better prognostic indicator with respect to stroke and cardiovascular mortality21 22 23 and can identify white coat and masked hypertension. It could provide an appropriate alternative to ambulatory monitoring in terms of diagnosis, particularly in primary care where it might not be immediately available or deemed too costly or when patients find it inconvenient or uncomfortable. Home monitoring has a smaller evidence base than ambulatory monitoring but has gained acceptance over recent years as data accumulate and accurate equipment becomes more widely available.24 25
If guidelines are to retain clinic measurement as a standard diagnostic tool, it is important to assess these in the light of ambulatory measurement. Similarly, for home measurements to be considered as an alternative to ambulatory measurements then their test performance needs to be evaluated. We conducted a systematic review of the test performance of the diagnosis of hypertension by clinic measurement and home monitoring compared with the reference standard of ambulatory monitoring.
We had various criteria for inclusion.
Types of study—Studies had to have extractable data for diagnoses of hypertension made with home and/or clinic blood pressure measurement compared with those made with ambulatory measurement. There was no restriction on language or year of publication.
Types of participants in studies—We included adult patients of all ages. Studies were excluded if participants were pregnant, in hospital, or receiving treatment at the time of the comparison, unless these groups could be excluded from other data within a paper. Although we aimed to derive data relevant to primary care, no restriction was placed on setting other than excluding patients in hospital.
Types of outcome measures—We extracted data into 2×2 tables for comparisons of the diagnosis of hypertension provided that clearly defined thresholds for the diagnosis of hypertension were used. Studies from which 2×2 tables could not be derived were excluded.
Reference standard—We chose ambulatory monitoring as the reference standard, with 135/85 mm Hg as the internationally accepted threshold for diagnosis on mean daytime readings.7 Among the various indirect methods of measuring blood pressure, ambulatory monitoring shows the strongest relation with clinical outcome and estimates blood pressure more accurately because multiple readings are taken.10 11 12 13 14 15 It thus represents the most appropriate choice of reference standard. Some studies have suggested that night time average blood pressure is superior to daytime at predicting cardiovascular outcomes,26 but there is greater consensus over the threshold to use for daytime averages than night time averages.5 7
We searched Medline (from 1950 onwards), Embase (from 1980 onwards), the Cochrane Database of Systematic Reviews, DARE, Medion (www.mediondatabase.nl), ARIF (www.arif.bham.ac.uk), and the TRIP database (www.tripdatabase.com) up to May 2010, using a search strategy designed to capture all studies evaluating the test performance characteristics of different methods of diagnosing hypertension in primary care.
The search strategy was based on the diagnostic filters developed by Haynes et al27 and Montori et al.28 To improve sensitivity in the search,29 however, we combined three separate search strategies using Medline and Embase (the full Medline search strategy is shown in appendix 2 on bmj.com): we combined keywords for hypertension, blood pressure monitoring, outpatient setting, and diagnosis; we limited MeSH terms for hypertension to diagnosis subheading and combined this with keywords for blood pressure monitoring and outpatient setting; and we combined keywords for hypertension, blood pressure monitoring, outpatient setting, and limit using the diagnosis search filter.
Selection of studies
Two reviewers (JH and RJMcM) independently reviewed the titles and abstracts of articles identified by the search strategy for potential relevance to the research question. After this process, the full papers of potentially eligible papers were assessed.
Data management and extraction
Two of four reviewers (JH, RJMcM, UM, JM) carried out data extraction from included papers in duplicate (the data extraction form template is in appendix 3 on bmj.com). Differences in data extraction were resolved by consensus. When necessary we contacted the authors of the primary studies to obtain additional information.
Assessment of methodological quality
We additionally collected information on recognised sources of bias in diagnostic test accuracy studies using a version of the QUADAS (Quality Assessment of Diagnostic Accuracy Studies) checklist,30 adapted for this study. The box lists the quality criteria considered.
Selection criteria of participants: both the inclusion criteria and how participants were selected (consecutively, randomly, or other clear justifiable process) should be described
Time period between the different methods of measurements: four weeks or less between any measurements compared
Blinding of those performing tests to previous measurements
Reporting of uninterpretable results: reporting where recording was incomplete
Reference standard: the whole sample had to receive the same comparison measurement tests, regardless of the results of the index test result
Attrition: information provided on any loss of subjects during the study
Adequate checking of self monitored readings
Equipment validation: evidence that all measurement equipment was clinically validated
We extracted estimates of sensitivity and specificity from each study for all reported threshold combinations of clinic or home measurement and ambulatory measurement. We identified the subset of studies where the combined data shared the common reference threshold (ambulatory office monitoring 135/85 mm Hg) and carried out a meta-analysis using hierarchical summary receiver operating characteristic (HSROC) models that accounted for sampling variability, unexplained heterogeneity, and covariation between sensitivity and specificity.29 Models were fitted to estimate and compare the sensitivity and specificity for diagnosis of hypertension made at the most common thresholds (140/90 mm Hg for clinic measurement, 135/85 mm Hg for home measurement). Differences between the tests were expressed as relative sensitivities and specificities to ascertain if there was a significant difference in the relative performance of the tests compared with ambulatory measurement. In a final analysis all studies were included to explore the effect of different diagnostic thresholds. Models were fitted with the SAS Metadas code31 32 and graphics produced with RevMan 5.33 When there were not enough studies available for fitting, we simplified the full models by assuming a symmetric receiver operating characteristic curve and fitting a fixed rather than random effects model. Sensitivity analyses considered the effect of differing the diagnostic thresholds, as well as assessing test performance in populations with mean clinic blood pressure at or above the diagnostic threshold, to separately consider where study populations had been recruited entirely from a typical screening population (and so excluding any studies where an additional group of normotensive people were included as “controls”). Further analyses were planned with other population characteristics, methodological quality of the studies, and methods of monitoring.
Our search identified 2914 studies (excluding duplicates), and we reviewed the full text of 115 papers for eligibility (fig 1⇓). Of these, 20 contained extractable data; three were not written in English (one each in French, Spanish, and Dutch). The 20 studies included 5863 individuals with mean age of 48.8 and mean proportion of women of 57%.
Table 1⇓ gives details of the population of each study34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 and table 2 gives details of their methodological quality⇓. The studies differed markedly in terms of age (mean age ranged from <33 to 60), sex (percentage of men ranged from 16% to 69%), sample size (from 16 to 2370), and whether a primary care or specialist population was used. All the studies had some degree of methodological weakness (or lack of clarity in what was reported): only 11 out of 20 studies used validated devices for all methods of monitoring, and only six provided evidence of blinding of those conducting the monitoring to previous blood pressure results. All studies avoided both partial and differential verification bias (that is, all patients in the studies received the same comparison measurement tests, regardless of initial results); reporting of attrition and selection criteria of participants was good.
There was marked diversity between studies in terms of mean baseline blood pressure of the population, number of measurements for clinic (2-18), home (18-56), and ambulatory monitoring (24-111), period of ambulatory measurement, and blood pressure thresholds used (tables 3 and 4⇓ ⇓). Similar diversity was seen in the range of sensitivity and specificity values for individual studies (tables 5 and 6)⇓ ⇓. Two studies reported very low specificities: Denolle37 (specificity 0%) and Elijovich et al38 (18%), both of which had small sample sizes. The study by Denolle included a total sample of 16 patients, and none was normotensive according to both clinic and ambulatory classifications. In Elijovich et al, only three out of a total sample of 72 patients were normotensive according to both clinic and ambulatory classifications.38
We pooled studies with the same thresholds for the reference and index tests and included them in a meta-analysis. Eight studies used a threshold of 135/85 mm Hg for ambulatory blood pressure monitoring and 140/90 mm Hg for clinic blood pressure monitoring to diagnose hypertension,39 43 46 48 49 50 51 52 while three used a threshold of 135/85 mm Hg for both ambulatory and home diagnosis.34 36 49 One of the clinic comparison studies,39 however, used the mean of the full 24 hour ambulatory blood pressure monitoring rather than mean of daytime readings and was therefore not comparable with the others. Only one study provided proportions diagnosed as hypertensive using all three methods of blood pressure monitoring.49 Figure 2⇓ provides forest plots of the sensitivity and specificity of eligible studies, with performance with either home or clinic measurement compared with ambulatory monitoring.
Figure 3⇓ provides a summary receiver operating characteristic plot for the seven clinic comparison studies (mean age 47.1; mean proportion of women 57%). Most studies were within the 95% confidence interval of the summary point,46 48 49 50 52 or at least close to the receiver operating characteristic curve,51 showing some consistency in results across the studies. The remaining outlier study had a small sample size compared with the others, had a younger age profile with a lower mean blood pressure, and used an unvalidated monitor for clinic measurements.39
Figure 4⇓ plots the three home comparison studies (mean age 52.5; mean proportion of women 55%) on a summary receiver operating characteristic plot. Despite having quite different mean blood pressures and settings, two of the three studies were similar in terms of sensitivity and specificity.36 49 With so few studies in this group, however, we could not plot a confidence interval or assess the statistical homogeneity.
A receiver operating characteristic (ROC) curve for a single study shows the relation between the true positive rate (sensitivity) and the false positive rate (100−specificity) for different cut-off points. In a meta-analysis, the points represent different studies, and the fitted summary ROC curves depict trade-offs between sensitivity and specificity that arise because of differences between the studies. Where the studies combined have different thresholds, the pattern might reflect variation with threshold seen in a single study. Where the studies combined share a threshold, the pattern will reflect trade-offs caused by the other differences between the studies.
Table 7 shows the pooled sensitivity and specificity for home blood pressure measurement and clinic blood pressure measurement⇓. Compared with ambulatory blood pressure monitoring of 135/85 mm Hg, clinic measurement over 140/90 mm Hg had a mean sensitivity of 74.6% (95% confidence interval 60.7% to 84.8%) and specificity of 74.6% (47.9% to 90.4%), whereas home measurement over 135/85 mm Hg had a mean sensitivity of 85.7% (78.0% to 91.0%) and specificity of 62.4% (48.0% to 75.0%). Neither the difference in sensitivity (relative sensitivity 1.15, 0.95 to 1.39) nor specificity (0.79, 0.40 to 1.55) between the home and clinic measurements was significant.
We explored trade-offs between sensitivity and specificity with variation in blood pressure thresholds for home and clinic measurements (table 8)⇓. Increases in specificity and decreases in sensitivity with increasing threshold (and the converse for decreasing threshold) were significant for performance home measurements but not significant for the clinic measurements.
We could not carry out the planned sensitivity analyses evaluating methodological quality, population characteristics, or monitoring methods because of the small number of included studies. The removal of the outlying study,39 which used an unvalidated monitor, resulted in marginal changes (because of the small sample size of the excluded study) in sensitivity of clinic measurement from 74.6% (60.7% to 84.8%) to 72.6% (56.7% to 84.2%) and in specificity from 74.6% (47.9% to 90.4%) to 77.9% (49.1% to 92.8%).
Sensitivity analysis of clinic comparisons including only those with mean blood pressures close to or above the diagnostic threshold found a sensitivity of 85.6% (81.0% to 89.2%) and specificity of 45.9% (33.0% to 59.3%) for clinic blood pressure. As all three included studies of home monitoring comparisons used a typical general practice screening population with no control group of normotensive people, we did not perform a further sensitivity analysis.
Summary of findings
This review has shown that neither clinic nor home measurements of blood pressure are sufficiently specific or sensitive in the diagnosis of hypertension. We included 20 studies with 5683 patients that compared different methods of diagnosing hypertension in diverse populations with a range of thresholds applied. In the nine studies that used similar diagnostic thresholds and were included in the meta-analysis (two comparing home with ambulatory measurement only, six comparing clinic with ambulatory measurement only, and one study comparing all three methods), neither clinic nor home measurement could be unequivocally recommended as a single diagnostic test. Clinic measurement, the current reference in most clinical work and guidelines, performed poorly in comparison with ambulatory measurement, and, given that clinic measurements are also least predictive in terms of cardiovascular outcome, this is not reassuring for daily practice.10 11 12 16 17 18 Home monitoring provided better sensitivity and might be suitable for ruling out hypertension given its relative ease of use and availability compared with ambulatory monitoring. In the case of clinic measurement, the removal of studies with a mean blood pressure in the normotensive range reduced specificity still further. This has profound implications for the management of hypertension, suggesting that ambulatory monitoring might lead to more appropriate targeting of treatment rather than starting patients on lifelong antihypertensive treatment on the basis of clinic measurements alone, as currently recommended.5 In clinical practice, this will be particularly important near the threshold for diagnosis, where most errors in categorisation will occur if ambulatory monitoring is not used.
Strengths and limitations of study
We used a comprehensive search strategy in multiple databases and all languages and are unlikely to have missed important numbers of relevant papers. While we did apply quality measures, we did not use a total measure of quality assessment to limit included papers as it is recognised that combining different shortcomings can generate distinct magnitudes of bias, even in opposing directions.29 54
The main weakness of our study is the paucity of data available. Only one study compared all three methods of measurement. Because of a lack of consensus internationally, a plethora of different thresholds was used, which meant that fewer than half of the studies could be combined in the meta-analysis.
The planned sensitivity analyses based on methodological quality, population characteristics, and monitoring schedule could not be performed because of the small number of studies and the methodological weaknesses inherent in included studies that would have made interpretation of such a subgroup analysis speculative. The number of measurements used, however, varied between two and 18 for clinic measurements (though only one study used more than six) compared with 18 to 42 for home measurements. These differences will have contributed to the observed heterogeneity and could explain the poor performance of clinic measurements, albeit that this is typical in clinical practice. The mean age of the population in the clinic comparison studies (47.1) was over five years younger than the mean age in the home comparison studies (52.5) and younger than a typical population of patients with hypertension in primary care (mid-60s).55
It was often not clear whether studies used validated measurement equipment, and even when it was mentioned, several studies provided validation citations on only some of the sphygmomanometers used. Given the shortage of literature on the subject, poor performance of a particular machine might conceivably lead to biased overall conclusions. We included in the meta-analysis only one study that used an unvalidated monitor,39 and exclusion of this study had a minimal effect on the results.
The findings clearly depend on the choice of the reference standard, and the three types of measurement are sufficiently different such that whichever one of them is chosen as the reference, the other two will perform relatively badly. The comparability of the performance of home monitoring to clinic measurement, rather than to ambulatory monitoring as might have been expected, could also reflect a relative paucity of relevant data, as there were only three home comparison studies, with wide confidence intervals for specificity (48% to 75%), with particularly poorly performance for home monitoring. Ambulatory monitoring, while providing the best correlation to outcome of the methods evaluated, nevertheless in general represents a single 24 hour period in an individual’s life hence it is important that a “normal” day is chosen, typically a working day. A study of the long term reproducibility of ambulatory measurements taken three times over a two year period found that daytime ambulatory blood pressure provided a reproducible estimate in 54 people with borderline hypertension (correlation coefficient 0.70 for systolic blood pressure).56
Finally, we cannot consider the implications for clinical practice in terms of the best method of monitoring treatment effects as our research question focused solely on diagnostic studies.
Comparisons with other studies
We could not find a previous study that combined literature on the diagnosis of hypertension with different methods of measurement. Guidelines to date have tended to recommend the use of clinic measurement with ambulatory blood pressure monitoring and, to a lesser extent, home monitoring as secondary methods in special cases such as white coat hypertension.5 6 7 8 9 24 Our results suggest that while this is a pragmatic approach supported by the results of treatment studies, more widespread use of ambulatory blood pressure monitoring for the diagnosis of hypertension, particularly around the thresholds, might result in more appropriately targeted treatment.
The poor specificity of both clinic and home measurement and poor sensitivity of clinic monitoring mean some people will be treated who would be defined as normotensive on the basis of ambulatory blood pressure monitoring. How big a proportion this is of the total number of people labelled as hypertensive will depend on the prevalence of hypertension in the population being studied. This can be seen in the sensitivity analysis where specificity drops as prevalence increases.
The positive and negative likelihood ratios were 2.07 and 0.25 for home compared with ambulatory measurement (across three comparison studies), respectively, and 2.94 and 0.34 for clinic compared with ambulatory measurement (across seven comparison studies), respectively. This suggests some correlation between the results of home or clinic measurement and ambulatory monitoring, but the correlation is not strong (positive likelihood ratios of over 10 and negative likelihood ratios of less than 0.1 would indicate a strong relation57). To help interpret this for clinical practice,58 if the prevalence of hypertension was as low as 10% (for example, in people under 40), then out of every four positive diagnoses provided by clinic measurement, close to three would be incorrect as judged by the reference standard of ambulatory measurement. If half of the population were hypertensive (such as those over 65), this would be reversed, and three out of every four positive diagnoses provided by clinic measurement would be correct with ambulatory measurement. When prevalence is 50%, however, it might be more accurate to use the results of the sensitivity analysis where mean blood pressure in studies was close to or above the diagnostic threshold, and here only 61% of diagnoses after clinic measurements would be correct (table 9⇓).
Many people with a current diagnosis of hypertension might not in fact have hypertension. This has important implications, both for the effect of labelling itself on otherwise healthy people59 60 61 62 and for the cost effectiveness of treatment.63 Perhaps an approach using clinic (or home) measurements as a screening test followed by ambulatory blood pressure monitoring for blood pressures that are within 10 mm Hg of threshold might be appropriate before definitive treatment but arguably a wider use of ambulatory monitoring would be needed to avoid overtreatment of white coat hypertension as well as detection of masked cases.
As we did not have sufficient studies that used a high threshold, we cannot determine the relevance of ambulatory monitoring in people with high clinic readings. White coat hypertension, however, can manifest with very high clinic readings,64 and, in the absence of a clinical indication for immediate treatment (such as the signs and symptoms of accelerated hypertension65), clinicians might want to organise an urgent ambulatory measurement rather than treat on the basis of limited clinic measurements.
Our study suggests that if ambulatory blood pressure monitoring is taken as the reference standard for the detection of hypertension, then treatment decisions based on clinic or home blood pressure alone, using thresholds of 140/90 mm Hg, result in substantial overdiagnosis. Ambulatory monitoring might lead to more appropriate targeting of treatment before the start of lifelong drug treatment, particularly around the diagnostic threshold. Considering the relative expense of ambulatory monitoring equipment, cost effectiveness analyses are essential before wholesale changes to the diagnosis of hypertension can be recommended.
What is already known on this topic
Hypertension is traditionally diagnosed after measurement of blood pressure in a clinic, but ambulatory and home measurements correlate better with outcome
What this study adds
Compared with ambulatory monitoring, neither clinic nor home measurements have sufficient sensitivity or specificity to be recommended as a single diagnostic test
If the prevalence of hypertension in a screened population was 30%, there would only be a 56% chance that a positive diagnosis with clinic measurement would be correct compared with using ambulatory measurement
More widespread use of ambulatory blood pressure for the diagnosis of hypertension would result in more appropriately targeted treatment
Cite this as: BMJ 2011;342:d3621
Contributors: CH, FDRH, and RJMcM had the original idea and gained the funding. NR did the electronic searches; JH, RJMcM, JM, and UM extracted the data. BG, JH, and JJD undertook the analyses. All authors contributed to the manuscript and approved the final version. RJMcM is guarantor.
Funding: This work forms part of a larger programme on monitoring in primary care supported by the National Institute for Health Research. The views and opinions expressed are those of the authors and do not necessarily reflect those of the NHS, NIHR, or the Department of Health. RJMcM holds an NIHR career development fellowship.
Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declare: no support from any organisation for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.
Ethical approval: Not required.
Data sharing: Dataset available from the corresponding author at. The dataset includes only anonymised material already in the public domain.
This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: http://creativecommons.org/licenses/by-nc/2.0/ and http://creativecommons.org/licenses/by-nc/2.0/legalcode.