BMJ 1996;313:342-343 (10 August)

Papers

Use of the capture-recapture technique to evaluate the completeness of systematic literature searches

Pat Spoor, information manager,a Mark Airey, senior research fellow,a Cathy Bennett, research fellow,a Julie Greensill, research assistant,a Rhys Williams, professor of epidemiology and public health a

a Division of Public Health, Nuffield Institute for Health, University of Leeds, Leeds LS2 9PL

Correspondence to: Pat Spoor. p.a.spoor{at}leeds.ac.uk.

Capture-recapture methods were pioneered in ecology and derive their name from censuses of wildlife in which several animals are captured, marked, released, and subject to recapture. In epidemiology the technique examines the degree of overlap between two (or more) methods of ascertainment and uses a simple formula to estimate the total size of the population. When the number already identified is subtracted from this estimate the number of cases not ascertained by either (or any) of the methods can then be calculated. It has been suggested that studies which attempt to ascertain all cases of a given disease in a population should use this method to estimate the number of missing cases.1 2

There are direct parallels between epidemiological studies which attempt to ascertain all available cases and systematic literature searches which attempt to identify all publications on a given topic: both should incorporate estimates of the number of cases or publications they fail to identify. Our study compared, for one journal, the results of searching an electronic literature database with those of hand searching, both carried out for the Cochrane collaborative review group on diabetes.3

Methods

and results

The Medline database was searched from January 1984 to October 1994 for articles in Diabetic Medicine likely to be describing randomised controlled trials, as defined by Dickersin et al and using their search strategy.4 Independently, a handsearch of the journal for the same period was carried out, with the same aim.

The maximum likelihood estimator, N = M(n/m), was used to estimate the total population size,2 where M is the number of publications identified by Medline, n the number identified by hand searching, and m the number identified by both sources. The estimated number unidentified by either method was calculated by subtraction. The maximum likelihood estimator is biased for small samples, for which Chapman's method is more appropriate.5 This estimates the total population size as N = (M + 1)(n + 1)/(m + 1) - 1. The variance of N is estimated as Var(N) = (M + 1)(n + 1) (M - m)(n - m)/((m + 1)2(m + 2)), from which 95% confidence intervals can be constructed.

Table 1 shows the number of publications identified by each method and the overlap. The articles missed by the hand search are attributed to human error; those not identified by the Medline search were improperly indexed, either because until recently no appropriate methodological subject heading existed or because the abstract failed to describe the study design. For our data both the maximum likelihood estimator and Chapman's method gave the same estimate of total population size (160, 95% confidence interval 158 to 164) rounded to the nearest whole number. The number of articles "missed" was 2 (0 to 6).


Table 1--Extent of overlap in the number of publications
found by Medline search of "Diabetic Medicine" (January
1984 to October 1994) and by hand searching
-----------------------------------------------------------------
                                     Medline search
-----------------------------------------------------------------
                                  Found         Not found
-----------------------------------------------------------------
              Found                115              35
Hand search
              Not found             8                2*
-----------------------------------------------------------------
*Estimated by capture-recapture technique, rounded to the nearest
whole number.

Comment

A caveat to the application of these methods is that if there is positive dependency between the two sources-- that is, if an article identified by hand searching is more likely to be ascertained in Medline than one not so identified--then the estimates will underestimate the true population. If, however, Medline and the hand search are negatively dependent then the estimates will overestimate the true population.2 Log-linear modelling offers an alternative approach to modelling dependency among data, where it is present.

The term capture-recapture is not so appropriate for the technique's use in epidemiology or literature searches since, while cases and publications may be said to be "captured," nothing is being "recaptured." As applied in epidemiology the method has been termed "ascertainment intersection."2 However, we suggest the more informative descriptor "comparison of multiple methods of ascertainment" (or COMMA) for this useful technique, which we advocate for all systematic literature searches.

We thank Mr Alan S Rigby for statistical advice.

Funding: Northern and Yorkshire Regional Health Authority and the British Diabetic Association.

Conflict of interest: None.

  1. LaPorte R. Assessing the human condition: capture-recapture techniques. BMJ 1994;308:5-6. [Free Full Text]
  2. Hook EB, Regal RR. The value of capture-recapture methods even for apparent exhaustive surveys. The need for adjustment for sources of ascertainment intersection in attempted complete prevalence studies. Am J Epidemiol 1992;135:1060-7. [Abstract/Free Full Text]
  3. Airey CM, Williams DRR. Cochrane Collaborative Review Group: Diabetes. Diabet Med 1995;12:375-6. [Medline]
  4. Dickersin K, Scherer R, Lefebvre C. Identifying relevant studies for systematic reviews. BMJ 1994;309:1286-91. [Abstract/Free Full Text]
  5. Chapman DG. Some properties of the hypergeometric distribution with applications to zoological sample census. University of California Public Statistics 1951;1:131-60.
(Accepted 6 March 1996)


Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to StumbleUpon StumbleUpon   Add to Technorati Technorati    What's this?

This article has been cited by other articles:

  • Zhang, D., Lv, F., Wang, L., Sun, L., Zhou, J., Su, W., Bi, P. (2007). Estimating the population of female sex workers in two Chinese cities on the basis of the HIV/AIDS behavioural surveillance approach combined with a multiplier method. Sex. Transm. Infect. 83: 228-231 [Abstract] [Full text]  
  • Sapsford, R.J., Lawrance, R.A., Dorsch, M.F., Das, R., Jackson, B.M., Morrell, C., Robinson, M.B., Hall, A.S. (2003). Identifying acute myocardial infarction: effects on treatment and mortality, and implications for National Service Framework audit. QJM 96: 203-209 [Abstract] [Full text]  
  • Loney, P. L, Stratford, P. W (1999). The Prevalence of Low Back Pain in Adults: A Methodological Review of the Literature. ptjournal 79: 384-396 [Abstract] [Full text]  
  • Griffin, S., Greenhalgh, T. (1998). Diabetes care in general practice: meta-analysis of randomised control trials • Commentary: Meta-analysis is a blunt and potentially misleading instrument for analysing models of service delivery. BMJ 317: 390-396 [Abstract] [Full text]  



Access jobs at BMJ Careers
Whats new online at Student 

BMJ