Intended for healthcare professionals

Education And Debate

Cluster analysis and disease mapping—why, when, and how? A step by step guide

BMJ 1996; 313 doi: (Published 05 October 1996) Cite this as: BMJ 1996;313:863
  1. Sjurdur F Olsen, senior research fellowa,
  2. Marco Martuzzi, research fellowb,
  3. Paul Elliott, professorb
  1. a Danish Epidemiology Science Centre, Statens Seruminstitut, DK-2300 Copenhagen S, Denmark, and Department of Public Health and Policy, London School of Hygiene and Tropical Medicine, London WC1E 7HT,
  2. b Department of Epidemiology and Public Health, Imperial College School of Medicine at St Mary's, London W2 1PG
  1. Correspondence to: Dr Olsen, Copenhagen.
  • Accepted 26 June 1996

Growing public awareness of environmental hazards has led to an increased demand for public health authorities to investigate geographical clustering of diseases. Although such cluster analysis is nearly always ineffective in identifying causes of disease, it often has to be used to address public concern about environmental hazards. Interpreting the resulting data is not straightforward, however, and this paper presents a guide for the non-specialist. The pitfalls include the fact that cluster analyses are usually done post hoc, and not as a result of a prior hypothesis. This is particularly true for investigations prompted by reported clusters, which have the inherent danger of overestimating the disease rate through “boundary shrinkage” of the population from which the cases are assumed to have arisen. In disease surveillance the problem of making multiple comparisons can be overcome by testing for clustering and autocorrelation. When rates of disease are illustrated in disease maps undue focus on areas where random fluctuation is greatest can be minimised by smoothing techniques. Despite the fact that cluster analyses rarely prove fruitful in identifying causation, they may—like single case reports—have the potential to generate new knowledge.

Public awareness about potential hazards in our environment is growing. With the advent of powerful computing techniques that can be applied to routinely collected mortality and morbidity data, the demand on public health authorities to undertake investigations into geographical patterns of disease has increased. Nevertheless, several basic epidemiological and statistical issues may present obstacles to the satisfactory handling of such data.1 Although texts are available that cover recent developments,2 3 there is no obvious resource for the generalist reader covering methods for investigating disease clusters and clustering and for interpreting disease maps. This paper is intended to fill this gap by presenting a step by step guide to these problems for …

View Full Text

Log in

Log in through your institution


* For online subscription