Intended for healthcare professionals

Endgames Statistical Question

Stratified cluster sampling

BMJ 2013; 347 doi: https://doi.org/10.1136/bmj.f7016 (Published 22 November 2013) Cite this as: BMJ 2013;347:f7016
  1. Philip Sedgwick, reader in medical statistics and medical education
  1. 1Centre for Medical and Healthcare Education, St George’s, University of London, London, UK
  1. p.sedgwick{at}sgul.ac.uk

Researchers investigated the suitability of a newly developed famine scale as an international definition of famine to guide humanitarian response, funding, and accountability.1 The scale had been proposed by Howe and Devereux, and it defined famine on the basis of intensity and magnitude.2 The scale was applied retrospectively to the humanitarian crisis during 2005 in Niger, west Africa, to determine whether famine had occurred. A cross sectional study design was used. Households were recruited using a stratified two stage cluster sampling method. Niger was stratified into its eight regions. Within each region, 26 villages were randomly selected, with the probability of selection proportional to the size of the village. Within each village, 20 households were systematically randomly selected. A census of the entire household was undertaken by administering a questionnaire to the head of each selected household.

The researchers concluded that on the basis of the famine scale developed by Howe and Devereux, most regions in Niger in 2005 experienced food crisis conditions, and some areas approached famine. Furthermore, it was suggested that the scale afforded more objective criteria than did previous approaches while providing early warning systems that might help guide the level of response in future situations.

Which of the following statements, if any, are true?

  • a) Cluster sampling meant that resources could be concentrated in a limited number of areas of the country

  • b) The stratified two stage cluster sampling approach constituted a multistage sampling method with three stages

  • c) Systematic random sampling of households in each village required the construction of a sampling frame

Answers

Statement a is true, whereas b and c are false.

The aim of the study was to assess whether the famine scale proposed by Howe and Devereux provided a suitable definition of famine to guide future humanitarian response, funding, and accountability. A cross sectional study design was applied.

The scale and severity of the humanitarian crisis in Niger in 2005 would probably have varied across the country. It was therefore imperative that any sample was representative of the population of Niger. Simple random sampling across the country could have been used to recruit households. However, simple random sampling would have produced a representative sample only if enough households were recruited. The population of Niger was geographically diverse. Therefore, random sampling of households across the country would have been impractical and too expensive. A stratified two stage cluster sampling approach was therefore used to ensure the resulting sample was representative of the country, while concentrating resources in fewer areas (a is true).

The stratified cluster sampling approach incorporated a combination of stratified and cluster sampling methods. Firstly, Niger was stratified by region. The country consists of eight regions—seven rural ones plus the capital, Niamey. Within each region a simple random sample could have been taken to ensure that each region was adequately represented. However, the population of each region was geographically diverse. Therefore, simple random sampling within each region or stratum would have been impractical and expensive. To concentrate resources in fewer places, a two stage cluster sampling process was performed within each stratum.

A cluster is a natural grouping of people—for example, towns, villages, schools, streets, and households. The sampling of clusters in the above study was a two stage process. The first stage of cluster sampling involved a random sample of 26 villages within each stratum or region. The probability of selection was proportional to the population size of the region—that is, larger villages had a greater probability of being selected than smaller ones. Within each chosen village, a fixed number of 20 households were selected using systematic random sampling. The household was the unit of analysis, with a census of each household achieved through a questionnaire.

The two stage cluster sampling process described above is referred to as a multistage cluster sampling approach, or simply multistage sampling. In multistage sampling, the resulting sample is obtained in two or more stages, with the nested or hierarchical structure of the members within the population being taken into account. Population members are arranged in clusters. The method is based on the random sampling of clusters at each stage, with the sampled clusters nested within the clusters sampled at the previous stage. In the example above, a two stage multistage sampling approach was used. The first stage involved random sampling of 26 villages within each region. The second stage involved the systematic random sampling of 20 households in each chosen village. The division of the country into regions was seen as stratification and not the first stage of a multistage sampling process (b is false). This is because all regions in the country were included and no random sampling of the regions took place.

The cluster sampling of villages within each stratum involved the construction of the sampling frame—that is, a list of all villages within each region. However, presumably it was not possible to list all the households in each chosen village. Therefore, households in a village were selected using systematic random sampling, which does not depend on a sampling frame (c is false). This involved selecting a single household in a village at random, with households then chosen at regular intervals thereafter—for example, every fifth household. Systematic sampling is typically considered to be a random sampling method, as long as the starting point is random and the periodic interval of selecting participants is determined before sampling takes place.

There are two types of sampling methods—probability sampling (also known as random sampling) and non-probability sampling (also known as non-random sampling). By definition, probability sampling methods involve some form of random selection of the population members, with each population member having a known and typically equal probability of being selected. For a non-probability sampling method, the probability of selection for each population member is not known. Although it is debatable, the method of stratified cluster sampling used above is probably best described as a non-probability sampling method. The villages in each region, and the households in each village, were chosen at random. It was possible to count the number of households in each chosen village. However, the number of households in each of the clusters that were not selected was not known. Hence the probability of selection of a household in the population could not be determined. Samples resulting from non-probability sampling methods are generally considered not to be representative of the population. However, there is no reason to think that the sample in the study above was not representative of the population—the sampling approach ensured that the resulting sample was representative of each region.

If a sampling approach involves only a single stage of sampling of clusters it is referred to as cluster sampling. A random sample of clusters from the population is obtained and all members of the selected clusters are included in the resulting sample. After the selection of clusters, no further sampling takes place. Cluster sampling is often used to select participants for a trial—so called, cluster trials. Cluster trials have been described in a previous question.3 However, in such trials the clusters are typically not selected at random from the population but by using convenience sampling—that is, by selecting conveniently located clusters. Convenience sampling has been described in a previous question.4

Notes

Cite this as: BMJ 2013;347:f7016

Footnotes

  • Competing interests: None declared.

References

View Abstract