Intended for healthcare professionals

Endgames Statistical Question

Multistage sampling

BMJ 2015; 351 doi: https://doi.org/10.1136/bmj.h4155 (Published 31 July 2015) Cite this as: BMJ 2015;351:h4155
  1. Philip Sedgwick, reader in medical statistics and medical education
  1. 1Institute for Medical and Biomedical Education, St George’s, University of London, London, UK
  1. Correspondence to: P Sedgwick p.sedgwick{at}sgul.ac.uk

Researchers investigated the views of the British public on the National Cancer Registry’s use of personal medical data for public health research and surveillance without individual consent. A cross sectional study with a face to face survey was performed by the Office for National Statistics. Participants were selected using multistage sampling of adults in the United Kingdom during March and April 2015. In each month a sample of postal districts was selected at random, with the probability of selection proportionate to size. Within each district, a sample of private households was chosen at random. During March 2762 households were selected with a further 1819 households in April. At the start of the interview, the interviewer determined the household composition and selected the respondent from among all those aged 16 or more. In households with more than one adult, one person was selected at random. If the person selected was unavailable or declined to be interviewed it was recorded as a non-response. Face to face interviews were carried out with 1703 (62%) adults in March and 1252 (69%) adults in April. The data were combined for analysis.1

Of the 2955 respondents, 72% (95% confidence interval 70% to 74%) did not consider any of the following to be an invasion of their privacy by the National Cancer Registry: inclusion of postcode, inclusion of name and address, and the receipt of a letter inviting them to a research study on the basis of inclusion in the registry. It was concluded that most of the British public considers the confidential use of personal, identifiable patient information by the National Cancer Registry for the purposes of public health research and surveillance not to be an invasion of privacy.

Which of the following statements, if any, are true?

  • a) The sampling technique constituted a multistage sampling method with three stages

  • b) Multistage sampling meant that resources could be concentrated in a limited number of areas of the country

  • c) By definition, multistage sampling constitutes probability sampling

Answers

Statements a, b, and c are all true.

The aim of the study was to seek the views of the British public on the use of personal medical data by the National Cancer Registry for the purposes of public health research and surveillance without individual consent. A cross sectional study design, described in a previous question,2 was used. Cross sectional studies are observational in design; the investigators do not intervene in any way but simply record the health, behaviour, attitudes, or lifestyle choices of the study participants. As the name suggests, the purpose of using a cross sectional study design is to obtain a representative sample by taking a cross section of the population. The sample in the above study was obtained using multistage sampling of adults in the UK. The study was run in two successive months, and a different sample was obtained in each month.

Multistage sampling entails two or more stages of random sampling based on the hierarchical structure of natural clusters within the population. Clusters are natural groupings of people—for example, electoral wards, general practices, schools, or households. A different type of cluster is randomly sampled at each stage, with the clusters nested within each other at successive stages. The final stage of sampling involves choosing a random sample of people in the clusters selected at the penultimate stage. Multistage sampling methods can be used to recruit participants in experimental or observational studies.

In the above study, the sample was obtained using multistage sampling with three stages (a is true). The first stage involved a random sample of postal districts in the UK, with the probability of selection proportionate to size. Therefore, postal districts with a greater number of households had a larger probability of selection. In the second stage, a random sample of households nested within the selected postal districts was obtained. At the final stage, an adult was selected at random within each household and invited to take part in the study. The postal districts are referred to as the first stage units, the households as the second stage units, and the adults living within the households as the third stage units. It would have been necessary to construct a sampling frame of postal districts—a list of all postal districts in the UK—and then sampling frames listing all private households in each selected postal district. The researchers did not indicate how many postal districts were selected at the first stage, although the number would have been fixed in advance.

Multistage sampling was used in preference to simple random sampling because the population was geographically diverse. Simple random sampling would have involved obtaining a random sample of a fixed size from a sampling frame—a list of all adults in the UK population. Each adult in the population would have had the same probability of selection. However, it would have been impractical and too expensive to survey such a sample because the UK population is geographically diverse. Multistage sampling meant that resources and efforts could be concentrated in a limited number of areas of the country, because the sampling of households was undertaken only in those postal districts selected at the first stage (b is true).

Multistage sampling and cluster sampling are often confused. As described above, multistage sampling is based on the hierarchical structure of natural clusters within the population. A different type of cluster is randomly sampled at each stage, with the clusters nested within each other at successive stages. The final stage of sampling is to choose a random sample of people in the clusters selected at the penultimate stage. Cluster sampling, described in a previous question,3 involves a random sample of clusters from the population. All those people in the selected clusters are then invited to be in the sample. Cluster sampling does not involve successive stages of sampling based on the hierarchical structure of natural clusters in the population. Cluster sampling can be time consuming, expensive, and impractical, not least because the clusters will be geographically diverse. Therefore, cluster sampling sometimes uses a random sample of clusters from a conveniently selected geographical region in the population. Cluster sampling is sometimes combined with stratified sampling, whereby a fixed number of clusters are selected at random from a stratum within the population. The strata are often based on, for example, geographical regions or age groups. An example is given in a previous question.4

Two types of sampling methods can be used to recruit participants to a study—random sampling (sometimes called probability sampling) and non-random sampling (sometimes called non-probability sampling). Random sampling involves some form of random selection of the population members. Simple random sampling (sometimes referred to simply as random sampling), described above, is the most straightforward type of probability sampling. By definition, multistage sampling constitutes probability sampling (c is true). As described above, multistage sampling entails two or more stages of random sampling based on the hierarchical structure of natural clusters within the population. The final stage of sampling involves choosing a random sample of people in the clusters selected at the penultimate stage.

Notes

Cite this as: BMJ 2015;351:h4155

Footnotes

  • Competing interests: None declared.

References

View Abstract