Cluster sampling

Philip Sedgwick

doi:10.1136/bmj.g1215

Endgames Statistical Question

Cluster sampling

BMJ 2014; 348 doi: https://doi.org/10.1136/bmj.g1215 (Published 31 January 2014) Cite this as: BMJ 2014;348:g1215

All rapid responses

Rapid responses are electronic comments to the editor. They enable our users to debate issues raised in articles published on bmj.com. A rapid response is first posted online. If you need the URL (web address) of an individual response, simply click on the response headline and copy the URL from the browser window. A proportion of responses will, after editing, be published online and in the print journal as letters, which are indexed in PubMed. Rapid responses are not indexed in PubMed and they are not journal articles. The BMJ reserves the right to remove responses which are being wilfully misrepresented as published articles or when it is brought to our attention that a response spreads misinformation.

From March 2022, the word limit for rapid responses will be 600 words not including references and author details. We will no longer post responses that exceed this limit.

The word limit for letters selected from posted responses remains 300 words.

Re: Cluster sampling

Dr. Sedgwick says(1)-

"Clusters are natural groupings of people—for example, electoral wards, general practices, and schools. Cluster sampling involves obtaining a random sample of clusters from the population, with all members of each selected cluster invited to participate. It is necessary to construct a sampling frame listing all clusters in the population. A sample of a fixed number of clusters is selected at random from this list. Each cluster has the same probability of being selected, independently of all others. However, if the size of clusters varies then the probability of selection may be proportional to the size of the cluster, with larger clusters having a larger probability of selection.1"

The last sentence, in my opinion does not apply here in randomized cluster trials. Dr. Sedgwick in the last but one line says this himself (Each cluster has the same probability of being selected, independently of all others).

Here we are making a list in sampling frame and selecting the cluster from it (First line in the para above in quotes).

I object because, unless we are selecting the cluster according to population size,it won't be Probability Proportion to Size (PPS). An example of PPS is WHO-designed
"Two-Stage"-30-Cluster Sampling for EPI (Expanded Programme Of Immunization)evaluation coverage as briefly described below:

First a list is made wherein all the clusters (villages in the district) are enlisted with the population size written against each village. In next column write the cummulative population-size.

The sampling interval is calculated by using the formula in a District:
Total cumulative population after adding of all the clusters (after listing them) in the district/30 = Sample Interval

A random number is selected which is less than or equal to the sampling interval. The number must have the same number of digits as the sampling interval. If sampling interval in the exercise turns out to be five-digit, number that is selected must also be a five-digit number that is between 00001 and the sampling interval. The first village is selected against which cumulative population contains (equals or exceeds) the first random number population. Then the community in which Cluster 2 is located is to be identified by the formula: Random No. + Sampling Interval. The cumulative population listed for that village will equal or exceed the number calculated. Likewise, other clusters are identified using the formula provided below.
Number to identify cluster 3 and onwards =
Number which identifies the + Sampling
location of the previous cluster interval
Using the above formula the number for each cluster 1, 2, 3, 4, 5, etc is noted besides the appropriate villages where the cumulative population equals or exceeds the calculated number. A single village or town may contain more than one cluster (PPS), because we are choosing the clusters after adding the sampling interval and bigger villages may have population large enough to accommodate the next number added with sampling interval.

Cluster Sampling allows adequate sampling size and be cost-effective. All the 30 clusters are surveyed within a restricted period of time - ideally within one week. This is necessary to ensure that they accurately represent the same population. The theories behind cluster sampling are statistically valid but complex.

References:

BMJ 2014;348:g1215

Competing interests: No competing interests

21 May 2014

Neeru Gupta

Scientist E

Indian Council of Medical research

Ansari Nagar, New Delhi-110029

Cluster sampling

All rapid responses

Re: Cluster sampling

Article alerts

Log in or register:

Download this article to citation manager

Help

Forward this page

Content links

About us

Resources

Explore BMJ

My account

Information

Search form

Cluster sampling

All rapid responses

Re: Cluster sampling

Article alerts

Log in or register:

Download this article to citation manager

Help

Forward this page

Content links

About us

Resources

Explore BMJ

My account

Information