Intended for healthcare professionals

Endgames Statistical Question

What is sampling error?

BMJ 2012; 344 doi: https://doi.org/10.1136/bmj.e4285 (Published 27 June 2012) Cite this as: BMJ 2012;344:e4285
  1. Philip Sedgwick, senior lecturer in medical statistics
  1. 1Centre for Medical and Healthcare Education, St George’s, University of London, Tooting, London, UK
  1. p.sedgwick{at}sgul.ac.uk

Researchers evaluated the effectiveness of a newly developed behavioural intervention designed to prevent weight gain and improve health related behaviours in women with young children. A cluster randomised controlled trial was used. The intervention group received four interactive group sessions that involved simple health messages, strategies to change behaviour, and group discussion, in addition to monthly support using mobile telephone text messages for 12 months. The control group received one non-interactive information session based on population guidelines on diet and physical activity.1

In total, 250 women aged 25-51 years with a child attending one of 12 primary schools were recruited and randomised as school clusters to the intervention group (n=127) or control group (n=123). The main outcome was weight change from baseline at 12 months. After one year, women in the intervention group had lost on average 0.20 kg (standard deviation 3.66 kg, standard error 0.35 kg), compared to an average weight gain of 0.83 kg (3.69 kg, 0.36 kg) for women in the control group. The researchers reported a significant difference between treatment groups in mean weight change at 12 months (intervention minus control; mean difference −1.13 kg, 95% confidence interval −2.03 to −0.24). The mean difference between treatment groups had been adjusted for cluster and weight at baseline.

The accuracy of the sample means as estimates of the population parameters may have been influenced by sampling error. Which of the following statements, if any, are true?

  • a) Sampling error would have occurred because a sample was taken from the population

  • b) Sampling error was quantified by the standard deviation of weight change at 12 months

  • c) Sampling error was quantified by the standard error of weight change at 12 months

Answers

Answers a and c are true, whereas b is false.

In the example above, the researchers evaluated the effectiveness of a newly developed behavioural intervention in preventing weight gain in women with young children. The intervention was compared with control treatment. For each treatment group the mean weight change at 12 months was an estimate of the population parameter for that treatment (intervention or control)—that is, the weight change that would be observed in the population of all women with young children if they received that particular treatment. The sample mean difference in weight change between treatments at 12 months was an estimate of the population parameter of the difference in mean weight change between treatments.

Although it is hoped that a sample estimate is similar in size to the population parameter, they are unlikely to be exactly the same. Sampling error refers to the difference in size between the sample estimate and the population parameter. Any inaccuracy in the sample estimate comes from it being based on a sample of individuals from the population (a is true). In general, sampling error gets smaller as the sample size increases because the sample more accurately represents the population. Furthermore, sampling error will be controlled if as well as increasing sample size the sample is chosen at random from the population. In the example above, women were recruited using convenience sampling, with mothers identified at one of 12 schools that their children attended. It is difficult to quantify how representative these clusters of mothers were of the population. Random sampling across the entire population would have produced a more representative sample.

For each treatment group, the accuracy of the sample mean weight loss at 12 months as an estimate of the population parameter was quantified by the standard error of the mean (c is true). The standard error of the mean (SEM), sometimes shortened to standard error (SE), was derived from the sample data. Because the population parameters are unknown, sampling error is a theoretical concept. Generally, the larger the standard error, the less accurate the sample estimate. The standard error was obtained by dividing the sample standard deviation of the mean change in weight at 12 months by the square root of the sample size. Therefore, as sample size increases the standard error will generally decrease. This is intuitive because as the sample size approaches that of the population, the sample mean becomes closer in value to the population mean. Although not presented, the standard error for the sample difference between treatments in mean weight change at 12 months was derived; this was used to calculate the 95% confidence interval for the population parameter of the difference in mean weight change between treatments at 12 months. The 95% confidence interval, described in a previous question,2 quantifies the uncertainty in the sample difference between treatments in mean weight change at 12 months as an estimate of the population parameter.

Standard deviation and standard error are often confused. Standard deviation does not provide a measure of sampling error (b is false). The standard deviations presented describe the variation in changes of body weight at 12 months between mothers in each treatment group. The sample standard deviation of weight at 12 months can be used to calculate a series of ranges in weight that contain certain proportions of the members of each treatment group.3

Notes

Cite this as: BMJ 2012;344:e4285

References

View Abstract