Endgames Statistical Question

# Populations and samples

BMJ 2012; 344 (Published 02 May 2012) Cite this as: BMJ 2012;344:e3048
1. Philip Sedgwick, senior lecturer in medical statistics
1. 1Centre for Medical and Healthcare Education, St George’s, University of London, Tooting, London, UK
1. p.sedgwick{at}sgul.ac.uk

Researchers investigated the association between use of proton pump inhibitors and risk of hip fracture in postmenopausal women. In total, 79 899 women enrolled in the Nurses’ Health Study, a prospective cohort study initiated in the United States, provided data biennially from 2000 until June 2008. The primary outcome was time until first hip fracture. The incidence rate of hip fracture was higher among regular users of proton pump inhibitors than among non-users (2.02 versus 1.51 fractures per 1000 person years).1

Which of the following statements, if any, are true?

• a) The statistical population was the population of the United States.

• b) The incidence rates of hip fracture among regular users and non-users of proton pump inhibitors were point estimates.

• c) The incidence rates of hip fracture were subject to sampling error.

Statements b and c are true, while a is false.

In statistics, a population is the entire group of people that a study aims to investigate. The population can have a finite or infinite number of members, who will have at least one characteristic in common. Typically it is too costly and labour intensive or perhaps not possible to study the entire population, and therefore data about a sample of members are collected. So that inferences can be made about the population on the basis of the sample, it is essential that the characteristics of the sample members are representative of those in the population. There is often confusion as to whom the study population represents, most likely because the concept of a population has different meanings in statistics and general everyday usage.

The Nurses’ Health Study was established in the United States in 1976. All registered nurses aged 30 to 55 years living in one of the 11 most populous states were invited to participate. Of 170 000 nurses contacted, 121 700 responded and were enrolled into the cohort. Subsequently, cohort members were sent questionnaires every two years.

In the example above, the study population may not be readily identifiable. The association between use of proton pump inhibitors and risk of hip fracture in postmenopausal women was investigated. The study population would not have been the general population, and neither would it have been all women living in the United States (a is false). The study population would have been postmenopausal women, because the aim of the study was to make inferences about them. Although the original cohort was nurses living in the 11 most populous states, it is unlikely that their use of proton pump inhibitors or risk of hip fracture later in life would differ from those of postmenopausal women living elsewhere in the country. There is no obvious reason why the study population should be restricted geographically, and it would seem appropriate that it includes all postmenopausal women in the US. Whether the study population could be extended internationally depends in part on exposure to the risk factor, use of proton pump inhibitors, and occurrence of hip fractures in other countries. More generally, if the intention of the study was to make inferences about postmenopausal women outside the study period, then the study population is effectively a hypothetical one.

Incidence rates have been described in a previous question.2 The incidence rates of hip fracture among regular users and non-users of proton pump inhibitors in the population during 2000 to 2008 are known as population parameters. The population parameters are effectively constant, yet unknown, and are estimated by the sample incidence rates. The sample estimates of the population parameters are also known as point estimates (b is true), as they take a single value, as opposed to a confidence interval, which is an interval estimate for a population parameter and takes a range of values.

The sample of postmenopausal women followed up from 2000 to 2008 may not have been totally representative of the study population; therefore the sample estimates may not have equalled the population parameters exactly. Because only part of the population was studied, the sample estimates are subject to sampling error (c is true) that may result in the sample estimates not being exactly equal in magnitude to the population parameters.

## Notes

Cite this as: BMJ 2012;344:e3048

View Abstract