Analysis of longitudinal studiesBMJ 2013; 346 doi: https://doi.org/10.1136/bmj.f363 (Published 18 January 2013) Cite this as: BMJ 2013;346:f363
- Philip Sedgwick, reader in medical statistics and medical education1,
- Louise Marston, research statistician2
- 1Centre for Medical and Healthcare Education, St George’s, University of London, Tooting, London, UK
- 2Department of Primary Care and Population Health, University College London, Royal Free Campus, London, UK
A randomised controlled trial investigated whether a low glycaemic index diet in pregnancy reduced the incidence of macrosomic (large for gestational age) infants. Participants were an at risk group—women without diabetes, in their second pregnancy, who had previously given birth to an infant weighing more than 4000 g. The intervention consisted of a low glycaemic index diet from early pregnancy; the control was no dietary intervention. The primary outcome measure was birth weight, and the secondary outcome was gestational weight gain from baseline (measured at 24, 28, 34, and 40 weeks). Treatment groups were compared at each gestational age using the independent samples t test.1
The researchers reported that a low glycaemic index diet in pregnancy did not reduce the incidence of large for gestational age infants in women at risk of fetal macrosomia. It did, however, have a significant effect on gestational weight gain. At each measured gestational age, except for 24 weeks, the intervention group gained significantly less weight than the control group⇓.
Which of the following statements, if any, are true?
a) Measurements of gestational weight gain for each woman were independent of each other
b) When testing for differences between treatment groups in gestational weight gain at all four time points, the probability of a type I error occurring was 5%
c) It can be concluded that there was a significant difference between treatments in gestational weight gain across pregnancy
Statements a, b, and c are all false.
The primary aim of the trial was to investigate the effects of a low glycaemic index diet in pregnancy, when compared to no intervention, on the incidence of macrosomia. Participants were a group of women who were at risk of fetal macrosomia. The secondary outcome was difference in gestational weight gain, measured at 24, 28, 34, and 40 weeks. Because women were followed during pregnancy, the trial had a longitudinal design, and the measurements of weight gain are called repeated measures or serial data. Treatment groups were compared at each time point using the independent samples t test, which has been described in a previous question.2 However, this approach presents statistical problems.
The serial observations on gestational weight gain would not have been independent of each other, but would have correlated with one another (a is false). If two measurements are independent of each other, knowing the value of one provides no information about the other. This would not have been true for gestational weight gain. A woman’s weight gain at—for example, 24 weeks—would be correlated with her weight gain at 28 weeks; that is, her weight gain at 24 weeks would predict her weight gain at 28 weeks.
Treatment groups were compared in gestational weight gain at 24, 28, 34, and 40 weeks using an independent samples t test. The purpose of each test was to make inferences about the population based on the sample. However, if the sample was not representative of the population, errors could have occurred when hypothesis testing. Two types of error were possible, type I and II, described in a previous question.3 A type I error would have occurred if the null hypothesis was incorrectly rejected in favour of the alternative—that is, if there was a difference in gestational weight gain between treatment groups in the trial but not in the population. A type II error would have occurred if the null hypothesis was not rejected in favour of the alternative when it should have been—that is, there was a difference between treatments in gestational weight gain in the population but the difference was not seen in the trial. Type I and II errors would have occurred because of sampling error—that is, because only a proportion of the population was studied.
As described in a previous question,3 the probability that a type I error will occur for a single hypothesis test is 0.05; however, when multiple hypothesis tests are performed the probability is increased (b is false). Care must be taken when research papers undertake a large number of statistical tests—ultimately some of these may result in type I errors. However, we will not know which significant findings are a type I error. The researchers in this example reported many tests that compared treatments groups, not just for gestational weight gain at 24, 28, 34, and 40 weeks. Because the measurements of gestational weight gain were not independent of each other, the significance tests performed at each gestational age would not be independent of each other. If one of the tests was significant, then tests at the other time points would probably be significant too simply because the measurements of gestational weight gain were correlated; this would further increase the probability of a type I error.
The fact that the measurements of gestational weight gain were correlated may have resulted in spurious significant differences between treatment groups at each of the gestational ages. Undertaking four separate statistical tests did not permit a general comparison between treatment groups across pregnancy. Therefore, it cannot be concluded that there was a significant difference between treatments in gestational weight gain across pregnancy (c is false). A more sophisticated statistical approach, such as a longitudinal analysis incorporating a multilevel model or generalised estimating equations, would have been needed to compare treatment groups in gestational weight gain across pregnancy at the different gestational ages. Such analyses would investigate the change in gestational weight gain in women during pregnancy while accounting for the non-independence of data. Although the researchers in this example constructed a statistical linear model that examined total weight gain and controlled for weight at baseline, they presented the results of statistical testing at each gestational age, which may have been misleading because it ignored important statistical issues described above.
Cite this as: BMJ 2013;346:f363
Competing interests: None declared.