Unit of observation versus unit of analysisBMJ 2014; 348 doi: https://doi.org/10.1136/bmj.g3840 (Published 13 June 2014) Cite this as: BMJ 2014;348:g3840
- Philip Sedgwick, reader in medical statistics and medical education
Researchers investigated the effects of a school based educational programme aimed at reducing the consumption of carbonated drinks to prevent weight gain in children aged 7-11 years. A cluster randomised controlled trial study design was used. The programme, which was delivered over one school year, focused on promoting a healthy diet. The control group received no intervention. Six primary schools in southwest England were recruited and 29 classes were involved in the trial. Classes were randomised to treatment, with 15 classes allocated to the educational programme (325 children) and 14 to the control treatment (319 children).1
The outcome measures included consumption of carbonated drinks. Each child recorded the number of glasses (average size 250 mL) that he or she drank over a three day period at baseline and at the end of the trial, and the change from baseline was obtained. For each class (the cluster), the average change in consumption of carbonated drinks across all children was derived. The treatment groups were compared with regard to the average change within clusters. Over one school year, the consumption of carbonated drinks decreased by a mean of 0.6 glasses per cluster in the intervention group but increased by 0.2 glasses per cluster in the control group (mean difference 0.7, 95% confidence interval 0.1 to 1.3).
Which of the following statements, if any, are true?
a) The unit of observation was the class
b) The unit of analysis was the class
c) It can be assumed that the measurements for children within a class (cluster) were independent of each other
Statement b is true, whereas statements a and c are false.
The unit of observation and unit of analysis are often confused. The unit of observation, sometimes referred to as the unit of measurement, is defined statistically as the “who” or “what” for which data are measured or collected. The unit of analysis is defined statistically as the “who” or “what” for which information is analysed and conclusions are made.
The above trial used a cluster randomised controlled trial study design, described in a previous question.2 Participants were recruited through cluster sampling.3 Six primary schools in southwest England were recruited and 29 classes were involved in the trial. The class was the cluster, and all children in each selected cluster were invited to participate. The clusters rather than the children were randomised to treatment using cluster allocation.4 The class was therefore the unit of randomisation. The unit of randomisation is defined statistically as the “who” or “what” that is randomised to treatment in a trial.5 All the children in a cluster received the treatment—a school based educational programme or control (no intervention)—that their cluster had been allocated to. The cluster was therefore the unit of intervention, defined statistically as the “who” or “what” for which the intervention was delivered.
In the example above, data were recorded for each child. The consumption of carbonated drinks over a three day period at baseline and at the end of the trial was measured. The change in consumption of carbonated drinks from baseline was obtained for each child. Therefore, the unit of observation was the child (a is false). For each cluster, the mean change in consumption of carbonated drinks across the children in the class was derived. The treatment groups were compared with regard to the average change across clusters (15 for the intervention and 14 for the control). Therefore, the unit of analysis was the cluster (b is true).
In the example above, the unit of analysis was the cluster. Because each cluster provided only one measurement, the data were considered independent, so standard statistical tests could be used to compare treatment groups. Alternatively, the child could have been the unit of observation. The treatment groups would then have been compared regarding the change in consumption of carbonated drinks averaged across the trial participants. However, if the child had been the unit of analysis, the probability of spurious significant findings and misleading conclusions would have increased. This is because a cluster randomised controlled trial design was used—children within a class would be more likely to respond in a similar manner to treatment and could not be assumed to be acting independently (c is false). Children within the same cluster would be more likely to experience similar outcomes than would those in other clusters, irrespective of treatment allocation. This lack of independence of measurements within clusters is usually assessed with the intraclass correlation coefficient (ICC), described in a previous question.6 If the child had been the unit of analysis then comparison of treatment groups would have needed to account for the lack of independence between children within a class (the cluster).
Consideration of the unit of observation and unit of analysis is important in other study designs, such as ecological studies, and not just clinical trials. Ecological studies were described in a previous question,7 in which the example used investigated the association between child wellbeing and economic status in rich developed societies.8 Twenty three of the richest 50 countries in the world were included. Data were collected for the child and aggregated across the country. Therefore, the unit of observation was the child, whereas the unit of analysis was the country. Because the unit of observation and unit of analysis are different in ecological studies, results from such studies are prone to the ecological fallacy. The ecological fallacy is a term used when collected data are analysed at a group level and the results are assumed to apply to associations at the individual level.9
Cite this as: BMJ 2014;348:g3840
Competing interests: None declared.