Statistics Notes: Comparing several groups using analysis of varianceBMJ 1996; 312 doi: http://dx.doi.org/10.1136/bmj.312.7044.1472 (Published 08 June 1996) Cite this as: BMJ 1996;312:1472
- a IRCF Medical Statistics Group, Centre for Statistics in Medicine, Institute of Health Sciences, PO Box 777, Oxford OX3 7LF
- b Department of Public Health Sciences, St George's Hospital Medical School, London SW17 0RE
- Correspondence to: Mr Altman.
Many studies, including most controlled clinical trials, contrast data from two different groups of subjects. Observations which are measurements are often analysed by the t test, a method which assumes that the data in the different groups come from populations where the observations have a normal distribution and the same variances (or standard deviations). While the t test is well known, many researchers seem unaware of the correct method for comparing three or more groups. For example, table 1 shows measurements of galactose binding for three groups of patients. A common error is to compare each pair of groups using separate two sample t tests1 with the consequent problem of multiple testing.2 The correct approach is to use one way analysis of variance (also called ANOVA), which is based on the same assumptions as the t test. We compare the groups to evaluate whether there is evidence that the means of the populations differ. Why then is the method called analysis of variance?
We can partition the variability of the individual data values into components corresponding to within and between group variation. Table 2 shows the analysis of variance table for the data in table 1. Fuller details about the calculations can be found in textbooks3 (although a computer would generally be used). The first column shows the “sum of squares” associated with each source of variation; these add to give the total sum of squares. The second column shows the corresponding degrees of freedom. For the comparison of k groups there are k-1 degrees of freedom. The third column gives the sums of squares divided by the degrees of freedom, which are the variances associated with each component (perhaps confusingly called mean squares). When there are two groups the residual variance is the same as the pooled variance used in the two sample t test.
Analysis of variance assesses whether the variability of the group means—that is, the between group variance—is greater than would be expected by chance. Under the null hypothesis that all the population means are the same the between and within group variances will be the same, and so their expected ratio would be 1. The test statistic is thus the ratio of the between and within group variances, denoted F in table 2. The larger the value of F the more evidence there is that the means of the groups differ. The observed value of F is compared with a table of values of the F distribution using the degrees of freedom for both the numerator and denominator—this value is sometimes written as F*RF [2,39]*. For the data in table 1 and F value greater than 3.24 would be significant with P<0.05. The observed value is far larger than this, giving strong evidence that the three populations of patients differ. With two groups one way analysis of variance is exactly equivalent to the usual two sample t test, and we have F=t2.
When the groups are significantly different we will often wish to explore further to see where the differences lie. When we compare more than two groups we need a clear idea of which comparisons we are interested in. Very often we are not equally interested in all possible comparisons. Many statistical procedures are available, their appropriateness depending on the question one wishes to answer. One simple method is to use the residual variance as the basis for modified t tests comparing each pair of groups. Here we get: group 1 v group 2, P=0.12; 1 v 3, P=0.0002; 2 v 3, P=0.06. The main difference is thus between groups 1 and 3, as can be seen from table 1. This procedure is an improvement on simply performing three two sample t tests in the first place because we proceed to comparing pairs of groups only if there is evidence of significant variability among all the groups, and also because we use a more reliable estimate of the variance within groups. Investigation of all pairs of groups often does not yield a simple interpretation, which is the price we can pay for not having a specific hypothesis. When the overall F test is not significant it is generally unwise to explore differences between pairs of groups. If the groups have a natural ordering—for example, representing patients with different stages of a disease—it is preferable to examine directly evidence for a (linear) trend in means across the groups.1 We will consider such data in a subsequent statistics note.
This type of analysis can be extended to more complex data sets with two classifying variables, using two way analysis of variance, and so on. Analysis of variance is a special type of regression analysis, and most data sets for which analysis of variance is appropriate can be analysed by regression with the same results.