Education And Debate

Statistics Notes: The use of transformation when comparing two means

BMJ 1996; 312 doi: http://dx.doi.org/10.1136/bmj.312.7039.1153 (Published 04 May 1996) Cite this as: BMJ 1996;312:1153
  1. J Martin Bland, professor of medical statisticsa,
  2. Douglas G Altman, headb
  1. a Department of Public Health Sciences, St George's Hospital Medical School, London SW17 0RE
  2. b ICRF Medical Statistics Group, Centre for Statistics in Medicine, Institute of Health Sciences, PO Box 777, Oxford OX3 7LF
  1. Correspondence to: Professor Bland.

    The usual statistical technique used to compare the means of two groups is a confidence interval or significance test based on the t distribution. For this we must assume that the data are samples from normal distributions with the same variance. Table 1 shows the biceps skinfold measurements for 20 patients with Crohn's disease and nine patients with coeliac disease.

    Table 1

    Biceps skinfold thickness (mm) in two groups of patients

    View this table:

    The data have been put into order of magnitude, and it is fairly obvious that the distribution is skewed and far from normal. When, as here, the assumption of normality is wrong we can often transform the data to another scale where the assumption of normality is reasonable. The transformation which achieves a normal distribution should also give us similar variances.1 Table 2 shows the results of analyses using the square root, logarithmic, and reciprocal transformations. The log transformation gives the most similar variances and so gives the most valid test of significance. It also gives a reasonable approximation to a normal distribution.

    Table 2

    Biceps skinfold thickness compared for two groups of patients, using different transformations

    View this table:

    Confidence intervals for transformed data are more difficult to interpret, however. Unlike the case of a single sample,2 the confidence limits for the difference between means cannot be transformed back to the original scale. If we try to do this the square root and reciprocal limits give ludicrous results. The lower limit for the square root transformation is negative. If we square this we get a positive lower limit and the confidence interval does not contain zero, even though the difference is not significant. If the observed difference were exactly zero the confidence limits would be equal in magnitude but opposite in sign. Transforming back by squaring would make them equal. For the reciprocal transformation the upper limit is very small (0.022) and transforming back by taking the reciprocal again gives 45.5. There is no way that the difference between mean skinfold in these two groups could be 45.5 mm. Thus the confidence interval for a difference cannot be interpreted on the untransformed scale for these transformations.

    Only the log transformation gives interpretable (and thus useful) results after we transform back. Using the antilog transformation, we get a confidence interval of 0.89 to 2.03, but these are not limits for the difference in millimetres. How could they be, for they do not contain zero, yet the difference is not significant? They are in fact the 95% confidence limits for the ratio of the geometric mean2 for patients with Crohn's disease to the geometric mean for patients with coeliac disease. If there were no difference the expected value of this ratio would be 1, not 0, and so lie within the limits. This procedure works because when we take the difference between the logarithms of the two geometric means we get the logarithm of their ratio, not of their difference.3 We thus have the logarithm of a pure number and we antilog this to give the dimensionless ratio of the two geometric means. The logarithmic transformation is strongly preferable to other transformations for this reason. Fortunately, for medical measurements it often achieves the desired effect.

    References

    1. 1.
    2. 2.
    3. 3.