BMJ 1996;312:1153 (4 May)

Education and debate

Statistics Notes: The use of transformation when comparing two means

J Martin Bland, professor of medical statistics,a Douglas G Altman, head b

a Department of Public Health Sciences, St George's Hospital Medical School, London SW17 0RE, b ICRF Medical Statistics Group, Centre for Statistics in Medicine, Institute of Health Sciences, PO Box 777, Oxford OX3 7LF

Correspondence to: Professor Bland.

The usual statistical technique used to compare the means of two groups is a confidence interval or significance test based on the t distribution. For this we must assume that the data are samples from normal distributions with the same variance. Table 1 shows the biceps skinfold measurements for 20 patients with Crohn's disease and nine patients with coeliac disease.


Table 1--Biceps skinfold thickness (mm) in two groups of
patients
----------------------------------------------------------
          Crohn's disease              Coeliac disease
----------------------------------------------------------
   1.8     2.8       4.2    6.2        1.8         3.8
   2.2     3.2       4.4    6.6        2.0         4.2
   2.4     3.6       4.8    7.0        2.0         5.4
   2.5     3.8       5.6   10.0        2.0         7.6
   2.8     4.0       6.0   10.4        3.0
----------------------------------------------------------
            Mean=4.72                    Mean=3.53
             SD=2.42                      SD=1.96

The data have been put into order of magnitude, and it is fairly obvious that the distribution is skewed and far from normal. When, as here, the assumption of normality is wrong we can often transform the data to another scale where the assumption of normality is reasonable. The transformation which achieves a normal distribution should also give us similar variances.1 Table 2 shows the results of analyses using the square root, logarithmic, and reciprocal transformations. The log transformation gives the most similar variances and so gives the most valid test of significance. It also gives a reasonable approximation to a normal distribution.


Table 2--Biceps skinfold thickness compared for two
groups of patients, using different transformations
------------------------------------------------------------------------
               Two sample      95% Confidence
               ttest, 27 df   interval for difference     Variance
              --------------     on transformed            ratio,
Transformation   t       P            scale             larger/smaller
------------------------------------------------------------------------
None, raw data  1.28    0.21  -0.71 mm to 3.07 mm            1.52
Square root     1.38    0.18  -0.140 to 0.714                1.16
Logarithm       1.48    0.15  -0.114 to 0.706                1.10
Reciprocal     -1.65    0.11  -0.203 to 0.022                1.63

Confidence intervals for transformed data are more difficult to interpret, however. Unlike the case of a single sample,2 the confidence limits for the difference between means cannot be transformed back to the original scale. If we try to do this the square root and reciprocal limits give ludicrous results. The lower limit for the square root transformation is negative. If we square this we get a positive lower limit and the confidence interval does not contain zero, even though the difference is not significant. If the observed difference were exactly zero the confidence limits would be equal in magnitude but opposite in sign. Transforming back by squaring would make them equal. For the reciprocal transformation the upper limit is very small (0.022) and transforming back by taking the reciprocal again gives 45.5. There is no way that the difference between mean skinfold in these two groups could be 45.5 mm. Thus the confidence interval for a difference cannot be interpreted on the untransformed scale for these transformations.

Only the log transformation gives interpretable (and thus useful) results after we transform back. Using the antilog transformation, we get a confidence interval of 0.89 to 2.03, but these are not limits for the difference in millimetres. How could they be, for they do not contain zero, yet the difference is not significant? They are in fact the 95% confidence limits for the ratio of the geometric mean2 for patients with Crohn's disease to the geometric mean for patients with coeliac disease. If there were no difference the expected value of this ratio would be 1, not 0, and so lie within the limits. This procedure works because when we take the difference between the logarithms of the two geometric means we get the logarithm of their ratio, not of their difference.3 We thus have the logarithm of a pure number and we antilog this to give the dimensionless ratio of the two geometric means. The logarithmic transformation is strongly preferable to other transformations for this reason. Fortunately, for medical measurements it often achieves the desired effect.

  1. Bland JM, Altman DG. Transforming data. BMJ 1996;312:770. [Free Full Text]
  2. Bland JM, Altman DG. Transformations, means, and confidence intervals. BMJ 1996;312:1079. [Free Full Text]
  3. Bland JM, Altman DG. Logarithms. BMJ 1996;312:700. [Free Full Text]

Related Article

Believability of relative risks and odds ratios in abstracts: cross sectional study
Peter C Gøtzsche
BMJ 2006 333: 231-234. [Abstract] [Full Text] [PDF]

This article has been cited by other articles:

  • Ting, H. H., Bradley, E. H., Wang, Y., Lichtman, J. H., Nallamothu, B. K., Sullivan, M. D., Gersh, B. J., Roger, V. L., Curtis, J. P., Krumholz, H. M. (2008). Factors Associated With Longer Time From Symptom Onset to Hospital Presentation for Patients With ST-Elevation Myocardial Infarction. Arch Intern Med 168: 959-968 [Abstract] [Full text]  
  • Okely, A. D., Booth, M. L., Hardy, L., Dobbins, T., Denney-Wilson, E. (2008). Changes in Physical Activity Participation From 1985 to 2004 in a Statewide Survey of Australian Adolescents. Arch Pediatr Adolesc Med 162: 176-180 [Abstract] [Full text]  
  • Froment, P, Vigier, M, Negre, D, Fontaine, I, Beghelli, J, Cosset, F L, Holzenberger, M, Durand, P (2007). Inactivation of the IGF-I receptor gene in primary Sertoli cells highlights the autocrine effects of IGF-I. J Endocrinol 194: 557-568 [Abstract] [Full text]  
  • DeLeon Ortega, J. E., Sakata, L. M., Kakati, B., McGwin, G. Jr, Monheit, B. E., Arthur, S. N., Girkin, C. A. (2007). Effect of Glaucomatous Damage on Repeatability of Confocal Scanning Laser Ophthalmoscope, Scanning Laser Polarimetry, and Optical Coherence Tomography. IOVS 48: 1156-1163 [Abstract] [Full text]  
  • Gotzsche, P. C (2006). Believability of relative risks and odds ratios in abstracts: cross sectional study. BMJ 333: 231-234 [Abstract] [Full text]  
  • Curtis, J. P., Portnay, E. L., Wang, Y., McNamara, R. L., Herrin, J., Bradley, E. H., Magid, D. J., Blaney, M. E., Canto, J. G., Krumholz, H. M. (2006). The Pre-Hospital Electrocardiogram and Time to Reperfusion in Patients With Acute Myocardial Infarction, 2000-2002: Findings From the National Registry of Myocardial Infarction-4. J Am Coll Cardiol 47: 1544-1552 [Abstract] [Full text]  
  • Nallamothu, B. K., Wang, Y., Magid, D. J., McNamara, R. L., Herrin, J., Bradley, E. H., Bates, E. R., Pollack, C. V. Jr, Krumholz, H. M., for the National Registry of Myocardial Infarction, (2006). Relation Between Hospital Specialization With Primary Percutaneous Coronary Intervention and Clinical Outcomes in ST-Segment Elevation Myocardial Infarction: National Registry of Myocardial Infarction-4 Analysis. Circulation 113: 222-229 [Abstract] [Full text]  
  • Sevrukov, A. B., Bland, J. M., Kondos, G. T. (2005). Serial Electron Beam CT Measurements of Coronary Artery Calcium: Has Your Patient's Calcium Score Actually Changed?. Am. J. Roentgenol. 185: 1546-1553 [Abstract] [Full text]  
  • Halfvarson, J, Standaert-Vitse, A, Jarnerot, G, Sendid, B, Jouault, T, Bodin, L, Duhamel, A, Colombel, J F, Tysk, C, Poulain, D (2005). Anti-Saccharomyces cerevisiae antibodies in twins with inflammatory bowel disease. Gut 54: 1237-1243 [Abstract] [Full text]  
  • Bradley, E. H., Herrin, J., Wang, Y., McNamara, R. L., Webster, T. R., Magid, D. J., Blaney, M., Peterson, E. D., Canto, J. G., Pollack,, C. V. Jr, Krumholz, H. M. (2004). Racial and Ethnic Differences in Time to Acute Reperfusion Therapy for Patients Hospitalized With Myocardial Infarction. JAMA 292: 1563-1572 [Abstract] [Full text]  
  • Locatelli, L., Zivadinov, R., Grop, A., Zorzon, M. (2004). Frontal parenchymal atrophy measures in multiple sclerosis. Mult Scler 10: 562-568 [Abstract]  
  • Pillow, J.J., Ljungberg, H., Hulskamp, G., Stocks, J. (2004). Functional residual capacity measurements in healthy infants: ultrasonic flow meter versus a mass spectrometer. Eur Respir J 23: 763-768 [Abstract] [Full text]  
  • Sutton, T. M., Stewart, R. A. H., Gerber, I. L., West, T. M., Richards, A. M., Yandle, T. G., Kerr, A. J. (2003). Plasma natriuretic peptide levels increase with symptoms and severity of mitral regurgitation. J Am Coll Cardiol 41: 2280-2287 [Abstract] [Full text]  
  • Shahar, E., Redline, S., Young, T., Boland, L. L., Baldwin, C. M., Nieto, F. J., O'Connor, G. T., Rapoport, D. M., Robbins, J. A. (2003). Hormone Replacement Therapy and Sleep-disordered Breathing. Am. J. Respir. Crit. Care Med. 167: 1186-1192 [Abstract] [Full text]  
  • Wijeysundera, D. N., Rao, V., Beattie, W. S., Ivanov, J., Karkouti, K. (2003). Evaluating Surrogate Measures of Renal Dysfunction After Cardiac Surgery. Anesth. Analg. 96: 1265-1273 [Abstract] [Full text]  
  • Vickers, A. J, Altman, D. G (2001). Statistics Notes: Analysing controlled trials with baseline and follow up measurements. BMJ 323: 1123-1124 [Full text]  
  • Bergus, G. R., Chapman, G. B., Levy, B. T., Ely, J. W., Oppliger, R. A. (1998). Clinical Diagnosis and the Order of Information. Med Decis Making 18: 412-417 [Abstract]  
  • Briggs, A. H., Gray, A. M. (1998). Power and Sample Size Calculations for Stochastic Cost-Effectiveness Analysis. Med Decis Making 18: S81-S92 [Abstract]  
  • Kerry, S. M, Bland, J M. (1998). Analysis of a trial randomised in clusters. BMJ 316: 54-54 [Full text]  
  • Azizi, M., Ezan, E., Nicolet, L., Grognet, J.-M., Menard, J. (1997). High Plasma Level of N-Acetyl-Seryl-Aspartyl-Lysyl-Proline : A New Marker of Chronic Angiotensin-Converting Enzyme Inhibition. Hypertension 30: 1015-1019 [Abstract] [Full text]  

Online poll
Find out more

Rapid responses for this article

There are no rapid responses for this article.


Student BMJ

Risk of surgery for inflammatory bowel disease: record linkage studies

What can you learn from this BMJ paper? Read Leanne Tite's Paper+

www.student.bmj.com

Listen to the latest BMJ Interview