BMJ 1996;312:1079 (27 April)

Statistics notes

Transformations, means, and confidence intervals

J Martin Bland, professor of medical statistics,a Douglas G Altman, head b

a Department of Public Health Sciences, St George's Hospital Medical School, London SW17 0RE, b ICRF Medical Statistics Group, Centre for Statistics in Medicine, Institute of Health Sciences, PO Box 777, Oxford OX3 7LF

Correspondence to: Professor Bland.

When we use transformed data in analyses,1 this affects the final estimates that we obtain. Figure 1 shows some serum triglyceride measurements, which have a skewed distribution. A logarithmic transformation is often useful for data which have positive skewness like this, and here the approximation to a normal distribution is greatly improved. For the untransformed data the mean is 0.51 mmol/l and the standard deviation 0.22 mmol/l. The mean of the log10 transformed data is -0.33 and the standard deviation is 0.17. If we take the mean on the transformed scale and back transform by taking the antilog, we get 10-0.33=0.47 mmol/l. We call the value estimated in this way the geometric mean. The geometric mean will be less than the mean of the raw data.



View larger version (28K):
[in this window]
[in a new window]
 
Fig 1--Serum triglyceride and log10 serum triglyceride concentrations in cord blood for 282 babies, with best fitting normal distribution

When triglyceride is measured in mmol/l the log of a single observation is the log of a measurement in mmol/l. The average of n such transformed measurements is also the log of a number in mmol/l, so the antilog is back in the original units, mmol/l.

The antilog of the standard deviation, however, is not measured in mmol/l. Calculation of the standard deviation of the log transformed data requires taking the difference between each log observation and the log geometric mean. The difference between the log of two numbers is the log of their ratio.2 As a ratio is a dimensionless pure number, the units in which serum triglyceride was measured would not matter; the standard deviation on the log scale would be the same. As a result, we cannot transform the standard deviation back to the original scale.

If we want to use the standard deviation or standard error it is easiest to do all calculations on the transformed scale and transform back, if necessary, at the end. For example, the 95% confidence interval for the mean on the log scale is -0.35 to -0.31. To get back to the original scale we antilog the confidence limits on the log scale to give a 95% confidence interval for the geometric mean on the natural scale (0.47) of 0.45 to 0.49 mmol/l. For comparison, the 95% confidence interval for the arithmetic mean using the raw, untransformed data is 0.48 to 0.54 mmol/l. These limits are wider than those for the geometric mean. This is because with highly skewed data the extreme observations have a large influence on the arithmetic mean, making it more prone to sampling error. Lessening this influence is one advantage of using transformed data.

If we use another transformation, such as the reciprocal or the square root,1 the same principle applies. We carry out all calculations on the transformed scale and transform back once we have calculated the confidence interval. This works for the sample mean and its confidence interval. Things become more complicated if we look at the difference between two means. We shall look at this in another Statistics Note.

  1. Bland JM, Altman DG. Transforming data. BMJ 1996;312:770. [Free Full Text]
  2. Bland JM, Altman DG. Logarithms. BMJ 1996;312:700. [Free Full Text]

This article has been cited by other articles:

  • Ting, H. H., Bradley, E. H., Wang, Y., Lichtman, J. H., Nallamothu, B. K., Sullivan, M. D., Gersh, B. J., Roger, V. L., Curtis, J. P., Krumholz, H. M. (2008). Factors Associated With Longer Time From Symptom Onset to Hospital Presentation for Patients With ST-Elevation Myocardial Infarction. Arch Intern Med 168: 959-968 [Abstract] [Full text]  
  • Kim, S. H., Reaven, G. M. (2008). Isolated Impaired Fasting Glucose and Peripheral Insulin Sensitivity: Not a simple relationship. Diabetes Care 31: 347-352 [Abstract] [Full text]  
  • Steel, P. D. G., Kammeyer-Mueller, J. (2008). Bayesian Variance Estimation for Meta-Analysis: Quantifying Our Uncertainty. Organizational Research Methods 11: 54-78 [Abstract]  
  • DeLeon Ortega, J. E., Sakata, L. M., Kakati, B., McGwin, G. Jr, Monheit, B. E., Arthur, S. N., Girkin, C. A. (2007). Effect of Glaucomatous Damage on Repeatability of Confocal Scanning Laser Ophthalmoscope, Scanning Laser Polarimetry, and Optical Coherence Tomography. IOVS 48: 1156-1163 [Abstract] [Full text]  
  • Toscano, M. J., Lay, D. C. Jr., Craig, B. A., Pajor, E. A. (2007). Assessing the adaptation of swine to fifty-seven hours of feed deprivation in terms of behavioral and physiological responses. J ANIM SCI 85: 441-451 [Abstract] [Full text]  
  • Glintborg, D., Andersen, M., Hagen, C., Frystyk, J., Hulstrom, V., Flyvbjerg, A., Hermann, A. P. (2006). Evaluation of metabolic risk markers in polycystic ovary syndrome (PCOS). Adiponectin, ghrelin, leptin and body composition in hirsute PCOS patients and controls.. Eur J Endocrinol 155: 337-345 [Abstract] [Full text]  
  • Storm, E. S., Miller, D. L., Hoover, L. J., Georgia, J. D., Bivens, T. (2006). Radiation Doses from Venous Access Procedures. Radiology 238: 1044-1050 [Abstract] [Full text]  
  • Nallamothu, B. K., Wang, Y., Magid, D. J., McNamara, R. L., Herrin, J., Bradley, E. H., Bates, E. R., Pollack, C. V. Jr, Krumholz, H. M., for the National Registry of Myocardial Infarction, (2006). Relation Between Hospital Specialization With Primary Percutaneous Coronary Intervention and Clinical Outcomes in ST-Segment Elevation Myocardial Infarction: National Registry of Myocardial Infarction-4 Analysis. Circulation 113: 222-229 [Abstract] [Full text]  
  • McNamara, R. L., Herrin, J., Bradley, E. H., Portnay, E. L., Curtis, J. P., Wang, Y., Magid, D. J., Blaney, M., Krumholz, H. M., for the NRMI Investigators, (2006). Hospital Improvement in Time to Reperfusion in Patients With Acute Myocardial Infarction, 1999 to 2002. J Am Coll Cardiol 47: 45-51 [Abstract] [Full text]  
  • Barnett, D. K., Bunnell, T. M., Millar, R. P., Abbott, D. H. (2006). Gonadotropin-Releasing Hormone II Stimulates Female Sexual Behavior in Marmoset Monkeys. Endocrinology 147: 615-623 [Abstract] [Full text]  
  • Magid, D. J., Wang, Y., Herrin, J., McNamara, R. L., Bradley, E. H., Curtis, J. P., Pollack, C. V. Jr, French, W. J., Blaney, M. E., Krumholz, H. M. (2005). Relationship Between Time of Day, Day of Week, Timeliness of Reperfusion, and In-Hospital Mortality for Patients With Acute ST-Segment Elevation Myocardial Infarction. JAMA 294: 803-812 [Abstract] [Full text]  
  • Teede, H. J., Dalais, F. S., Kotsopoulos, D., McGrath, B. P., Malan, E., Gan, T. E., Peverill, R. E. (2005). Dietary Soy Containing Phytoestrogens Does Not Activate the Hemostatic System in Postmenopausal Women. J. Clin. Endocrinol. Metab. 90: 1936-1941 [Abstract] [Full text]  
  • Bradley, E. H., Herrin, J., Wang, Y., McNamara, R. L., Webster, T. R., Magid, D. J., Blaney, M., Peterson, E. D., Canto, J. G., Pollack,, C. V. Jr, Krumholz, H. M. (2004). Racial and Ethnic Differences in Time to Acute Reperfusion Therapy for Patients Hospitalized With Myocardial Infarction. JAMA 292: 1563-1572 [Abstract] [Full text]  
  • Locatelli, L., Zivadinov, R., Grop, A., Zorzon, M. (2004). Frontal parenchymal atrophy measures in multiple sclerosis. Mult Scler 10: 562-568 [Abstract]  
  • Beeh, K. M., Beier, J., Koppenhoefer, N., Buhl, R. (2004). Increased Glutathione Disulfide and Nitrosothiols in Sputum Supernatant of Patients With Stable COPD. Chest 126: 1116-1122 [Abstract] [Full text]  
  • Fiorella, D., Heiserman, J., Prenger, E., Partovi, S. (2004). Assessment of the Reproducibility of Postprocessing Dynamic CT Perfusion Data. Am. J. Neuroradiol. 25: 97-107 [Abstract] [Full text]  
  • Baech, S. B, Hansen, M., Bukhave, K., Jensen, M., Sorensen, S. S, Kristensen, L., Purslow, P. P, Skibsted, L. H, Sandstrom, B. (2003). Nonheme-iron absorption from a phytate-rich meal is increased by the addition of small amounts of pork meat. Am. J. Clin. Nutr. 77: 173-179 [Abstract] [Full text]  
  • Baech, S. B., Hansen, M., Bukhave, K., Kristensen, L., Jensen, M., Sorensen, S. S., Purslow, P. P., Skibsted, L. H., Sandstrom, B. (2003). Increasing the Cooking Temperature of Meat Does Not Affect Nonheme Iron Absorption from a Phytate-Rich Meal in Women. J. Nutr. 133: 94-97 [Abstract] [Full text]  
  • Decensi, A., Omodei, U., Robertson, C., Bonanni, B., Guerrieri-Gonzaga, A., Ramazzotto, F., Johansson, H., Mora, S., Sandri, M. T., Cazzaniga, M., Franchi, M., Pecorelli, S. (2002). Effect of Transdermal Estradiol and Oral Conjugated Estrogen on C-Reactive Protein in Retinoid-Placebo Trial in Healthy Women. Circulation 106: 1224-1228 [Abstract] [Full text]  
  • Bland, J M., Altman, D. G (1996). Statistics Notes: The use of transformation when comparing two means. BMJ 312: 1153-1153 [Full text]  

Online poll
Find out more

Rapid responses for this article

There are no rapid responses for this article.


Student BMJ

Risk of surgery for inflammatory bowel disease: record linkage studies

What can you learn from this BMJ paper? Read Leanne Tite's Paper+

www.student.bmj.com

Listen to the latest BMJ Interview