- a Department of Public Health Sciences, St George's Hospital Medical School, London SW17 0RE
- b ICRF Medical Statistics Group, Centre for Statistics in Medicine, Institute of Health Sciences, PO Box 777, Oxford OX3 7LF
- Correspondence to: Professor Bland.
When we use transformed data in analyses,1 this affects the final estimates that we obtain. Figure 1 shows some serum triglyceride measurements, which have a skewed distribution. A logarithmic transformation is often useful for data which have positive skewness like this, and here the approximation to a normal distribution is greatly improved. For the untransformed data the mean is 0.51 mmol/l and the standard deviation 0.22 mmol/l. The mean of the log10 transformed data is -0.33 and the standard deviation is 0.17. If we take the mean on the transformed scale and back transform by taking the antilog, we get 10-0.33=0.47 mmol/l. We call the value estimated in this way the geometric mean. The geometric mean will be less than the mean of the raw data.
Serum triglyceride and log10 serum triglyceride concentrations in cord blood for 282 babies, with best fitting normal distribution
When triglyceride is measured in mmol/l the log of a single observation is the log of a measurement in mmol/l. The average of n such transformed measurements is also the log of a number in mmol/l, so the antilog is back in the original units, mmol/l.
The antilog of the standard deviation, however, is not measured in mmol/l. Calculation of the standard deviation of the log transformed data requires taking the difference between each log observation and the log geometric mean. The difference between the log of two numbers is the log of their ratio.2 As a ratio is a dimensionless pure number, the units in which serum triglyceride was measured would not matter; the standard deviation on the log scale would be the same. As a result, we cannot transform the standard deviation back to the original scale.
If we want to use the standard deviation or standard error it is easiest to do all calculations on the transformed scale and transform back, if necessary, at the end. For example, the 95% confidence interval for the mean on the log scale is -0.35 to -0.31. To get back to the original scale we antilog the confidence limits on the log scale to give a 95% confidence interval for the geometric mean on the natural scale (0.47) of 0.45 to 0.49 mmol/l. For comparison, the 95% confidence interval for the arithmetic mean using the raw, untransformed data is 0.48 to 0.54 mmol/l. These limits are wider than those for the geometric mean. This is because with highly skewed data the extreme observations have a large influence on the arithmetic mean, making it more prone to sampling error. Lessening this influence is one advantage of using transformed data.
If we use another transformation, such as the reciprocal or the square root,1 the same principle applies. We carry out all calculations on the transformed scale and transform back once we have calculated the confidence interval. This works for the sample mean and its confidence interval. Things become more complicated if we look at the difference between two means. We shall look at this in another Statistics Note.

CiteULike
Connotea
Del.icio.us
Digg
Facebook
Reddit
Technorati
Twitter
Stumbleupon
Rapid responses
Latest Responses
The decline in the breast cancer incidence is 1.2% and it is not significant.
Published 10 February 2012
'twas ever thus
Published 10 February 2012
The value of historic human remains
Published 10 February 2012
In Praise of British Literature
Published 10 February 2012
Is real shared decision making possible?
Published 10 February 2012
Most responses
Does anyone understand the government’s plan for the NHS? (17 responses)
Published 17 Jan 2012
Bad medicine: medical nutrition (15 responses)
Published 18 Jan 2012
Shared decision making: really putting patients at the centre of healthcare (7 responses)
Published 27 Jan 2012
Why legislation is necessary for my health reforms (7 responses)
Published 1 Feb 2012
Search for evidence goes on (5 responses)
Published 17 Jan 2012