Endgames Statistical Question

# Skewed distributions

BMJ 2012; 345 doi: https://doi.org/10.1136/bmj.e7534 (Published 08 November 2012) Cite this as: BMJ 2012;345:e7534
1. Philip Sedgwick, reader in medical statistics and medical education
1. 1Centre for Medical and Healthcare Education, St George’s, University of London, London, UK
1. p.sedgwick{at}sgul.ac.uk

The effectiveness of early abdominopelvic computed tomography in patients with acute abdominal pain of unknown cause was evaluated using a randomised controlled trial study design. Computed tomography was delivered within 24 hours of admission. Control treatment was standard practice (radiological investigations as indicated). In total, 55 patients were randomised to early computed tomography and 55 to control treatment.1

The main outcome measures included length of hospital stay. The mean length of hospital stay was 6.6 (standard deviation 5.8) days for the early computed tomography arm and 9.2 (9.8) days for the standard practice arm.

Which of the following statements, if any, are true?

• a) The distribution of length of hospital stay for the intervention was skewed to the right

• b) The sample mean of length of hospital stay for the intervention was smaller than the sample median

• c) Treatment groups would be compared in length of hospital stay using the independent samples t test

Statement a is true, whereas b and c are false.

The spread of sample measurements for the outcome measure length of hospital stay can be described by the sample mean and standard deviation. As described in a previous question,2 about 68% or more of the observations of length of hospital stay will lie no further than one standard deviation away from the sample mean. In addition, about 95% or more will lie no further than two standard deviations away and about 99% or more no further than three standard deviations away from the sample mean. These inferences can be made regardless of the shape of the distribution of the sample measurements of length of hospital stay—that is, whether it is normal or skewed.

For the intervention group the length of hospital stay was 6.6 days with a sample standard deviation of 5.8 days. Therefore, about 68% or more of the intervention group would have had a length of hospital stay between (6.6−5.8) and (6.6+5.8) days—that is, between 0.8 and 12.4 days. About 95% or more of the intervention group would have had a length of hospital stay between (6.6−2(5.8)) and (6.6+2(5.8)) days—that is, between −5.0 and 18.2 days. Furthermore, about 99% or more of the intervention group would have had a length of hospital stay between (6.6−3(5.8)) and (6.6+3(5.8)) days—that is, between −10.8 and 24 days.

Two of the derived ranges contain values that are not possible because length of hospital stay cannot be negative. Therefore, about 95% or more of the intervention group had a length of stay between zero and 18.2 days and about 99% had a length of stay between zero and 24 days. Because the sample mean was smaller than the midpoint of the range, the distribution of length of stay was positively skewed (a is true).

Although not presented, the histogram of the spread of length of hospital stay would have a long tail to the right that incorporated some large values, but with the bulk of the observations concentrated to the left but above zero. It was possible to detect that the distribution was positively skewed from the summary statistics—in particular, the sample mean was less than twice the value of the sample standard deviation. For the control treatment it was possible to detect that the distribution of length of hospital stay was positively skewed because the sample deviation was larger in value than the sample mean. Because length of hospital stay was positively skewed, the arithmetic mean would be disproportionally raised by a small number of high values in the right hand tail of the distribution. Therefore, the sample mean will be larger than the median for both treatment groups (b is false); the researchers reported that the median length of stay for both treatment groups was 5 days.

Negative and positive skewed distributions have been described in a previous question.3 The left hand tail of a negative skewed distribution would incorporate some low scores and is longer than the tail on the right. In a negative skewed distribution the bulk of scores are concentrated to the right of the distribution.

The independent samples t test, sometimes known as the Student’s t test, compares the mean of a variable measured on a continuous scale between two independent groups. Described in a previous question,4 it is a parametric test, and to enable the test to be applied, assumptions are made about the data. Parametric tests have been described in a previous question,5 and such tests assume that the variable to be compared is normally distributed in both groups. Therefore, the independent samples t test would not have been used to compare the treatment groups (c is false).

A non-parametric test could have been performed that did not make assumptions about the distribution of the data. The Wilcoxon rank sum test or Mann-Whitney U test, described in a previous question,6 could have been used. The researchers of the above trial reported that comparison of treatment groups using the Mann-Whitney test gave P=0.20, indicating that the treatment groups did not differ significantly in length of hospital stay. Alternatively, the observations of length of hospital stay could have been log transformed, as described in a previous question,7 to ascertain whether the transformed data met the assumptions of the independent samples t test.

Many statistical methods that compare an outcome measure between treatment groups assume that the data are normally distributed. However, a histogram of the outcome measures will not usually be presented, so it is not possible to verify whether distributional assumptions have been met. It is standard practice to present the mean and standard deviation for the outcome measures. Although it is not possible to ascertain if an outcome measure is normally distributed from such information, it is possible sometimes—as illustrated above—to show that it is not.

## Notes

Cite this as: BMJ 2012;345:e7534

## Footnotes

• Competing interests: None declared.

View Abstract