BMJ 1996;313:1200 (9 November)

Education and debate

Statistics Notes: Detecting skewness from summary information

Douglas G Altman, head,a J Martin Bland, professor of medical statistics b

a ICRF Medical Statistics Group, Centre for Statistics in Medicine, Institute of Health Sciences, PO Box 777, Oxford OX3 7LF, b Department of Public Health Sciences, St George's Hospital Medical School, London SW17 0RE

Correspondence to: Mr Altman.

As we have noted before, many statistical methods of analysis assume that the data have a normal distribution.1 When the data do not they can often be transformed to make them more normal.2 Readers of published papers may wish to be reassured that the authors have carried out an appropriate analysis. When authors present data in the form of a histogram or scatter diagram then readers can see at a glance whether the distributional assumption is met. If, however, only summary statistics are presented--as is often the case--this is much more difficult. If the summary statistics include the range of the data then some idea of the distribution may be gained. For example, a range from 7 to 41 around a mean of 15 suggests that the data have positive skewness. However, as the range is based on the two most extreme (and hence atypical) values this inference is not reliable. Similar asymmetry affecting the lower and upper quartiles3 would be much more convincing evidence of a skewed distribution. Usually, however, the only summary statistics presented are the mean and either the standard deviation or standard error. Such information cannot show that the data are near to a normal distribution, but they can sometimes show that they are not.

There are two useful tricks. The normal distribution extends beyond two standard deviations either side of the mean. It follows that for measurements which must be positive (like most of those encountered in medicine) if the mean is smaller than twice the standard deviation the data are likely to be skewed. Table 1 shows urinary cotinine levels related to number of cigarettes smoked daily. Clearly the data must be highly skewed, as the mean is smaller than the standard deviation in each group. This aspect of the data was not apparent in the original paper, which gave just the means and standard errors. (We added the standard deviations, derived simply as standard error x (square root)n.) As a consequence, the use of t tests was not easily seen to be incorrect.


Table 1--Urinary cotinine excretion (µg/mg creatinine)
related to number of cigarettes smoked daily4
---------------------------------------------------------
Cigarettes        No in
smoked per day    group   Mean      SE         SD
---------------------------------------------------------
1-9                 25    0.31     0.08       0.40
10-19               57    0.42     0.10       0.75
20-29               99    0.87     0.19       1.89
30-39               38    1.03     0.25       1.54
>40                 28    1.56     0.57       3.02
Unspecified         25    0.56     0.16       0.80

The second indicator of skewness can be used when, as in table 1, there are data for several groups of individuals. As we have noted,2 deviations from the normal distribution and a relation between the standard deviation and mean across groups often go together. If the standard deviation increases as the mean increases then this is a good indication that the data are positively skewed, and specifically that a log transformation may be needed.2 There is a clear relation between mean and standard deviation for the cotinine data. As we have noted, log transformation often removes skewness and makes the standard deviations more similar.

In this example we can detect skewness from summary statistics, but we cannot tell what the effect of log transformation would have been. That requires the raw data.

  1. Altman DG, Bland JM. The normal distribution. BMJ 1995;310:298. [Free Full Text]
  2. Bland JM, Altman DG. Transforming data. BMJ 1996;312:770. [Free Full Text]
  3. Altman DG, Bland JM. Quartiles, quintiles, centiles and other quantiles. BMJ 1994;309:996. [Free Full Text]
  4. Matsukura S, Taminato T, Kitano N, Seino Y, Hamada H, Uchihashi M, et al. Effects of environmental tobacco smoke on urinary cotinine excretion in nonsmokers. N Engl J Med 1984;311:828-32. [Abstract]

Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?

This article has been cited by other articles:

  • Barbui, C., Signoretti, A., Mule, S., Boso, M., Cipriani, A. (2008). Does the Addition of a Second Antipsychotic Drug Improve Clozapine Treatment?. Schizophr Bull 0: sbn030v1-sbn030 [Abstract] [Full text]  
  • Barbui, C. MD, Furukawa, T. A. MD, Cipriani, A. MD (2008). Effectiveness of paroxetine in the treatment of acute major depression in adults: a systematic re-examination of published and unpublished data from randomized trials. CMAJ 178: 296-305 [Abstract] [Full text]  
  • Rathbone, J., Zhang, L., Zhang, M., Xia, J., Liu, X., Yang, Y., Adams, C. E. (2007). Chinese herbal medicine for schizophrenia: Cochrane systematic review of randomised trials. Br. J. Psychiatry 190: 379-384 [Abstract] [Full text]  
  • Salim, R., Ben-Shlomo, I., Nachum, Z., Mader, R., Shalev, E. (2005). The Incidence of Large Fetomaternal Hemorrhage and the Kleihauer-Betke Test. Obstet Gynecol 105: 1039-1044 [Abstract] [Full text]  
  • Vale, S. (2004). Is Methylphenidate an Irreplaceable Therapy for the Fatigued Cancer Patient?. JCO 22: 4028-4028 [Full text]  
  • NOSE, M., BARBUI, C., GRAY, R., TANSELLA, M. (2003). Clinical interventions for treatment non-adherence in psychosis: meta-analysis. Br. J. Psychiatry 183: 197-206 [Abstract] [Full text]  
  • MAMMEN, P., GEORGE, C., THARYAN, P. (2001). Questions About Reasons for Living. Am. J. Psychiatry 158: 1331-1332 [Full text]  
  • SIMMONDS, S., COID, J., JOSEPH, P., MARRIOTT, S., TYRER, P. (2001). Community mental health team management in severe mental illness: a systematic review. Br. J. Psychiatry 178: 497-502 [Abstract] [Full text]  
  • BARBUI, C., HOTOPF, M. (2001). Amitriptyline v. the rest: still the leading antidepressant after 40 years of randomised controlled trials. Br. J. Psychiatry 178: 129-144 [Abstract] [Full text]  
  • Wahlbeck, K., Cheine, M., Essali, A., Adams, C. (1999). Evidence of Clozapine's Effectiveness in Schizophrenia: A Systematic Review and Meta-Analysis of Randomized Trials. Am. J. Psychiatry 156: 990-999 [Abstract] [Full text]  



Student BMJ

Risk of surgery for inflammatory bowel disease: record linkage studies

What can you learn from this BMJ paper? Read Leanne Tite's Paper+

www.student.bmj.com

Listen to the latest BMJ Interview