Statistics notes: The normal distributionBMJ 1995; 310 doi: https://doi.org/10.1136/bmj.310.6975.298 (Published 04 February 1995) Cite this as: BMJ 1995;310:298
All rapid responses
I enjoyed reading statistics notes by Douglas G Altman and J Martin
Bland, and understand their idea to clarify and simplify some important
topics in statistics for research medical workers. However, there are
certain aspects of statistics that have to be stated correctly and
precisely. One of them is the single, most important statement in
statistics, called Central Limit Theorem (CLT). Unfortunately, CLT is not
properly explained in many introductory books on statistics.
In the statistics notes Statistics notes "The normal distribution",
BMJ 1995; 310: 298, it is written that the meaning of CLT is that "the
means of random samples from any distribution will themselves have a
However, this is not correct and contains two mistakes.
The CLT actually states that the distribution of the means of random
samples approaches normal distribution, when sample size increases, if
certain conditions are met.
For example, one of the versions of CLT has the condition that the
underlying population must have a finite mean and standard deviation. The
reason for imposing this condition is that there are some distributions
(for example, extremely dispersed Cauchy distribution) for which CLT does
not hold. Therefore, it is not true that CLT is valid for ANY population.
The second issue that needs to be clarified is about the true
distributions of the sample means (this distribution is called sampling
distribution in statistics). This distribution will NEVER exactly be
normal, unless the parent population is strictly normal! According to the
CLT we can only APPROXIMATE this distribution with the normal one.
Additionally, the approximation works better as the sample size increases.
Typically, researchers are advised to use large samples (over 30 units),
since in majority of practical situations CLT can be applied in these
Finally, even when the sample size tends to infinity, the
distribution of the sample means will not become normal, it will
degenerate into a single spike (since we will have all the information on
the population within a sample, and the sample mean will become exactly
the same as the population mean).
Competing interests: No competing interests