Rapid responses are electronic comments to the editor. They enable our users
to debate issues raised in articles published on bmj.com. A rapid response
is first posted online. If you need the URL (web address) of an individual
response, simply click on the response headline and copy the URL from the
browser window. A proportion of responses will, after editing, be published
online and in the print journal as letters, which are indexed in PubMed.
Rapid responses are not indexed in PubMed and they are not journal articles.
The BMJ reserves the right to remove responses which are being
wilfully misrepresented as published articles or when it is brought to our
attention that a response spreads misinformation.
From March 2022, the word limit for rapid responses will be 600 words not
including references and author details. We will no longer post responses
that exceed this limit.
The word limit for letters selected from posted responses remains 300 words.
I enjoyed reading statistics notes by Douglas G Altman and J Martin
Bland, and understand their idea to clarify and simplify some important
topics in statistics for research medical workers. However, there are
certain aspects of statistics that have to be stated correctly and
precisely. One of them is the single, most important statement in
statistics, called Central Limit Theorem (CLT). Unfortunately, CLT is not
properly explained in many introductory books on statistics.
In the statistics notes Statistics notes "The normal distribution",
BMJ 1995; 310: 298, it is written that the meaning of CLT is that "the
means of random samples from any distribution will themselves have a
normal distribution".
However, this is not correct and contains two mistakes.
The CLT actually states that the distribution of the means of random
samples approaches normal distribution, when sample size increases, if
certain conditions are met.
For example, one of the versions of CLT has the condition that the
underlying population must have a finite mean and standard deviation. The
reason for imposing this condition is that there are some distributions
(for example, extremely dispersed Cauchy distribution) for which CLT does
not hold. Therefore, it is not true that CLT is valid for ANY population.
The second issue that needs to be clarified is about the true
distributions of the sample means (this distribution is called sampling
distribution in statistics). This distribution will NEVER exactly be
normal, unless the parent population is strictly normal! According to the
CLT we can only APPROXIMATE this distribution with the normal one.
Additionally, the approximation works better as the sample size increases.
Typically, researchers are advised to use large samples (over 30 units),
since in majority of practical situations CLT can be applied in these
cases.
Finally, even when the sample size tends to infinity, the
distribution of the sample means will not become normal, it will
degenerate into a single spike (since we will have all the information on
the population within a sample, and the sample mean will become exactly
the same as the population mean).
Note on the correct interpretation of the Central Limit Theorem
I enjoyed reading statistics notes by Douglas G Altman and J Martin
Bland, and understand their idea to clarify and simplify some important
topics in statistics for research medical workers. However, there are
certain aspects of statistics that have to be stated correctly and
precisely. One of them is the single, most important statement in
statistics, called Central Limit Theorem (CLT). Unfortunately, CLT is not
properly explained in many introductory books on statistics.
In the statistics notes Statistics notes "The normal distribution",
BMJ 1995; 310: 298, it is written that the meaning of CLT is that "the
means of random samples from any distribution will themselves have a
normal distribution".
However, this is not correct and contains two mistakes.
The CLT actually states that the distribution of the means of random
samples approaches normal distribution, when sample size increases, if
certain conditions are met.
For example, one of the versions of CLT has the condition that the
underlying population must have a finite mean and standard deviation. The
reason for imposing this condition is that there are some distributions
(for example, extremely dispersed Cauchy distribution) for which CLT does
not hold. Therefore, it is not true that CLT is valid for ANY population.
The second issue that needs to be clarified is about the true
distributions of the sample means (this distribution is called sampling
distribution in statistics). This distribution will NEVER exactly be
normal, unless the parent population is strictly normal! According to the
CLT we can only APPROXIMATE this distribution with the normal one.
Additionally, the approximation works better as the sample size increases.
Typically, researchers are advised to use large samples (over 30 units),
since in majority of practical situations CLT can be applied in these
cases.
Finally, even when the sample size tends to infinity, the
distribution of the sample means will not become normal, it will
degenerate into a single spike (since we will have all the information on
the population within a sample, and the sample mean will become exactly
the same as the population mean).
Competing interests:
None declared
Competing interests: No competing interests