How to read a paper: Statistics for the non-statistician. I: Different types of data need different statistical tests
BMJ 1997; 315 doi: https://doi.org/10.1136/bmj.315.7104.364 (Published 09 August 1997) Cite this as: BMJ 1997;315:364
All rapid responses
As a post graduate student of Pharmacology, in 1997, understanding
and applying statistics to clinical reserach was a subject that most my
tribe frequently struggled with. For every incremental "a-ha" moment of
greater understanding and wisdom, there were many more times and occasions
of self-acknowledged lack of knowledge, and sometimes, complete ignorance
(which in hindsight, truly was bliss). I can recollect, on more than one
occasion, having passionate discussions on whether the Chi Square Test is
parametric or non-parametric. A decade later, as a read through the the
rapid responses (and the straightforward, logical explanation) of Ms
Kathleen M. Koehler to Ms Trisha Greenhalgh's correction, I seem almost
convinced and yet somewhat confused. In bio-medical research, do
discontinuous counts and frequencies qualify as "data with distributions",
and is the Chi Square Test parametric after all? I believe that the the
jury is still out on this, and I patiently await the verdict.
Dr. Aamir Shaikh (MD,DPBM)
aamirshaikh@assansa.com
Competing interests:
None declared
Competing interests: No competing interests
In Table 1 of this article, the description of, and example for, the
method, "One-way Analysis of Variance" are actually appropriate for
Repeated Measures Analysis of Variance. Standard one-way ANOVA is a
generalization of the independent samples t-test (not the paired t-test).
Later in this otherwise well-written article, the author equates a
residual with the perpendicular distance from the line. A residual is, in
fact, seen as a vertical distance from the fitted line.
Competing interests:
None declared
Competing interests: No competing interests
Possibly, a correction is needed for the published correction to this
article. The correction states that both the chi-squared test and
Fisher’s exact test are non-parametric. However, I believe the correct
statement is that they are both parametric.
The definitions of parametric and non-parametric are below, taken
from the original BMJ article. The description in the BMJ table of chi-
squared and Fisher’s exact test is: “Tests the null hypothesis that the
distribution of a discontinuous variable is the same in two (or more)
independent samples.” Thus both tests are based on a distribution, and
this is the definition of a parametric test. The chi-squared test is
based on a normal approximation to the binomial distribution, the Fisher’s
exact test is based on a hypergeometric distribution. Fisher’s exact test
can be used when sample size (and number of counts per cell) is too small
for a chi-squared test.
The confusion probably arises from the second part of the test
description. Both of these tests are for discontinuous variables, that
is, counts. The fact that the tests are for discontinuous data may make
it seem as if the tests are non-parametric. However, the classification
as parametric or non-parametric does not relate to whether the data are
discontinuous, but to whether the test is based on a distribution. Since
these two tests are based on distributions, they are parametric.
Additional reference: Fundamentals of Biostatistics, 4th ed, by Bernard
Rosner, Duxbury Press, Belmont, CA, 1995.
Definitions from original BMJ article:
All statistical tests are either parametric (that is, they assume
that the data were sampled from a particular form of distribution, such as
a normal distribution) or non-parametric (they make no such assumption).
In general, parametric tests are more powerful than non-parametric ones
and so should be used if possible.
Non-parametric tests look at the rank order of the values (which one
is the smallest, which one comes next, and so on) and ignore the absolute
differences between them. As you might imagine, statistical significance
is more difficult to show with non-parametric tests, and this tempts
researchers to use statistics such as the r value inappropriately. Not
only is the r value (parametric) easier to calculate than its non-
parametric equivalent but it is also much more likely to give (apparently)
significant results. Unfortunately, it will give a spurious estimate of
the significance of the result, unless the data are appropriate to the
test being used. More examples of parametric tests and their non-
parametric equivalents are given in table 1).
Kathleen M. Koehler, PhD, MPH
Epidemiologist
Food and Drug Administration,
Center for Food Safety and Applied Nutrition,
5100 Paint Branch Parkway,
College Park, MD 20740-3835
kkoehler@cfsan.fda.gov
Competing interests: No competing interests
How to read a paper: Statistics for the non-statistician. Comment to
Greenhalgh
Emili Garcia-Berthou, professor of biostatistics
Prof. Greenhalgh provided helpful guidance for the critical reading
of medical papers. Although the references given are excellent, his two
statistical articles need improvement. The most obvious corrections (see
also BMJ 1997;315:675) are detailed below.
- The parametric equivalent of Wilcoxon paired-sample test is the
(two) paired-sample t test (to be distinguished from the one-sample t
test) (Table 1, p. 365).
- Spelling error: Kruskal-Wallis test (not Kruskall-Wallis) (Table
1).
- The one-way analysis of variance (ANOVA) and Kruskal-Wallis test
are the generalisations to three or more groups of the two independent-
sample tests (t and Mann-Whitney tests). The generalisations of the paired
-sample tests (t and Wilcoxon tests) are the randomised-block designs
(ANOVA and Friedman test, respectively) (Table 1). The example given (to
determine whether plasma glucose level is higher one, two, or three hours
after a meal) should be analysed as a randomised-block design or a
repeated-measure design. The example given for a two-way ANOVA is also
misleading because it involves blocks (people) and two factors (hour and
sex).
- The term "covariate" is usually restricted to the analysis of
covariance or regression analysis; "factor" is used for two-way ANOVA
(Table 1).
- The usual symbol for Spearman rank correlation coefficient is rs,
with s subscript (Table 1).
- The term "skew normal distribution" is not commonly used and should
be replaced with "skewed distribution". (p. 365)
- The residuals of (standard) regression analysis are the vertical
(not the perpendicular) distance from each point to the line. (p. 365)
- Although fixing a nominal level of significance is unnecessary and
the 5% level is arbitrary, a result is generally considered significant if
P =< 0.05 (including P = 0.05) (p. 423).
- The normality assumption was much emphasised throughout the
articles (p. 365, 422, 425) at the expense of the more relevant
assumptions of homoscedasticity (which was not even mentioned!) and (in
linear regression) linearity. Nor it was stated that nonparametric methods
also entail similar assumptions. This complex question was treated too
simplistically.
Competing interests: No competing interests
Re: How to read a paper: Statistics for the non-statistician. I: Different types of data need different statistical tests
As non-statisticians we could only understand that t-test is applicable on continuous data (i.e. when means are compared) and chi-square when we have categorical or nominal data (i.e when proportions are compared).
Competing interests: No competing interests