t-Tests and Rank Sum Tests
To t-Test or Not
The article by Thompson and Barber (BMJ 2000;320:1197-1200) discusses
the relative merits of t-Tests and rank sum tests. They demonstrate that
the former strictly requires that the data are approximately normally
distributed but it is very robust. The latter requires that the
distributions of the data in the samples are similar (Conover 1980).
Otherwise the two methods are similar, the rank test being like a t-Test
on the ranks (Conover 1980). However, as the authors point out, when it
comes to using averages and their differences and the ranges of values and
differences that are supported by the data, the method that employs means
is clearly more useful in most situations.
It is good that these authors have brought to attention problems with
analysis of the sort of data people working in quality improvement in
hospitals have to deal with regularly. However, I think that the
discussion needs to go further if we are to learn what we can from our
The first figure in their paper shows costs of two methods of
treatment. It is clear that the two distributions differ in shape and
therefore one might expect the rank test to be inappropriate. However, if
we look at the first group there seems to be a subgroup that does very
well, at least from a cost perspective, and another group that does not.
It may be more useful to try to determine the characteristics of these
subgroups than to perform significance tests. If there are in fact two
distributions in the first group, it may not make sense to compare them in
together, with another group.
A further aspect can be illustrated by two groups of length of stay
data from the same diagnosis related group; there were 151 patients in the
first group and 210 in the second. A simple chart (not shown) revealed
that the two distributions had similar shapes but slightly different
locations; as is usual with these data they had a marked positive skew. In
addition, like most length of stay data there were many tied values at
most lengths of stay.
The difference between the means of the two groups was 0.102 days
(95% CI -2.074 to 2.304) and the average Bca bootstrap 95% CI values were,
for 5 runs each of 1000,
-2.084 to 2.208. The t-Test was clearly not significant. However, the rank
sum test value was Z=2.164, P=0.03 (Bayes factor 1/10, see Goodman 1999),
suggesting that a difference may have existed. The difference between the
medians was one day.
Estimators for the rank sum test are available in CIA (Altman,
Machin, Bryant and Gardner 2000), and the 95% CI for both the binomial and
Wilcoxon methods was 0 to 2; this was also the bootstrap value. For these
data with many tied values, it may be better to think of them as being in
ordered categories. A suitable estimator for the rank sum test with
ordered categorical data is the RIDIT (Fleiss 1981). The mean RIDIT was
0.567 (95% CI 0.508 to 0.625) using group 2 as the reference group. This
means that a group 1 patient had odds of 0.567/(1-0.567) or 1.31 to 1 of
having a longer length of stay than a comparable group 2 patient (Fleiss
This is a quite modest difference and it may be of no practical
importance. However, if it had occurred that the group 2 data had come
from a hospital that went to extra lengths with its discharge planning and
the group 1 data from a hospital that did not, and that it was likely that
this modest difference also occurred with other high volume diagnosis
related groups, the view of its importance might change.
Thompson S and Barber J “How Should Cost Data in Pragmatic Randomised
Controls be Analysed” British Medical Journal 2000;320:1197-2000.
Conover W “Practical Nonparametric Statistics” Wiley New York 2nd edition
Goodman S “Toward Evidence-Based Medical Statistics 2: The Bayes Factor”
Annals of Internal Medicine 1999;130:1005-1013.
Altman D, Machin D, Bryant T, and Gardner M “Statistics with Confidence”
2nd edition British Medical Journal London 2000.
Fleiss J “Statistical Methods for Rates and Proportions” John Wiley and
Sons New York 1981.
Competing interests: No competing interests