Intended for healthcare professionals

Endgames Statistical Question

# Independent samples t test

BMJ 2010; 340 (Published 02 June 2010) Cite this as: BMJ 2010;340:c2673
1. Philip Sedgwick, senior lecturer in medical statistics
1. 1Centre for Medical and Healthcare Education, St George’s, University of London, London
1. p.sedgwick{at}sgul.ac.uk

A new programme of goal oriented visits from a rheumatologist was devised for patients in primary care with osteoarthritis of the knee.1 The new programme was assessed using a randomised controlled trial to ascertain whether it provided benefits in terms of weight management and physical activity. The control treatment was usual care. A total of 154 patients were randomised to the new programme (three goal oriented standardised consultations) and 182 to usual care.

At four months, the mean weight loss for patients who received standardised consultations was on average greater than for those receiving usual care (mean 1.11 (SD 2.49) kg v 0.37 (SD 2.39) kg; P=0.007). The two groups were statistically compared using a two sided, independent samples t test with a 0.05 (5%) critical level of significance (t=2.77, degrees of freedom=334).

Which of the following statements, if any, are true?

• a) Alternative hypothesis: in the total population of patients with knee osteoarthritis, standardised consultations will result in greater weight loss than will usual care

• b) The value of t is dependent on the magnitude of the difference between sample means

• c) The degrees of freedom are dependent on the sample size

• d) The P value is derived from the value of t and the degrees of freedom

Answers b, c, and d are true; a is false.

The independent samples t test, also known as the student’s t test, was used to establish whether the standardised consultations and usual care would result in equivalent mean weight change in the population. Statistical hypothesis testing has been described in previous questions.2 3

The above statistical hypothesis test was two sided. The null hypothesis would have specified no difference—that is, in the population from which the patients were selected, standardised consultations and usual care would result in the same mean weight change. The alternative hypothesis would have been two sided—that is, standardised consultations would result in greater, or smaller, mean weight change than usual care (a is false).

The P value represents the strength of the evidence in support of the null hypothesis.4 The above P value was derived using the t test statistic (t) plus the degrees of freedom (df) (d is true). Generally, as the difference between the sample means of the two treatment groups becomes larger, the absolute value of the t test statistic (that is if the test statistic is negative, the negative sign is ignored) increases, indicating increasing evidence against the null hypothesis (b is true). The degrees of freedom are equal to the total number of individuals in the two independent treatment groups minus two (c is true). In this example, therefore, the degrees of freedom are (154+182)−2=334.

To appreciate the derivation of the P value, consider hypothetical repeated sampling. The above study would be repeated an infinite number of times and under the same conditions—each time a sample of the same size would be selected at random from the entire population. For each sample, the treatment groups would have different sample means and standard deviations; therefore, the samples would have t test statistics of different values. The histogram of these test statistics is shown in the figure. Known as the t distribution, the graph is symmetrical about zero. Statistical hypothesis testing is based on the premise that we would reject the null hypothesis in favour of the alternative for 5% of these samples. For a two tailed t test, those samples with a test statistic most extreme in magnitude and represented by the 2.5% of test statistics that fell in each tail of the t distribution would result in the null hypothesis being rejected. For the distribution shown, those t test statistics with an absolute value greater than 1.97 represent the 2.5% of test statistics that fall in the tails of the distribution.

There is a unique t distribution for each degrees of freedom value. As the degrees of freedom increase, the tails of the t distribution flatten out, and the hump becomes “fatter” because it incorporates more of the distribution.

Histogram of t test statistics from 10 000 samples of exactly the same size as in the study described. If the number of samples was increased infinitely, the histogram could be approximated by the curve shown—a t distribution for 334 degrees of freedom. For the t distribution shown, 2.5% of the t test statistics have a value less than −1.97, and 2.5% have a value greater than 1.97

The alternative hypothesis for the above study is two tailed and, therefore, the possibility of a difference in the means in either direction (that is, positive or negative) is tested. The absolute value of the t test statistic (t=2.77) is referenced against the distribution in the figure. The total proportion under this curve bounded to the left and right of –2.77 and 2.77 respectively equals the P value for the above hypothesis test. Therefore, half of the P value (0.0007) is contained in each tail.

## Notes

Cite this as: BMJ 2010;340:c2673

## Footnotes

• Competing interests: None declared.

View Abstract