Statistics notes: Calculating correlation coefficients with repeated observations: Part 1—correlation within subjects

J Martin Bland; Douglas G Altman

doi:10.1136/bmj.310.6977.446

General Practice

Statistics notes: Calculating correlation coefficients with repeated observations: Part 1—correlation within subjects

BMJ 1995; 310 doi: https://doi.org/10.1136/bmj.310.6977.446 (Published 18 February 1995) Cite this as: BMJ 1995;310:446

J Martin Bland, reader in medical statisticsa,
Douglas G Altman, headb

^a Department of Public Health Sciences, St George's Hospital Medical School, London SW17 0RE
^b Medical Statistics Laboratory, Imperial Cancer Research Fund, PO Box 123, London WC2A 3PX

Correspondence to: Dr Bland.

In an earlier Statistics Note1 we commented on the analysis of paired data where there is more than one observation per subject, as shown in table I. We pointed out that it could be highly misleading to analyse such data by combining repeated observations from several subjects and then calculating the correlation coefficient as if the data were a simple sample. This note is a response to several letters about the appropriate analysis for such data.

TABLE I

Repeated measurements of intramural p^H and PaCO² for eight subjects²

View this table:

The choice of analysis for the data in table I depends on the question we want to answer. If we want to know whether subjects with high values of intramural pH also tend to have high values of PaCO₂ we are interested in whether the average pH for a subject is related to the subject's average PaCO₂. We can use the correlation between the subject means, which we shall describe in a subsequent note. If we want to know whether an increase in pH within the individual was associated with an increase in PaCO₂ we want to remove the differences between subjects and look only at changes within.

To look at variation within the subject we can use multiple regression. We make one of our variables, pH or PaCO₂, the outcome variable and the other variable and the subject the predictor variables. Subject is treated as a categorical factor using dummy variables3 4 and so has seven degrees of freedom. We use the analysis of variance table3 4 for the regression (table II), which shows how the variability in pH can be partitioned into components due to different sources. This method is also known as analysis of covariance and is equivalent to fitting parallel lines through each subject's data (see figure). The residual sum of squares in table II represents the variation about these lines. We remove the variation due to subjects (and any other nuisance variables which might be present) and express the variation in pH due to PaCO₂ as a proportion of what's left: (Sum of squares for PaCO₂)/(Sum of squares for PaCO₂ + residual sum of squares) The magnitude of the correlation coefficient within subjects is the square root of this proportion. For table II this is: (square root) 0.1153/0.1153+0.3337 = 0.51 The sign of the correlation coefficient is given by the sign of the regression coefficient for PaCO₂. Here the regression slope is -0.108, so the correlation coefficient within subjects is -0.51. The P value is found either from the F test in the associated analysis of variance table, or from the t test for the regression slope. It doesn't matter which variable we regress on which; we get the same correlation coefficient and P value either way.

TABLE II

Analysis of variance for the data in table I

View this table:

pH against PaCO₂ for eight subjects, with parallel lines fitted for each subject

If we incorrectly calculate the correlation coefficient ignoring the fact that we have 47 observations on only 8 subjects, we get -0.07, P=0.7. Hence the correct analysis within subjects reveals a relation which the incorrect analysis misses.

References

1.↵
1. Bland JM,
2. Altman DG
.Correlation, regression, and repeated data.BMJ1994;308: 896.
OpenUrl FREE Full Text
2.↵
1. Boyd O,
2. Mackay CJ,
3. Lamb G,
4. Bland JM,
5. Grounds RM,
6. Bennett ED
.Comparison of clinical information gained from routine blood-gas analysis and from gastric tonometry for intramural pH.Lancet1993;341:142–6.
OpenUrl CrossRef PubMed Web of Science
3.↵
1. Altman DG
.Practical statistics for medical research. London: Chapman and Hall,1991.
4.↵
1. Armitage P,
2. Berry G
.Statistical methods in medical research.3rd ed. Oxford: Blackwell,1994.

View Abstract

[1] 1.↵
Bland JM,
Altman DG
.Correlation, regression, and repeated data.BMJ1994;308: 896.
OpenUrl FREE Full Text

[2] Bland JM,

[3] Altman DG

[4] 2.↵
Boyd O,
Mackay CJ,
Lamb G,
Bland JM,
Grounds RM,
Bennett ED
.Comparison of clinical information gained from routine blood-gas analysis and from gastric tonometry for intramural pH.Lancet1993;341:142–6.
OpenUrl CrossRef PubMed Web of Science

[5] Boyd O,

[6] Mackay CJ,

[7] Lamb G,

[8] Bland JM,

[9] Grounds RM,

[10] Bennett ED

[11] 3.↵
Altman DG
.Practical statistics for medical research. London: Chapman and Hall,1991.

[12] Altman DG

[13] 4.↵
Armitage P,
Berry G
.Statistical methods in medical research.3rd ed. Oxford: Blackwell,1994.

[14] Armitage P,

[15] Berry G

Statistics notes: Calculating correlation coefficients with repeated observations: Part 1—correlation within subjects

References

Article alerts

Log in or register:

Download this article to citation manager

Help

Forward this page

Content links

About us

Resources

Explore BMJ

My account

Information

Search form

Statistics notes: Calculating correlation coefficients with repeated observations: Part 1—correlation within subjects

References

Article alerts

Log in or register:

Download this article to citation manager

Help

Forward this page

Content links

About us

Resources

Explore BMJ

My account

Information