BMJ 1995;310:633 (11 March)

Statistics notes

Calculating correlation coefficients with repeated observations: Part 2--correlation between subjects

J Martin Bland, reader in medical statistics,a Douglas G Altman, head b

a Department of Public Health Sciences, St George's Hospital Medical School, London SW17 0RE, b Medical Statistics Laboratory, Imperial Cancer Research Fund, PO Box 123, London WC2A 3PX

Correspondence to: Dr Bland.

This is the thirteenth in a series of occasional notes on medical statistics

In earlier Statistics Notes1 2 we commented on the analysis of paired data where there is more than one observation per subject. It can be highly misleading to analyse such data by combining repeated observations from several subjects and then calculating the correlation coefficient as if the data were a simple sample.1 The appropriate analysis depends on the question we wish to answer. If we want to know whether an increase in one variable within the individual is associated with an increase in the other we can calculate the correlation coefficient within subjects.2 If we want to know whether subjects with high values of one variable also tend to have high values of the other we can use the correlation between the subject means, which we shall describe here.


Means of repeated measurements of intramural pH and Paco2 for
eight subjects3
---------------------------------------
Subject     pH       Paco2 Number
---------------------------------------
1           6.49      4.04         4
2           7.05      5.37         4
3           7.36      4.83         9
4           7.33      5.31         5
5           7.31      4.40         8
6           7.32      4.92         6
7           6.91      6.60         3
8           7.12      4.78         8

The table shows the mean pH and Paco2 for each of eight subjects, with the number of pairs of observations for each. The 47 pairs of measurements from which these means were calculated were given previously.2 Here we are interested in whether the average pH for a subject is related to the subject's average Paco2.

We can calculate the usual correlation coefficient for the mean pH and mean Paco2. For the data in the table this gives r=0.09, P=0.8.

This analysis does not take into account the different numbers of measurements on each subject. Whether this matters depends on how different the numbers of observations are and whether the measurements within subjects vary much compared with the means between subjects. We can calculate a weighted correlation coefficient, using the number of observations as weights. Many computer programs will calculate this, but it is not difficult to do by hand.

We denote the mean pH and Paco2 for subject i by xi and yi, the number of observations for subject i by mi, and the number of subjects by n. It is fairly obvious4 that the weighted mean of the xi is (summation)mixi/(summation)mi. In the usual case, where there is one observation per subject, the mi are all one and this formula gives the usual mean (summation)xi/n.

An easy way to calculate the weighted correlation coefficient is to replace each individual observation by its subject mean. Thus the table would yield 47 pairs of observations, the first four of which would each be pH=6.49 and Paco2=4.04, and so on. If we use the usual formula for the correlation coefficient on the expanded data we will get the weighted correlation coefficient. However, we must be careful when it comes to the P value. We have only 8 observations (n in general), not 47. We should ignore any P value printed by our computer program, and use a statistical table instead.

The actual formula for a weighted correlation coefficient is: (summation)mixiyi - (summation)mixi(summation)miyi/(summation)mi ((summation)mix2i - ((summation)mixi)2/(summation)mi) ((summation)miyi - ((summation)miyi)2/(summation)mi) where all summations are from i=1 to n. When all the mi are equal they cancel out, giving the usual formula for a correlation coefficient.

For the data in the table the weighted correlation coefficient is r=0.08, P=0.9. There is no evidence that subjects with a high pH also have a high Paco2. However, as we have already shown,2 within the subject a rise in pH was associated with a fall in Paco2.

  1. Bland JM, Altman DG. Correlation, regression and repeated data. BMJ 1994;308:896. [Free Full Text]
  2. Bland JM, Altman DG. Calculating correlation coefficients with repeated observations: Part 1--correlation within subjects. BMJ 1995;310:446. [Free Full Text]
  3. Boyd O, Mackay CJ, Lamb G, Bland JM, Grounds RM, Bennett ED. Comparison of clinical information gained from routine blood-gas analysis and from gastric tonometry for intramural pH. Lancet 1993;341:142-6. [Medline]
  4. Armitage P, Berry G. Statistical methods in medical research. 3rd ed. Oxford: Blackwell, 1994:215.

This article has been cited by other articles:

  • Hoole, S. P., Boyd, J., Ninios, V., Parameshwar, J., Rusk, R. A. (2008). Measurement of cardiac output by real-time 3D echocardiography in patients undergoing assessment for cardiac transplantation. Eur J Echocardiogr 9: 334-337 [Abstract] [Full text]  
  • Jenkins, C., Monaghan, M., Shirali, G., Guraraja, R., Marwick, T. H. (2008). An intensive interactive course for 3D echocardiography: is 'crop till you drop' an effective learning strategy?. Eur J Echocardiogr 9: 373-380 [Abstract] [Full text]  
  • Smart, D J, Gill, N D, Beaven, C M, Cook, C J, Blazevich, A J (2008). The relationship between changes in interstitial creatine kinase and game-related impacts in rugby union. Br. J. Sports. Med. 42: 198-201 [Abstract] [Full text]  
  • Yamashita, K., Okabayashi, T., Yokoyama, T., Yatabe, T., Maeda, H., Manabe, M., Hanazaki, K. (2008). The Accuracy of a Continuous Blood Glucose Monitor During Surgery. Anesth. Analg. 106: 160-163 [Abstract] [Full text]  
  • Sesay, M., Tauzin-Fin, P., Gosse, P., Ballanger, P., Maurette, P. (2008). Real-Time Heart Rate Variability and Its Correlation with Plasma Catecholamines During Laparoscopic Adrenal Pheochromocytoma Surgery. Anesth. Analg. 106: 164-170 [Abstract] [Full text]  
  • Jenkins, C., Chan, J., Bricknell, K., Strudwick, M., Marwick, T. H. (2007). Reproducibility of Right Ventricular Volumes and Ejection Fraction Using Real-time Three-Dimensional Echocardiography: Comparison With Cardiac MRI. Chest 131: 1844-1851 [Abstract] [Full text]  
  • Yotti, R., Bermejo, J., Desco, M. M., Antoranz, J. C., Rojo-Alvarez, J. L., Cortina, C., Allue, C., Rodriguez-Abella, H., Moreno, M., Garcia-Fernandez, M. A. (2005). Doppler-Derived Ejection Intraventricular Pressure Gradients Provide a Reliable Assessment of Left Ventricular Systolic Chamber Function. Circulation 112: 1771-1779 [Abstract] [Full text]  
  • Jenkins, C., Bricknell, K., Hanekom, L., Marwick, T. H. (2004). Reproducibility and accuracy of echocardiographic measurements of left ventricular parameters using real-time three-dimensional echocardiography. J Am Coll Cardiol 44: 878-886 [Abstract] [Full text]  
  • Burdjalov, V. F., Baumgart, S., Spitzer, A. R. (2003). Cerebral Function Monitoring: A New Scoring System for the Evaluation of Brain Maturation in Neonates. Pediatrics 112: 855-861 [Abstract] [Full text]  
  • Fedorcsak, P., Dale, P. O., Storeng, R., Tanbo, T., Abyholm, T. (2001). The impact of obesity and insulin resistance on the outcome of IVF or ICSI in women with polycystic ovarian syndrome. Hum Reprod 16: 1086-1091 [Abstract] [Full text]  
  • Jialal, I., Stein, D., Balis, D., Grundy, S. M., Adams-Huet, B., Devaraj, S. (2001). Effect of Hydroxymethyl Glutaryl Coenzyme A Reductase Inhibitor Therapy on High Sensitive C-Reactive Protein Levels. Circulation 103: 1933-1935 [Abstract] [Full text]  
  • Duffield, A. J, Thomson, C. D, Hill, K. E, Williams, S. (1999). An estimation of selenium requirements for New Zealanders. Am. J. Clin. Nutr. 70: 896-903 [Abstract] [Full text]  
  • Booth, S. L, O'Brien-Morse, M. E, Dallal, G. E, Davidson, K. W, Gundberg, C. M (1999). Response of vitamin K status to different intakes and sources of phylloquinone-rich foods: comparison of younger and older adults. Am. J. Clin. Nutr. 70: 368-377 [Abstract] [Full text]  
  • Morrish, P K, Rakshi, J S, Bailey, D L, Sawle, G V, Brooks, D J (1998). Measuring the rate of progression and estimating the preclinical period of Parkinson's disease with [18F]dopa PET. J. Neurol. Neurosurg. Psychiatry 64: 314-319 [Abstract] [Full text]  
  • Altman, D. G, Bland, J M. (1997). Statistics Notes: Units of analysis. BMJ 314: 1874-1874 [Full text]  

Online poll
Find out more

Rapid responses for this article

There are no rapid responses for this article.


Student BMJ

Risk of surgery for inflammatory bowel disease: record linkage studies

What can you learn from this BMJ paper? Read Leanne Tite's Paper+

www.student.bmj.com

Listen to the latest BMJ Interview