BMJ 1996;313:41-42 (6 July)

Education and debate

Statistics Notes: Measurement error and correlation coefficients

J Martin Bland, professor of medical statistics,a Douglas G Altman, head b

a Department of Public Health Sciences, St George's Hospital Medical School, London SW17 0RE, b ICRF Medical Statistics Group, Centre for Statistics in Medicine, Institute of Health Sciences, PO Box 777, Oxford OX3 7LF

Correspondence to: Professor Bland.

Measurement error is the variation between measurements of the same quantity on the same individual.1 To quantify measurement error we need repeated measurements on several subjects. We have discussed the within-subject standard deviation as an index of measurement error,1 which we like as it has a simple clinical interpretation. Here we consider the use of correlation coefficients to quantify measurement error.

A common design for the investigation of measurement error is to take pairs of measurements on a group of subjects, as in table 1. When we have pairs of observations it is natural to plot one measurement against the other. The resulting scatter diagram (see figure 1) may tempt us to calculate a correlation coefficient between the first and second measurement. There are difficulties in interpreting this correlation coefficient. In general, the correlation between repeated measurements will depend on the variability between subjects. Samples containing subjects who differ greatly will produce larger correlation coefficients than will samples containing similar subjects. For example, suppose we split this group in whom we have measured forced expiratory volume in one second (FEV1) into two subsamples, the first 10 subjects and the second 10 subjects. As table 1 is ordered by the first FEV1 measurement, both subsamples vary less than does the whole sample. The correlation for the first subsample is r = 0.63 and for the second it is r = 0.31, both less than r = 0.77 for the full sample. The correlation coefficient thus depends on the way the sample is chosen, and it has meaning only for the population from which the study subjects can be regarded as a random sample. If we select subjects to give a wide range of the measurement, the natural approach when investigating measurement error, this will inflate the correlation coefficient.


Table 1--Pairs of measurements of FEV1 (litres) a few weeks apart
from 20 Scottish schoolchildren, taken from a larger study (D Strachan,
personal communication)
---------------------------------------------------------------------------
                 Measurement                         Measurement
Subject-------------------------------Subject------------------------------
No              1st       2nd         No              1st        2nd
---------------------------------------------------------------------------
 1             1.19       1.37        11             1.54        1.57
 2             1.33       1.32        12             1.59        1.60
 3             1.35       1.40        13             1.61        1.53
 4             1.36       1.25        14             1.61        1.61
 5             1.38       1.29        15             1.62        1.68
 6             1.38       1.37        16             1.78        1.76
 7             1.38       1.40        17             1.80        1.82
 8             1.40       1.38        18             1.85        1.89
 9             1.43       1.38        19             1.94        2.10
10             1.43       1.51        20             2.10        2.20



View larger version (15K):
[in this window]
[in a new window]
 
Fig 1--Measurements from pairs of observations plotted against each other

The correlation coefficient between repeated measurements is often called the reliability of the measurement method. It is widely used in the validation of psychological measures such as scales of anxiety and depression, where it is known as the test-retest reliability. In such studies it is quoted for different populations (university students, psychiatric outpatients, etc) because the correlation coefficient differs between them as a result of differing ranges of the quantity being measured. The user has to select the correlation from the study population most like the user's own.

Another problem with the use of the correlation coefficient between the first and second measurements is that there is no reason to suppose that their order is important. If the order were important the measurements would not be repeated observations of the same thing. We could reverse the order of any of the pairs and get a slightly different value of the correlation coefficient between repeated measurements. For example, reversing the order of the even numbered subjects in table 1 gives r = 0.80 instead of r = 0.77. The intra-class correlation coefficient avoids this problem. It estimates the average correlation among all possible orderings of pairs. It also extends easily to the case of more than two observations per subject, where it estimates the average correlation between all possible pairs of observations.

Few computer programs will calculate the intra-class correlation coefficient directly, but when the number of observations is the same for each subject it can be found from a one way analysis of variance table2 such as table 2. We need the total sum of squares, SST, and the sum of squares between subjects, SSB.

Then

rI = mSSB - SST/(m - 1) SST

where m is the number of observations per subject. For table II, m = 2 and

rI = 2 x 1.52981 - 1.74651/(2 - 1) x 1.74651 = 0.75


Table 2--One way analysis of variance for the data in table 1
-----------------------------------------------------------------------------------------------
                           Degrees of     Sum of        Mean        Variance      Probability
Source of variation         freedom      squares       square       ratio (F)        (P)
-----------------------------------------------------------------------------------------------
Children                       19         1.52981      0.08052       7.4          <0.0001
Residual                       20         0.21670      0.01086
-----------------------------------------------------------------------------------------------
Total                          39         1.74651

In practice, there will usually be little difference between r and rI for true repeated measurements. If, however, there is a systematic change from the first measurement to the second, as might be caused by a learning effect, rI will be much less than r. If there was such an effect the measurements would not be made under the same conditions and so we could not measure reliability.

The correlation coefficient can be used to compare measurements of different quantities, such as different scales for measuring anxiety. We could make repeated measurements of all the quantities on the same subjects and calculate intra-class correlations. The measures with the highest correlation between repeated measurements would discriminate best between individuals; in other words they would carry the most information. For most applications, however, we prefer the within-subjects standard deviation as an index of measurement error, as it has a more direct interpretation which can be applied to individual measurements.1

  1. Bland JM, Altman DG. Measurement error. BMJ 1996;312:1654. [Free Full Text]
  2. Altman DG, Bland JM. Comparing several groups using a analysis of variance BMJ 1996;312:1472-3.

This article has been cited by other articles:

  • Alcazar, J. L., Rodriguez, D., Royo, P., Galvan, R., Ajossa, S., Guerriero, S. (2008). Intraobserver and Interobserver Reproducibility of 3-Dimensional Power Doppler Vascular Indices in Assessment of Solid and Cystic-Solid Adnexal Masses. J Ultrasound Med 27: 1-6 [Abstract] [Full text]  
  • Peterson, R. C., Wolffsohn, J. S. (2007). Sensitivity and reliability of objective image analysis compared to subjective grading of bulbar hyperaemia. Br. J. Ophthalmol. 91: 1464-1466 [Abstract] [Full text]  
  • Kas, A., Payoux, P., Habert, M.-O., Malek, Z., Cointepas, Y., El Fakhri, G., Chaumet-Riffaud, P., Itti, E., Remy, P. (2007). Validation of a Standardized Normalization Template for Statistical Parametric Mapping Analysis of 123I-FP-CIT Images. JNM 48: 1459-1467 [Abstract] [Full text]  
  • Lagreze, W. A., Lazzaro, A., Weigel, M., Hansen, H.-C., Hennig, J., Bley, T. A. (2007). Morphometry of the Retrobulbar Human Optic Nerve: Comparison between Conventional Sonography and Ultrafast Magnetic Resonance Sequences. IOVS 48: 1913-1917 [Abstract] [Full text]  
  • Morgan, A. J, Hosking, S. L (2007). Non-invasive vascular impedance measures demonstrate ocular vasoconstriction during isometric exercise. Br. J. Ophthalmol. 91: 385-390 [Abstract] [Full text]  
  • Montaudon, M., Berger, P., de Dietrich, G., Braquelaire, A., Marthan, R., Tunon-de-Lara, J. M., Laurent, F. (2006). Assessment of Airways with Three-dimensional Quantitative Thin-Section CT: In Vitro and in Vivo Validation. Radiology 0: 2422060029-0 [Abstract] [Full text]  
  • Whittaker, K. A., Cowley, S. (2006). Evaluating health visitor parenting support: validating outcome measures for parental self-efficacy.. J Child Health Care 10: 296-308 [Abstract]  
  • Harris, M. L., Hobson, A. R., Hamdy, S., Thompson, D. G., Akkermans, L. M., Aziz, Q. (2006). Neurophysiological evaluation of healthy human anorectal sensation. Am. J. Physiol. Gastrointest. Liver Physiol. 291: G950-G958 [Abstract] [Full text]  
  • Haverkamp, D., Sierevelt, I. N., Breugem, S. J. M., Lohuis, K., Blankevoort, L., van Dijk, C. N. (2006). Translation and Validation of the Dutch Version of the International Knee Documentation Committee Subjective Knee Form. Am J Sports Med 34: 1680-1684 [Abstract] [Full text]  
  • Muller, M.J., Mazanek, M., Weibrich, C., Dellani, P.R., Stoeter, P., Fellgiebel, A. (2006). Distribution characteristics, reproducibility, and precision of region of interest-based hippocampal diffusion tensor imaging measures.. Am. J. Neuroradiol. 27: 440-446 [Abstract] [Full text]  
  • Armstrong, J. J., Leigh, M. S., Sampson, D. D., Walsh, J. H., Hillman, D. R., Eastwood, P. R. (2006). Quantitative Upper Airway Imaging with Anatomic Optical Coherence Tomography. Am. J. Respir. Crit. Care Med. 173: 226-233 [Abstract] [Full text]  
  • Marcus, G. M., Rose, E., Viloria, E. M., Schafer, J., De Marco, T., Saxon, L. A., Foster, E., for the VENTAK CHF/CONTAK-CD Biventricular Pacing, (2005). Septal to Posterior Wall Motion Delay Fails to Predict Reverse Remodeling or Clinical Improvement in Patients Undergoing Cardiac Resynchronization Therapy. J Am Coll Cardiol 46: 2208-2214 [Abstract] [Full text]  
  • Moreno-Montanes, J., Alvarez, A., Maldonado, M. J. (2005). Objective Quantification of Posterior Capsule Opacification after Cataract Surgery, with Optical Coherence Tomography. IOVS 46: 3999-4006 [Abstract] [Full text]  
  • Emery, C. A, Cassidy, J D., Klassen, T. P, Rosychuk, R. J, Rowe, B. H (2005). Development of a Clinical Static and Dynamic Standing Balance Measurement Tool Appropriate for Use in Adolescents. ptjournal 85: 502-514 [Abstract] [Full text]  
  • Lasky, R. E., Luck, M. L., Parikh, N. A., Laughlin, N. K. (2005). The Effects of Early Lead Exposure on the Brains of Adult Rhesus Monkeys: A Volumetric MRI Study. Toxicol Sci 85: 963-975 [Abstract] [Full text]  
  • Berger, P., Perot, V., Desbarats, P., Tunon-de-Lara, J. M., Marthan, R., Laurent, F. (2005). Airway Wall Thickness in Cigarette Smokers: Quantitative Thin-Section CT Assessment. Radiology 235: 1055-1064 [Abstract] [Full text]  
  • Hickman, S. J., Wheeler-Kingshott, C. A. M., Jones, S. J., Miszkiel, K. A., Barker, G. J., Plant, G. T., Miller, D. H. (2005). Optic Nerve Diffusion Measurement from Diffusion-Weighted Imaging in Optic Neuritis. Am. J. Neuroradiol. 26: 951-956 [Abstract] [Full text]  
  • KELLY, F. J., LEE, R., MUDWAY, I. S. (2004). Inter- and Intra-Individual Vitamin E Uptake in Healthy Subjects Is Highly Repeatable across a Wide Supplementation Dose Range. Ann. N. Y. Acad. Sci. 1031: 22-39 [Abstract] [Full text]  
  • Palange, P., Valli, G., Onorati, P., Antonucci, R., Paoletti, P., Rosato, A., Manfredi, F., Serra, P. (2004). Effect of heliox on lung dynamic hyperinflation, dyspnea, and exercise endurance capacity in COPD patients. J. Appl. Physiol. 97: 1637-1642 [Abstract] [Full text]  
  • Hickman, S. J., Toosy, A. T., Jones, S. J., Altmann, D. R., Miszkiel, K. A., MacManus, D. G., Barker, G. J., Plant, G. T., Thompson, A. J., Miller, D. H. (2004). A serial MRI study following optic nerve mean area in acute optic neuritis. Brain 127: 2498-2505 [Abstract] [Full text]  
  • Silbert, B. S., Maruff, P., Evered, L. A., Scott, D. A., Kalpokas, M., Martin, K. J., Lewis, M. S., Myles, P. S. (2004). Detection of cognitive decline after coronary surgery: a comparison of computerized and conventional tests. Br J Anaesth 92: 814-820 [Abstract] [Full text]  
  • Merialdi, M., Caulfield, L. E, Zavaleta, N., Figueroa, A., Costigan, K. A, Dominici, F., Dipietro, J. A (2004). Randomized controlled trial of prenatal zinc supplementation and fetal bone growth. Am. J. Clin. Nutr. 79: 826-830 [Abstract] [Full text]  
  • Bot, S D M, Terwee, C B, van der Windt, D A W M, Bouter, L M, Dekker, J, de Vet, H C W (2004). Clinimetric evaluation of shoulder disability questionnaires: a systematic review of the literature. Ann Rheum Dis 63: 335-341 [Abstract] [Full text]  
  • Hickman, S. J., Toosy, A. T., Jones, S. J., Altmann, D. R., Miszkiel, K. A., MacManus, D. G., Barker, G. J., Plant, G. T., Thompson, A. J., Miller, D. H. (2004). Serial magnetization transfer imaging in acute optic neuritis. Brain 127: 692-700 [Abstract] [Full text]  
  • Kao, P. N., Faul, J. L. (2003). Emerging therapies for pulmonary hypertension: Striving for efficacy and safety. J Am Coll Cardiol 41: 2126-2129 [Full text]  
  • Brey, E. M., Lalani, Z., Johnston, C., Wong, M., McIntire, L. V., Duke, P. J., Patrick, C. W. Jr. (2003). Automated Selection of DAB-labeled Tissue for Immunohistochemical Quantification. J. Histochem. Cytochem. 51: 575-584 [Abstract] [Full text]  
  • Pittock, S. J., Meldrum, D., Hardiman, O. (2002). The "Spray Can" Sign: Validation of a Clinical Observation in Chronic Inflammatory Demyelinating Polyneuropathy. Arch Neurol 59: 1637-1640 [Abstract] [Full text]  
  • Simpson, J. M, Valentine, J., Worsfold, C. (2002). The Standardized Three-metre Walking Test for elderly people (WALK3m): repeatability and real change. Clin Rehabil 16: 843-850 [Abstract]  
  • Bruner, L H, Carr, G J, Harbell, J W, Curren, R D (2002). An investigation of new toxicity test method performance in validation studies: 2. comparison of three measures of toxicity test performance. Hum Exp Toxicol 21: 313-323 [Abstract]  
  • Halligan, S (2002). Reproducibility, repeatability, correlation and measurement error. Br. J. Radiol. 75: 193-194 [Full text]  
  • Stone, B. D., Elias-Todd, T., Parrino, J., Ward, C., Walters, E. H., Faul, J. L., Burke, C. M., Poulter, L. W. (2001). EG-1 POSITIVE EOSINOPHILS IN ASTHMA. Am. J. Respir. Crit. Care Med. 164: 171a-172 [Full text]  
  • Talvik, M., Nordstrom, A.-L., Nyberg, S., Olsson, H., Halldin, C., Farde, L. (2001). No Support for Regional Selectivity in Clozapine-Treated Patients: A PET Study With [11C]Raclopride and [11C]FLB 457. Am. J. Psychiatry 158: 926-930 [Abstract] [Full text]  
  • Nirmalan, M., Willard, T., Columb, M. O., Nightingale, P. (2001). Effect of changes in arterial-mixed venous oxygen content difference (C(a-{v})O2) on indices of pulmonary oxygen transfer in a model ARDS lung{{dagger}},{{dagger}}{{dagger}}. Br J Anaesth 86: 477-485 [Abstract] [Full text]  
  • FAUL, J. L., DEMERS, E. A., BURKE, C. M., POULTER, L. W. (1999). The Reproducibility of Repeat Measures of Airway Inflammation in Stable Atopic Asthma. Am. J. Respir. Crit. Care Med. 160: 1457-1461 [Abstract] [Full text]  
  • SALOME, C. M., ROBERTS, A. M., BROWN, N. J., DERMAND, J., MARKS, G. B., WOOLCOCK, A. J. (1999). Exhaled Nitric Oxide Measurements in a Population Sample of Young Adults. Am. J. Respir. Crit. Care Med. 159: 911-916 [Abstract] [Full text]  
  • Bland, J M., Altman, D. G (1996). Statistics Notes: Measurement error proportional to the mean. BMJ 313: 106-106 [Full text]  

Rapid Responses:

Read all Rapid Responses

Is the formula for the intraclass correlation coefficient correct?
Peter Schuck
bmj.com, 1 Sep 2000 [Full text]
precedence rules in formula
Michael D McStephen
bmj.com, 3 May 2002 [Full text]
Re: precedence rules in formula
J Martin Bland
bmj.com, 3 May 2002 [Full text]



Student BMJ

Risk of surgery for inflammatory bowel disease: record linkage studies

What can you learn from this BMJ paper? Read Leanne Tite's Paper+

www.student.bmj.com

Listen to the latest BMJ Interview