Your results may vary: the imprecision of medical measurements
BMJ 2020; 368 doi: https://doi.org/10.1136/bmj.m149 (Published 20 February 2020) Cite this as: BMJ 2020;368:m149
All rapid responses
Rapid responses are electronic comments to the editor. They enable our users to debate issues raised in articles published on bmj.com. A rapid response is first posted online. If you need the URL (web address) of an individual response, simply click on the response headline and copy the URL from the browser window. A proportion of responses will, after editing, be published online and in the print journal as letters, which are indexed in PubMed. Rapid responses are not indexed in PubMed and they are not journal articles. The BMJ reserves the right to remove responses which are being wilfully misrepresented as published articles or when it is brought to our attention that a response spreads misinformation.
From March 2022, the word limit for rapid responses will be 600 words not including references and author details. We will no longer post responses that exceed this limit.
The word limit for letters selected from posted responses remains 300 words.
Dear Editor
Don’t blame the tools
Lord Byron’s paraphrased proverb “Good workmen never quarrel with their tools” aptly summarises the art of interpreting test results.[1].
If there is a perfect lab test for a given condition, the patients can cut out the intermediary and go straight to the lab for a direct-to-consumer diagnostic test.
In practice, although Laboratory tests might be imperfect, they complement other clinical information. Laboratory test results always need to be interpreted in context of patient symptoms, medical history, clinical findings and radiology investigations.[2].
For example, PSA (Prostate Specific Antigen) is used for screening of Prostate cancer but PSA levels needs to be interpreted with other clinical information.[3].[4].
A low normal PSA level of 2.0 does not rule out prostate cancer if the digital rectal examination of prostate is abnormal.
An abnormally high PSA level of 100 is not of concern if patient had bladder retention and prostate biopsy has shown inflammation only.
A PSA level of 0.5 is not a good outcome in a patient who had radical prostatectomy but a PSA of 0.5 is an excellent outcome in a patient who had prostate radiation.
Even a 50% error rate in PSA levels in the above clinical scenarios is of minimal clinical consequence compared to the medical history of patients.
The busy clinicians already have a plethora of laboratory tests to contend with and adding complexity to existing test results should be done only if high quality evidence suggests improvement in clinical care.[5].
References
1 Lord Byron. Don Juan (Canto 1 Stanza 201). http://www.gutenberg.org/files/21700/21700-h/21700-h.htm#2H_4_0002 (accessed 28 Feb 2020).
2 Sundaram V, Rothnie K, Bloom C, et al. Impact of comorbidities on peak troponin levels and mortality in acute myocardial infarction. Heart Published Online First: 26 February 2020. doi:10.1136/heartjnl-2019-315844
3 Palsdottir T, Nordstrom T, Karlsson A, et al. The impact of different prostate-specific antigen (PSA) testing intervals on Gleason score at diagnosis and the risk of experiencing false-positive biopsy recommendations: a population-based cohort study. BMJ Open 2019;9:e027958. doi:10.1136/bmjopen-2018-027958
4 Tikkinen KAO, Dahm P, Lytvyn L, et al. Prostate cancer screening with prostate-specific antigen (PSA) test: a clinical practice guideline. BMJ 2018;362. doi:10.1136/bmj.k3581
5 McCormack JP, Holmes DT. Your results may vary: the imprecision of medical measurements. BMJ 2020;368. doi:10.1136/bmj.m149
Competing interests: No competing interests
One of the major problems with our imprecise diagnostic tests is the assumption that disease is either present or absent using man-made standards and interpretation of results.
Diagnostic testing that requires the interpretation of whether disease is present or absent, reflects a fundamental problem with the yes/no testing approach used to determine if someone has cancer, heart disease or some other problem (sensitivity) or doesn’t (specificity); and just as importantly whether our treatment for the patient is working or not [1].
For too long physicians and insurance companies have looked at people as either having a disease or not. Medicine is after all the practice of healthcare, not disease care. The entire approach leads to problems with coverage of pre-existing conditions, as well as the consequence of dealing with failures of that approach, when those told they do not have a health problem, later die from it [2,3].
Qualitative and semi-quantitative methods of looking for disease, excludes the understanding that changes in the health of a patient occurs across a continuum – a “Health-Spectrum” - and that patients do not suddenly wake up one day with cancer, heart disease, or some other health problem [4].
The ability to quantitatively measure changes in the Health-Spectrum of a patient is more accurate, consistent and reproducible; and it allows us to measure actual treatment results without guessing [5,6] - thereby decreasing healthcare costs and time while saving lives.
These measurements are only possible when the calibration of the equipment we use is made against a known standard, which does not flux – but is constant [5,6]. Only by using such constants, will these measurements be accurate, consistent and reproducible – across the Health-Spectrum of the patient – eliminating the “results may vary” error introduced using man-made standards and interpretation [6].
References:
1. Fleming RM, Fleming MR, Chaudhuri TK. Are we prescribing the right diets and drugs for CAD, T2D, Cancer and Obesity? Int J Nuclear Med Radioactive Subs 2019;2(2):000115.
2. https://www.areyoudense.org/
3. https://www.youtube.com/watch?v=RTHEtRtiB3k
4. Fleming RM, Fleming MR. The Importance of Thinking about and Quantifying Disease like Cancer and Heart Disease on a “Health-Spectrum” Continuum. J Compr Cancer Rep 2019;3(1):1-3 (Article ID 100011).
5. Fleming RM, Fleming MR, Chaudhuri TK, McKusick A. FMTVDM Quantitative Imaging Replaces Current Qualitative Imaging for Coronary Artery Disease and Cancer, Increasing Diagnostic Accuracy and Providing Patient- Specific, Patient-Directed Treatment. Cardio Open 2019;4(3).
6. Fleming RM, Fleming MR, Dooley WC, Chaudhuri TK. Invited Editorial. The Importance of Differentiating Between Qualitative, Semi-Quantitative and Quantitative Imaging – Close Only Counts in Horseshoes. Eur J Nucl Med Mol Imaging. DOI:10.1007/s00259-019-04668-y. Published online 17 January 2020 https://link.springer.com/article/10.1007/s00259-019-04668-y
Competing interests: FMTVDM issued to first author.
Dear Editor
We have been told to look at Diagnostic Tests in terms of Sensitivity and Specificity, and Positive (or Negative) Predictive Value. The framework here is Bayesian.
But this article, which seems very interesting, mentions none of the above. It points out the difference between analytical variation, in the laboratory, and biological variation, in the patient. The latter is hard to deal with, and so is the interaction between the two (which, from the article, appears to follow a Fisherian sum of squares model).
What we require is to mesh the maths and the medicine, as this article did not quite do enough. Not relating the biological/ analytical division to the old Sensitivity/ Specificity cleavage was a failing, for example.
The article did not mention the mathematical process of 'regression to the mean', whereby patients distant from the mean at a baseline measurement may , with the next measurement, approach the mean.
Also, there can be problems at time with multiple measurements linked to the level of significance, though I am not quite sure of the mathematical context here. Is there no one out there to address this issue of the significance level lowering when multiple measurements are taken?
Competing interests: No competing interests
Dear Editor
Measurement variation is profoundly important in clinical medicine and its implications largely overlooked.
A number of us in in University of Birmingham have an interest in measurement variation and its clinical implications - which include misdiagnosis and incorrect inferences about the effects of treatment during monitoring leading to incorrect clinical decisions. There are some situations where chance variation in measured parameters may be the main influence on clinical decisions.
In response to the observation that there is little medical student teaching on measurement variation I would like to point out that I do teach Graduate Entry Students in University of Birmingham about measurement variation. I use a simple exercise with a hypothetical patient to illustrate the effects of variation. This can easily be reproduced and I would be happy to share the teaching materials.
Competing interests: No competing interests
McCormack & Holmes (BMJ 2020;368:m149) provide an important lesson in data interpretation. One cause of misinterpretation is that inaccurate results are presented to an impossibly high degree of precision. The illustration shows the relative weights of 1, 10 and 100, and the small difference made by one in the second or third place of decimals.
x
x x
x x
x x
x x
x x
x x
x x
x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x x
x x x x x x
x x x x x x
x x x x x x
x x x x x x
x x x x x x
x x x x x x
x x x x x x
x x x x x x
x x x x x x
x x x x x x x
111 101 11 1
110 100 10
To avoid spurious precision, it would be better to round results, so they have a precision of the same order as the reference change value. For example, my alkaline phosphatase activity is reported as 64 units/L. A truer report would quote my alkaline phosphatase activity as 60 units/L.
Robin Ferner
Honorary Professor of Clinical Pharmacology
r.e.ferner@bham.ac.uk
Competing interests: No competing interests
In view of the papers cited in this article can give the impression that the “reference change value” (RCV) is a new concept. However, the statistical model that supports it was introduced in the 70s of last century by E. K. Harris and collaborators [1] and it is noteworthy that, more than 40 years later, such an interesting parameter has not yet moved into the routine interpretation of laboratory results.
Although practical difficulties have been adduced, that there are, to present this information properly to the clinician, the reason could really be in the lack of confidence in the robustness of this parameter.
To enter this information in the laboratory report, the laboratory professional must start by selecting the value for the biologic variation (CVi) that best matches their patient population among those published, which sometimes differ substantially, or assume that the median of all of them or the value reported in a meta-analysis represents them.
Next it should be assumed that this CVi is the same for all patients, which is generally not true. This means that patients with intra-individual variability lower than the value set for CVi will have a high risk that a significant change will go unnoticed, while patients with greater variability will have a high risk of detecting changes that have not taken place.
Finally, it should be assumed that the test results compared are independent. Although this may be considered true for many magnitudes if the time elapsed between determinations is sufficiently long, the results tend to be correlated when sampling is frequent as occurs with hospital patients. The existence of correlation implies that the probability of detecting a significant change will depend on the time elapsed between determinations.
Taking these conditions into account, the laboratory must assess whether the estimated RCV for a test is sufficiently reliable and can be safely applied in the patients investigated before flagging a result as different from the previous one in the laboratory report. The concept of the value of the reference change is attractive and can even be used as an instructive diversion but it can pose a risk to patients if it is not used rigorously.
Reference
1. Harris EK, Brown SS. Temporal changes in the concentrations of serum constituents in healthy men: Distributions of within-person variances and their relevance to the interpretation of differences between successive measurements. Annals of Clinical Biochemistry, 1979:16, 169–76
Competing interests: No competing interests
McCormack and Holmes article clearly demonstrates how far is the practice of medicine from an exact science, how much is still an art, quite dependent on the performer.
It is not just the medical school teaching of not treating the test but the patient, but the fact the numeric value of the test, even repeated, could easily fail to demonstrate a trend -that needs action- and simply show physiological variation.
Furthermore, the other “half“ of the patient record, the non-numeric data, is even worse. Clinicians are infrequent coders [1] and electronic health records are abundant on free text that fails to provide a mechanism for processing and comparing data, for analysing trends, for auditing, for implementation of supportive diagnostic tools.
Nowadays defensive medicine is becoming a process of collecting data, much of it unnecessary and badly stored in an oversized shared electronic health record. The initial product of this information management is the dismissal of most of the data available and the collection of more facts. The final result is not knowledge but confusion. Too much talk about improving outcomes, about population health management, but the lack of adequate data, the misinterpretation of information, only leads to ignorance.
Medical professionals are failing to manage the vast amount of patient data available; they are failing to use adequately the information at their disposal, they are failing to get the knowledge that could change their diagnosis and management plans, they are failing to acquire the wisdom that could make medical practice safer. In consequence they are preparing to fail to the patient, failing to improve their final outcome.
Information is power, and as medicine is failing to control its vast collection of data, it is rather becoming powerless, incapable of progressing, of harnessing the benefits other sciences like informatics can provide.
References
1. Liaw, S. T., Chen, H. Y., Maneze, D., Taggart, J., Dennis, S., Vagholkar, S., & Bunker, J. Health reform: is routinely collected electronic information fit for purpose? Emergency Medicine Australasia, 2012; vol. 24(1), 57-63.
Competing interests: No competing interests
Dear Editor,
As a senior medical student, your article made me reflect on the lack of undergraduate teaching around understanding and communicating biological variability in relation to medical test measurements.
It strikes me that medical schools could have a key role in preparing newly qualified doctors to more objectively interpret test results. As McCormack and Holmes say, biological variability cannot be fixed, but an understanding of its effect can help us make informed decisions. By incorporating a taught session into clinical training, whole cohorts of newly qualified doctors would have a familiar framework to help judge how meaningful certain test results may be.
Junior doctors may also be rightly placed to help spread awareness of the issue within medical teams. Junior doctors are the most frequent people to order and follow up patient tests in the hospital setting, supported by seniors (Ericksson et al., 2018). The surrounding discussions within teams may be an ideal opportunity to raise the issue of test reference change values. It might help to rationalise the significance of certain results and more critically consider whether further testing is always required. Given current pressures on laboratory overuse, this would surely be a welcome change (Zhi et al., 2013).
I think targeting such teaching to medical school communication skills sessions could be effective. In my centre we rehearse and receive feedback on simulated patient interactions over a variety of set scenarios. There are existing initiatives to prepare medical students to communicate with ‘e-patients’ (patients who have access to online information) by adding modules or new scenarios to existing curricula (Herrmann-Werner et al., 2019). It would be feasible to extend a task and practice discussing uncertainty or insignificance in test results.
By adequately preparing medical students to interpret and communicate uncertainty in results, we can expect better care as they progress to be junior doctors: more confident test interpretation, more appropriate test ordering and more effective patient interactions.
References
Ericksson, W., Bothe, J., Cheung, H., Zhang, K., and Kelly, S. (2018). Factors leading to overutilisation of hospital pathology testing: the junior doctor’s perspective. Aust. Health Rev. 42, 374.
Herrmann-Werner, A., Weber, H., Loda, T., Keifenheim, K.E., Erschens, R., Mölbert, S.C., Nikendei, C., Zipfel, S., and Masters, K. (2019). “But Dr Google said…” – Training medical students how to communicate with E-patients. Med. Teach. 41, 1434–1440.
Zhi, M., Ding, E.L., Theisen-Toupal, J., Whelan, J., and Arnaout, R. (2013). The Landscape of Inappropriate Laboratory Testing: A 15-Year Meta-Analysis. PLoS ONE 8, e78962.
Competing interests: No competing interests
Dear Editor
We read the article with great interest, Its a very interesting diagnostic and predictive tool. We would like to raise few important points.
The variability in measurement may be not just due to biological and analytical methods, Its starts at much more basic sub-cellular level, There can be inter and intraobserver variability which cab ne genetic, environmental, epigenetic, iatrogenic, Poor equipment, measurement bias, Poor sampling technique, faulty equipment.
Confidence interval might not be the best way to drill down the statistical aspects.
Even the timing of measurement of sample and the space where they are measured can lead to significant differences.
We would suggest atleast Intarclass correlation coefficent if not bland altman plot for measuring the variability in a meaningful way.
Competing interests: No competing interests
Using reference change values (RCV) to assess changes in analyte concentrations – not as easy as it looks
McCormack and Holmes have developed an application to make it easier for medical practitioners and patients to assess changes in analyte concentrations when monitoring the same individual over time, an initiative we applaud. This is achieved by providing reference change values (RCV) which are based on estimates of biological variation data and analytical variation, assuming that pre-analytical variation is negligible. The biological variation data used as basis for this application are mostly taken from the EFLM Biological Variation Database (1), which delivers real-time updated biological variation data for numerous analytes, resulting from systematic reviews and appraisal of published biological variation studies by the Biological Variation Data Critical Appraisal Checklist (BIVAC) (2). However, we observe for some analytes inconsistencies between the estimates reported by McCormack and Holmes and those published in the EFLM Biological Variation Database, as exemplified for HDL cholesterol, at 7.5% vs 5.7%, respectively.
Furthermore, there are two additional caveats that must be taken into account when using this application. Firstly, an inappropriate formulae for RCV is applied. Estimates of analytical and biological variation can be quantified as standard variations (SD) or as coefficient of variation (CV). The concept of the RCV was introduced by Harris and Yasaka (3), calculated as: RCV(SD)=Z*√(2)*√(SDA^2 * SDI^2). This formulae is only applicable to SDs, i.e. changes in units, with a presupposition that the expected change between samples follows a normal distribution. However, the difference in percent between normally distributed variables is not a normally distributed variable itself (4). The percent difference between measurements M1 and M2 is defined as (M2-M1)/M1=M2/M1 -1, and if M1 and M2 are normally distributed, M2/M1 is not. Thus, when using analytical (CVA) and within-subject biological variation (CVI) estimates quantified as CVs, the following RCV(CV) formulae should be applied:
SDA^2 = ln(CVA^2 +1)
SDI^2 = ln(CVI^2 +1)
RCV(CV)=100%*(exp(±Z*√2*√(SDA^2 +SDI^2) )-1)
To exemplify, using triglycerides for which McCormack and Holme in their application report:
RCV(SD)=±100%*1.64*√(2)*√(0.025^2 * 0.205 ^2)=±47.9%,
based on the biological and analytical variation estimates of 20.5% and 2.5%, respectively, at a 95% confidence level. If applying these in a RCVCV formulae as is appropriate, we get the following:
SDA^2 = ln(CVA^2 +1)=ln(0.025^2 +1)=0.0006248048
SDI^2 = ln(CVI^2 +1)=ln(0.205^2 +1)=0.04116594
RCV(CV)=100%*(exp(±1.64*√2*√(0.0006248048 +0.04116594))-1) = (-37.8%, 60.7%)
Thus, when using these RCVs to monitor a patient, there is clearly a consequence of applying incorrectly or correctly calculated RCVs. The consequence of using CV-based estimates in a RCV(SD) formulae is the greatest when analytical or biological estimates are high, ie. over 10% if based on rough estimates.
Secondly, the calculator as a standard approach applies a two-sided Z-score. It is often assumed that a Z-score of 1.96 for P <0.05 (and sometimes also 2.58 for P <0.01) is appropriate for any clinical scenario. However, these Z-scores are bidirectional (or two-tailed or two-sided), and this infers that the difference between the two serial results can be either an increase or a decrease. However, in many clinical situations, the decision-making is the assessment of a significant fall (for example, reduction of HbA1c after treatment for diabetes mellitus), or a significant rise (for example, an increase in serum troponin after acute chest pain). Thus, unidirectional (one-tailed or one-sided) Z-scores must be used in such clinical situations to facilitate correct interpretation; these are 1.64 for P <0.05 and 2.33 for P <0.01. Correct definitions of the clinical decision-making context and the major differences between the term “change” and “rise or fall,” and their synonyms, are required for correct calculation of appropriate RCV. In addition, clinical decisions are often taken using probabilities less than 95% and the chosen Z scores therefore must reflect this.
It is also important to note that RCV only address how likely it is that a certain change can be explained by analytical and biological variation, but not the probability that a real change in the patient´s health state has occurred. It has been suggested that a better tool for understanding and interpretation of measured differences in monitoring is needed and that the concepts of sensitivity, specificity, likelihood ratios, and odds used for diagnostic test evaluations should also be applied to monitoring by substituting measured concentrations with measured differences (5).
References
1. Aarsand AK, Fernandez-Calle P, Webster C, Coskun A, Gonzales-Lao E, Diaz-Garzon J, Jonker N, Minchinela J, Simon M, Braga F, Perich C, Boned B, Roraas T, Marques-Garcia F, Carobene A, Aslan B, Barlett WA, Sandberg S. The EFLM Biological Variation Database. https://biologicalvariation.eu/. Accessed March 2020.
2. Aarsand AK, Røraas T, Fernandez-Calle P, Ricos C, Díaz-Garzón J, Jonker N, Perich C, González-Lao E, Carobene A, Minchinela J, Coşkun A, Simón M, Álvarez V, Bartlett WA, Fernández-Fernández P, Boned B, Braga F, Corte Z, Aslan B, Sandberg S. The Biological Variation Data Critical Appraisal Checklist: A Standard for Evaluating Studies on Biological Variation. Clin Chem 2018; 64:501-514
3. Harris EK, Yasaka T. On the calculation of a “reference change” for comparing two consecutive measurements. Clinical Chemistry. 1983;29:25–30.
4. Fokkema MR, Herrmann Z, Muskiet F a J, Moecks J. Reference change values for brain natriuretic peptides revisited. Clinical Chemistry. 2006;52:1602–3.
5. Petersen PH, Sandberg S, Iglesias N, et al. ‘Likelihood-ratio’ and ‘odds’ applied to monitoring of patients as a supplement to ‘reference change value’ (RCV). Clin Chem Lab Med 2008;46:157-164.
Competing interests: No competing interests