Intended for healthcare professionals

Rapid response to:

Research Methods & Reporting

GRADE Evidence to Decision (EtD) frameworks: a systematic and transparent approach to making well informed healthcare choices. 2: Clinical practice guidelines

BMJ 2016; 353 doi: https://doi.org/10.1136/bmj.i2089 (Published 30 June 2016) Cite this as: BMJ 2016;353:i2089

Rapid Response:

Re: GRADE Evidence to Decision (EtD) frameworks: a systematic and transparent approach to making well informed healthcare choices. 2: Clinical practice guidelines

The claim that “a point estimate is our best guess of the magnitude and direction of an effect” needs to be qualified because it represents an observed value of a random variable (the estimator), which often takes an infinite number of possible values; hence, the need to quantify the uncertainty associated with the point estimate, somehow. This often motivates the use of Confidence Intervals (Cis) that aim to provide a way to calibrate this uncertainty using coverage probabilities. Unfortunately, the latter does not extend to the observed CI, despite numerous questionable attempts to assign probabilities to that. The reason is that post-data the observed CI either includes or excludes the true value of the effect, but it’s unknown which is which. Similarly, the p-value can be very misleading as indicating the “the magnitude of the effect” because of its dependence on the sample size. It does, however, indicate the direction of the effect; see Mayo and Spanos (2006).

The best way to establish “the magnitude and direction of an effect” is to use the post-data severity evaluation in conjunction with Neyman-Pearson (N-P) testing. This evaluation establishes the discrepancy from the null value warranted by the particular data. It can be shown that the post-data severity associated with the point estimate has probability .5, but one would prefer to establish warranted discrepancies from the null with much higher probability, say .9 or even .95. Moreover, observed CIs include values with very low severity; see Spanos (2014).

This confirms the comment quoted by Ansari: “If the confidence interval was wider still, and included the null value of a difference of 0%, we will not have excluded the possibility that the treatment has any effect whatsoever, and would need to be even more skeptical in our conclusion.” The post-data severity evaluation is used to establish the warranted discrepancy from the null with high probability for particular data. It enables a practitioner to go beyond accept/reject, p-values and observed CIs, in order to give rise to an evidential interpretation relating to the "the magnitude and direction of an effect". Such established warranted discrepancies with high severity can used as reliable guides for Evidence-based Decision-making.

Mayo, D. G. and Spanos, A. (2006), "Severe testing as a basic concept in a Neyman--Pearson philosophy of induction," British Journal for the Philosophy of Science, 57: 323--57.
Spanos, A. (2014), "Recurring controversies about P values and confidence intervals revisited”, Ecology, 95(3): 645–651.

Competing interests: No competing interests

01 September 2016
Aris Spanos
Professor
Virginia Tech
3025 Pamplin Hall, Virginia Tech, Blacksburg, VA 24061