Caution in interpretation neededBMJ 1995; 310 doi: https://doi.org/10.1136/bmj.310.6980.667 (Published 11 March 1995) Cite this as: BMJ 1995;310:667
- Stephen Senn,
- Rajesh Bakshi,
- Nathalie Ezzet
- Biometrician Central medical adviser Biometrician Medical Department, Ciba-Geigy AG, Basle CH-4002, Switzerland
EDITOR,—Considerable caution must be exercised in interpreting the results of Lyn March and colleagues' n of 1 trials in osteoarthritis.1 For example, the authors observe that “seven of he eight patients who changed treatment from baseline changed to paracetamol (P=0.07, exact binomial).” Of the 20 patients analysed, 16 were taking a non-steroidal anti-inflammatory drug and only three were taking paracetamol. Therefore, at most three patients could have changed their treatment from paracetamol.
Furthermore, the authors implicitly assume that failure to find a significant difference in favour of diclofenac in the n of 1 trials considered constituted adequate grounds for preferring paracetamol. In drug development generally it is many years since failure to find a significant difference has been accepted as “proof” of equivalence. For fixed effects an n of 1 trial with three pairs of treatment periods has roughly the same power as a two period crossover with three patients; this is inadequate to establish treatment effects in osteoarthritis. We have reconstituted the variances of the individual estimates of the differences between treatments from the authors' table and find an average variance of 33 for pain scores. If the clinically relevant difference (between paracetamol and diclofenac) was 10 mm the power to detect a significant difference would be only 33%; thus, using a significance test under such circumstances one would, on average, incorrectly designate two thirds of patients as “adequately controlled” with paracetamol.
It is true that when the data are examined with a random effects model there is clear evidence of heterogeneity. Whether a random or fixed effects model is used, however, the overall treatment effect is significant and in favour of diclofenac (P=0.04 and P<0.0001, respectively).
“Peering at P values” is not an appropriate way to use data from n of 1 trials.2 Strategies for combining information from individuals with information from groups are needed.2 In this series of n of 1 trials flaws in the methodology will have biased downwards the estimate of the proportion of patients responding better to diclofenac.
We also consider that, whereas a small series of trials in selected patients (essentially a “convenience” sample) might be suitable for showing the capability of one drug to prove superior to another (which is the case here for diclofenac), it could never, and particularly if analysed in this way, support the generalisation that “in osteoarthritis many patients currently receiving or being considered for non-steroidal anti-inflammatory drugs may achieve adequate control with paracetamol.”