Cost effectiveness calculations and sample sizeBMJ 2000; 321 doi: https://doi.org/10.1136/bmj.321.7262.697 (Published 16 September 2000) Cite this as: BMJ 2000;321:697
All rapid responses
Cost-benefit studies analyze outcomes in terms of money. This method
has always been a point of debate such that its difficult to put an actual
amount value for a certain outcome. As such, still only a number of
studies are available.
Due to this type of analysis, sample size has tremendously increased
in order to detect significant minute differences across treatment choices
in designing studies. In designing two arm studies, what has been
termed as “clinically” significant difference, is set by investigators
based on expert opinion. This value, which is important in calculating the
sample size for a study, may increase and decrease depending more often on
the ability to gather the number of subjects for a study or cost of the
Example: Comparing two treatment with outcome measured in
N= Za (root2PQ)+ Zb root(2P1Q1 + 2P2Q2)/ D squared
D = difference between the two proportions that is desired to be detected
However, in designing studies with two arms, the D or the difference
between the two can be objectively termed as “clinically significant
difference” by assigning value to the treament choices. By assigning value
to treatment and outcome, e.g., treatment one is P2000 and outcome
prevented is P5000, how many treatment one is equivalent to preventing an
Hence, D is a value which, when statistically significant, will make
us choose between treatment one versus placebo.
With concept of number needed to treat, i.e., number of cases to
treat in order to prevent the primary outcome, the desired number needed
to treat can be derived using:
(Cost of treatment one)x(# of cases treated)=
(cost of outcome)x(number of outcome prevented)
Cost of treatment = P2000 (e.g. chronic otic antibiotic drops in otitis
Cost of outcome = P5000 (e.g. a tympanomastoidectomy)
(P2000) (# of treated cases) = (P5000) (# of outcome)
(# of treated cases)/(# of outcome prevented)=
NNT = 2.5
hypothetical ARI = 1/NNT = 0.4
D = 0.4
ARI = 0.4 which is actually a desired difference such that when
exceeded in the study, would decide whether the treatment is better.
And given the sample size formula, one can compute for the needed
sample to detect a “clinically significant difference” based on costing.
Costing an outcome however will be controversial from one specialist
over another. Outcomes like patients ending up in a surgical procedure
confined for this number of particular days who is projected to earn and
be productive for a value for each day is relatively easy to assess.
Costing life, i.e., mortality is a different matter.
However, in a country where resources are very limited, a perceived
“clinically significant difference” should include costing as an important
factor. As such a study that is designed to investigate a novel treatment
or procedure versus an established one must account for costing. A
statistically significant difference of 0.0001, where number needed to
treat is big (1/0.0001 = 10,000) as well as the cost, may not be
clinically significant if the cost outcome being considered is relatively
It is true that larger sample size may be required to detect
significant small differences necessary in studies that involves costing-
effectiveness analysis. However, a preliminary costing analysis can be
applied to determine a minimum sufficient sample in designing a study.
These provides a more measurable basis for determining “what a clinically
significant difference is”.
Competing interests: No competing interests
The article by Torgerson and Campbell  suggests using
the total cost break-even point as the hypothetical effect
size for the purpose of power and sample size calculations.
While this criterion appears as good as many others, it
could lead to the mistaken conclusion that a study designed
in this way can test cost efectiveness.
In fact, if the sample size is chosen as in  with a
type 1 error rate of 5%, then the power to detect lower
overall cost is only 2.5% at the same 5% level!
Take the example of comparing endometrial laser ablation
with transcervical endometrial resection, as discussed
in . The proposed study design with 435 patients per arm
has a power of 80% to detect a significantly lower rate
of re-treatment after laser ablation than the 27% observed
for transcervical resection. However to establish lower
overall cost, we must show that the re-treatment rate is
significantly lower than the break-even point of 19%.
The probability of observing a rate significantly lower
than 19% under the hypothesis that the rate is 19% is,
unsurprisingly, very low. In fact it is precisely half
the type 1 error rate of 5%.
So the proposed cost-effectiveness criterion for selecting
a hypothetical effect size cannot be used for testing
cost effectiveness. This leaves its rationale looking
rather slight. In fact I support the "logistic" procedure
that the authors appear to denigrate in : calculating
the effect size which yields a practical sample size.
This is an effective way of reducing the arcane logic
of power calculations to a parameter that a clinician
can use to decide if the trial should proceed.
I believe this is compatible with Goodman's agenda  to
incorporate clinical understanding into medical
statistics. It is quite different to using post-hoc
power calculations to explain away negative results .
 Torgerson DJ, Campbell MK. Cost effectiveness
calculations and sample size. BMJ 2000;321:697.
 Goodman SN. Towards evidence-based medical statistics.
1: The p-value fallacy. Ann Intern Med 1999;130:
 Goodman SN. The use of predicted confidence intervals
when planning experiments and the misuse of power
when interpreting the results. Ann Intern Med 1994;121:
Competing interests: No competing interests