Intended for healthcare professionals

Education And Debate

Alteplase for stroke: money and optimistic claims buttress the “brain attack” campaignCommentary: Who pays the guideline writers?Commentary: Thrombolysis in stroke: it works!

BMJ 2002; 324 doi: (Published 23 March 2002) Cite this as: BMJ 2002;324:723

The raw data of the NINDS trial should be made public

I appreciate Dr Grotta's willingness to submit a detailed response on
behalf of the NINDS study group, and welcome his provision of more
detailed subgroup data for further analysis. The NINDS study group's rapid
response letter is a testament to the social value of the bmj's rapid
response section, because it demonstrates that the rapid response section
can serve as a valuable forum for serious scientific discussion. The
readers of the bmj (and the wider public) are much better served if they
can study and analyse the arguments of multiple discussants, who obviously
have different points-of-view. All the discussants obviously "spin" their
analysis of the data, and the independently-thinking reader is hopefully
able to determine the likely "scientific truth" by carefully perusing the
different points-of-view.

Dr. Grotta stated "Dr. Mann writes that in order to obtain valid
results, critical prognostic variables have to be
prespecified, and corrected for, in the design of any randomized
controlled trial. This is true, but Dr. Mann’s
choice of prognostic variables is different than ours. What we think is
important is predicting who will respond to TPA. Dr. Mann is most
concerned with who will do well despite therapy." Dr Grotta is entirely
mistaken if he presumes that I am more concerned with who will do well
despite therapy. Surely, both factors have to be carefully considered when
designing a tPA-for-stroke trial? I can appreciate the fact that the NINDS
investigators mainly focused their attention on predicting who will most
likely respond to tPA when designing their stroke trial -- along the
practical lines suggested by David Sackett [1] who stated that confidence
in a trial's results is greater when the signal/noise ratio of the trial
is enhanced. According to Sackett the "signal describes the differences
between the effects of the experimental and control treatments". By
deliberatedly choosing patients who would most likely have a substantial
response to tPA, the NINDS trialists would obviously maximise the
"signal", which would subsequently increase one's confidence in tPA's
efficacy if the trial's results turned out to be positive. However, surely
it is equally important to decrease the "noise" in order to be confident
in the validity of the NINDS trial's results? Consider Sackett's
definition of "noise", which he defines as "noise (or uncertainty) in an
RCT is the sum of all the factors ("sources of variation") that can affect
the absolute risk reduction or absolute difference". In the case of tPA-
for-stroke trials that are poorly balanced for baseline stroke severity,
variations in the expected rate of a favorable stroke outcome (due to the
natural course of the disease) vary according to the degree of imbalance
in baseline stroke severity. Significant imbalances in baseline stroke
severity between treated and placebo patients create considerable "noise"
-- because they produce a significant chance-variability in the rate of a
favorable stroke outcome that may obscure (magnify/diminish) the "true"
efficacy of tPA and cause the "apparent" efficacy of tPA to be greater-or-
less than the "true" efficacy of tPA.

Now that Dr. Grotta has published the favorable stroke outcome
results for each subgroup, it is much easier to
demonstrate the degree of "noise" caused by stroke severity imbalances in
the NINDS trial by simply reviewing the results presented in table 1 in
Dr. Grotta's rapid response letter. The NINDS investigators' own data-
presentation shows that the "apparent" efficacy of tPA for patients
treated between 91-180 minutes is 21% (46% minus 25%), and only 14% (37%
minus 23%) when the NIHSS 0-5 subgroup's results are eliminated from
consideration. That represents a one-third reduction in tPA's "apparent"
efficacy due to the elimination of "noise" from the biggest single source
of confounding due to stroke severity imbalances between treated and
placebo patients in the NINDS trial -- the stroke outcome results from the
NIHSS 0-5 subgroups. The 7% absolute difference (due to the recruitment of
such a large percentage of very mild stroke patients) was chalked up as
being due to tPA therapy, when it was obviously due to the natural course
of the disease. It is ironic that such
a high proportion of very mild stroke patients were recruited into the
NINDS trial. Patients with very mild
strokes (NIHSS 0-5) represented about 20% of the total number of tPA
patients treated between 91-180 minutes. Recruiting patients with very
mild strokes is contrary to Sackett's basic principle of mainly recruiting
high risk
patients, who would more likely show a substantial response to tPA
therapy. It was also contrary to the NINDS investigators' own policy of
discouraging the recruitment of patients with a NIHSS score of <4 [2].

There may be another significant source of "noise" due to stroke
severity imbalances that may cause the "apparent" efficacy of tPA to be
different from the "true" efficacy of tPA -- "noise" that would be
generated if the placebo and tPA patients within EACH subgroup were not
near-perfectly balanced for baseline stroke severity (even though the
total number of placebo and tPA patients in each subgroup was near-equal).
I explored that particular issue at great length in a letter to the CMAJ
[3] and I wonder to what degree stroke severity imbalances within each/all
of the NINDS subgroup's could be a confounding factor. The true answer to
that question will only become fully apparent when the NINDS investigators
make all the raw data from the NINDS
trial publically available -- so that the public can much more accuratedly
determine how well-balanced EACH of the subgroups were for baseline stroke
severity. According to the TOAST graph [4] a two-point difference in the
"average" baseline NIHSS score between treated and placebo patients in the
NIHSS range of 16-20 can cause a 3-5% absolute difference in the rate of a
favorable stroke outcome, which could alter the "apparent" efficacy of
tPA for those patients by a factor of 30-50%. The figure of 30-50% is
obtained by simply examining the NIHSS 16-20 subgroups' results in table
1, which showed that the apparent" absolute efficacy of tPA for those
patients was 9% (27% minus 18%).

Another probable example of "noise" due to stroke severity imbalances
within the NINDS trial's subgroups can be ascertained by looking at the
NIHSS 11-15 and NIHSS 16-20 placebo subgroup's rate of favorable stroke
outcome results in table 1 in Dr. Grotta's letter. The rate of favorable
stroke outcome for the NIHSS 11-15 placebo subgroup was 14%, which was
less than the figure of 18% for the NIHSS 16-20 placebo subgroup. That
result is obviously surprising because patients with a baseline NIHSS
stroke severity score of 11-15 are naturally expected to have a much
better stroke outcome result than patients with a baseline stroke severity
NIHSS score of 16-20. The figure of 14% seems to be extraordinarily low
for untreated patients with a baseline NIHSS score in that stroke
severity range and it is much less than would be predicted. It would be
very informative if the NINDS trialists would publish the rate of
favorable stroke outcome results for the NIHSS 11-15 placebo subgroups
from the 0-90 minute arm of the NINDS trial, ECASS trial, ECASS II trial
and ATLANTIS trial (including their "average" baseline stroke severity
scores). It would be extremely useful to know whether the "average" rate
of favorable stroke outcome of the NIHSS 11-15 placebo patients from those
other trials are closer to the 34% figure predicted by the TOAST graph
[4], and whether the comparable NINDS placebo results from the 91-180
minute cohort is a statistical outlier that artefactually inflates the
"apparent" efficacy of tPA in that subgroup of patients. How much of an
effect could this particular imbalance have if the "true" rate of a
favorable stroke outcome for placebo patients in the NIHSS 11-15 subgroup
was 30%? The answer is that an additional 5 patients would have a
favorable stroke outcome.

Finally, there is another "noise" element factor due to stroke
severity imbalances in the NINDS trial that should
be considered. Note that there were 18 more patients in the placebo group
in the NIHSS >20 subgroup (compared to the tPA group), and that those
patients only had a 4% chance likelihood of a favorable stroke outcome. If
those 18 patients were equally distributed between the NIHSS 5-10, 10-15
and 16-20 subgroups, then an additional 6 patients would have a favorable
stroke outcome. Adding that figure of 6 patients to the 5 additional
patients from the NIHSS 11-15 subgroup means that an additional 11 placebo
patients would have a favorable stroke outcome. Then the computed figure
for the placebo group (excluding the NIHSS 0-5 subgroup) would be 47/160
and not 36/160, which translates to 29% and not 23%. That means that the
"apparent" efficacy of tPA would be reduced by another 6%, and it would
only be 8% and not 14%.

How useful is this post hoc conjecturing about the NINDS trial's
subgroup data? Dr. Grotta stated "It is also very
important to remember that these post-hoc analyses involving subgroups
without sufficient statistical power to answer the meaningful question
should be considered as “hypothesis generating” and providing a rationale
for further study only." I agree with Dr. Grotta, and I think that my post
hoc conjecturing about the "noise" influence of stroke severity imbalances
within the NINDS trial's subgroups is simply a "hypothetical" explanation,
which only becomes a "realistic" explanation if the raw data supports my
theory's basic tenets. That is why I have requested that the NINDS
investigators make the pooled raw data from all the tPA-for-stroke trials
publically available [5], so that the raw data can be independently
examined. The pooled results from all the tPA-for-stroke trials should be
examined for EACH level of baseline stroke severity (from a baseline NIHSS
score of 1-25) for different times-to-treatment, so that the "noise" due
to stroke severity imbalances can be eliminated as a confounding factor.
By also examining the favorable stroke outcome results for different times
-to-treatment, it will immediately become clear to what degree delays in
time-to-treatment affect the "apparent" efficacy of tPA -- without having
to depend on the hypothetical model constructed by the NINDS investigators

A number of other statements made by Dr. Grotta deserve further
commentary. He stated "Contrary to Dr. Mann’s
assertion, NIH Stroke Scale was a prespecified variable that was known to
predict outcome and it was corrected for in the usual way in the original
publication." I have parsed that original publication countless times and
I have never read any statement that implied that the NINDS investigators
had corrected for imbalances in baseline stroke severity between treated
and placebo patients. Hopefully, the NINDS ivestigators, or other bmj
readers, could point out the particular "statement/statements" in that
original publication that I must have missed. Dr. Grotta also stated "The
facts are that the baseline imbalance of stroke severity DOES NOT explain
the entire results of the trial. The imbalance explains some of the
difference, but there is no question that there is still benefit from TPA
91-180 minutes after stroke onset." I wholeheartedly agree with Dr. Grotta
-- the imbalance only explains some of the difference. However, the
critical question is how much of the difference is due to stroke severity
imbalances and how much is due to the "true" efficacy of tPA? That
question has presently not been answered, and I strongly suspect that an
accurate answer will only become apparent when all of the NINDS trial's
raw data is made available to the public. Do the NINDS investigators have
a valid reason for not making the raw data available -- considering that
the study was funded with public money through the NIH? There is a
disturbing dissonance between the refusal of the trial's investigators to
make patient-level data publicly available, and the NIH's traditional
stance on the dissemination of the results of NIH-sponsored research.
Indeed, this is unequivocally explicated in a draft policy statement
concerning data sharing released on March 1, 2002 [7]. The statement
states "There are many reasons to share data from NIH-supported studies.
Sharing data reinforces open scientific inquiry, encourages diversity of
analysis and opinion, promotes new research, makes possible the testing of
new or alternative hypotheses and methods of analysis -----". In fact, the
NIH draft statement makes some definite recommendations and it explicitly
states "The NIH will expect investigators supported by NIH funding to make
their research data available to the scientific community for subsequent

Finally, Dr. Grotta also stated "The results of the NINDS study have
been confirmed by numerous independent reports from both academic and
community hospitals." How is that possible if those studies did not have a
placebo arm? By what means could post-marketing tPA-for-stroke studies
determine the "true" efficacy of tPA if they did not have an absolute or
relative comparator? In the absence of an absolute comparator (equally
balanced group of placebo patients in a RCT), one could theoretically only
determine that the "other" study had a similar degree of efficacy as the
NINDS trial if the "other" study had tPA patients with an identical stroke
severity distribution as the original NINDS trial. Does anyone know of
such a study? In the absence of knowledge of such a study, I took the
Multicentre Stroke Survey's group of >1,000 tPA patients, who had an
"average" rate of a favorable stroke outcome of 33%, and I calculated the
likelihood of a similar group of untreated stroke patients having a
favorable stroke outcome due to the natural course of the disease (using
data from the NINDS trial and not the TOAST study). The calculated results
were reported in my rapid response letter [3] and the estimated "average"
figure was 31.7%. That figure suggests that tPA was probably not
significantly efficacious in those patients. Although the results are only
based on a relative comparison, which is not universally regarded as being
statistically valid, I would be interested in knowing if anyone has a
better means of demonstrating how the results of post-marketing studies
(which are not RCTs) can accuratedly confirm-or-refute the positive
results of the NINDS trial.

Jeffrey Mann.


1. Sackett, David L. Why randomized controlled trials fail but
needn't: 2. Failure to employ physiological statistics, or the only
formula a clinician-trialist is ever likely to need (or understand!) CMAJ:
Canadian Medical Association Journal. 165(9):1226-1237, October 30,

Available online at

2. Comment by Patrick Lyden at the FDA Advisory Committee meeting -
June 6th, 1996. From the meeting's transcripts - lines 15-16 on page 183.

3. Mann J. To what degree do stroke severity imbalances affect the
"apparent" efficacy of tPA. Canadian Medical
Association Journal rapid response letter. June 25th 2002.

Available online at
Also available at

4. Adams HP, Davis PH, Leira EC, Chang KC, Bendixen BH, Clarke W, et
al. Baseline NIH stroke scale score strongly predicts outcome after
stroke: a report of the Trial of Org 10172 in Acute Stroke Treatment
(TOAST). Stroke 1999; 30 (11): 2496.

5. Mann J. An open letter to the stroke interventionist community.
bmj rapid response letter. 19 May 2002.

Available online at

6. Representative copy of figure 2 from the Marler article.

Available online at

7. NIH announces draft statement on sharing research data. Release
Date: March 1, 2002.

Available online at

Competing interests: No competing interests

08 July 2002
Jeffrey Mann
Salt Lake City, UT 84103