Rapid Responses to:

PAPERS:
Sara Schroter, Nick Black, Stephen Evans, James Carpenter, Fiona Godlee, and Richard Smith
Effects of training on quality of peer review: randomised controlled trial
BMJ 2004; 328: 673 [Abstract] [Full text]
*Rapid Responses: Submit a response to this article

Rapid Responses published:

[Read Rapid Response] Effect of training on quality of peer review
Robert West   (26 March 2004)
[Read Rapid Response] Data-analysis errors
Eric N Grosch   (12 April 2004)
[Read Rapid Response] Were there fatal order effects in this trial?
Trisha Greenhalgh   (19 April 2004)

Effect of training on quality of peer review 26 March 2004
 Next Rapid Response Top
Robert West,
reader
UWCM.Cardiff

Send response to journal:
Re: Effect of training on quality of peer review

The 'in house' trial of training ( Schroter et al 2004,328, 673 ) reports a significant, though small and editorially unimportant improvement in review quality ( with differences and 95% confidence intevals ) soon after training. Perhaps the following sentence reveals the more important finding; no difference only six months later.

The trial tested effect of brief training on review of one type of study; the trial, arguably the simplest to appraise and the results emphasised the recorded error count. While an error count may be informative to editors, other possibly more important features, like originality, were merged into one composite measure and this showed very little difference ( approximately 10% ). The authors speculate that ' some reviewers may not have persisted in detecting all ( planted ) errors '. Indeed, could that alone explain the apparent difference between recently trained and controls ? An important feature of training is explaining that inconsistencies in the manuscript are useful proxies for errors in the study. I, for one, have never attempted to list all errors, inconsistencies, typos and poor English, observed in manuscripts, but have aimed to illustrate with enough examples to provide editors with grounds for rejection, revision or acceptance.

There may be also some uncertainty over magnitude and significance of the apparent improvement at test 2. The best scores for quality and ( reported ) errors were in self taught 'survivors' but there were many more drop-outs in this group ( 28 v 6% , table 1 ) and drop-outs had poorer scores at baseline ( results ll 2-4 ). It would not be at all surprising if those, who were finding the exercise diffiult, fell by the wayside. While two methods for imputation for missing values are cited, but not the simple comparison of only those who completed, abstract and results present unadjusted differences.

Archie Cochrane would have challenged inference, based on a trial ( or study ) with a drop-out rate of this magnitude; 45% from randomisation to 6 month follow up ); perhaps particularly disappointing among volunteering professionals.

Competing interests: none

Data-analysis errors 12 April 2004
Previous Rapid Response Next Rapid Response Top
Eric N Grosch,
private practitioner
Largo, FL, USA, 33774

Send response to journal:
Re: Data-analysis errors

Schroter et al present no original data, so the reader cannot assess the accuracy of their processed numerical results in table 2.

RQI-score and number of errors identified consist, in each case, of ordinal data, for which the median is the appropriate measure of central tendency and the interquartile range is the appropriate measure of disperson, not, as the authors indicate, the mean and standard deviation, respectively.

For time taken to review in minutes, the data are parametric, so mean and standard deviation might be appropriate if the data were not skewed but they are skewed in every case, as indicated by the fact that double the SD exceeds the mean in every instance, so that two standard deviations to the left of the mean is less than zero, an absurd result for a measure of time that would have begun from time zero. That means that the distribution is not Gaussian, a necessary prerequisite for use of mean and standard deviation. Accordingly, again, the median is the appropriate measure of central tendency and the interquartile range is the appropriate measure of disperson for the time-data.[1]

1. White SJ. Statistical errors in papers int eh British Journal of Psychiatry. Br J Psychiatry. 1979 Oct;135:336-42

Competing interests: None declared

Were there fatal order effects in this trial? 19 April 2004
Previous Rapid Response  Top
Trisha Greenhalgh,
Professor of Primary Care
University College London

Send response to journal:
Re: Were there fatal order effects in this trial?

I was one of the reviewers who participated in this trial. I was randomised to 'no intervention', and received three papers. I do a lot of reviewing for the BMJ, and the time from consenting to doing the reviews was several months, so I was not aware that the particular papers I was reviewing were part of the trial. I reviewed the first paper carefully and (I later found) scored 4.21 out of a maximum 5 - i.e. I must have spotted most of the flaws. I remember commenting that I was surprised the paper had not been rejected at initial screen. When I got the second paper (again not consciously recalling that I had agreed to do this trial), I found it to be dreadfully flawed. I was busy, so sent a short review pointing out the major flaws and offering to send a more detailed response if the editors needed that. I scored 2.64 for that review - i.e. I only "found" about half the flaws. When I got the third paper, I did the same, and scored 2.57.

My point here is, I believe that all reviewers got all papers in the same order. I suspect that, like me, there was a clear attrition of input by experienced reviewers as they received yet another clearly unpublishable paper. This group would have done what experienced reviewers always do - find enough points to hole the paper below the water line, then stop looking. I also suspect that less experienced reviewers who received training might well have improved their score as they learnt new skills, but that this would have been counterbalanced by the behavour of the 'old hands'.

Could a statistician explore this hypothesis, since I for one am concerned that the BMJ seems to have "proved" the couter-intuitive hypothesis that reviewers don't benefit from training.

Competing interests: None declared