Jump to: Page Content, Site Navigation, Site Search,
You are seeing this message because your web browser does not support basic web standards. Find out more about why this message is appearing and what you can do to make your experience on this site better.
BMJ 2004;328:673 (20 March), doi:10.1136/bmj.38023.700775.AE (published 2 March 2004)
Sara Schroter, senior researcher1, Nick Black, professor of health services research2, Stephen Evans, professor of pharmacoepidemiology2, James Carpenter, senior lecturer in medical statistics2, Fiona Godlee, head of BMJ knowledge1, Richard Smith, editor1
1 BMJ Editorial Office, BMA House, Tavistock Square, London WC1H 9JR, 2 London School of Hygiene and Tropical Medicine, London WC1E 7HT
Correspondence to: S Schroter sschroter{at}bmj.com
Design Single blind randomised controlled trial with two intervention groups receiving different types of training plus a control group.
Setting and participants Reviewers at a general medical journal.
Interventions Attendance at a training workshop or reception of a self taught training package focusing on what editors want from reviewers and how to critically appraise randomised controlled trials.
Main outcome measures Quality of reviews of three manuscripts sent to reviewers at four to six monthly intervals, evaluated using the validated review quality instrument; number of deliberate major errors identified; time taken to review the manuscripts; proportion recommending rejection of the manuscripts.
Results Reviewers in the self taught group scored higher in review quality after training than did the control group (score 2.85 v 2.56; difference 0.29, 95% confidence interval 0.14 to 0.44; P = 0.001), but the difference was not of editorial significance and was not maintained in the long term. Both intervention groups identified significantly more major errors after training than did the control group (3.14 and 2.96 v 2.13; P < 0.001), and this remained significant after the reviewers' performance at baseline assessment was taken into account. The evidence for benefit of training was no longer apparent on further testing six months after the interventions. Training had no impact on the time taken to review the papers but was associated with an increased likelihood of recommending rejection (92% and 84% v 76%; P = 0.002).
Conclusions Short training packages have only a slight impact on the quality of peer review. The value of longer interventions needs to be assessed.
We aimed to determine whether reviewers for the BMJ who underwent training would produce reviews of better quality than those who received no training; whether face to face training would be more beneficial than a self taught package; and whether any training effect would last at least six months.
Assessments and procedures
We selected three previously published papers, each describing a randomised controlled trial of alternative generic ways of organising and managing clinical work. We removed the names of the original authors, changed the titles of the manuscripts and any reference to study location, and introduced 14 deliberate errors (see bmj.com). We asked all consenting reviewers to review the first paper. After this baseline assessment one intervention group received a full day of face to face training, and we mailed the other intervention group a self taught training package. Two to three months after the intervention we sent the second paper to reviewers who had completed the first review, and approximately six months later we sent the third paper to those who completed the second review. We sent the manuscripts to the reviewers in a style similar to the standard BMJ review process, but we told them that these papers were part of the study, and we did not pay them.
Outcome measures
Review qualityThe review quality instrument is an eight item validated instrument (see bmj.com) developed specifically for assessing the quality of reviews.8 Two editors independently rated the quality of each review. We used the mean score of the items averaged over the two ratings.
Number of deliberate major errorsTwo researchers blind to the identity and study group of the reviewer independently assessed the number of major errors reported in each review. We used the total number of major errors identified averaged across the two raters.
Time taken and recommendation on publicationReviewers recorded the time taken to review each paper and whether it should be published with no revision, published with minor or major revision, rejected, or other. Given the very poor quality of the papers, the most appropriate recommendation would have been rejection.
Interventions
Face to face trainingThe full day of training covered what BMJ editors require from reviewers and techniques of critical appraisal for randomised controlled trials. Participants were also given written instructions and a CD Rom. Self taught trainingWe created a self taught training package based on the materials used in the training workshops, including the CD Rom. We asked reviewers to complete a questionnaire indicating the training exercises they had completed and to evaluate the training materials.
Statistical analysis
We examined differences between the groups in scores on the review quality instrument by using analysis of covariance. We did an overall analysis comparing all three groups, and we report significant results only if the overall analysis was significant. Assessment of the impact of non-response used standard methods that assume the data are missing at random.9 We also investigated how much lower than the (observed) mean for responders the (unobserved) mean for non-responders would have to be, in order to remove any intervention effect.10
Evaluation of training interventions
One hundred and fifty eight reviewers attended training workshops, and 81% (114/141) anticipated that the quality of their reviews would improve. Most of the 120 recipients of the self taught package who completed review 2 reported having used the package (104 (87%) completed three of the five exercises, and 103 (86%) did all five), and 98 (82%) felt that the quality of their reviews would improve as a result.
Outcome measures
Agreement was good between pairs of raters assessing the quality of reviews and the number of deliberate major errors identified.
Review quality instrument scoresFor review 1, the mean score for the whole sample was 2.71 (SD 0.73) and was similar across all three groups (table). For review 2, the difference between the self taught group and the control group was 0.29 (95% confidence interval 0.14 to 0.44; P = 0.0002), and that between the control group and the face to face group was 0.16 (0.02 to 0.3; P = 0.025). We found no significant difference between any of the groups for the third review (P = 0.204). The participants in the control group who did a third review showed a small but significant rise in their score (0.17, 0.09 to 0.26; P = 0.0001), which reduced the difference between them and the intervention groups.
|
Errors identifiedThe number of errors detected in the baseline reviews was similar in the three groups (table). However, the difference between the control group and each of the intervention groups was significant for review 2 and remained significant in the analysis of covariance. The differences observed for review 3 were slightly smaller but in a similar direction and were not significantly different after adjustment for baseline and multiple testing.
Time taken to review and recommendationGenerally, the mean time taken to review papers did not differ significantly between the groups (table). All three groups spent less time doing the third review than the previous two reviews. The proportion of reviewers recommending rejection of paper 1 was similar across the groups. The proportion recommending rejection of paper 2 was significantly lower for the control group than for the self taught group (76% v 92%; P < 0.0001), and the same pattern occurred for paper 3 (74% v 91%; P = 0.001).
Impact of non-responders
As the difference between responders and non-responders is unknown, the impact of non-response on the conclusions cannot be definitively determined. With a "missing at random" assumption, non-response has no effect on the statistical significance of the results. Alternatively, with a more conservative approach for the analysis of covariance comparison of review quality instrument scores between the self taught and control groups, we have to reduce the mean for the non-responders by 0.46 for the difference to become statistically insignificant (see bmj.com).11
The validity of the data may have been affected in several ways. Reviewers may have underperformed or overperformed, knowing they were taking part in a trial. Some reviewers may not have persisted in detecting all the errors after identifying enough to condemn a paper. These influences are likely to have affected each of the three randomised groups of reviewers equally.
As has been shown in areas outside the health sector, very short training has only a marginal impact. We cannot, therefore, recommend use of the intervention we studied. Previous studies have shown that voluntary attendance at a training session and written feedback by editors have no effect on quality of reviews.4 5 In contrast, previous observational research has shown that extended training in epidemiology and statistics is associated with better reviewing.7 An intermediate approach to enhancing peer review (somewhere between a one day workshop and a one year postgraduate level training course) may be feasible for journals to provide.
|
Test papers, descriptions of deliberate errors, and review quality instrument are on bmj.com
This is the abridged version of an article that was posted on bmj.com on 2 March 2004: http://bmj.com/cgi/doi/10.1136/bmj.38023.700775.AE
We thank all the reviewers who participated; the editors who assisted with the face to face training (Trish Groves, Sandy Goldbeck-Wood, Kamran Abbasi, Richard Smith, Fiona Godlee) and the Critical Appraisal Skills Programme (CASP) team; all the editors who rated the quality of reviews, especially Carole Mongin-Bulewski, Trish Groves, Rhona Macdonald, and Alison Tonks; Lyda Osorio for assessing the number of errors reported; and the authors of the original manuscripts for allowing us to use them.
Contributors: See bmj.com
Funding: This study was funded by the NHS London Regional Office Research and Development Directorate. The views and opinions expressed in this paper do not necessarily reflect those of NHSE (LRO) or the Department of Health.
Competing interests: RS is editor of the BMJ. SS, NB, and SE review for the BMJ.
Because members of BMJ staff were involved in the conduct of this research and writing the paper, assessment and peer review have been carried out entirely by external advisers. No member of BMJ staff has been involved in making the decision on the paper.
Ethical approval: The ethics committee of the London School of Hygiene and Tropical Medicine approved the study.
![]()
CiteULike
Complore
Connotea
Del.icio.us
Digg
Reddit
StumbleUpon
Technorati What's this?
Read all Rapid Responses