BMJ  2004;328:673 (20 March), doi:10.1136/bmj.38023.700775.AE (published 2 March 2004)

Paper

Effects of training on quality of peer review: randomised controlled trial

Sara Schroter, senior researcher1, Nick Black, professor of health services research2, Stephen Evans, professor of pharmacoepidemiology2, James Carpenter, senior lecturer in medical statistics2, Fiona Godlee, head of BMJ knowledge1, Richard Smith, editor1

1 BMJ Editorial Office, BMA House, Tavistock Square, London WC1H 9JR, 2 London School of Hygiene and Tropical Medicine, London WC1E 7HT

Correspondence to: S Schroter sschroter{at}bmj.com

Abstract

Objective To determine the effects of training on the quality of peer review.

Design Single blind randomised controlled trial with two intervention groups receiving different types of training plus a control group.

Setting and participants Reviewers at a general medical journal.

Interventions Attendance at a training workshop or reception of a self taught training package focusing on what editors want from reviewers and how to critically appraise randomised controlled trials.

Main outcome measures Quality of reviews of three manuscripts sent to reviewers at four to six monthly intervals, evaluated using the validated review quality instrument; number of deliberate major errors identified; time taken to review the manuscripts; proportion recommending rejection of the manuscripts.

Results Reviewers in the self taught group scored higher in review quality after training than did the control group (score 2.85 v 2.56; difference 0.29, 95% confidence interval 0.14 to 0.44; P = 0.001), but the difference was not of editorial significance and was not maintained in the long term. Both intervention groups identified significantly more major errors after training than did the control group (3.14 and 2.96 v 2.13; P < 0.001), and this remained significant after the reviewers' performance at baseline assessment was taken into account. The evidence for benefit of training was no longer apparent on further testing six months after the interventions. Training had no impact on the time taken to review the papers but was associated with an increased likelihood of recommending rejection (92% and 84% v 76%; P = 0.002).

Conclusions Short training packages have only a slight impact on the quality of peer review. The value of longer interventions needs to be assessed.

Introduction

Many studies have illustrated the limitations of peer review in improving the quality of research papers.1 Few studies have evaluated interventions that try to improve peer review,2 and no randomised controlled trials have examined the effects of training.3 Training that would be feasible for reviewers to undergo and for a journal to provide would have to be short or provided at a distance. Although the effectiveness of short educational interventions is questionable, some brief interventions have been shown to be successful.4 5

We aimed to determine whether reviewers for the BMJ who underwent training would produce reviews of better quality than those who received no training; whether face to face training would be more beneficial than a self taught package; and whether any training effect would last at least six months.

Methods

Participants
We randomised consenting reviewers into three groups: two intervention groups and a control group. We ensured that the groups were similar in terms of factors known to influence the quality of reviews.6 7

Assessments and procedures
We selected three previously published papers, each describing a randomised controlled trial of alternative generic ways of organising and managing clinical work. We removed the names of the original authors, changed the titles of the manuscripts and any reference to study location, and introduced 14 deliberate errors (see bmj.com). We asked all consenting reviewers to review the first paper. After this baseline assessment one intervention group received a full day of face to face training, and we mailed the other intervention group a self taught training package. Two to three months after the intervention we sent the second paper to reviewers who had completed the first review, and approximately six months later we sent the third paper to those who completed the second review. We sent the manuscripts to the reviewers in a style similar to the standard BMJ review process, but we told them that these papers were part of the study, and we did not pay them.

Outcome measures
Review quality—The review quality instrument is an eight item validated instrument (see bmj.com) developed specifically for assessing the quality of reviews.8 Two editors independently rated the quality of each review. We used the mean score of the items averaged over the two ratings.

Number of deliberate major errors—Two researchers blind to the identity and study group of the reviewer independently assessed the number of major errors reported in each review. We used the total number of major errors identified averaged across the two raters.

Time taken and recommendation on publication—Reviewers recorded the time taken to review each paper and whether it should be published with no revision, published with minor or major revision, rejected, or other. Given the very poor quality of the papers, the most appropriate recommendation would have been rejection.

Interventions
Face to face training—The full day of training covered what BMJ editors require from reviewers and techniques of critical appraisal for randomised controlled trials. Participants were also given written instructions and a CD Rom. Self taught training—We created a self taught training package based on the materials used in the training workshops, including the CD Rom. We asked reviewers to complete a questionnaire indicating the training exercises they had completed and to evaluate the training materials.

Statistical analysis
We examined differences between the groups in scores on the review quality instrument by using analysis of covariance. We did an overall analysis comparing all three groups, and we report significant results only if the overall analysis was significant. Assessment of the impact of non-response used standard methods that assume the data are missing at random.9 We also investigated how much lower than the (observed) mean for responders the (unobserved) mean for non-responders would have to be, in order to remove any intervention effect.10

Results

Participants
Of 1256 reviewers assessed, 609 (48%) eligible reviewers agreed to take part. The quality of the baseline reviews of those who did not complete follow up reviews was poorer than that of reviewers who did (review quality instrument score 2.60 v 2.73; P = 0.16), they detected fewer major errors (2.11 v 2.67; P = 0.01), and they recommended rejection less often (58% v 70%; P = 0.037).

Evaluation of training interventions
One hundred and fifty eight reviewers attended training workshops, and 81% (114/141) anticipated that the quality of their reviews would improve. Most of the 120 recipients of the self taught package who completed review 2 reported having used the package (104 (87%) completed three of the five exercises, and 103 (86%) did all five), and 98 (82%) felt that the quality of their reviews would improve as a result.

Outcome measures
Agreement was good between pairs of raters assessing the quality of reviews and the number of deliberate major errors identified.

Review quality instrument scores—For review 1, the mean score for the whole sample was 2.71 (SD 0.73) and was similar across all three groups (table). For review 2, the difference between the self taught group and the control group was 0.29 (95% confidence interval 0.14 to 0.44; P = 0.0002), and that between the control group and the face to face group was 0.16 (0.02 to 0.3; P = 0.025). We found no significant difference between any of the groups for the third review (P = 0.204). The participants in the control group who did a third review showed a small but significant rise in their score (0.17, 0.09 to 0.26; P = 0.0001), which reduced the difference between them and the intervention groups.


View this table:
[in this window]
[in a new window]
 
Review quality, errors detected, time taken, and proportion recommending rejection (based on data from all participants). Values are means (SDs) unless stated otherwise

 

Errors identified—The number of errors detected in the baseline reviews was similar in the three groups (table). However, the difference between the control group and each of the intervention groups was significant for review 2 and remained significant in the analysis of covariance. The differences observed for review 3 were slightly smaller but in a similar direction and were not significantly different after adjustment for baseline and multiple testing.

Time taken to review and recommendation—Generally, the mean time taken to review papers did not differ significantly between the groups (table). All three groups spent less time doing the third review than the previous two reviews. The proportion of reviewers recommending rejection of paper 1 was similar across the groups. The proportion recommending rejection of paper 2 was significantly lower for the control group than for the self taught group (76% v 92%; P < 0.0001), and the same pattern occurred for paper 3 (74% v 91%; P = 0.001).

Impact of non-responders
As the difference between responders and non-responders is unknown, the impact of non-response on the conclusions cannot be definitively determined. With a "missing at random" assumption, non-response has no effect on the statistical significance of the results. Alternatively, with a more conservative approach for the analysis of covariance comparison of review quality instrument scores between the self taught and control groups, we have to reduce the mean for the non-responders by 0.46 for the difference to become statistically insignificant (see bmj.com).11

Discussion

This study has confirmed the limitations of peer review as witnessed by reviewers' failure to detect major methodological errors in three straightforward accounts of randomised controlled trials. With the exception of recommendations to the editor, improvements were slight and did not reach the a priori definition of editorial significance (review quality instrument score 0.4). The self taught package seemed to be more effective than the face to face training, although for the review quality instrument this result is only of borderline significance if non-responders are on average editorially significantly worse than responders. One possible reason for the differential response rate for the second review is that the non-responders in the self taught group had not used their training package. The power of the study was sufficient to detect important differences had they existed.

The validity of the data may have been affected in several ways. Reviewers may have underperformed or overperformed, knowing they were taking part in a trial. Some reviewers may not have persisted in detecting all the errors after identifying enough to condemn a paper. These influences are likely to have affected each of the three randomised groups of reviewers equally.

As has been shown in areas outside the health sector, very short training has only a marginal impact. We cannot, therefore, recommend use of the intervention we studied. Previous studies have shown that voluntary attendance at a training session and written feedback by editors have no effect on quality of reviews.4 5 In contrast, previous observational research has shown that extended training in epidemiology and statistics is associated with better reviewing.7 An intermediate approach to enhancing peer review (somewhere between a one day workshop and a one year postgraduate level training course) may be feasible for journals to provide.


What is already known on this topic

Many studies have illustrated the inadequacies of peer review and its limitations in improving the quality of research articles

Although short educational interventions generally have limited effect, no major studies have been done in the field of peer reviewer training

What this study adds

Our short training package had only a slight impact on the quality of peer review in terms of quality of reviews and detection of deliberate major errors

The training did, however, influence reviewers' recommendations to editors



Editorials by Davidoff, Schroter, and Groves

Test papers, descriptions of deliberate errors, and review quality instrument are on bmj.com

This is the abridged version of an article that was posted on bmj.com on 2 March 2004: http://bmj.com/cgi/doi/10.1136/bmj.38023.700775.AE

We thank all the reviewers who participated; the editors who assisted with the face to face training (Trish Groves, Sandy Goldbeck-Wood, Kamran Abbasi, Richard Smith, Fiona Godlee) and the Critical Appraisal Skills Programme (CASP) team; all the editors who rated the quality of reviews, especially Carole Mongin-Bulewski, Trish Groves, Rhona Macdonald, and Alison Tonks; Lyda Osorio for assessing the number of errors reported; and the authors of the original manuscripts for allowing us to use them.

Contributors: See bmj.com

Funding: This study was funded by the NHS London Regional Office Research and Development Directorate. The views and opinions expressed in this paper do not necessarily reflect those of NHSE (LRO) or the Department of Health.

Competing interests: RS is editor of the BMJ. SS, NB, and SE review for the BMJ.

Because members of BMJ staff were involved in the conduct of this research and writing the paper, assessment and peer review have been carried out entirely by external advisers. No member of BMJ staff has been involved in making the decision on the paper.

Ethical approval: The ethics committee of the London School of Hygiene and Tropical Medicine approved the study.

References

  1. Rennie D. Editorial peer review: its development and rationale. In: Godlee F, Jefferson T, eds. Peer review in health sciences. London: BMJ Books, 1999.
  2. Callaham ML, Knopp RK, Gallagher EJ. Effect of written feedback by editors on quality of reviews. JAMA 2002;287: 2781-3.[Abstract/Free Full Text]
  3. Callaham ML, Wears RL, Waeckerle JF. Effect of attendance at a training session on peer reviewer quality and performance. Ann Emerg Med 1998;32: 318-22.[CrossRef][Web of Science][Medline]
  4. Thomson O'Brien MA, Freemantle N, Oxman AD, Wolf F, David DA, Herrin J. Continuing education meetings and workshops: effects on professional practice and health care outcomes. Cochrane Database Syst Rev 2003;(4): CD003030 [GenBank] .
  5. Beaulieu M, Rivard M, Hudon E, Beaudoin C, Saucier D, Remondin M. Comparative trial of a short workshop designed to enhance appropriate use of screening tests by family physicians. Can Med Assoc J 2002;167: 1241-6.[Abstract/Free Full Text]
  6. Evans AT, McNutt RA, Fletcher SW, Fletcher RH. The characteristics of peer reviewers who produce good-quality reviews. J Gen Intern Med 1993;8: 422-8.[Web of Science][Medline]
  7. Black N, van Rooyen S, Godlee F, Smith R, Evans S. What makes a good reviewer and a good review in a general medical journal. JAMA 1998;280: 231-3.[Abstract/Free Full Text]
  8. Van Rooyen S, Black N, Godlee F. Development of the review quality instrument (RQI) for assessing peer reviews of manuscripts. J Clin Epidemiol 1999;52: 625-9.[CrossRef][Web of Science][Medline]
  9. Schafer JL. Multiple imputation: a primer. Stat Methods Med Res 1999;8: 3-15.[Abstract/Free Full Text]
  10. Simon R. Bayesian subset analysis: applications to studying treatment-by-gender interactions. Stat Med 2002;21: 2909-16.[CrossRef][Web of Science][Medline]
  11. White I, Carpenter J, Evans S, Schroter S. Eliciting and using expert opinions about non-response bias in randomised controlled trials. Technical report: email James.Carpenter{at}lshtm.ac.uk
(Accepted 30 November 2003)


Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to StumbleUpon StumbleUpon   Add to Technorati Technorati    What's this?

Relevant Articles

Evidence based publishing
Leanne Tite and Sara Schroter
BMJ 2006 333: 366. [Extract] [Full Text] [PDF]

(Short) training does not improve peer review
BMJ 2004 328: 0. [Full Text]

Improving peer review: who's responsible?
Frank Davidoff
BMJ 2004 328: 657-658. [Extract] [Full Text] [PDF]

BMJ training for peer reviewers
Sara Schroter and Trish Groves
BMJ 2004 328: 658. [Extract] [Full Text] [PDF]

This article has been cited by other articles:

  • Vetter, N. (2007). Editing the JPH. J Public Health (Oxf) 29: 215-217 [Full text]  
  • Sheiman, R. G. (2007). The RSNA Reviewer Mentorship Program. Radiology 244: 631-632 [Full text]  
  • Carpenter, J. R., Kenward, M. G., White, I. R. (2007). Sensitivity analysis after multiple imputation under missing at random: a weighting approach. Stat Methods Med Res 16: 259-275 [Abstract]  
  • White, I. R, Carpenter, J., Evans, S., Schroter, S. (2007). Eliciting and using expert opinions about dropout bias in randomized controlled trials. Clin Trials 4: 125-139 [Abstract]  
  • Tite, L., Schroter, S. (2007). Why do peer reviewers decline to review? A survey. J. Epidemiol. Community Health 61: 9-12 [Abstract] [Full text]  
  • Tite, L., Schroter, S. (2006). Evidence based publishing. BMJ 333: 366-366 [Full text]  
  • Smith, R. (2006). Peer review: a flawed process at the heart of science and journals.. JRSM 99: 178-182 [Full text]  
  • Schroter, S., Tite, L., Hutchings, A., Black, N. (2006). Differences in Review Quality and Recommendations for Publication Between Peer Reviewers Suggested by Authors or by Editors. JAMA 295: 314-317 [Abstract] [Full text]  
  • Savulescu, J, Viens, A M (2005). What makes the best medical ethics journal? A North American perspective. J. Med. Ethics 31: 591-597 [Abstract] [Full text]  
  • Davidoff, F. (2004). Improving peer review: who's responsible?. BMJ 328: 657-658 [Full text]  
  • Schroter, S., Groves, T. (2004). BMJ training for peer reviewers. BMJ 328: 658-658 [Full text]  

Rapid Responses:

Read all Rapid Responses

Effect of training on quality of peer review
Robert West
bmj.com, 26 Mar 2004 [Full text]
Data-analysis errors
Eric N Grosch
bmj.com, 12 Apr 2004 [Full text]
Were there fatal order effects in this trial?
Trisha Greenhalgh
bmj.com, 19 Apr 2004 [Full text]



Access jobs at BMJ Careers
Whats new online at Student 

BMJ