Misleading, unscientific, and unjust: the United Kingdom's research assessment exerciseBMJ 1998; 316 doi: https://doi.org/10.1136/bmj.316.7137.1079 (Published 04 April 1998) Cite this as: BMJ 1998;316:1079
- Gareth Williams, professor of medicine
- Accepted 25 February 1998
Keen observers of Britain will know of our obsession with regularly occurring phenomena that involve large sums of money, balls, and disappointment. Two obvious examples are test match cricket and the national lottery. Another, just as parochial but with lessons for the global scientific community, is the research assessment exercise (RAE) run every four years by the Higher Education Funding Councils. The aim of the exercise is to measure research activity in British academic institutions and thus determine how the councils' research budget will be distributed among the country's universities.
The aim of the research assessment exercise is to evaluate research success and determine central funding for academic units in the United Kingdom
The assessment criteria used are restrictive, flawed, and unscientific and produce a distorted picture of research activity that can threaten the survival of active and productive research units
The assessment exercise is also unaccountable, inefficient, time consuming, and expensive
The assessment exercise should be made objective, by basing it solely on each unit's total published output during the survey period; each publication would be scored for quality (using agreed criteria) and the unit's share of the work done
With a computerised spreadsheet, data could be collected easily and each unit's submission for assessment continually updated. The assessment scores could determine the national ranking of groups in each specialty, as well as the distribution of central funds to each unit
The importance of the exercise
In the 1996 research assessment exercise each “unit of assessment” in each university was graded from 1 (research of little consequence) to 5 (research of international renown) and 5* (outstanding).1 In the “hospital based clinical medicine” unit of assessment (which includes all mainstream medical and surgical specialties) a grade 5* was awarded to two institutions, grade 5 to four, grade 4 to eight, grade 3a and 3b to eight and two respectively, and grade 2 to one institution. Units that did well will enjoy guaranteed funding from the Higher Education Funding Councils for the next four years, while poorly rated units are trying to limit the substantial damage of lost income from the councils. With the limited money available, it is logical for research funding to reflect research success; the need for research assessment is therefore incontrovertible.
Inevitably, the research assessment exercise has acquired enormous importance. It will determine who will receive the double blessing of money and prestige and whose research careers will wither on the vine. Given the seriousness of the consequences, the exercise should be accurate, just, and accountable. Unfortunately, it is none of these (see box). It gathers misleading data and assesses these unscientifically and unaccountably, using an inefficient procedure that is expensive and extremely wasteful of scientists' time and energy. Moreover, its limited focus is stifling other aspects of scholarship, notably teaching.
Major flaws in the research assessment exercise
Restrictive survey criteria
Researchers not funded by Higher Education Funding Councils are excluded
Total research output is not considered
Only peer reviewed papers are considered
Dubious research performance indicators
Best four papers of each council funded researcher
Number of PhD studentships
Grant income (amount and source)
Unit's assessment of its own performance
Loopholes and abuse
Selective submissions to enhance research image
A researcher's publications at one unit are transferred with the researcher to a new post
Using staff funded by councils to submit publications from ineligible researchers
Inefficiency and expense
Time wasted by units in preparing submissions for assessment
Time wasted by assessment panels
Delay of several months before results are announced
Four year cycle between assessments is too long
Process is subjective and unaccountable
Subjective assessment by panel of units' submissions
Selection of panel members is covert
No audit or peer review
Restrictions favour established groups with existing support from funding council
Process is uninformative
No information about individual groups in each unit
No information about national ranking of groups in each specialty
Damage to other aspects of scholarship
Teaching and teachers neglected
Other ineligible academic activities neglected:
Writing reviews and books
Reviewing papers and grants
Academic units in Britain are already expending vast amounts of time and effort in preparing a strong submission for the next research assessment exercise, in the year 2001. This process is unlikely to advance knowledge and will involve much window dressing and the discarding of groups and individuals that do not meet the assessment exercise's criteria of success. The assessment exercise must be completely rebuilt if it is to have any credibility as an indicator of research performance or funding.
Operation and flaws of current research assessment exercise
At present, each academic unit declares those members of staff funded by the Higher Education Funding Councils who will be in post on the date of the assessment exercise and whose research it considers worth while. These individuals each cite their four best papers published in peer reviewed journals and all external grants and PhD studentships obtained since the previous assessment. Individuals' contributions are submitted together with the unit's assessment of its own research, highlighting its star groups and themes. All submissions for each unit of assessment (for example, “hospital based clinical subjects”) are reviewed by a panel of British scientists selected by the funding councils, which decides a lumped score for each institution. Groups considered to be at least two ranks higher than the institution's overall score (such as a grade 5 group in a 3a unit) are “flagged.”
This assessment of research activity is distorted and obscures the true productivity of many groups, mainly because of its idiosyncratic measures of research success.
The assessment exercise recognises only those individuals funded by the Higher Education Funding Councils in post on the assessment date, and so excludes researchers supported by short term external grants. These people form the backbone of many research groups in Britain, yet cannot be considered in their own right no matter how productive they might be. However, members of the group who are funded by the councils can put their own names on these papers and claim them among their four publications, which introduces a bias in favour of units with relatively more council funded staff.
Inexplicably, a researcher's publications since the previous assessment are regarded as the property of the unit where he or she is based at the time of the present assessment. This “transfer” rule is both unjust and illogical as a unit has no intellectual or moral claim on any work which was devised, funded, executed, and written up elsewhere. The assessment exercise in 1996 was preceded by a flurry of job advertisements, apparently to attract productive scientists and effectively buy their publications. The absurdity of this practice has been exposed by Bird's analogy with the football league, which would collapse if players transferring to a new team took their previous goals with them.2
Like many others, my own group has fallen foul of these anomalies. For the 1996 assessment, we could cite only four of the 30 peer reviewed papers published between 1992 and 1996 because I was the only worker eligible for assessment out of the 13 members of the group who were first named authors. One of these other members was a postdoctoral scientist who had moved to another unit, which, even though it does no animal based research, promptly claimed his papers on neuropeptides in the rat hypothalamus.
The existing rules are readily manipulated, notably by not declaring individuals funded by the Higher Education Funding Councils who will not enhance a unit's overall research profile. In the 1996 assessment exercise one university contrived to gain a grade 5 in hospital based clinical subjects by reincarnating three individuals as 1.5 full time equivalent researchers. The next research assessment exercise may penalise institutions that conceal too many underperforming staff who are council funded, but this will do nothing to tackle the system's fundamental and obvious flaws.
Dubious performance indicators
None of the assessment exercise's main indices of research success—eligible workers' four best papers, numbers of PhD students, and external grant income and the unit's self assessment—stands up to scrutiny.
Publications are generally the only sign of research activity visible internationally and its only enduring legacy. What else will be remembered in 10 years' time? A group's publication record must therefore be the core index of its research performance, but assessing only four papers of each council funded member is misleading and prejudicial. Surely the only valid measure of what a group has achieved is its total published output during the assessment period, regardless of how its members were funded or where they now work?
Grant income impresses assessment panels, especially funding obtained from the Medical Research Council, the Wellcome Trust, and major medical charities. Research obviously costs money, but grant income is no guarantee that the project will be successfully conducted or reach any worthwhile conclusions. There is no evidence that a project's measurable outcome (such as publications or the improved management of a disease) is related to the size of the grant that supported it. The source of grant income is an even weaker performance criterion: can it really be argued that a published paper would be more or less valuable if someone else had paid for the work?
Self assessment would be superfluous if the entire exercise was objective and thorough: the world class stature of a group should be self evident without others in the unit having to draw this to the panel's attention.
The assessment exercise is cumbersome, inefficient, and expensive
The preparation and assessment of submissions must waste thousands of hours of scientists' time. The true costs of this lost productivity and its contribution to the continuing decline of British science are incalculable, but there is no doubt that the time could be better spent.
The machinery of the assessment exercise also grinds unacceptably slowly: publication of the results of the 1996 assessment took about 20% of the time before the next assessment, and the four year cycle of the assessment exercise assumes that British science stands still during this time. Rationally, research performance should be assessed more frequently, but the slowness and expense of the current exercise clearly make this impossible.
The assessment exercise is unaccountable
The power and influence of the assessment exercise demands transparency and accountability. Unfortunately, the transformation of an institution's submission into its assessment grade is cloaked in a mystical opacity of which politicians could be proud. We simply do not know how different panels value the various performance criteria (grant income, publications) or how they assess subjective questions such as the stature of individuals (who may be working in specialties unfamiliar to panel members). At present, it is impossible to counter the reflex criticism of those disadvantaged by the assessment exercise—namely, that the panel's judgment was flawed or biased, or both.
The assessment exercise is biased
The restrictions regarding who and what can be included in submissions favour established groups with staff funded by the Higher Education Funding Councils. However, such staff include those with long standing appointments, who do not necessarily fulfil present criteria of research excellence. Unless this bias is removed, there is a real risk that council funding will be progressively concentrated on a dwindling number of privileged units. New groups, especially those in units with low assessment grades, may never be able to clamber aboard this self perpetuating merry-go-round.
The assessment exercise is uninformative
The composite score for a unit of assessment may be distilled from the submissions of over 100 individuals and can conceal a wide scatter of quality. For example, a cardiology group in a grade 5 unit may not merit that grade itself; conversely, a good group can be embedded in a mediocre department without being flagged.
The assessment exercise damages scholarship
The growing obsession with the need to succeed in the research assessment exercise is damaging other crucial academic activities. There is an increasing tendency to abandon forms of scientific writing that are not included in submissions (reviews, editorials, and books) as well as the refereeing of manuscripts and grant proposals, teaching, and administration.
Individuals who devote much time to teaching may find themselves jettisoned from departments obsessed with the research assessment exercise and that hack back “dead wood” in favour of those active in research. This would be catastrophic for the next generation of medical students and for the new, problem based, undergraduate curricula. So far, there have been few attempts to integrate the conflicting yet complementary demands of the research assessment exercise and its teaching counterpart, the teaching quality assessment (TQA), which will take place later this year.
Can the assessment exercise be made to work?
The following suggestions are intended to stimulate debate rather than to provide a quick fix for the research assessment exercise's many ills. To fulfil its essential roles, the exercise must be
Based on accurate, complete, and valid measures of research success
Fair, transparent, and fully accountable
Quick, efficient, and cheap to operate
Evaluation of research success
Publications are the only universally accepted currency of research success; other measures, including grant income, have no proved value and should be scrapped. A group's submission should comprise its total published output since the previous assessment, irrespective of who funded its members or where they now work. This would correct the unjust exclusion criteria and the daft transfer rule, and would also avoid the need for more mathematical fudge factors to discourage abuse of the system.
A group's output should include all its substantive publications—including reviews, editorials, books, and chapters (perhaps with different mathematical weightings)—as well as peer reviewed papers. The output score must accurately reflect the quantity and quality of the group's work. The group's share of each publication could be quantified by an “attribution factor,” which would be 1.0 for work performed entirely within the group and appropriately smaller for collaborative efforts.
Quality is harder to assess objectively, especially soon after publication, and it is probably reasonable to gauge a paper's stature from that of the journal in which it is published. However, the commonly used impact factor is mathematically dubious (see box).3 Instead, general and specialist journals relevant to each subject could be assigned to categories (perhaps 1 to 5), decided by the assessment panel for each specialty in consultation with researchers in the subject. The categories could be published before each research assessment exercise to democratise the scoring process. This would also redress the imbalance between specialties whose journals have particularly high impact factors (such as cardiology) and those with low impact factors (such as public health).
Why journals' impact factors should not be used to measure quality of publications*
Impact factors can conceal large differences in citation rates for different articles in the same journal
Impact factors are determined by an arbitrary mathematical exercise that is unrelated to the scientific quality of individual papers
Impact factors of specialist journals depend on the specialty itself and on the proportions of basic research v clinical research in the subject
*Modified from Seglen3
A group's assessment score would simply be the sum of its publication scores (that is, the product of journal category and attribution factor for each publication). With computerised spreadsheets, a unit's submission for assessment could be continuously updated and the final submission easily checked and transmitted by email within a few hours.
New roles for the assessment panels
Assessment panels would need only to decide the categorisation of journals, audit randomly chosen submissions, and respond to any challenges about the outcome. Panel members could be selected at random from a pool of active researchers nominated by all relevant institutions in the United Kingdom, while panel chairs should be scientists of international standing from outside the country.
Information provided by assessment exercise
The proposed assessment exercise would readily yield two valuable measures of research success. Firstly, all groups in a given specialty (such as cardiology) throughout the country could be ranked into a national pecking order, information that is currently lacking. Secondly, each institution's total output within each unit of assessment would be a reasonable template for the distribution of funding by the Higher Education Funding Councils.
The research assessment exercise is a dysfunctional juggernaut, lumbering on under its own momentum and threatening to crush research creativity, careers, and scientific integrity. Its credibility survives only because it is administered by members of the scientific establishment. It must now be replaced with a structure that is scientifically and morally beyond reproach.
Academic medicine should have nothing to fear from the radical changes that will be needed; the truly excellent will welcome the opportunity to prove that they can remain at the top whatever criteria are used.
In the research assessment exercise of 1996, the hospital based clinical subjects in Liverpool were graded 3b overall, with a flag for my group. The preclinical sciences (anatomy, physiology, and pharmacology) in Liverpool had the highest density of grades 5 and 5* in the United Kingdom.4
The views expressed here are my own. I am grateful to the many colleagues who read and commented on this article.
Conflict of interest: None.