- Katherine Woolf, lecturer in medical education1,
- Henry W W Potts, senior lecturer in health informatics2,
- I C McManus, professor of psychology and medical education1
- 1Academic Centre for Medical Education, UCL Division of Medical Education, London N19 5LW, UK
- 2Centre for Health Informatics and Multiprofessional Education, UCL Division of Population Health, London N19 5LW
- Correspondence to: K Woolf
- Accepted 29 December 2010
Objective To determine whether the ethnicity of UK trained doctors and medical students is related to their academic performance.
Design Systematic review and meta-analysis.
Data sources Online databases PubMed, Scopus, and ERIC; Google and Google Scholar; personal knowledge; backwards and forwards citations; specific searches of medical education journals and medical education conference abstracts.
Study selection The included quantitative reports measured the performance of medical students or UK trained doctors from different ethnic groups in undergraduate or postgraduate assessments. Exclusions were non-UK assessments, only non-UK trained candidates, only self reported assessment data, only dropouts or another non-academic variable, obvious sampling bias, or insufficient details of ethnicity or outcomes.
Results 23 reports comparing the academic performance of medical students and doctors from different ethnic groups were included. Meta-analyses of effects from 22 reports (n=23 742) indicated candidates of “non-white” ethnicity underperformed compared with white candidates (Cohen’s d=−0.42, 95% confidence interval −0.50 to −0.34; P<0.001). Effects in the same direction and of similar magnitude were found in meta-analyses of undergraduate assessments only, postgraduate assessments only, machine marked written assessments only, practical clinical assessments only, assessments with pass/fail outcomes only, assessments with continuous outcomes only, and in a meta-analysis of white v Asian candidates only. Heterogeneity was present in all meta-analyses.
Conclusion Ethnic differences in academic performance are widespread across different medical schools, different types of exam, and in undergraduates and postgraduates. They have persisted for many years and cannot be dismissed as atypical or local problems. We need to recognise this as an issue that probably affects all of UK medical and higher education. More detailed information to track the problem as well as further research into its causes is required. Such actions are necessary to ensure a fair and just method of training and of assessing current and future doctors.
In 1995, a BMJ news article reported that all the students who failed clinical finals at the University of Manchester the previous year had been men with Asian names.1 A systematic review of the predictors of medical school success published seven years later found that white ethnicity predicted good performance, but only one of the 14 included reports came from the United Kingdom.2
Two large studies of degree outcomes in UK higher education have since shown that, across all subjects, white students were more likely than students who categorised themselves as Asian, black, mixed, or Chinese/other to achieve first or upper second class degrees. Differences in attainment between Asian and black students mostly disappeared when socioeconomic status was taken into account. Even after adjustment for up to seven confounding variables, however, the white students still achieved higher degree classes than students from all the minority ethnic groups.3 4 5
Medicine was largely excluded from these studies because it is an unclassified degree (that is, students either pass or fail; they do not receive first, second, or third class degrees). There is therefore less certainty about ethnic differences in the attainment of UK medical students, who are particularly highly selected for academic excellence and often come from privileged socioeconomic backgrounds.6 7 8 In postgraduate terms, ethnic differences in the academic attainment of doctors have been explored mostly only in terms of country of primary medical qualification.9 10
A third of all UK medical students are from minority ethnic groups, 1.6 times the proportion on other undergraduate courses,11 with by far the largest minority ethnic group being the Indian group (11%), followed by the Pakistani group (5%) (table 1⇓). In 2009, 36% of newly qualified doctors and 52% of all other hospital doctors working in the NHS were from minority ethnic groups.12 13 The UK’s Race Relations Amendment Act 2000 places a duty on all public authorities, including universities and the National Health Service, to monitor admission and progress of students and the recruitment and career progression of staff by ethnic group to be able to address inequalities or disadvantage.8 14 15
We undertook a systematic review and meta-analysis of studies comparing the academic performance of UK trained doctors and medical students from white and minority ethnic, or non-white, groups.
This section gives details of the protocol for the review.
The concept of “ethnicity” is complex, politically charged, and context specific.16 As such, we explicitly state how we interpreted ethnicity and our subsequent choice of ethnicity variable. In defining ethnicity, we followed Senior and Bhopal, who wrote: “[ethnicity] implies one or more of the following: shared origins or social background; shared culture and traditions that are distinctive, maintained between generations, and lead to a sense of identity and group; and a common language or religious tradition.”17
The white/non-white comparison
As with all reviews, we were restricted in our analyses by the data collected and reported in the original studies. In particular, we were restricted in our comparisons between different ethnic groups because most of the literature compared white candidates with all other—that is, non-white—candidates. While putting all the minority ethnic groups into one category for comparison with a white group was obviously not ideal, we had two reasons for this approach.
Firstly, the white/non-white comparison was a pragmatic approach to the lack of data on ethnicity in most studies, often because the numbers of candidates from certain minority ethnic groups were too small to allow sensible statistical analysis. We therefore compared the ethnic group that was typically the largest—the white group—with all other groups combined—the non-white group. When information on the largest group after the white group—the Asian group—was available, we also performed that comparison. To an extent, all ethnic categories are essentially pragmatic because they can never take into account all the subtle variations between groups of people (for example, while the English census categories distinguish between people with their recent origins in India, Pakistan, and Bangladesh, they do not distinguish between those speaking Punjabi, Saraiki, Sindhi, Pashto, Urdu, Balochi, Kashmiri, etc).18
Secondly, the white/non-white comparison is scientifically justified by the evidence from UK higher education, which shows that the largest and least explained gap in attainment is between the white and non-white groups. This suggests that the white/non-white distinction is important in examination of the possible causes for this gap. Underlining this, the white/non-white distinction also seems important in other areas—for example, a recent report from the UK Government’s Department of Work and Pensions showed that, on average, children from all minority groups had higher levels of poverty than children in the white majority group.19
Types of reports, participants, and outcome measures
We included all published and unpublished quantitative reports on the academic performance of UK trained medical students or doctors that included a measure of candidate ethnicity.
All studies in our review included medical students undertaking formative or summative assessments at UK medical schools and UK trained doctors undertaking formative or summative UK postgraduate medical assessments.
The outcome measure can be encapsulated as “academic performance.” This includes pass/fail, attainment of other academic related specific goals (such as achieving a placement or not), and mean assessment scores.
Main comparison groups
The comparison was between white and non-white candidates. Where the data were available, we also compared white and Asian (Indian, Pakistani, and Bangladeshi) candidates.
Other important factors
We chose, a priori, to conduct separate meta-analyses for postgraduate and undergraduate assessments; machine marked written and practical clinical assessments; and pass/fail outcomes.
We excluded reports with data only from outside the UK; self reported examination performance; lack of information about ethnic group (country of primary medical qualification for doctors and fee status for undergraduates were insufficient); obvious sampling bias; lack of sufficient detail from which we could calculate an effect size and standard error; and outcome measures unrelated to academic attainment or that could be influenced solely by a non-academic factor (such as dropout).
Search strategy for identification of reports
KW had recently completed a PhD on ethnic differences in medical school performance,20 and we used her personal knowledge of reports as a starting point.21 We also included data from other projects the authors were working on or were asked to advise on. We searched the online databases PubMed, Scopus, and ERIC using the following search terms:
PubMed: (ethnic* OR race OR minority OR Asian) AND (“Education, Medical”[Mesh] OR “Educational Measurement”[Mesh]) AND “Great Britain”[Mesh]
Scopus: ((undergraduate OR postgraduate) AND (“medical education”) AND ((perform* OR assess* OR exam* OR score OR grade OR fail)) AND (AFFILCOUNTRY((“great britain” OR “united kingdom” OR “northern ireland” OR england OR scotland OR wales)) AND ((ethnic* OR race OR minority OR asian))
ERIC: ((Keywords: ethnic) or (Keywords: minority) and (Thesaurus Descriptors: “Physicians” OR Thesaurus Descriptors: “Medical Education” OR Thesaurus Descriptors:”Medical Students” OR Thesaurus Descriptors: “Medical Schools”) not (Keywords: american) not (Keywords: states) not (Keywords: gpa) not (Keywords: MCAT)
We conducted specific searches using the search terms (ethnic* OR race OR Asian OR minority) of the e-journal versions of Medical Education (1966-2010), Medical Teacher (1979-2010), Advances in Health Sciences Education (all volumes), and BMC Medical Education (all volumes), as well as available published abstracts of conference proceedings of the Annual Scientific Meeting of ASME (Association for the Study of Medical Education) and the AMEE (Association for Medical Education Europe) annual conferences. We also used Google Scholar and Google to search the grey literature for government reports, etc. Finally, we used backwards and forwards citation searching.
Methods of the review
KW assessed reports for eligibility against previously agreed criteria and for methodological quality without consideration of their results. HWWP assessed those chosen reports. Reports were not assessed blind; we knew the authors’ names, affiliations, and the source of publication. We discussed any differences until these were resolved. KW and ICM extracted data from full text versions of all included papers. All authors double extracted data from a sample of randomly chosen sources and reconciled any differences. When reports had insufficient data for analysis, we contacted authors to ask for more complete data.
Synthesis of results and statistical analysis
To combine reports, we calculated an effect size (Cohen’s d) and standard error for each.22 For categorical outcome variables, we first calculated an odds ratio and its associated confidence interval (www.hutchon.net/ConfidOR.htm) and then followed Chinn’s method23 to convert these into Cohen’s d and standard errors.
When reports contained data from assessments taken at different points of the course by the same participants, we prioritised finals (year 5 assessments) over other undergraduate data as those examinations determine whether a medical student can become a doctor. Otherwise, we chose those taken by a larger number of candidates (larger sample size). When reports contained continuous measures of performance (such as exam score) and pass/fail data for the same examinations taken by the same participants, we prioritised continuous data because they are more sensitive; published categorical data, however, took precedence over unpublished continuous data.
When participants took more than one assessment in the same year, we calculated a mean score and standard deviation for all assessments, from which we calculated an effect size. When reports contained multivariate analyses, we prioritised simple effects; however if no simple effects were reported, we included outcome measures adjusted for other variables. In addition, we did a narrative summary of the effects of ethnicity adjusted for other variables.
When outcomes for separate minority ethnic groups were given, we back calculated the numbers who passed or the score for each group and combined groups as necessary to create a non-white category and to make the results more comparable with other reports. We also conducted a separate meta-analysis for white and Asian candidates. We used the definition of Asian given in the reports. When we had raw data, we defined Asian as Indian, Pakistani, or Bangladeshi (census categories).
Whenever possible, we analysed the performance of undergraduates with “home” (UK) or EU status only. In postgraduate examinations, we analysed the performance of UK graduates only.
Using MIX software (www.mix-for-meta-analysis.info/), we performed eight meta-analyses, one on all reports and seven on subsets of the data. We used random effects models and drew funnel plots for each to assess publication bias.
Figure 1⇓ shows the number of reports, their identified sources, and reasons for exclusions.
Before the start of searching, we knew of 26 reports to include. Of these, 18 were published in peer reviewed journals,1 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 two were KW’s PhD20 and ICM’s unpublished data that supplemented the data published in the study by McManus and Richards24 and McManus and colleagues,28 and one was published in the grey literature.9 The five remaining were retrieved from collaborations the authors were involved in around the time the review took place. These were Carroll and Mackenzie’s conference abstract,42 supplementary unpublished data (M Carroll, personal communication, 2010), data from a pilot assessment for selection into specialty training in England run by the Association for Medical Royal Colleges (AoMRC, unpublished, 2010), a conference poster,43 and an interim report of 2008 nMRCGP (new Membership of the Royal College of General Practitioners) exam results.44
We retrieved 571 reports, including Brown45, Ricketts46, and Calvert(47) via online database searching. We found one report when we searched conference abstracts48 and one when we searched the journal Medical Teacher.49 We removed 56 duplicates, reviewed the abstracts and titles of the 541 remaining, and excluded 480. The full text versions of 61 reports were reviewed and another 38 excluded (20 did not provide data on UK trained participants; eight lacked sufficient details about outcomes; five lacked sufficient detail about participants’ ethnic group; four did not have an appropriate assessment outcome; and one did not distinguish between UK and non-UK trained doctors).
We eventually included 23 reports in the meta-analysis. Many reports contained more than one set of data, for either the same or different candidates. Details of the studies, including factors on which quality was assessed, are given in table 2⇓ (undergraduate prospective studies), table 3⇓ (undergraduate retrospective studies), and table 4⇓ (postgraduate). McManus (I C McManus, personal communication) provided the data referred to in the study by McManus and Richards.24 This study was excluded because it referred only in the text to an analysis by ethnicity but did not provide data on that analysis. McManus also provided supplementary data for another published study (McManus et al28).
Sixteen reports measured academic performance in undergraduates.
Design and sampling
One study was a cluster randomised controlled trial with ethnicity as one of the independent variables.39 All others were prospective or retrospective cohort studies. Most studies combined more than one cohort of students from the same medical school. The largest study had over 2000 participants30 and the smallest had 164.38 All studies in which the outcome measure was a continuous measure of exam score had to exclude participants without exam data. This was typically about 5% of students. Five studies gave reasons for candidates lacking exam data.30 32 36 38 40 Ricketts et al included only students who progressed normally throughout the course, excluding those who re-sat exams or dropped out.46
Overseas students are likely to be educationally different from UK students.28 Most reports differentiated between home students and overseas students, and two also adjusted for fee status in multivariate analyses.38 40 Lumb and Vail32 and Haq et al33 looked only at UK students’ performance. McManus et al performed separate tests for UK only students.28 Wass et al reported that only two students in their cohort were not educated in the UK.31 In the study by Yates and James, ethnicity was unknown for 96% of overseas students, so this group was largely excluded.36 Kilminster et al reported 7% of overseas students in their sample and said there were no differences in results between home and overseas students.41 Six studies20 30 37 39 42 46 and two sets of unpublished data (M Carroll, personal communication, 2010; McManus unpublished) did not distinguish between UK and overseas students in their analysis. The proportions of overseas students in these samples are probably small because most UK medical schools are allowed to take only about 7.5% overseas students (www.medschools.ac.uk/Students/Pages/FAQs.aspx#section8).
All of the studies used formal summative assessments as outcome measures. Most gave explanations of their examinations: the format, how they were marked, and the subject matter covered. Two studies gave details of how the validity of the assessments had been established.31 46 Five reported the psychometric reliability of their assessments.20 31 33 37 39 Ten published reports20 31 32 33 36 37 38 41 42 46 and Carroll (personal communication, 2010) included continuous measures of assessment, and five included examination failure (M Carroll, personal communication, 2010; McManus, unpublished).30 36 42 Eleven included written outcomes of machine marked assessments (M Carroll, personal communication, 2010; McManus, unpublished).20 33 36 37 38 39 41 42 46 Eleven included outcomes of practical assessments (M Carroll, personal communication, 2010; McManus, unpublished).20 31 32 33 36 37 38 39 41 In the study by Yates and James, the outcome for which an ethnic difference was reported (theme C) was assessed with a combination of written, online, oral presentation, and coursework.40
Most studies conducted multivariate statistical tests (that is, with more than one predictor) to adjust the outcomes for factors other than ethnicity. Three published studies31 33 42 and two unpublished (M Carroll, personal communication, 2010; McManus, unpublished) conducted only univariate analyses (looking solely at the effect of ethnicity on outcomes), although Haq et al restricted his sample to English speaking Asian UK students.33 Most of the studies considered a P<0.05 to be significant, though three used P<0.01.31 40 41 Yates et al used P<0.001 as significant for univariate tests but did not state the significance level for multivariate tests.38
Seven reports measured postgraduate performance (table 4)⇑.
Design and sampling
All postgraduate reports were retrospective cohort studies except for one, which was both prospective and retrospective.34 The two largest studies had over 2000 candidates27 35; the smallest had 53.49 The studies were all cross sectional and therefore it was not appropriate to report the candidates lost to follow-up. Candidates were often excluded if ethnicity data were missing (though Dewhurst et al analysed the 10% missing ethnicity data separately35). Two studies by Brown et al had some of the largest proportions of missing data, with 27% and 25% of their candidates being excluded because of missing data.45 49 However, this includes non-UK candidates whose data were not meta-analysed in this study.
Binary outcome measures (pass rates or selection success) were included in all reports except for the report by Wakeford et al27 and the unpublished 2010 report from the Association for Medical Royal Colleges, both of which used mean assessment scores. The two studies by Brown et al also included continuous measures of candidates’ shortlisting, interview, or assessment scores.45 49 Three reports included data on written assessment performance,27 34 44 and four included data on practical assessment performance.27 34 35 44 All were formal summative assessments, except the unpublished 2010 report from the Association for Medical Royal Colleges, which was a pilot test. Two studies restricted analyses to first attempts only.27 35
Meta-analysis 1: all reports
We included data on 23 742 candidates from 22 reports (36 datasets) in the meta-analysis of all reports. We excluded the second paper by Brown et al49 as it contained data on the same candidates as in their other paper.45 Overall, 17 172 candidates were white and 6570 non-white. The negative effect of non-white ethnicity on performance was significant (P<0.001) and of medium magnitude (d=−0.42; 95% confidence interval −0.49 to −0.34) (fig 2⇓). A funnel plot showed no obvious publication bias (fig 3⇓). There was heterogeneity in the sample (I2=72%). Of the 36 datasets, 35 showed a negative effect of non-white ethnicity and 25 of those showed a significantly negative effect. One showed no effect. None showed a positive effect of non-white ethnicity. The funnel plot showed no sign of publication bias.
Meta-analysis 2: undergraduate assessments
We included data on 13 193 undergraduates from 16 reports (27 datasets). The negative effect of non-white ethnicity on performance remained significant (P<0.001), with the same effect size (d=−0.42, −0.49 to −0.35). There was less heterogeneity in this subset than in the meta-analysis of all results (I2=50%) (fig 4⇓). The funnel plot showed no sign of publication bias.
Meta-analysis 3: postgraduate assessments
We included data on 10 549 postgraduate candidates from six reports (nine datasets). We excluded the 2001 paper by Brown et al49 as it contained data on the same candidates as in their 2003 paper.45 The negative effect of non-white ethnicity on performance remained significant (P<0.001), with a similar effect size (d=−0.38, −0.60 to −0.17). Heterogeneity was present (I2=89%) (fig 5⇓). The funnel plot showed no sign of publication bias.
Meta analyses 4-7
In the meta-analysis of machine marked assessments, we included data on 20 415 candidates from 14 reports (26 datasets). The negative effect of non-white ethnicity on performance remained significant (P<0.001; d=−0.35, −0.44 to −0.26; I2=81%).
In the meta-analysis of practical assessments we included data on 16 038 candidates from 15 reports (27 datasets). The negative effect of non-white ethnicity on performance remained significant (P<0.001; d=−0.42, −0.52 to −0.33; I2=76%).
In the pass-fail meta-analysis we included data on 10 990 candidates from nine reports (10 datasets). As the paper by Carroll42 and the extra data provided by Carroll (personal communication, 2010) included overlapping data we included only the former in the meta-analysis. We conducted two meta-analyses of pass/fail outcomes. The first used the odds ratios converted to effect sizes for inclusion in meta-analysis 1, and showed a significant negative effect of non-white ethnicity on performance (P<0.001, d=−0.59, −0.84 to −0.35; I2=83%). The second used the original odds ratios reported in the studies, to check for bias in the conversion process, and again showed a significant negative effect of non-white ethnicity on performance (P<0.001, odds ratio 2.92, 1.88 to 4.55; I2=83%).
In the continuous outcomes meta-analysis we included data on 12 174 candidates from 13 reports (30 datasets). The negative effect of non-white ethnicity on performance was significant (P<0.001; d=−0.38, −0.46 to −0.30; I2=64%).
Where the data were available, we conducted a separate analysis comparing white and Asian candidates only. The data on 13 843 candidates (10 974 white and 2675 Asian) came from 10 reports, comprising 16 datasets,27 33 35 42 44 46 and included raw data from Lumb and Vail32 and three papers by Woolf et al.20 37 39 The results were similar (d=−0.40, −0.51 to −0.28; I2=80%).
None of the funnel plots for these meta-analyses showed any sign of publication bias.
Summary of adjusted effects
Thirteen of the 23 reports gave details of figures adjusted for various other factors (table 5⇓). Eleven of the 13 showed unadjusted significant effects of ethnicity on outcomes. Of the 11, only Ricketts et al found that adjusting for covariates removed a previously significant effect of ethnicity on outcomes.46 The covariates in that study were sex, disability, year (their outcome measure was a progress multiple choice test taken by students across all five years), and three interaction terms. The 10 other reports that showed simple effects of ethnicity also found significant effects of ethnicity on outcomes after adjustment for sex. Kilminster et al reported no significant interaction between the effects of sex and ethnicity.41
The following studies all found a significant effect of ethnicity on outcomes: five studies that adjusted for previous exam performance,32 34 35 36 40 one study in UK candidates speaking English as a first language,33 two studies that adjusted for school type,32 40 and one study that adjusted for socioeconomic group.32.
This meta-analysis shows that doctors and medical students of non-white ethnicity underperform academically compared with their white counterparts. The effect was significant (P<0.001) and, using the terminology of Cohen50, was of medium magnitude (d=−0.42). It should be remembered in interpreting such effects that Cohen’s d describes differences in terms of the means, whereas proportions in the tails of a distribution can be much larger. To give an example, on a typical exam with a mean score in white candidates of 60 (SD 5), an effect size of d=−0.42 would mean that non-white candidates would score an average of 57.9. If the pass mark were 50, then, under certain statistical assumptions, 2.3% of white and 5.6% of non-white candidates would fail, making the odds of failure in non-white candidates 2.5 times higher than for white candidates. For those studies reporting pass/fail outcomes, we found an overall odds ratio of 2.92 (P<0.001).
Separate meta-analyses of undergraduate assessments (d=−0.42; P<0.001), postgraduate assessments (d=−0.38; P<0.001), machine marked written assessments (d=−0.35; P<0.001), practical clinical assessments (d=−0.42; P<0.001), assessments with pass/fail outcomes (d=−0.59; P<0.001), and assessments with continuous outcomes (d=−0.38; P<0.001) all showed similar effects in the same direction. The comparison of white and Asian candidates only also showed a similar result (d=−0.40; P<0.001). Though there were varying amounts of heterogeneity in the meta-analyses, it is clear that the finding of an ethnic difference in assessment outcomes is both consistent and persistent. Of 36 datasets included in the meta-analysis of all studies, 35 showed a negative effect of non-white ethnicity and in 25 the effect was significant. None showed a significant positive effect of non-white ethnicity; one showed no difference between the white and non-white groups.
Strengths and weaknesses
Our meta-analysis contained data from nearly 24 000 candidates. It provides evidence that ethnic differences in postgraduate attainment exist independently of the known lower performance in postgraduate examinations of overseas candidates.9 51 52 The separate analysis of machine marked written assessments and practical assessments allowed us to investigate possible effects of examiner bias and verbal communication skills on ethnic differences in attainment. That an ethnic attainment gap was found in both machine marked and face to face assessments suggests that those factors are unlikely to be primarily responsible, although effects might still be present. Our summary of adjusted results enables us to begin to look at some of the possible confounders and showed that differences in attainment are unlikely to be more prominent in men or women. The fact that our study includes data from postgraduate as well as undergraduate assessments highlights that this effect is not restricted to medical students but affects practising doctors.
Funnel plots showed no apparent publication bias. As we have a particular interest in this issue we are often asked to analyse unpublished assessment data to check for ethnic differences, and, when possible, we included those data to reduce publication bias. This approach can introduce a different type of selection bias, and so we suggest that all UK medical schools and Royal Colleges should analyse their assessment data for ethnic differences and publish the results.
Interpretation of the results is somewhat tempered by heterogeneity, particularly in the postgraduate sample. The meta-analysis of undergraduate results had an I2 of 50%, which can be considered as “moderate” and, according to Higgins and Thompson’s tentative estimations, is unlikely to be of much concern.53 The postgraduate data, however, had a high heterogeneity of I2=89%, and there is clearly unexplained variation in the effect size here. Forest plots did not suggest any simple explanations, and, while we were unable to conduct a meta-analysis of effects adjusted for confounders (because of a lack of reported data and variation in the confounders adjusted for), our summary of adjusted effects in table 4 does not point to a single explanatory variable. The heterogeneity in the postgraduate studies could have been because of varying assessment formats or reliabilities. For example, the shortlisting scores reported in Brown et al45 are different in many ways from the results of a tightly controlled machine marked MRCGP exam.44 It might also have reflected differences in the comparison groups between studies. One of the largest studies in the meta-analyses compared Asians with non-Asians.27 While the largest proportion of the non-Asian group was probably white, that group would also have included non-white candidates; removal of this study from the analyses, however, did not significantly alter the findings.
There are other possible reasons for heterogeneity in both undergraduate and postgraduate samples. The proportion of white and non-white students was different in the older studies compared with the newer ones. To explore whether this affected the heterogeneity of the results, we conducted a post hoc meta-analysis of 14 studies (23 datasets) from candidates who can reasonably be expected to have entered medical school in or after 1997, when the number of medical students in the UK expanded rapidly.54 It showed similar results (d=−0.44, −0.55 to −0.35; P<0.001; I2=72%). This also shows that the ethnic gap is a feature of current medical education.
The proportions of candidates from different minority ethnic groups might also have varied between reports, and it is possible that differential performance between minority ethnic groups (as is found in school children, where, for example, Indians achieve higher grades than Bangladeshis55 56) might have resulted in heterogeneity. It is also important to consider how the variation in methods for obtaining ethnic data (such as self report, use of name, and photograph; in three studies the method was unknown), as well as the exclusion of participants whose ethnicity was unknown, could have affected the reliability of the results.
Our meta-analysis comparing white and Asian candidates was not subtle enough to distinguish between different groups of Asians and showed similar results to the overall meta-analysis, including an I2 of 80%. To tease out these differences would need studies with larger sample sizes. At present we know of only one UK study within medicine with a large enough sample size to look for differences between several minority ethnic groups, and it found no significant differences between them in terms of pass rates35; that study, however, was not able to distinguish between, say, Bangladeshis and Pakistanis, let alone any finer categorisations.
Another weakness of our study is that most of the undergraduate reports in this meta-analysis came from London and Nottingham medical schools (though the postgraduate studies probably included graduates of all UK medical schools). Medical schools vary in their curriculums, teaching methods, and proportions of minority ethnic students from various groups and their graduates vary in postgraduate attainment.57 So, though the effect was clear in the five non-London/non-Nottingham schools in the study, care should be taken when generalising our findings outside England. Similarly, the postgraduate reports were mostly of general practitioners and physicians, and there were no studies of surgeons, for example. We could not therefore study possible variation in ethnic differences across specialties and grades.
While we were able to examine in a limited way the impact of various covariates (particularly sex) on the ethnic gap in attainment, we were not able to do a formal analysis of the relation between ethnicity, socioeconomic status, and attainment. While most medical students in the UK are from the highest socioeconomic groups,6 7 this might well vary as a function of ethnicity and medical school and therefore needs exploring.
Strengths and weaknesses in relation to other studies
To our knowledge, the only other systematic review of the evidence at undergraduate level was carried out by Ferguson and colleagues.2 Those authors also found that non-white ethnicity negatively predicted undergraduate performance, but their review contained only one UK study, the others coming mainly from the United States. In postgraduate terms, we do not know of any systematic reviews of qualified doctors’ academic performance in relation to ethnicity.
Most reports retrieved in our search that did not meet the inclusion criteria also found that undergraduates from ethnic minorities underperformed.1 25 47 48 In contrast, Arulampalam and colleagues found that Indian women and non-white non-Indian (“other”) students were less likely than white students to drop out of medicine in the first year.26 Plint and colleagues reported no significant ethnic differences in terms of GP placement success for UK graduates in 2004 and 2008, although white UK candidates did achieve higher scores in knowledge tests in 2009.43
The question of attainment before medicine is also important. In a national sample, McManus et al found that non-white students tended to enter medical school with slightly lower grades in school leaving exams (effect size d=−0.10).58 There is also evidence that applicants59 60 61 62 and entrants40 from ethnic minorities have lower results on the UKCAT (aptitude test used to select medical students) than white students. Unlike A levels, however, which are known to be reasonably good predictors of medical school performance,63 the evidence so far suggests that UKCAT is not good predictor.40 64 Our study was concerned with performance at medical school and beyond and therefore did not consider attainment before medicine. We did report the results of three studies from two medical schools that examined whether adjusting for previous grades removed the effect of ethnicity on attainment at medical school, and they found it did not.32 40 46 Once again, larger scale longitudinal studies would be required to examine this important issue further. The question of why medical students from ethnic minorities have, on average, slightly lower entry qualifications than white students is a related question that also requires attention.
Unanswered questions and further research
Ethnic differences in attainment seem to be a consistent feature of medical education in the UK, being present across medical schools, exam types, and undergraduate and postgraduate assessments, and have persisted for at least the past three decades. They cannot be dismissed as atypical or local problems. This is an uncomfortable finding, with good reason. While exam performance is by no means the only marker of good performance as a doctor or medical student, the fact remains that without passing finals, medical students cannot become doctors, and without passing postgraduate exams, it is much harder for doctors to progress in a medical career. That exam results vary by ethnicity is therefore extremely important and requires attention. Although ethnicity is clearly related to exam performance, what is not clear is why that might be.20 33 37 This meta-analysis allows us to move on from publishing the effects of ethnicity on exams in single medical schools or individual college membership exams to exploring the reasons for this gap in attainment and what might be done about it.
More detailed information is needed to track the attainment gap, and further research is needed into its causes, which, like ethnic differences in achievement in primary and secondary education, are probably complex and multifactorial.65 To begin to address the problem, it needs to be recognised as a shared problem. A proper approach will be for all medical schools and Royal Colleges to analyse their assessment results by ethnic group and place their results in the public domain and to encourage educational researchers to examine possible mechanisms, such as stereotype threat,66 and test interventions for improvements.39 Medical students and doctors from all ethnic groups will need to be involved in this process. Without these actions, it will be a struggle to ensure a fair and just method of training and assessing our future and current doctors.
What is already known on this topic
A third of all UK medical students and junior doctors are from minority ethnic groups
A 2002 review of the factors influencing medical school success found evidence of underperformance in minority ethnic candidates, but included just one report of UK data
UK universities and the NHS are legally required to monitor the admission and progress of students and the recruitment and career progression of staff by ethnic group
What this study adds
UK trained doctors and medical students from minority ethnic groups tend to underperform academically compared with their white counterparts
Ethnic differences are unlikely to be primarily caused by examiner bias or candidate communication skills because similar effects are found in machine and examiner marked assessments
Fairness and equality in training and assessment will be achieved only by acknowledging this is a shared problem, by collecting detailed data, and by conducting further research into its causes
Cite this as: BMJ 2011;342:d901
We thank Mark Carroll, David James, Sue Kilminster, Trudie Roberts, and Andy Lumb for sharing their raw and unpublished data with us.
Contributors: ICM conceived the idea for the study, which was further developed with KW and HWWP. KW and HWWP decided on inclusion and exclusion criteria. KW and ICM searched for reports and extracted data. All authors double extracted data and contributed to the statistical analysis and interpretation of results. KW prepared the first draft of the paper, and all authors contributed to subsequent drafts and agreed on the final report. All contributors had full access to all of the data (including statistical reports and tables) in the study and can take responsibility for the integrity of the data and the accuracy of the data analysis. KW is guarantor.
Funding: This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Competing interests: All authors have completed the Unified Competing Interest form at www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declare: no support from any organisation for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.
Ethical approval: The study was covered by the exemptions of the UCL research ethics committee http://ethics.grad.ucl.ac.uk/exemptions.php. Consent from participants was not obtained but the presented data are anonymised and risk of identification is low.
Data sharing: Datasets are available from the corresponding author at.
This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: http://creativecommons.org/licenses/by-nc/2.0/ and http://creativecommons.org/licenses/by-nc/2.0/legalcode.