Factors associated with success in medical school: systematic review of the literatureBMJ 2002; 324 doi: https://doi.org/10.1136/bmj.324.7343.952 (Published 20 April 2002) Cite this as: BMJ 2002;324:952
- Eamonn Ferguson, reader in health psychology ()a,
- David James, professor of fetomaternal medicineb,
- Laura Madeley, research associatea
- a School of Psychology, University of Nottingham, Nottingham NG7 2RD
- b School of Human Development, Faculty of Medicine, Queen's Medical Centre, Nottingham NG7 2UH
- Correspondence to: E Ferguson
- Accepted 7 November 2007
Selection of medical students in the United Kingdom has come under intense scrutiny in recent years. Some authors have claimed that discrimination occurs in favour of white applicants, female applicants, and applicants from independent schools.1 2 3 4 5,w1,w2High profile cases, such as that of Laura Spence, have led to a public questioning of the selection, training, and validation of doctors. The process of selecting medical students is unsatisfactory from a logistical point of view (approximately 40 000 applications are allowed from 10 000 students for just 5000 places) and leads to chance playing a big part and to apparent unfairness.
The criteria medical schools use to select future doctors are similar across the country.4 They include academic ability, insight into medicine (including work experience), extracurricular activities and interests, personality, motivation, and linguistic and communication skills. But what is the evidence base for using these criteria?
The Committee of Deans and Heads of Medical Schools commissioned a systematic review of factors believed to be significant predictors of success in medicine. We report the results of that systematic review, which was carried out from June to August 2000. The review examines data on the predictive validity of the eight criteria that have been studied in relation to the selection of medical students: cognitive factors (previous academic ability), non-cognitive factors (personality, learning styles, interviews, references, personal statements), and demographic factors (sex, ethnicity). Previous academic ability, personal statements, references, and interviews are all traditionally used in selection, but how good are they at predicting future performance? Personality and learning styles are not traditionally used, but should they be?
Previous academic performance is a good, but not perfect, predictor of achievement in medical training
It accounts for 23% of the variance in performance in undergraduate medical training and 6% of that in postgraduate competency
Long term prospective cohort studies or case-control studies are needed to examine predictors of success after qualification, and reliable, valid, and fair models of medical job competence need to be developed
Relatively little research has been done into the importance of learning styles, interviews, ethnicity, sex, personal statements, and references, but a strategic learning style, white ethnicity, and female sex are associated with success in medical training
We used three databases to conduct literature searches: Medline OVID citations, Web of Science, and PsycLIT. We used the search criteria “medical school” or “student admissions” or “selection” and “medical school student performance” and “career outcome.” We initially used combinations of the key words or phrases “medical school,” “admissions,” “selection,” “medical education,” “predictors,” and “medical student.” We conducted additional searches using combinations of the above key words with the key words “personality,” “interviews,” “learning styles,” “gender,” “references,” “resumes,” “personal statements,” and “ethnicity.”
On the basis of their propensity to generate hits, we examined three journals—Medical Education, Journal of Medical Education, and Academic Medicine—for further relevant articles. Finally, we scrutinised the reference sections of relevant articles identified by these search strategies for further relevant publications. We aimed to identify papers on the predictive validity of as many aspects as possible of the process of selecting medical students.
For the systematic review we used a mixture of traditional techniques of qualitative review and more quantitative methods of meta-analysis. We included studies in the review if they had a clear description of the predictors used and their quantification, a clear description of the outcome measures, and an acceptable statistical method of analysis of the relation between predictors and outcome measures. For indicators of previous academic performance, we examined only studies that used nationally or internationally accepted academic indicators (for example, GCSE grades, A level grades, grade point average (GPA) scores, medical college admission test (MCAT)). For other predictor measures, such as personality profiles, we explored only studies reporting data based on validated indices. From the studies thus identified, we selected only those directly relevant to medicine; we excluded studies relating to nursing and physiotherapy training, for example. Finally, we used meta-analysis only when a sufficient quantity of systematic data was available.
Medline produced 157 hits, Web of Science produced 550 hits, and PsycLIT produced 413 hits. Of the articles on Medline, 19% also appeared on Web of Science and 5% appeared on PsycLIT. Sixty two papers reported studies of previous academic performance,w3-w64 and 31 papers contained information on personality.w10,w13,w17,w18,w20,w24,w30,w38,w40,w48,w63,w65-w84 We found 16 papers on sex,w1,w2,w10,w27,w42,w59,w85-w94 and 14 papers related to ethnicity.w1,w34,w39,w42,w45,w46,w55,w66,w92,w94-w98 Eleven papers described studies on motivation or study habits,w1,w28,w91,w99-w106 and 16 papers examined the predictive validity of interviews.w27,w30,w72,w76,w88,w107-w117 We identified two papers on the predictive validity of personal statementsw10,w27 and one paper on the predictive validity of references.w110
Sufficient data were available on measures of previous academic performance for us to be able to perform a meta-analysis and to examine two broad areas of achievement in medical training (undergraduate and postgraduate). Studies relating admission criteria to undergraduate assessments included all the years of undergraduate training, whereas the studies of postgraduate performance mainly focused on internship ratings (that is, the first year after qualification). For the other predictors, either insufficient data were available for meta-analysis (ethnicity, sex, learning styles, personal statements) or a variety of different assessment tools were used (personality), making a systematic comparison across studies difficult.
The indicators of previous academic performance ranged widely in the types of assessment and the response formats used. However, it seemed reasonable to examine these assessments as a whole for three reasons. Firstly, all are used in the selection of medical students, and some assessment of their overall predictive power is important. Secondly, the meta-analysis examining undergraduate medical training was to be general, combining preclinical and clinical assessments. Different aspects of previous academic performance might be differentially predictive at different stages of training,w26 so combining all the indices seemed more appropriate. Finally, good evidence exists that diverse measures of cognitive ability are all statistically related to general intelligence.6
We conducted the quantitative analyses by using hierarchical linear modelling (see bmj.com).7 Level 1 variables were the correlation coefficients between predictors and outcomes, and level 2 variables were sample sizes within the individual studies.
Measures of previous academic performance and assessments in medical school are associated with some degree of unreliability for a variety of reasons related to the candidate and the assessor (for example, illness, tiredness, environmental factors). In addition, students entering medical school are likely to be at the top end of the potential range of scores for previous academic performance and are also likely to do well in their medical school training. Both these factors (unreliability and restriction of range) statistically limit the size of the correlations between predictors and outcomes.8 We therefore corrected the effect sizes reported in this paper, calculated using HLM-5 software, 7 9 for error due to unreliability and range restriction. We used conventional methods to compare the corrected effect size estimates with the uncorrected ones to determine the contribution of error to the effect size estimates. 8 10
We converted the level 1 variables (the correlation coefficients) by using Fisher's r to Z transform before entering them into the meta-analysis. We entered all level 1 variables described in the papers into the analysis. Several papers examined the relation between multiple predictors and multiple outcomes.w7,w15,w21,w23 Although neither the predictors nor the outcomes are likely to be statistically independent, complete independence is not necessary for the meta-analysis to be valid.11
We used Cohen's calibration for effect size to guide interpretation of the results reported here.12 Cohen argues that an effect size of 0.10 should be classed as “small,” 0.30 as “moderate,” and 0.50 or greater as “large.”
Tests of previous academic performance
Tests measuring prior learning or previous academic performance included the medical college admission test, A levels, and grade point average. We entered 753 usable correlation coefficients into the meta-analyses for undergraduate performance, with a total sample size of 21 905 participants (mean 248.9, SD 265.06). Five studies explored admissions criteria in relation to postgraduate training, giving rise to 32 usable coefficients, with a total sample size of 2487 participants (mean 355.3, SD 566.8).w47,w50-w52,w64
In the prediction of undergraduate medical success, the average effect size was 0.30 (SE 0.016, range 0.22 to 0.74, 95% confidence interval 0.27 to 0.33, P<0.00001). This means that, on average, previous academic performance accounts for 9% of the variance in overall performance at medical school. Correction for unreliability in both the predictor (previous academic ability) and outcome (medical training success) variables increased the effect size correlation from 0.30 to 0.36 (95% confidence interval 0.31 to 0.39). Further correction for restriction of range increased the coefficient to 0.48 (0.40 to 0.51). This corrected coefficient indicates that 23% of variance in medical school performance can be explained by previous academic performance. The uncorrected correlation coefficient would be classed as moderate in size according to Cohen's calibration, and the final corrected coefficient approaches a large effect.12
In the prediction of postgraduate medical competence the average effect size was 0.14 (SE 0.05, range0.34 to 0.41, 95% confidence interval 0.05 to 0.23, P<0.05). Thus, on average, previous academic performance accounts for less than 3% of the variance in postgraduate medical performance. Correction for unreliability increased the effect size correlation to 0.17 (95% confidence interval 0.06 to 0.27), and further correction for restriction of range increased it to 0.24 (0.08 to 0.37). This corrected coefficient indicates that 6% of variance in postgraduate performance can be explained by previous academic performance. Both the uncorrected and corrected coefficients are classed as small according to Cohen's calibration.12
The 95% confidence intervals and ranges indicate a wide variability in effect sizes across the studies. This variability was not significantly associated with sample size for either the undergraduate analysis or the postgraduate analysis.
A meta-analysis of the personality measures was not possible owing to the wide variety of measures used, which included the California personality inventory, Rotter's “locus of control” scale, Cattell's 16PF, Eysenck's personality index, Minnesota multi-phasic personality inventory, Myers Briggs type indicator, state-trait anxiety inventory, and psychiatric interviews. The more consistent descriptive findings are summarised below.
The most commonly used test has been the California personality inventory. With this measure, eight subscales have emerged consistently as predictors of success in medical training: “dominance,” “tolerance,” “sociability,” “self acceptance,” “well being,” “responsibility,” “achievement via conformance,” and “achievement via independence.”w69,w79 Dominance has been shown to be correlated with undergraduate multiple choice question scores (uncorrected r 0.26), tolerance with the ability to use numerical data and make calculations (0.25), and well being and achievement via conformance with success in oral examinations (0.22 and 0.32).w79
Rotter's locus of control is a personality test that assesses the extent to which people feel that outcomes in their lives are contingent on their own behaviour (“internals”) in comparison with the influence of factors such as “fate” and “chance” (“externals”). Medical students with high preclinical and clinical grade point averages were, surprisingly, more likely to express an external orientation (0.51 and 0.31).w74 There is also some evidence that medical students express more external beliefs as they progress through medical school.w48 This seems to be at variance with studies showing that higher levels of internal beliefs are associated with academic success.13 One area deserving further examination is that in these studies the researchers may be tapping into what is referred to as “defensive external” beliefs.14 Defensive externals act much like internals but endorse an external orientation as a verbal defence against failure.
Results of state-trait anxiety studies have shown that state anxiety (anxiety in relation to a specific event, in this case examinations) is significantly, but weakly (3% of the variance), negatively associated with aspects of medical performance, but that trait anxiety (non-specific anxiety) is not significantly related to performance.w63,w84 Furthermore, levels of academic anxiety may show an inverted U shaped association with first year performance, in that students with extremes of anxiety tend to do worse than those in the mid-range.w48 This is consistent with arousal theory, which postulates that people perform best at an optimal level of arousal.15
Recent developments in personality theory have suggested that five factors underlie normal personality and that these can be found in previously reported measures of personality. 16 17 These factors, known as the “Big 5” or five factor model of personality, are “emotional stability-neuroticism” (high scores relate to anxiety, depression), “extroversion” (high scores relate to being outgoing, sociable), “openness to experience” (high scores relate to being creative, artistic), “agreeableness” (high scores relate to being cooperative, trusting), and “conscientiousness” (high scores relate to being methodical, organised, motivated by achievement). Some of the subscales of the California personality inventory, especially the achievement subscales, may relate to conscientiousness in the Big 5. The Big 5 offers a theoretical framework for the study of personality in medical selection and training. Conscientiousness has been shown in previous research to be related to success in a variety of occupational settings, and extraversion has been correlated with success in jobs that involve a social dimension (for example, sales).18 Within medicine, extraversion predicted success in paediatric objective examinations (0.51).w83 A recent study using the Big 5 has shown that conscientiousness is a positive predictor of preclinical achievement (standardised regression coefficient, =0.58), even with control for previous academic performance (A level grades).w10
A consistent finding in the literature is that women tend to perform better than men in their medical trainingw1,w10,w27,w85,w91 and are more likely to attain an honours degree.w2 Women also tend to perform better in clinical assessments.w86,w87 Two studies suggested that men slightly outperformed women on early assessments (for example, National Board of Medical Examiners (NBME) part I) but that these differences disappeared later (NBME part II).w85,w86 However, these differences were small and reached significance only when the sample sizes were large. This raises the question of the practical relevance of these sex differences. For example, a significant difference was reported between men and women in NBME part II paediatrics scores, with men scoring 82.13 and women 82.70.w86
Are tests of previous academic performance equally accurate predictors for men and women? When the accuracy of a predictor such as the medical college admission test is examined, the difference between predicted outcome scores (for example, NBME part I) and the actual outcome scores can be calculated. If the actual score is higher than the predicted score the test underpredicts; if the converse is found then the test overpredicts. Some evidence indicates that the admission test underpredicts for women.w94
A growing body of research explores whether different motivational, academic, and demographic factors influence the performance of men and women. Motivation seems to be important. For example, in one study, “service quality variables” (such as “helping others”) predicted women's clinical grades and “individual mastery variables” (such as “intellectual growth”) predicted men's clinical grades.w89
Some evidence indicates that in the United Kingdom, as well as in the United States, students from ethnic minority groups are more likely to fail a medical examination than are white students.w1,w55 However, non-UK ethnic minority students in the United Kingdom may perform better than white UK students.w1
A common finding across several studies is that traditional cognitive selection measures (medical college admission test, grade point average) show significant predictive power for ethnic minority groups.w34,w45,w46,w55,w96,w97 However, measures of previous academic performance tend to overpredict for ethnic minorities but to underpredict for white students.w94,w95 No studies have examined whether differential experiences of training in medical schools contribute to this difference.
Learning style covers both motivations for learning and the processes by which the student approaches the task of learning. Two general models of learning styles have been used (box).
Models of learning styles
The first model is based on three learning approaches: “deep,” “strategic,” and “surface.19,w28 Deep learning is based on three motivational factors (intrinsic motivation, vocational interest, and personal understanding) and three learning processes (making links across material, searching for a deeper understanding of the material, and looking for general principles). Strategic learning is motivated by a desire to be successful and leads to patchy and variable understanding. Surface learning is motivated by fear of failure and a desire to complete a course, with students tending to rely on learning “by rote” and focusing on particular tasks.
The second model is based on Kolb's description of four approaches to learning—concrete experience (experiential learning), abstract conceptualisation (development of analytic strategies and theories), active experimentation (learning through action and risk taking), and reflective observation (viewing problems from multiple perspectives before deciding how to proceed).w100 These four approaches combine to produce four types of learner: “convergers” (emphasise the deductive method), “divergers” (use creative problem solving and view a problem from many perspectives before acting), “assimilators” (prefer an inductive approach), and “accommodators” (prefer hands-on experience as a way of learning).
The studies examining the tripartite model in medical students have shown a relatively consistent finding of a significant positive association between the use of strategic learning and final marks (uncorrected r 0.178 to 0.26)w28,w99,w103-w105; only one study failed to replicate this effect.w101 However, although some evidence shows that deep learning has a positive association with performance in examinations (0.157 to 0.262),w28,w104 other studies have failed to replicate this finding.w101,w103 Similarly, although a significant negative association has been reported between surface learning and examination performance (for example, 0.204),w28 several studies have failed to replicate this effect.w91,w101,w103
Results from studies using the Kolb model suggest that students with a “convergers” learning style tend to perform better than those with any other style.w99,w100 Adopting a strategic or converger learning style seems to be a useful strategy for students who wish to succeed. Surface, deep, and strategic learning styles seem to show some degree of trait stability (0.33 to 0.42). However, this is only a moderate effect, suggesting that learning styles can change.w28 It may therefore be useful for medical educational programmes to teach students how to use the more successful study skills. 20 21
Three types of study have explored the predictive power of interviews. The first type compared the performance of medical students who were interviewed and accepted with that of students who were accepted without intervieww113,w114 or those rejected by one medical school (Yale) but accepted at another, both on the basis of an interview, with those accepted by Yale but who chose to go to another medical school.w107 These studies showed no differences and concluded that the interview added little to the selection process. However, the studies had methodological limitations, including the use of small numbers (cohort range 23-113), a failure to eliminate selection biases, and a limited range of outcome measures.
The second type of study related interviewers' ratings (for example, overall suitability for medicine) to the interviewees' early preclinical success, withdrawal, and drop out ratesw27,w30,w72,w76,w88,w111,w112,w115,w116 and overall rating of the graduate physicians' potential competency as doctors.w111 These studies reported evidence that interview scores were able to predict future success. For example, overall interview rating correlated with a Dean's letter of recommendation (0.33)w111 and grade point average (0.08 to 0.14).w117
Thirdly, one study compared the interview with other pre-admission criteria.w117 Interview ratings were independently associated with success in early training after controlling for grade point average (for example, 0.11).
Thus useful additional information that has predictive power for outcome can probably be collected from an interview. However, little is known about factors such as the impact of inter-interviewer variation, whether any systematic biases exist, and the effect of training for interviewers.w117
Personal statements and references
Two studies examined the predictive value of personal statements provided by candidates on their suitability to study medicine. One study analysed the content of candidates' actual statements and found no evidence that they predicted early preclinical success.w10 The other study used weighted proforma information about cultural skills (not candidates' actual statements) and found a small negative association with outcome (=0.184).w27 Thus too few data on personal statements are available to allow definitive conclusions to be drawn. More work is needed, especially into the relation between statements and clinical and postgraduate performance.
The only study on the value of references suggested that the academic reference had no predictive value in subsequent achievement.w110 This is consistent with the conclusions from studies of the value of references in other occupations.
Prediction of postgraduate clinical competence
Most studies of the predictive power of pre-admission cognitive and non-cognitive factors have focused on predicting success in undergraduate medical training. Fewer studies have examined pre-admission criteria as predictors of postgraduate medical competence. Several papers do, however, explore how cognitive factors (such as data gathering and analysis skills, knowledge, first to fourth year grade point average, and NMBE parts I and II) and non-cognitive factors (such as interpersonal skills and attitudes) assessed during medical student training predict postgraduate clinical competence.22 23 24 25 26 27 These studies show that cognitive factors can account for up to 51% of the variance in NBME part III grade.26 Only two studies have compared the predictive power of both admissions criteria (grade point average and medical college admission test) and scores in medical school examinations in relation to postgraduate competence.w47,w64 The evidence from these comparative studies indicates that the pre-medical scores show a weak relation to internship competence. For example, Richards et al showed that 60% (9/15) of the associations between previous academic ability and undergraduate success were significant (r range 0.17 to 0.34) but that only 10% (one) of the associations between previous academic performance and intern performance rating were significant (0.20).w47 This pattern of findings is confirmed by our meta-analysis. More detailed longitudinal studies exploring the complex relations between admissions criteria (cognitive, non-cognitive, and demographic), medical school performance, and postgraduate medical competence are needed.
One of the main problems with studying postgraduate clinical performance is establishing a comparable scoring system for assessing competency in different specialties. This is known as the “criterion problem” and confronts the prediction of success in all jobs, not just medicine. 28 29 One solution to this problem has been to develop competency based models of core and specific skills, through detailed job analyses of individual medical specialties.30
Discussion and conclusions
Relatively few studies provide comparative analyses of the predictive power of the wide variety of factors used in combination for selecting medical students (interview, grade point average, learning styles, personality). The research that has been undertaken has mainly concentrated on measures of previous academic ability as a predictor of undergraduate achievement. More work is needed to identify selection criteria that predict postgraduate performance.
Consistent with reviews in other occupational areas, academic or cognitive ability was a moderate predictor of success in undergraduate medical training.29 The strength of this association before corrections was moderate (0.30) in terms of Cohen's calibration, becoming large (0.48) after correction.12 Previous academic performance, however, would be classified as a predictor with a small effect (0.14 uncorrected, 0.24 corrected) for postgraduate medical competence.
Few studies have examined the effects of learning styles, interviews, personal statements, and references in relation to achievement in medical training. These factors need to be explored in future studies. The evidence indicates that work on learning styles is likely to be fruitful. The academic reference seems to have no predictive power. Virtually no research has examined the predictive power of personal statements. This is an important area for future research, as the personal statement forms an important part of the current selection process in the United Kingdom. More sophisticated research into the value of the interview is also needed—to explore the structure of interviews, how they are conducted, the effects of training, whether different interviewers (for example, psychiatrists or surgeons) focus on different factors, and how the predictive power can be enhanced.
Sufficient preliminary data indicating an impact of personality on medical school progression exist to warrant further research. However, the research needs to be conducted in a more prospective and systematic fashion.w10 “Achievement striving,” “state anxiety,” and “conscientiousness” should be the focus in future studies.
Future research needs to take a more multivariate approach to studying predictors of success in medical training. Predictors are likely to be intercorrelated,31,w10 as are outcome measures. Furthermore, learning across the medical degree (and indeed postgraduate learning) occurs over time, and time series analyses and models that allow for prediction of change over time would also be a useful approach. The use of structural modelling procedures,5 as well as hierarchical structural models using structural and time series components, would be beneficial to developing our understanding of the prediction of performance.
Contributors: DJ and EF conceived the study. LM conducted the initial literature search and input of data. EF conducted additional literature searches and input and analysis of data. DJ and EF wrote the paper. EF is the guarantor.
Funding Committee of Deans and Heads of Medical Schools
Competing interests None declared
A list of references produced by the search and an explanation of hierarchical linear modelling can be found on bmj.com