Intended for healthcare professionals

CCBYNC Open access
Research Christmas 2021: Get Lucky

Giving science the finger—is the second-to-fourth digit ratio (2D:4D) a biomarker of good luck? A cross sectional study

BMJ 2021; 375 doi: (Published 15 December 2021) Cite this as: BMJ 2021;375:e067849
  1. James M Smoliga, professor of physiology1,
  2. Lucas K Fogaca, student1,
  3. Jessica S Siplon, student1,
  4. Abigail A Goldburt, student1,
  5. Franziska Jakobs, student1
  1. 1Department of Physical Therapy, High Point University, High Point, NC, USA
  1. Correspondence to: J M Smoliga jsmoliga{at}
  • Accepted 30 November 2021


Objectives To explore whether random chance, weak research methodology, or inappropriate reporting can lead to claims of statistically significant (yet, biologically meaningless) biomarker associations, using as a model the relation between a common surrogate of prenatal testosterone exposure, second-to-fourth digit ratio (2D:4D), and a random indicator of good luck.

Design Cross sectional study.

Setting University sports performance laboratory in the United States. Data were collected from May 2015 to February 2017.

Participants 176 adults (74 women, 102 men), including university students, faculty, and staff with no history of injuries, disease, or medical conditions that would affect digit length.

Main outcome measures 2D:4D, body composition parameters potentially influenced by androgens (bone mineral content, bone mineral density, body fat percentage), and good luck (using poker hands from randomly selected playing cards as a surrogate).

Results 2D:4D significantly correlated with select body composition parameters (Spearman’s rs range −0.26 to 0.23; P<0.05), but the correlations varied by sex, participant hand measured, and the method of measuring 2D:4D (by photocopy or radiography). However, the strongest correlation observed was between right hand 2D:4D in men measured by radiograph and poker hand rank (rs=0.28, P=0.004).

Conclusions Greater prenatal exposure to testosterone, as estimated by a lower 2D:4D, significantly increases good luck in adulthood, and also modulates body composition (albeit to a lesser degree). While these findings are consistent with a wealth of research reporting that 2D:4D is related to many seemingly disparate outcomes, they are not meant to provide confirmatory evidence that 2D:4D is a universal biomarker of nearly everything. Instead, the associations between 2D:4D and good luck are simply due to chance, and provide a “handy” example of the reproducibility crisis within medical and scientific research. Biologically sound hypotheses, pre-registration of trials, strong methodological and statistical analyses, transparent reporting of negative results, and unbiased interpretation of data are all necessary for biomarker studies and other areas of clinical research.


The ratio between the length of the second and fourth digit, referred to as the 2D:4D digit ratio (2D:4D) has received considerable attention in the scientific community (fig 1) and mainstream media12345 because of its apparent association with health and behavior. The digit ratio is often claimed to be a surrogate for prenatal androgen exposure, based on cross sectional human studies and experimental animal studies.6789 These studies generally postulate that a lower 2D:4D reflects greater testosterone exposure (or greater testosterone-to-estrogen ratio), which accounts for men having lower 2D:4D than women.10 Variations in 2D:4D are also speculated to be rooted in genetic polymorphisms that influence testosterone metabolism and sensitivity.1112 However, little evidence supports this hypothesis in humans,1314 because prospective studies have reported a lack of consistent associations between androgen concentrations in the amniotic fluid or umbilical cord blood and 2D:4D in childhood and adulthood.151617

Fig 1
Fig 1

Cumulative number of PubMed indexed papers about the second-to-fourth digit ratio (2D:4D). The following search was performed to identify papers published each year, with an example of 2020: (“2D:4D” or “digit ratio”) AND ((“2020”[date-publication]: “2020”[date-publication])). This search might not capture papers that refer to prenatal testosterone or similar concepts in the abstract but use 2D:4D in the methods

Despite a lack of solid physiological justification for studying 2D:4D, an abundance of studies claim that this anthropometric remnant of the prenatal hormonal environment relates to risk of disease in adulthood (that is, cancers18 and cardiometabolic disease192021), age of onset, prognosis, and treatment options.22 In 2020, 2D:4D was suggested to help “identify those for whom it would be advisable to exercise social distancing” to avoid contracting covid-19.23 However, thorough examination of the literature raises questions about the validity and reproducibility of 2D:4D research. It is implausible to think that one biomarker in utero not only predicts risk of myocardial infarction and age of onset,24 but also is associated with the likelihood of a person becoming a firefighter,25 having musical ability,2627 showing pro-environmental consumption behavior,28 having a sense of directionality,29 being successful at Sumo wrestling,30 being obsessed with celebrities,31 or making a specific choice of Coca-Cola products from a vending machine.32

Associations between 2D:4D and various outcomes are generally justified through (tenuous) biological explanations, but the possibility of spurious correlations are seldom considered. Digit ratio studies almost always include many comparisons, which increases the likelihood of false positive findings.33 The digit ratio is also easy to measure, facilitating its inclusion in larger studies, where attempts can be made to correlate it to many other metrics (eg, the BBC internet study34). This practice of including bonus factors without sufficient scientific justification is known to produce spurious associations in other areas of research.35 Even if most studies produce null results, selective reporting and publication bias can create the appearance of consistent positive effect.36 Thus, research involving 2D:4D could provide a prime example of the reproducibility crisis in medicine and science, and of the perpetuation of research based on weak scientific hypotheses,37 non-rigorous methodology, and an over-dependence on confirming hypotheses from weak (potentially spurious) correlations.

With previous research determining that 2D:4D is related to a diversity of outcomes that seem to ultimately shape one’s decisions in life and fate, we aimed to explore the magnitude to which 2D:4D is associated with good luck (using a randomly drawn poker hand selected by each participant as a surrogate measure). To put this in the context of clinically relevant outcomes, we also sought to determine whether 2D:4D was related to body composition parameters, which could plausibly be related to prenatal androgen exposure. We hypothesized that, by random chance alone, 2D:4D would show a statistically significant relation with good luck, similar in magnitude to body composition parameters. We performed this study to demonstrate that random chance alone could produce seemingly convincing results, rather than validating the use of 2D:4D as a biomarker.


Study design

An abridged methods section is presented here (full details are provided in appendix 1). This cross sectional study was approved by the institutional review board at High Point University (High Point, NC, USA), and was carried out from May 2015 to February 2017. A priori power analyses are not typically performed to determine sample size in 2D:4D research, so our study sample was based on the time and resources the principal investigator could allot to data collection. A total of 176 individuals gave written informed consent and enrolled in the study. Research participants visited the laboratory in person and underwent body composition testing, had their finger lengths measured using two different procedures, and performed a procedure designed to be a surrogate of good luck.

Study population

Adults aged over 18 years were recruited within a university setting, in a sports performance laboratory (High Point University), including students, faculty, and staff. Individuals with a history of any musculoskeletal or rheumatic diseases, injuries, or surgeries that influenced hands or fingers bilaterally were excluded from the study. Individuals with a history of unilateral hand or finger injury (eg, previous fracture) were allowed to participate, but data from the injured hand or individual fingers were excluded from analysis.

Body composition assessment

We measured bone mineral content, bone mineral density, and body fat percentage using dual energy x ray absorptiometry (DXA) on a Hologic Discovery W scanner (Hologic, Marlborough, MA). Calibration and scan procedures were performed in accordance with manufacturer recommendations (appendix 1).

Digit ratio data collection

Digit ratios were measured by two different procedures in accordance with best practice recommendations (appendix 1). Figure 2 shows example images. Participants were instructed to lightly place their hand on a standard photocopier (MX-3570N, Sharp Electronics, Montvale, NJ), and the researcher then scanned one hand at a time. Digital images were captured and delivered electronically to the research team. To minimize radiation exposure and optimize time efficiency, we also used DXA to obtain images of the phalangeal bones. We used the scanner’s lumbar spine analysis software to capture an image with sufficient detail to identify details of the phalangeal bones.

Fig 2
Fig 2

Example hand images used to measure the second-to-fourth digit ratio (2D:4D). Both images are the right hand of the same participant; yellow lines represent second and fourth digit measurements. (A) Photocopy image: 2D:4D=0.966. (B) Radiographic image: 2D:4D=0.943 (radiographic image has been horizontally flipped to be in the same orientation as the photocopy)

Good luck measurement

Participants were asked to select five cards from a deck of playing cards (United States Playing Card Company, Erlanger, KY). Cards were thoroughly shuffled by an investigator and then fanned out, face down, onto a smooth, flat surface. The participant was then requested to select any five of the face down cards and flip them over. The value and suit of the cards were recorded. This procedure was repeated a second time with a separate, shuffled deck of playing cards, which resulted in each participant having two separate, randomly selected, five card poker hands, drawn from two separate decks of cards.

Poker hands were classified and ranked according to standard poker rules (that is, royal flush as the highest hand, single high card as lowest hand). Each individual’s highest ranking hand was then selected, and ranked in relation to all other participants at the completion of the study. For instance, the best poker hand in the study’s dataset (a nine high straight) was given the top rank of 1. In the event of a tie, both hands were given the same rank, and the ranking below was given a rank two units below those (eg, if two participants had identical hands, and both were ranked 37th, the next hand below them would be ranked 39th).

Digit ratio measurement

All measurements were performed in Adobe Photoshop using the measure tool. We used mouse guided calipers to measure the second and fourth digits of each hand in accordance with that of previous recommendations.38 For photocopies, the center of the proximal skinfold nearest the metacarpophalangeal joint was the first point of measurement, and the center of the distal fingertip was the second point. For radiographic (DXA) images, the center of the base of the proximal phalanx was the first point of measurement, and the center of the distal tip of the distal phalanx was the second point.

To minimize risk of rater bias or erroneous measurements, each image was measured by at least two trained raters, each blinded to measurements by the other raters.39 2D:4D for each hand was computed by the photocopy and radiograph techniques. The percentage difference between raters for each 2D:4D measurement was then computed as follows: (rater 1−rater 2)÷(maximum value of rater 1 or rater 2)×100%. If a ≥2.0% difference was noted, a third blinded rater repeated the measurement on that image.

Statistical analysis

Statistical analysis was performed by using the average 2D:4D from at least two raters for each hand and each procedure (photocopy and radiography). Significant relations have been reported between 2D:4D and various outcomes when male and female individuals are combined into one group.64041 In other instances, the relation between 2D:4D and an outcome is only statistically significant when men and women are analyzed separately.2742 Therefore, we performed all analyses both ways—with sexes combined and separated.

In our (facetious) effort to persuade less statistically savvy readers of the validity of our statistical analyses, we report as many P values as possible, even where they are not necessary (eg, descriptive statistics). All statistical analyses were performed in SPSS version 27.0 and a priori statistical significance was set at P≤0.05.

Inter-rater agreement

To determine reliability of the most consistent raters for a given 2D:4D, a one-way random intraclass correlation coefficient was computed for each hand using each technique. Additionally, the mean inter-rater percentage differences were computed for each measurement technique for each hand.

To determine the relation between 2D:4D and sex, hand, and measurement technique, linear mixed effects models were computed by use of a scaled identity as the repeated measures covariance structure. Sex (male v female), hand (left v right), measurement technique (photocopy v radiograph), and all two way interactions between the three factors served as categorical predictors, and 2D:4D served as the dependent variable. A similar linear mixed effects model was also used to compare age, height, and body mass between sexes.

For correlations between 2D:4D and body composition parameters and good luck, preliminary analysis showed that the continuous body composition outcome variables were not normally distributed. Therefore, Spearman’s rs was computed for all correlations.

Patient and public involvement

Conversations with members of the public inspired this study, because many indicated that they had seen 2D:4D mentioned in social media; some believed that scientific research had confirmed that 2D:4D had real life applications, while others expressed doubt that 2D:4D could predict anything. However, patients or the public were not directly involved in this study because of limited resources. A member of the public read this manuscript after submission.


The analysis included 176 individuals (102 men and 74 women). Men were significantly older (age +1.5 years (95% confidence interval 0.2 to 2.9), P=0.03), heavier (body mass +16.0 kg (12.9 to 19.1), P<0.001), and taller (height +0.16 m (0.10 to 0.22), P<0.001) than women. A total of 690 hand images (346 photocopy, 344 radiograph) were included in the analysis. The left hands from one female and one male participant were excluded from analysis, because of a history of broken digits on those hands. Four photocopies (two left hands for men, one left and right for a woman) and six radiographs (three left and three right for men) were not analyzed for technical reasons (that is, missing scan, digits landmarks not clearly visible).

Reliability and rater agreement

All intraclass correlation values were more than 0.90 and the mean percentage difference between raters was less than 1.0% for all 2D:4D measurements, indicating excellent agreement between raters (table 1).

Table 1

Reliability of rater measurement of the second-to-fourth digit ratio, by measurement technique and participant hand measured. Data are mean (95% confidence interval)

View this table:

Comparison of 2D:4D by sex, hand, and measurement technique

A summary of 2D:4D data are provided in table 2. Some significant differences were seen: men had a lower 2D:4D than women (P<0.001), the left hand had a lower 2D:4D than the right hand for the photocopy technique (P=0.004; but a two way interaction showed this association was significant for photocopy only), and the radiograph technique produced lower 2D:4D measurements than the photocopy technique (P<0.001). A significant interaction between measurement technique and hand measured (P=0.006) was noted, but the interactions between sex and measurement technique (P=0.70) and between sex and hand measured (P=0.92) were not significant. Despite significant differences in mean 2D:4D, the distribution of 2D:4D between sexes, hands, and techniques showed considerable overlap (fig 3).

Table 2

Summary of second-to-fourth digit ratios (2D:4D) by sex, measurement technique, and participant hand measured. Data are mean (95% confidence interval). 95% confidence intervals for radiograph ratios for men are equal to three decimal places

View this table:
Fig 3
Fig 3

Histograms of second-to-fourth digit ratios (2D:4D) by participant hand measured, sex, and measurement technique. Despite statistically significant mean differences between men and women for each hand and each technique, the histograms had considerable overlap. (A) Left hand, photocopy technique; (B) right hand, photocopy; (C) left hand, radiograph; (D) right hand, radiograph

Relation between 2D:4D and outcome measures

Correlations between 2D:4D and body composition parameters and good luck ranking are presented in table 3, figure 4, and figure 5. Many statistically significant correlations were seen, with rs ranging in magnitude from 0.16 to 0.28; the strongest correlation was between the right hand 2D:4D measured by the radiograph technique and poker hand rank in men. However, results were not always consistent between measurement technique. For instance, the correlation between right hand 2D:4D and bone mineral content in women was significant for the photocopy technique (rs=0.23, P=0.05), but not for the radiograph technique (rs=0.08, P=0.5). In some instances, only men showed a significant correlation (eg, between left and right hand 2D:4D measured using the radiograph technique with bone mineral density), while in others, correlations were only significant when sexes were combined (that is, left and right hand 2D:4D measured using the radiograph technique with body fat percentage).

Table 3

Relations between second-to-fourth digit ratios (2D:4D) and age, body composition parameters, and good luck (using poker hand rank as a surrogate)

View this table:
Fig 4
Fig 4

Scatter plots of second-to-fourth digit ratios (2D:4D) versus bone mineral density by participant sex and hand measured. Only 2D:4D measured by radiographic technique are presented, because the ratios measured by photocopy did not show significant results. (A) left hand, women; (B) right hand, women; (C) left hand, men; (D) right hand, men

Fig 5
Fig 5

Scatter plots of second-to-fourth digit ratios (2D:4D) versus poker hand rank by participant hand measured and sex. Only 2D:4D measured by radiographic technique are presented, because the ratios measured by photocopy did not show significant correlations. (A) Left hand, women; (B) right hand, women; (C) left hand, men; (D) right hand, men


This study intended to explore whether researchers can get lucky in finding statistically significant associations between a biomarker and various outcomes of interest, and whether these relations might reflect random chance rather than biological cause and effect. Failure to recognize these common research pitfalls (eg, scientifically unjustified hypotheses, weak experimental and statistical methodology, and improper reporting; box 1) can allow false positive findings to masquerade as evidence to support unsound theories. We focus on the 2D:4D example, but we urge researchers and clinicians to be especially vigilant when interpreting data from biomarker association studies.

Box 1

So-called “pitfalls” to avoid in research on second-to-fourth digit ratios (satirical)

Pre-registration of study protocol

  • Substantially reduces flexibility for defining what the primary outcome is, and also creates rigidity in deciding on participant groups and statistical analyses

Performing or reporting a priori power analysis

  • Removes flexibility in determining or adjusting sample size and weakens claims of non-significant trends being meaningful

Detailed accounting for multiple comparisons

  • Could change statistically significant findings into non-significant ones, which makes for a less interesting (and perhaps less publishable) paper. If multiple comparisons are requested, simply state that they were done and avoid disclosing the denominator used and specifics of how it was determined

Reporting negative findings from other outcomes

  • Provides evidence that might contradict evidence otherwise supporting the hypothesis, and might also undermine the appearance of consistent findings within the literature


Readers unaware of our study’s intent could interpret our results as showing that prenatal testosterone influences body composition in men (maximum rs=0.26), but not as much as it influences good luck (rs=0.28). These results accord with much of what is reported in the 2D:4D literature, including similar magnitude correlations (that is, rs=0.15 to 0.35), lower 2D:4D associated with desirable metrics of performance (that is, better body composition and poker hand), a sex specific effect, and greater association for the right hand than for the left. If study’s reported findings are similar to an existing body of research, it might be easy to overlook multiple fallacies and assume that statistically significant findings represent real effects. Thus, our findings could be used (inappropriately) to support theories claiming that prenatal testosterone exposure influences adulthood traits, and they could validate an unfounded hypothesis that 2D:4D might be predictive of future luck.

If we were to interpret these findings seriously, we might suggest that men with a low 2D:4D should participate in activities where good luck is an important contributor to success, while those with high 2D:4D abstain from purchasing lottery tickets. Although we report a lower 2D:4D to be associated with good luck, some studies report that low 2D:4D is also associated with some cancers,18 and mathematical analysis suggests they might be due to bad luck.4344 Thus, we might postulate that 2D:4D does not have a direct causative influence on one’s luck, but rather influences behaviors that modulate luck (that is, carrying a lucky rabbit’s foot, frequently interacting with black cats).45 This explanation is indeed ridiculous, but scientific “just so” stories are commonly used to explain chance findings and make them fit within an existing paradigm.46 We would also caution readers that the good luck described in our study does not necessarily translate to the act of “getting lucky,” although previous research indicates that 2D:4D is associated with sexual attractiveness in social situations.47

It is problematic when a small subset of positive findings from a larger pool of multiple comparisons are simply assumed to be physiological cause and effect, and spurious correlations are not considered a possibility. Sufficient multiple comparisons allow a reasonable likelihood of finding a difference in group means or a correlation with P<0.05, which makes it difficult to disentangle true relations from random chance, especially in the absence of a strong mechanistic hypothesis. Visual examination of figure 3 provides seemingly convincing evidence of a clear relation between 2D:4D and bone mineral density in men (physiologically plausible), but an equally convincing relation is apparent for good luck in figure 4 (clearly spurious). Such chance findings can partly explain why 2D:4D in one hand, but not the other, is significantly associated with the outcome of interest in many studies, and why the more predictive hand is inconsistent between studies. As an example, one meta-analysis relating 2D:4D to athletic performance concludes, “under some circumstances yet to be identified, left hand 2D:4D systematically out-predicts right hand 2D:4D whereas the opposite is true under other circumstances.”48

Strengths and weaknesses of the study

A strength of this study is that we intentionally performed this study in the same way that hundreds of other 2D:4D studies have been conducted. However, our study attempted to determine whether spurious relations (with poker hand) could easily occur using best practice 2D:4D measurements and how distinct they would be from seemingly physiologically plausible ones (with body composition).

This study had multiple limitations consistent with many 2D:4D studies, as well as medical research attempting to reach conclusions based on correlations or simplistic between-group differences. For example, we did not have a scientific justification for our sample size, which can facilitate misleading results. Although we report significant P values, separate sex analyses were actually underpowered (appendix 2). In underpowered studies, significant findings might be blindly accepted as real, even though they are more likely to be spurious.49

This trial was not pre-registered, which is not uncommon for cross sectional studies. Without predefined hypotheses, outcomes, and statistical analyses (including adjustments for covariates and multiple comparisons), readers cannot determine whether significant results were achieved through flexible methodology (eg, various forms of P hacking) and selective reporting. We actually performed different procedures that generated random numbers (appendix 2), and simply got lucky in that our most fun and interesting procedure (poker hand), had some significant P values (as did all others for at least one comparison). We purposefully omitted these details from the methodology to make our point—even ridiculous hypotheses can be confirmed with sufficient multiple comparisons, and the apparent validity of these results can be biased through selective reporting. Without pre-registration, reported results might be the endpoint of flexible analyses (see example in fig 6). Our results could have appeared even more convincing if we only reported on 2D:4D measured on radiograph, which would have seemingly provided fewer multiple comparisons. Even in registered clinical trials, selective reporting and outcome switching are not uncommon.5051

Fig 6
Fig 6

Hypothetical algorithm for identifying statistically significant relations between second-to-fourth digit ratios (2D:4D) and any outcome measure


Our results suggest that a lower 2D:4D, purportedly indicative of greater prenatal testosterone exposure, is associated with favorable body composition parameters and also good luck. When interpreted in the context of the 2D:4D literature, this finding provides further evidence that 2D:4D might be a universal biomarker of one’s fate. In reality, our statistically significant results are actually spurious, and raise the possibility that other claims regarding 2D:4D’s association with human health and behavior might also be false positive findings owing to weak experimental and statistical methodology.

The 2D:4D literature provides a valuable example of the necessity for research to have a physiologically sound justification, registered a priori hypotheses with detailed data analysis plans, and publicly available datasets (when feasible) to have a pathway toward clinical relevance. Appropriately powered replication studies and publication of non-significant findings are essential to ensure that poor quality research does not dominate a given field to provide an appearance of a strong body of evidence and spawn biologically unjustified medical recommendations. Before concluding that weak correlations confirm a hypothesis, researchers should consider the possible existence of false positive findings—a dangerous artifact of statistical good luck.

What is already known about this topic

  • Second-to-fourth digit ratios (2D:4D) are commonly used as a surrogate for prenatal exposure to testosterone, although the evidence for this association in humans is weak

  • Many studies have linked 2D:4D to various aspects of physical and mental health, with proponents of the measurement suggesting that it should be incorporated into clinical practice

What this study adds

  • A lower 2D:4D is associated with lower body fat percentage, greater bone mineral content, greater bone mineral density, and greater good luck, especially in men

  • Spurious associations (that is, false positive findings) are likely to account for statistically significant findings in situations where a weak physiological basis exists for a relation between a predictor and outcome

Ethics statements

Ethical approval

This study was approved by the High Point University Institutional Review Board (Protocols #201301-153 and #201505-374). All research participants underwent informed written consent.

Data availability statement

The full dataset used for this study is available in the supplementary material.


JMS thanks his department for supporting him in investing his time doing this type of research.


  • Contributors: JMS conceived this idea after reviewing the digit ratio research and becoming suspicious that one biomarker could predict pretty much anything. JMS recruited LKF, JSS, AAG, and FJ to become involved in this project after convincing them that a study about fingers and playing cards was important science. JMS was the principal investigator and led the research design, statistical analysis, data visualization, and manuscript writing and revision. LKF, JSS, and AAG substantially contributed to investigation (data collection and subject recruitment), general project administration, data curation, and contributed to writing aspects of this manuscripts. FJ contributed to data curation and validation. All authors had full access to the data in the study and can take responsibility for the integrity of the data and the accuracy of the data analysis. JMS is the guarantor. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.

  • Funding: This study received no specific funding.

  • Competing interests: All authors have completed the ICMJE uniform disclosure form at and declare: no support from any organization for the submitted work; no financial relationships with any organizations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.

  • The lead author affirms that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) have been explained.

  • Dissemination to participants and related patient and public communities: The principal investigator will directly contact freelance journalists that he has worked with previously, who have written articles about his research for major media outlets; pitch a plain language summary for mainstream media websites where he has written previously (The Conversation or Scientific American blogs), as well as other similar websites; and provide a Twitter thread on the topic.

  • Provenance and peer review: Not commissioned; externally peer reviewed.

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See:


View Abstract