CCBYNC Open access
Research Christmas 2015: All in the Mind

Debunking the curse of the rainbow jersey: historical cohort study

BMJ 2015; 351 doi: http://dx.doi.org/10.1136/bmj.h6304 (Published 14 December 2015) Cite this as: BMJ 2015;351:h6304
  1. Thomas Perneger, clinical epidemiologist
  1. Division of Clinical Epidemiology, Geneva University Hospitals, and Faculty of Medicine, University of Geneva, CH-1211 Geneva, Switzerland
  1. Correspondence to: T Perneger thomas.perneger{at}hcuge.ch
  • Accepted 23 October 2015

Abstract

Objective To understand the underlying mechanism of the “curse of the rainbow jersey,” the lack of wins that purportedly affects the current cycling world champion.

Design Historical cohort study.

Setting On the road.

Participants Professional cyclists who won the World Championship Road Race or the Tour of Lombardy, 1965-2013.

Main outcome measures Number of professional wins per season in the year when the target race was won (year 0) and in the two following years (years 1 and 2; the world champion wears the rainbow jersey in year 1). The following hypotheses were tested: the “spotlight effect” (that is, people notice when a champion loses), the “marked man hypothesis” (the champion, who must wear a visible jersey, is marked closely by competitors), and “regression to the mean” (a successful season will be generally followed by a less successful one).

Results On average, world champions registered 5.04 wins in year 0, 3.96 in year 1, and 3.47 in year 2; meanwhile, winners of the Tour of Lombardy registered 5.08, 4.22, and 3.83 wins. In a regression model that accounted for the propensity to win of each rider, the baseline year accrued more wins than did the other years (win ratio 1.49, 95% confidence interval 1.24 to 1.80), but the year in the rainbow jersey did not differ significantly from other cycling seasons.

Conclusions The cycling world champion is significantly less successful during the year when he wears the rainbow jersey than in the previous year, but this is best explained by regression to the mean, not by a curse.

Introduction

Samuel Johnson chided doctors for believing that if a patient got better it was because they sent him to the waters, for mistaking “subsequence for consequence.”1 The alternative explanation—that patients consult when they feel poorly, and most get better regardless of treatment—requires a grasp of random variation. Mostly, we struggle with randomness.2

Doctors are not the only culprits. Consider professional cycling and the “curse of the rainbow jersey.”3 The “rainbow” jersey is worn by the current cycling world champion (it is an odd rainbow: the jersey is white, with bands of blue, red, black, yellow, and green across the chest). In 1965 British cyclist Tom Simpson won the World Championship Road Race, then broke his leg while skiing during the following winter and lost his 1966 season to this and other injuries. In the ensuing years, champion after champion encountered all manner of misery while wearing the jersey: injury, disease, family tragedy, doping investigations, even death, but especially a lack of wins.3 It soon became obvious that the rainbow jersey was cursed.

Several explanations can be entertained. One is that the world champion is as likely to encounter difficulties as anyone, but, as he is the champion, people notice more. This is the “spotlight effect.” Another explanation is that the world champion, very noticeable in the rainbow jersey, is marked more closely by rivals, which lowers his chances of winning. This is the “marked man hypothesis.” Finally, random variation in success rates ensures that a very successful season, such as one during which the rider has won a major race, is likely to be followed by a less successful season. This is the “regression to the mean” phenomenon.4 In this study, I explored to what extent these hypotheses are supported by racing results of cycling champions.

Methods

The study population included winners of the Union Cycliste Internationale men’s World Championship Road Race from 1965 to 2013 and, for comparison, the winners of the Tour of Lombardy of the same years. The latter race is of comparable importance—it is one of five “monuments” among classic one day races—and takes place at the end of the racing season, just like the World Championship.

The outcome variable was the number of individual wins in professional races during a given year, obtained from a publicly accessible database (www.procyclingstats.com). Win counts were obtained for three calendar years: year 0, at the end of which the rider won the target race (World Championship or Tour of Lombardy); year 1, during which the world champion wore the allegedly cursed jersey; and year 2, when all riders returned to curse-free status.

Study hypotheses

The hypothesised patterns for the average numbers of wins are (fig 1):

Figure1

Fig 1  Three hypotheses under consideration: expected average number of wins in year when race took place (year 0), following year (year 1), and year after that (year 2), for winner of World Championship Road Race (empty circles) and winner of Tour of Lombardy (full circles)

  • “Spotlight effect”—the problems of the world champion are apparent only because of increased media attention, so the numbers of wins remain at the same level for the three years.

  • “Marked man” hypothesis (indistinguishable from the rainbow curse)—a decrease in wins affects the current world champion, but this effect disappears in year 2 and does not affect the Lombardy winner.

  • “Regression to the mean”—year 0 is a high outlier, and the number of wins returns to a lower level in years 1 and 2. The pattern is identical for the Lombardy winner.

  • Combination of “marked man” and “regression to the mean.”

Statistical analysis

I tabulated the mean numbers of professional victories per rider and per year separately for winners of the World Championship and of the Tour of Lombardy. I used the Wilcoxon paired test for year to year comparisons.

I used mixed negative binomial regression to evaluate the hypotheses.5 The dependent variable was the annual number of wins. Each rider was afforded an individual tendency to win, represented below by the random intercept αi. The index “i” identified the rider and remained identical if a rider won more than one target race (for example, Eddy Merckx won five target races and contributed 15 data points). An annual win count appeared more than once if it counted towards more than one target win; for example, for a repeat champion, the win total for year 1 of the first title was also the win total for year 0 of the second title.

I built four models. The first (model 1) represented the “spotlight effect” and added to the random intercept a fixed effect for the race (World Championship=0, Tour of Lombardy=1): log(wins)=αi rideri1 Lombardy. The model of the “marked man” hypothesis (model 2) added a fixed effect for the year in the rainbow jersey (rainbow=1 for year 1 of the world champion, and=0 otherwise): log(wins)= αi rideri1 Lombardy+β2 rainbow. The model representing “regression to the mean” (model 3) included a fixed effect for the baseline year of both races (baseline=1 for year 0, and 0 for years 1 and 2): log(wins)= αi rideri1 Lombardy+β3 baseline. The fourth model (model 4) represented both the “marked man” and the “regression to the mean” hypotheses together: log(wins)= αi rideri1 Lombardy+β2 rainbow+β3 baseline. Regression coefficients β correspond to expected differences in logarithms of wins, and eβ express the ratio of wins. The a priori hypotheses put no constraint on β1 but required a negative β2 and a positive β3.

I used the Akaike information criterion to identify the best fitting model. The criterion equals 2k–2LL, where k is the number of parameters of each model and LL its log-likelihood.6 Each model included three parameters (two for the negative binomial distribution and one for the variance of the random intercept) in addition to parameters of the fixed effects. The analyses were run on Stata version 13.

Results

The dataset included annual win totals for 289 rider years: for each race, 49 results in year 0, 49 in year 1, and 46 (World Championship) or 47 (Tour of Lombardy) in year 2. Totals were lower in year 2 because winners in 2013 contributed only years 0 and 1 (the 2015 season was incomplete at the time of analysis), and three win totals were missing due to retirement of riders. Several riders won more than one target race, and 63 different riders contributed data: 40 riders had one target win, 14 had two wins, seven had three wins, one had four wins, and one had five (Merckx, triple world champion and double Lombardy winner). Six riders won both races in the same season.

Winners of both target races had similar annual numbers of wins: on average 4.18 (quartiles 1, 2.5, and 5) for world champions, and 4.37 (quartiles 1, 3, and 6) for Lombardy winners. Similarly, for winners of both races, the annual win total was higher in year 0 than in years 1 and 2 (table 1); the difference between year 0 and the following years was statistically significant, but the difference between years 1 and 2 was not.

Table 1

Mean number of professional racing wins for world champions and for Tour of Lombardy winners of preceding year

View this table:

The first regression model confirmed that the average number of annual wins did not differ significantly between world champions and Lombardy winners (table 2). Model 2 tested whether the year in the rainbow jersey was a special case; although the win ratio was less than 1, the reduction was small and statistically non-significant. Model 3 confirmed that the baseline year of both races was significantly more successful than the ensuing years. Model 4 confirmed that the rainbow year did not differ significantly from other years (this time the win ratio was above 1) but that the baseline year of either race was significantly more successful.

Table 2

Mixed negative binomial regression models with random rider specific intercept, and their goodness of fit statistics

View this table:

The comparison of goodness of fit statistics confirmed that models 3 and 4, which incorporated regression to the mean, were substantially better than models 1 or 2. The best fitting model was model 3, as it had the lowest value of the Akaike information criterion.

Discussion

The curse of the rainbow jersey probably does not exist. The current road racing world champion wins less on average than he did in the previous season, but this phenomenon is best explained by regression to the mean. The relative lack of success was not restricted to the season in the rainbow jersey but persisted in the following season and affected equally the winners of the Tour of Lombardy. There was nothing remarkable about the year spent wearing the rainbow jersey.

Nevertheless, this study may not rule out a curse entirely, as it tested only one facet of the curse—the decrease in wins. I found no good data about the personal problems of professional cyclists. Also, all wins were given even weight: if the world champion is cursed to winning only minor races, this analysis would have missed that. Finally, this analysis did not account for any changes in doping practices, for lack of reliable data. The possibility remains that cyclists dope until they win an important race and stop afterwards.

Regression towards the mean is unavoidable whenever the variable under study (here, sporting success) fluctuates over time, the correlation between consecutive observations is less than 1, and the baseline observation is defined by an arbitrarily high or low value (here, a season marked by an important win). Regression to the mean may explain, for instance, why patients who lose bone density in the first year are likely to reverse this trend at follow-up or why HIV related risk behaviours improve after enrolment into a prevention trial.7 8 This phenomenon occurs regularly in clinical medicine, research, and programme evaluation, as well as in other walks of life. For instance, some flight instructors believe that praising a pilot after a smooth landing is counterproductive but reprimanding a pilot after a rough landing leads to improvement.2 Their observation is correct—an extreme performance will be followed by a more average one—but the causal inference is not. Neither is this reaction particularly new. Quite possibly the proverb “Pride goeth before destruction” (King James Bible, Proverbs 16:18) should be credited with the first description of regression towards the mean, and not Francis Galton,9 who merely showed that chance and correlation, not the Lord or a large ego, were to blame.

What is already known on this topic

  • Professional cyclists, just like doctors, are prone to mistaking temporal sequence for causality

  • Cycling world champions seem to have a horrible year wearing the champion’s stripes (“the curse of the rainbow jersey”)

What this study adds

  • World champions win significantly less when they wear the rainbow jersey than during the previous year

  • However, this is no different from the following year and is similar to the experience of winners of the Tour of Lombardy

  • Regression towards the mean explains this pattern best

  • Yesterday’s winner is not cursed if he does not win again today (and, by analogy, the patient did not necessarily get better because the doctor prescribed mud baths)

Footnotes

  • Contributors: TP conducted the study, wrote the paper, and approved the version submitted for publication.

  • Funding: None.

  • Competing interests: The author has completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf (available on request from the author) and declares: no support from any organisation for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.

  • Ethical approval: Not needed.

  • Transparency: The author affirms that this manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) have been explained.

  • Data sharing: No additional data available.

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 3.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/3.0/.

References

View Abstract