Decline in sperm counts: an artefact of changed reference range of “normal”?BMJ 1994; 309 doi: http://dx.doi.org/10.1136/bmj.309.6946.19 (Published 02 July 1994) Cite this as: BMJ 1994;309:19
- P Bromwich,
- J Cohen,
- I Steward,
- A Walker
- Midland Fertility Services, Court Parade, Aldridge WS9 8LT Centre for Ecosystems Analysis and Management, University of Warwick Coventry CV4 7AL Nonlinear Systems Laboratory, Mathematics Institute, University of Warwick, Coventry CV4 7AL
- Correspondence to: Dr Cohen.
- Accepted 11 April 1994
Objective: To investigate a reported fall in sperm counts during 1940–90 in relation to the reduced lower reference value of “normal” during the same period by assuming the null hypothesis that no change had occurred in the probability distribution of the sperm concentration.
Design: Analysis by using various mathematical models of the probaility distribution of sperm concentration together with experimental data which supported a model employing a logarithmic distribution.
Subjects: 235 men presenting for stimulated in vitro fertilisation at Midland Fertility Services, Aldridge, in 1992 together with samples of 20 ejaculates from each of five men attending the same centre during 1992–3.
Results: The effect of the change in lower reference value for the “normal” sperm concentration (from 60x109 to 20x109/l) depended on the probability distribution of the concentration in the population. If that distribution was normal or uniform, then very little of the reported decline was a consequence of the change in lower reference value. If it was heavily skewed, then most or all of the reported decline may have been a consequence of that change. The limited experimental data available indicate that the distribution was heavily skewed.
Conclusions: Depending on the actual distribution of sperm concentration in the population, the reported decline in concentration may have been accounted for entirely or in part by the change in lower reference value. The original evidence does not support the hypothesis that the sperm count declined significantly between 1940 and 1990.
Sperm concentrations in successive samples from one man, and aggregate data from many patients, are highly skewed and closer to a logarithmic distribution than a normal distribution
The evidence for a long term decline in sperm concentrations, based on historical data, is unconvincing
Lower reference values of normal (of 60x109/l or 20x109/l) should not be applied uncritically
The pattern of individual variability means that averages may be poor measures of fertility
Geometric means may be more appropriate clinical variables than arithmetic means but are unreliable and require validation
Carlsen et al performed an extensive analysis of historical data on human sperm concentrations.1 By using linear regression analysis on 61 different sets of data obtained between 1938 and 1990 they reported a significant (P<0.0001) decrease in mean sperm concentration — from 113x109/l in 1940 to 66x109/l in 1990, a decline of more than 40%. Their conclusion received widespread recognition, including coverage by the media.2 They also reported a marginal (P<0.03) decrease in mean seminal volume, from 3.4 ml in 1940 to 2.8 ml in 1990. As further support of the hypothesis of a decline in sperm quality, they noted that “the lower reference value for a normal' human sperm count has changed from 60x106/ml in the 1940s to the present value of 20x106/ml. Note that their use of the term “sperm count” refers to concentration, not total numbers of sperms in an ejaculate, and in keeping with other authors we occasionally use it in this sense below.
In order to avoid bias, Carlsen et al restricted their study to men with proved fertility (39 studies) or “normal” men of unknown fertility (22 studies). However, the mean sperm concentration is very sensitive to the form of the probability distribution for sperm concentration, either in individuals or across the population. It is also sensitive to cut offs introduced by selection of subjects. Similar comments apply to other measures of fertility, such as the total number of sperms in an ejaculate, but in keeping with Carlsen et al we restrict attention to the concentration.
In particular, the reduction of the lower reference value for a normal sperm concentration is likely to lower the observed mean, because men with mean sperm concentrations between 20x109/l and 60x9/l would tend to be excluded in studies performed in the 1940s but be included in later studies.
To what extent might the apparent decline reported by Carlsen et al be a consequence of the change in the lower reference value? If lower sperm counts are more probable than high ones, then discarding subjects with low sperm counts has a disproportionate effect on the mean. Thus the effect of such a change depends on how the probability distribution of the sperm concentration interacts with the effects of averaging over accumulated results for individual patients or over accumulated results for a population.
The usual statistical models employed to analyse these questions are probability distributions, usually defined by a probability density function.3 The distribution is normal if its probability density function is the usual bell shaped curve4 and uniform if the probability density function is constant within some range and zero outside it. It is a power law distribution if the probability density function is proportional to X−s for some positive constant s. When s=1 such a distribution is said to be logarithmic, because the cumulative probability distribution is given by a logarithm. Because the total probability must be 1, power law distributions must be reduced to zero at some upper cut off when s<2 or otherwise modified. Power law distributions, and in particular logarithmic ones, are heavily skewed towards lower values.
Surprisingly little is known about the probability distributions of sperm concentrations, either for individuals or for populations. The effect on the mean of a change in the lower reference value depends sensitively on the underlying distribution. To illustrate this dependence we have considered four hypothetical cases: uniform normal logarithmic, and power law distributions.
For this analysis we assume a simple model as null hypothesis. This is that the probability distribution of sperm concentration across the male population remained unchanged between 1940 and 1990; that men with a concentration below 60x109/l were rejected in the 1940s; but that only men with a concentration below 20x109/l were rejected in the 1980s-90s. In practice the effect of exclusion would not be as clear cut, because individual variability implies that men who are included in a trial may subsequently produce ejaculates whose concentrations lie in the excluded range. This effect could be incorporated into a refined model. It would have a moderate quantitative effect on our calculations, but we would not expect it to change the main conclusions.
We analyse the effect of the change in lower reference value as follows. Given a probability distribution, we first cut it off at a lower limit of 60x109/l and adjust parameters to reproduce the 1940 mean of 113x109 obtained by Carlsen et al. We call this the “assigned” mean. Leaving the parameters unchanged, we then include data values between 20x109/l and 60x109/l and recompute the mean. We refer to this as the “predicted” mean appropriate to the lower reference value adopted in 1990. The table gives the results. Mathematical details are available separately from JC.
If the distribution is logarithmic, then the predicted 1990 mean is 76x109/l, so that the change of the lower reference value accounts for almost all of the apparent decline in mean sperm concentration. With uniform or normal distributions the change in lower reference value cannot account for the apparent decline. For some power law distributions the change in the lower reference value leads to a bigger decline than that reported by Carlsen et al.
In general, any distribution that is heavily skewed towards smaller values will lead to a disproportionate decrease in the mean concentration in response to the change in the lower reference value without any actual change in the distribution of sperm concentration across the population. This is the main point of our argument, and it is not affected by fine details of models.
How are sperm concentrations actually distributed? Because of the widespread use of the mean sperm count (or concentration) as a clinical variable many papers have reported data on mean values. However, information on the distribution about the mean is scarce. Figure 2 in Carlsen et al (our fig 4 (left)) summarises how percentages of men with sperm concentrations in five ranges changed between 1930 and 1990. We show below that their data are consistent with our hypothesis that there has been no change in the distribution of sperm concentration in the population. Cohen considered possible distributions.5 The folklore observation that “twice as many sperms is as likely as half as many” suggests that the distribution is logarithmic. In order to offer (limited) experimental support for the skewness of sperm concentration data we shall discuss new data supplied by Midland Fertility Services, Aldridge. These data are of two kinds: (a) sperm concentrations for 235 men presenting for stimulated in vitro fertilisation at Midland Fertility Services, Aldridge, in 1992; (b) sperm concentrations in 20 ejaculates from each of five selected men attending the same centre during 1992–3.
Distribution for populations
Figure 1 shows the observed distribution of sperm concentrations. Figure 1 is drawn as a histogram, whose shape provides an approximation of the probability density function. To do this, the possible values of the concentration have been divided into intervals, or “bins,” and the number of men whose concentration falls in a given bin is plotted as the height of the corresponding vertical bar. The histogram clearly possesses a key feature of logarithmic and power law distributions. It is highly skewed towards low values.
Distribution for individuals
Figure 2 shows data consisting of 20 concentration measurements performed on five different men (identified by case numbers at Midland Fertility Services, Aldridge). These men were selected as potential donors and thus may have had higher mean sperm counts and concentrations than would be typical of the population as a whole. The distribution of sperm concentrations for individuals was highly skewed towards lower values. Because the number of observations was small the skewness was shown by the thinning out of bars at higher values as well as in the peaks at lower values. When the data were pooled (fig 3) the skewness became more apparent. For this analysis the relevant feature of figure 2 is that it shows considerable variability within individuals. The precise nature of that variability cannot be deduced with confidence from the limited data presented here, but it is likely to be skewed in the same general manner as the pooled data.
Figure 2 in Carlsen et al (our fig 4 (left)) shows how the percentages of men in their study whose sperm concentrations fell into one of five ranges varied between 1940 and 1990. The ranges were <20, 20–40, 41–60, 61- 100, and >100 (allx109/l). There was a dramatic decline in the percentage for the highest range from 50% in 1940 to 16% in 1990. However, this apparent decline is consistent with the hypothesis that the distribution of sperm concentration had not changed but that the lower cut off had. If we assume a logarithmic distribution with an upper cut off of 190x109/l as in the table, then figure 4 (right) shows how the percentages are predicted to change, assuming our logarithmic model, when the lower cut off is reduced from 60x109/l to 20x109/l. With a lower cut off of 60x109/l the percentage in the highest range is 55%, but this drops to 28% when the lower cut off is changed to 20x109/l. It is clear why: the inclusion in 1990 of large numbers of men with low sperm concentrations, who would have been excluded in 1940, drastically reduced the percentage in the higher ranges. There is an especially dramatic effect on the highest range, which is very broad.
Our simple model assigns zero percentages to all ranges below the 1940 and 1990 cut offs. In contrast, figure 4 (left) shows non-zero percentages below these cut offs. The discrepancy can be explained in terms of individual variability, as discussed above. A man may on one or more occasions produce a sperm concentration high enough to be included in a study, even though his mean concentration is below the lower reference value that is in force. Some subsequent observations may then be lower than this value. The result would be to spread out the lower cut off assumed in our model.
These data suggest that logarithmic or power law distributions provide more appropriate models for sperm production than normal or uniform distributions. This break with tradition can to some extent be justified biologically. For example, we have found simple models of sperm recruitment to ejaculates that generate various power law distributions. Details of these are also available separately from JC.
Discussion and conclusions
The data from Midland Fertility Services, Aldridge, support the hypothesis that the probability distribution of sperm concentration, both in individuals and in populations, resembles a logarithmic distribution. In particular, they are heavily skewed towards lower values and do not resemble either a normal or a uniform distribution. The observed mean value for sperm concentration is therefore very sensitive to the choice of cut offs at lower values — that is the lower reference value for a normal sperm count (measured as a concentration). The analysis summarised in the table and figure 4 indicates that nearly all of the observed decline in mean sperm count may be a consequence of the reduction of the lower reference value and that the evidence presented by Carlsen et al for a decline in sperm quality is unconvincing.
Similar reasoning applies to any sufficiently skewed distribution so we would not expect improved data to change the general line of our argument. However, a decline that was considerably smaller than that reported by Carlsen et al could be consistent with our analysis and might be detectable with confidence, given better data. More extensive data are needed to establish with greater precision the probability distributions of sperm concentration in populations and in individuals.
It is standard to use arithmetic means of sperm counts and concentrations as clinical variables. However, if the hypothesis of near logarithmic distributions is confirmed, then the geometric mean would be a more appropriate statistic.
The level of significance (P<0.0001) reported in the linear regression analysis of Carlsen et al represents only the confidence that the observed mean has changed. It does not indicate the cause of that change. It can be accounted for by a change in the lower reference value for normal sperm count, provided that the distribution for sperm production is sufficiently skewed towards lower values. In particular a change in sperm concentration from 113x109/l to 76x109/l can be entirely accounted for in this way by using a logarithmic distribution, which is supported by the available data. The remaining discrepancy between 76x109/l and 66x109/l is unlikely to be significant.
Instead of confirming the apparent decline in sperm count, as Carlsen et al assert, the change in lower reference value may well be responsible for it.