Falling sperm quality: fact or fiction?BMJ 1994; 309 doi: https://doi.org/10.1136/bmj.309.6946.1 (Published 02 July 1994) Cite this as: BMJ 1994;309:1
- S Farrow
Epidemiologists should be able to tell us whether sperm quality has changed over time, or at least whether the quantity has changed. In 1992 Carlsen and colleagues concluded that sperm concentration per unit volume had fallen by 40% over the previous 50 years.1 This finding led to much speculation about the cause: oestrogens or pesticides in meat or water were the popular culprits (Horizon, “Assault on the Male: a Horizon Special,” BBC, 1993 Oct 31). In an article in this week's journal Bromwich and colleagues argue that Carlsen et al applied the wrong form of analysis and that an artefact explains nearly all of the putative “fall” (p 19).2
Before we enter the modern debate it is worth being reminded that the hypothesis of falling sperm quality attracted most attention in the 1970s, but it was debated in the peaceful obscurity of the specialist journals.3 Macleod and Wang temporarily silenced that debate; on the basis of a 10 year study of over 15000 men they concluded that there had been no decline.4 They considered all the large scale studies and compared them with their unique series from New York, where men had been analysed in the same laboratory for 30 years, with the same selection criteria and the same analytical methods being used.
This time around, Carlsen and colleagues' paper in the BMJ attracted and attention of the non-medical media. The question, nevertheless, raises difficult issues for the armchair epidemiologist, not least because of the problem of gaining access to the specialist literature. A typical postgraduate medical library in Britain is likely to have less than 10% of the original 61 papers quoted by Carlsen et al. Few reviewers have the time, energy, or resources to retrieve the full list. Those of us who have read most of the papers find several grounds for criticising Carlsen and colleagues' paper.
The authors began by searching MEDLINE and Index Medicus. This method itself produces its own publication selection bias, frequently fails to identify relevant articles in the scientific literature, and leaves out books and reports and other grey literature. The next problem was that of “patient” selection bias. Some men were examined before vasectomy; some were captured while their partners were attending antenatal clinics; some were volunteer donors participating in artificial insemination programmes (often medical students); and some were recruited as part of an occupational study in which a suspected hazard was being investigated. Others were recruited from infertility clinics but were included only if their partners subsequently became pregnant. Although Carlsen et al discussed selection bias, it would have been more helpful if the groups had been kept separate and analysed separately.
Bromwich et al are right to point out the theoretical problem of selection bias resulting from the changing definition of “normal” sperm counts, which may have led to some men being excluded from some of the earlier studies, particularly those of healthy “normal” volunteers. Given the number of such studies, however, this is unlikely to have substantially biased the results of Carlsen and colleagues' analysis.
Carlsen et al included studies irrespective of their sample size; many of them were so small that they would not normally be considered to be admissible as evidence. One study was of seven men5; 11 studies were of fewer than 20 men and 29 studies of fewer than 50 men. Carlsen et al weighted these studies according to the logarithm of the sample size so that studies with small sample sizes were given greater weight than they deserved.
Although reference was made to the difference between the date of the study and the data of publication, no allowance was made when there were substantial discrepancies. For example, the largest study, with a sample size of 4435, was based on data collected between 11 and five years before publication.
Given the nature of the skewed distribution of the data, use of a geometric or logarithmic mean is clearly preferable to use of an arthmetic mean — a point that was made as early as 1979.4 Bromwich et al make much of this skewness but from a theoretical perspective.2 Given the plethora of research data, it is surprising that they chose to speculate rather than cite the literature.
Perhaps most puzzling of all is why Carlsen et al fitted a simple regression line to the data for sperm concentration and time. Many different ways of analysing time data exist, but rarely would a biological variable lend itself to simple regression. Why not try an exponential or logarithmic curve or even something cyclical? Indeed, several of the papers that Carlsen et al quoted applied cosine waves to several years of consecutive data with the conclusion that sperm concentration had an annual rhythm with peaks in the spring (March) and troughs in the autumn (September). Before deciding on which curve to apply it is useful to have some underlying hypothesis. Did an event in the past “cause” the decline or is some factor still operating? it is also useful to have a purpose — for example, to explain the past or predict the future.
Several lessons should be learnt from the BMJ's sage of vanishing sperm. Researchers need to exercise great care in how they collect data and describe their methods and results in journals. This is not a call for a return to simple empiricism, in which data are seen as being supreme. We would do well to remember that the work “datum” means “given” and provides us with information based on what we choose to collect and what we choose to leave out. There is inherent bias in how we define the problem in the first place. Once measurement begins, bias and error are fundamental.
The misapplication of increasingly sophisticated statistical tests (of which regression is one of the simple) is becoming commonplace. More rigorous application of legitimate methods of analysis is required, particularly when time series are being analysed. When inferences are being drawn over time we deserve more than simple analysis: we need to follow McKeown's example and seek corroborating evidence from a wide range of disciplines.6 Why not refer to the extensive data from veterinary research?
By the nature of their work epidemiologists erect hypotheses and invite others to test them to destruction. They run a constant occupational risk, that of being mistaken. Editors have their part to play in protecting this small occupational group form doing themselves damage. Alternatively, editors could consider printing the names of referees alongside articles. This would do nothing to allay anxieties, but it would certainly spread the blame