Ways of measuring rates of recurrent eventsBMJ 1996; 312 doi: https://doi.org/10.1136/bmj.312.7027.364 (Published 10 February 1996) Cite this as: BMJ 1996;312:364
- Robert J Glynn, associate professor of medicine (biostatistics)a,
- Julie E Buring, associate professor of ambulatory care and preventiona
- a Division of Preventive Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02215, USA
- Correspondence to: Dr Glynn.
- Accepted 11 October 1995
Recurrent events are common in medical research, yet the best ways to measure their occurrence remain controversial. Moreover, the correct statistical techniques to compare the occurrence of such events across populations or treatment groups are not widely known. In both observational studies and randomised clinical trials one natural and intuitive measure of occurrence is the event rate, defined as the number of events (possibly including multiple events per person) divided by the total person-years of experience. This is often a more relevant and clinically interpretable measure of disease burden in a population than considering only the first event that occurs. Appropriate statistical tests to compare such event rates among treatment groups or populations require the recognition that some individuals may be especially likely to experience recurrent events. Straightforward approaches are available to account for this tendency in crude and stratified analyses. Recently developed regression models can appropriately examine the association of several variables with rates of recurrent events.
Many diseases and other clinical outcomes may recur in the same patient. Examples include asthma attacks, skin cancers, myocardial infarctions, injuries, migraines, seizures in epileptics, and admissions to hospital. Another type of repeated event occurs when a disease can affect paired or multiple organs separately, such as cataract affecting a second eye or cavities in multiple teeth. What measures to use to quantify the occurrence of such conditions remains controversial in both clinical trials and observational studies.1 2 3 4 5 6 Inappropriate statistical approaches are often used to compare rates of recurrent events.6 Although statistical approaches based on sound principles exist, the methodological issues surrounding the study of recurrent events have received insufficient attention in the clinical and epidemiological literature.4 5
Recurrent events are not independent
Windeler and Lange presented several examples of clinical trials in which multiple events in the same participant were inappropriately treated as independent observations.6 They argued that the use of measures including repeated events should be abandoned: more meaningful clinical measures, they claimed, were the percentage of patients who experience a first event, or possibly the number of patients experiencing a first event divided by the total time in the population until these first events occur.
Although the use of naive statistical methods that treat recurrent events as independent observations will often produce misleading results, this does not justify discarding these events from consideration—which, in effect, is what an analysis of only first events does. In many settings an appropriate analysis of recurrent events can yield clinical insights that would be missed in an evaluation of only first events. Our aim here is to present examples where consideration of recurrent events reveals important information.
The challenge in comparing the rates of recurrent events between treatment groups or populations arises because some individuals are more prone to recurrences than others. Thus, standard statistical approaches based on the binomial or Poisson distributions which treat the recurrent events as independent observations are invalid. Straightforward alternative approaches are available, however, which account for the differing tendencies for disease recurrence across individuals. In this article, we also show the problems with analyses that treat recurrent events as independent and describe justified alternatives.
Definitions of rates
We illustrate alternative measures of rates of disease with data from a clinical trial of the effect of regular intake of a cranberry juice drink on bacteriuria and pyuria in elderly women.7 A total of 153 elderly women (mean age 78.5 years) were randomly assigned to consume 300 ml a day of a commercially available cranberry juice drink or a placebo drink that was indistinguishable in taste, appearance, and vitamin C content but lacked cranberry content. After randomisation six clean voided urine samples were collected at roughly monthly intervals. The primary outcome was bacteriuria (organisms numbering >/= 105/ml, regardless of organism) with pyuria in a given study month.
The proportions who had bacteriuria with pyuria at least once during the trial were comparable between groups: 43% in the cranberry group and 46% in the placebo group (table 1). It is preferable to compare the rates of first events, however, to account for varying times until women first had bacteriuria with pyuria and for varying intervals of follow up because of attrition (12 women in the cranberry group and 20 in the placebo group did not complete the trial). On that measure women randomised to the cranberry group had 10.2 first events per 100 person-months in the study, while women in the placebo group had 12.5 first events per 100 person-months. Thus the incidence rates per unit of person-time were also comparable between groups (P=0.40).
When all urine specimens collected throughout the study were considered, however, substantial differences between the groups emerged. To account for varying intervals of follow up, rates of recurrent events were defined in each treatment group as the total number of urine samples with bacteriuria and pyuria divided by the total months of follow up. Overall, 15.0% of urine samples in the cranberry group were positive compared with 28.1% of urine samples in the placebo group. Accounting for the repeated urine samples in participants by the method of Zeger and Liang, the odds of having a positive urine in the cranberry group in any given month was only 0.42 times that in the placebo group (95% confidence interval 0.23 to 0.76; P=0.004).8 The discrepancy between rates of first events and overall rates arose because women in the cranberry group were far more likely to recover from their bacteriuria-pyuria than women in the placebo group. The average one month probability of change from a bacteriuric-pyuric sample to a non-infected sample was 0.54 in the cranberry group and 0.28 in the placebo group (P=0.006). Thus, restriction of analyses to only first events would have obscured important clinical differences in this trial.
Event rates more informative
In health services research the event rate is generally more relevant than the rate of first events. Consider the question of whether rates of admission to hospital vary between areas. For example, among residents of Boston, Massachusetts aged 65-99 in 1989 the admission rate was 0.35 per person, and 22% of this population had at least one admission to hospital.5 The admission rate gives a clearer measure of overall use. In New Haven, Connecticut, the comparable admission rate was 0.20 per person, and 15% of this population had at least one admission. The admission rate in Boston was thus 1.75 times that in New Haven, whereas the percentage of elderly people in Boston with at least one admission was only 1.47 times that in New Haven. Comparison of rates of first events substantially underestimates the greater service use in Boston because a large component of this greater use is due to a much greater tendency to readmit previously admitted patients in Boston.
Cumming et al have similarly argued for the greater public health importance of event rates, rather than only first events, in their studies of falls.4 The risk of fracture increases with each fall; hence the number of falls is a more specific indicator of risk than whether one has fallen. A high rate of recurrent falls may especially increase the risk of injury. As with admissions to hospital, identification only of those who fall at least once can blur important distinctions between groups when one group has a raised probability of recurrence relative to the other.
In ophthalmology (and other specialties with paired organs) counting the number of affected eyes, rather than whether an individual is affected in either eye, can enhance interpretability. For many eye diseases the eye is the fundamental unit requiring treatment, so the costs of care are more directly related to the number of affected eyes than to the number of affected individuals. Clinically, a bilateral visual impairment is far worse than a unilateral impairment because a good fellow eye can, in many ways, compensate for an impaired eye.9 Thus counting the number of affected eyes can add important information about the burden of a disease on a person.
Statistical techniques for multiple events EQUAL FOLLOW UP: NEGATIVE BINOMIAL DISTRIBUTION
When all participants are followed for the same length of time the event rate, including multiple events in a population or treatment group, is the average number of events in that population. Although the Poisson distribution is still used to describe and compare such rates of recurrent events, its limitations for doing so were described long ago. Table 2 shows data from one of the several examples presented by Greenwood and Yule to illustrate these limitations and to propose alternatives.1
These data summarise the pattern of accidents observed in a group of machinists who were followed for three months. Among the 414 machinists 200 accidents occurred: an average rate of 0.483 accidents per machinist. If we assume that no machinist is more prone to accidents than any other and the Poisson distribution with this mean is fitted to these data then the second column displays the numbers of machinists that would be expected to have each number of accidents. The Poisson distribution fits badly, as it greatly overestimates the numbers with one accident and underestimates the numbers with four or more accidents. This typifies the poor performance of the Poisson distribution when events are more likely to recur in some individuals than in others.
As an alternative, Greenwood and Yule recommended using the negative binomial distribution to describe the numbers of recurrent events. This distribution, further described by Fisher10 and Anscombe,11 is completely characterised by its mean and variance. The mean is estimated by the event rate in the population (here 0.483) and is thus the parameter of direct clinical interest. The variance of the distribution is estimated by the variance of the number of events in the population (here 1.01 compared to an expected variance of 0.483 under the Poisson distribution).
One important characteristic of the negative binomial distribution is that its variance is always greater than the variance of a Poisson distribution with the same mean. The attractiveness of the negative binomial distribution to describe recurrent events stems from its derivation. If each member of a population has recurrent events according to an individual Poisson event rate and the Poisson rates vary across the population (according to a specified distribution) then the distribution in the total population of the number of events for each member follows the negative binomial distribution. Thus the negative binomial distribution naturally accommodates the different propensities to events across members of the population. Among these machinists, as in several other examples considered by Greenwood and Yule, the negative binomial distribution fits the data far better than the Poisson distribution and, in particular, agrees with the observation that some individuals are especially accident prone and have five or more accidents.
The potential for incorrect inference that can arise from wrongly using the Poisson distribution when comparing rates of recurrent events across populations or treatment groups is illustrated with the data in table 3. During 1989 rates of admission to hospital among Medicare enrollees aged 65-99 in Castine and Jackman, Maine were 0.20 and 0.28, respectively.5 In both areas the Poisson distribution displays the same poor fit to the observations seen in the accident data, while the negative binomial distribution fits well.
Consider estimation of standard errors and confidence intervals for the mean difference of 0.08 between admission rates in these two areas. Under the Poisson distribution the standard error of this difference is (square root)(0.20/181+0.28/162)=0.053, where 181 and 162 are the numbers of individuals in the respective areas. Under the negative binomial distribution the standard error of this difference is (square root)(0.32/181+0.55/162)=0.072, where 0.32 and 0.55 are the sample variances of the numbers of admissions in the respective areas. An approximate 95% confidence interval for the mean difference based on the poor fitting Poisson distribution (-0.03 to 0.18) is much narrower than the alternative confidence interval based on the negative binomial distribution (-0.06 to 0.22). The substantially higher (35%) standard error based on the negative binomial distribution is typical of what is seen in a large number of examples of comparisons of hospitalisation rates.5 It indicates that appropriate confidence intervals will be much wider, P values higher, and significance tests more conservative than those based on the Poisson distribution.
UNEQUAL FOLLOW UP: OBSERVED VARIABILITY IN INDIVIDUAL RATES OF OCCURRENCE
In clinical studies participants often have different durations of follow up, either because they are enrolled or last seen at different times or because they are lost to follow up. In such studies a meaningful measure of occurrence is often the total number of events divided by the total length of follow up. When each participant can have at most a single event the Poisson distribution is generally used to estimate the variance of such rates and as the basis for significance tests of differences in rates between groups. When events can recur, however, use of the Poisson distribution will usually lead to the same underestimation of variances and P values that occurs when the length of follow up is uniform.
One straightforward alternative is to use the observed variability in individual rates of occurrence to construct more accurate tests and confidence intervals.12 This is a natural extension of the approach used to estimate parameters in the negative binomial distribution for uniform follow up. It accounts for increased variability beyond that expected under the Poisson distribution because of the differing propensities for recurrent events within the population.
For example, in the study of the effect of cranberry juice intake on bacteriuria and pyuria, each woman had an individual event rate defined as her number of months with bacteriuria and pyuria divided by the total number of months she contributed urine samples. The variances of these rates about the overall population rates shown in table 1, weighted for the varying intervals of follow up among women,12 were 0.00076 in the cranberry group and 0.0017 in the placebo group. A simple significance test of the difference between the two groups in overall rate of bacteriuria with pyuria uses as standard error of this difference the square root of the sum of these two variances. It compares the observed rate difference of 0.131 divided by this standard error of 0.0495 to the normal distribution and obtains a P value of 0.008. Fisher et al also used this method to compare rates of admission to hospital between areas in cohorts followed for varying intervals.13
CONTROL OF CONFOUNDING: STANDARDISATION OR REGRESSION
Simple comparisons of rates of recurrent events between two populations may not be valid because the populations may differ on important confounding variables. One common strategy in the presence of confounding is to present standardised rates which are composed of weighted averages of stratum specific rates where the strata are formed by categories of the confounding variables. Whether individuals are followed for uniform lengths of time or for varying intervals, simple extensions of the approaches just described are available to obtain confidence intervals and significance values for differences or ratios of standardised rates of recurrent events.5 12
Control of confounding through standardisation has the attractive features of simplicity and intuitiveness and it maintains the interpretability of rates. Nevertheless, regression approaches will often be more efficient, especially when several potential confounders have to be controlled for simultaneously, and they may also improve control for confounding by better modelling of relations with confounders. Regression models also need to account for different propensities for recurrent events across individuals.
Several authors have presented regression models for estimating relative event rates when events recur non-randomly.8 14 15 16 Analyses of the odds of bacteriuria with pyuria associated with ingestion of cranberry juice7 used the generalised estimating equation approach8 to account for recurrent events in the same women. This approach was also used to control for potential confounding variables, investigate possible modification of the effect over time, and measure the odds of transition from urine with bacteriuria and pyuria to a non-infected state over a one month period.
Other applications of these regression methods have included: evaluation of recurrent biliary symptoms among patients in a clinical trial of treatments for gall stones14; rates of recurrent tumours in carcinogenicity experiments in rats17; and variability in rates of malpractice claims.18 Although transfer of appropriate statistical methods to medical applications is sometimes slow, because of its usefulness and ready applicability in the setting of recurrent events, the paper by Zeger and Liang8 may be one of the most cited of statistical papers published in the last 10 years.19
PAIRED DATA: ACCOUNTING FOR CORRELATION
Repeated events also occur in studies of eyes or other paired organs such as ears, kidneys, or hips. For most common eye diseases the presence of disease in one eye substantially increases the likelihood of contralateral disease. Hence independence of events is also an inappropriate assumption, and statistical approaches that treat events in the two eyes of a single person as independent observations are generally invalid.2 3
Two valid alternatives are generally used to address this problem. The simplest approach treats a person as the unit of analysis and classifies a disease as present whenever it occurs in either eye. Alternative approaches treat the eye as the unit of analysis and explicitly account for the correlation between fellow eyes. These latter approaches offer enhanced statistical power through distinction of individuals who are unilaterally and bilaterally affected.20 21 They are also more interpretable when the characteristics of an eye—for example, intraocular pressure—are evaluated as determinants of an outcome—for example, progression of glaucoma.
Simple approaches are available to correct t tests and χ2 tests when means or rates including data from both eyes are compared between groups.22 In some clinical trials the eye is the unit of randomisation, as one eye randomly receives one treatment and the other eye receives an alternative treatment.23 24 Under this design the eye must be treated as the unit of analysis. Several authors have developed extensions of the proportional hazards model that evaluate time to failure in each eye and explicitly model the association between failure times in fellow eyes.25 26 27
The appropriateness of a measure of effect depends critically on the scientific question to be addressed. For example, primary prevention trials of myocardial infarction or cancer, such as the Physicians' Health Study,28 generally compare rates of first events because a first event may increase the risk of subsequent events and can also affect adherence to study treatments, as well as other behaviours and risk factors. In other settings, however, where events recur more frequently and the patients studied may already have had previous events, consideration of all observed events may be more relevant. Restriction of analyses to only first events in this setting discards much relevant information.
Comparison of rates of recurrent events requires consideration of the tendency for some individuals to have greater propensities for recurrent events than others. Several statistical approaches are available, although some of these are complex and require special statistical software. Identification of the model that best fits the data can be challenging,29 and expert statistical advice is recommended. Nevertheless, the appropriate analysis can often answer questions of greater clinical relevance than a comparison of rates of first events.
We thank Dr Jerry Avorn for his constructive comments as well as additional data from his cranberry juice study.
Funding Supported by grants EY06633 and EY08103 from the National Eye Institute and CA40360 and CA47988 from the National Cancer Institute.
Conflict of interest None.