The effect of omitted covariates on confidence interval and study power in binary outcome analysis: A simulation study

https://doi.org/10.1016/j.cct.2006.08.007Get rights and content

Abstract

Background/objectives

The consequence of omitted but balanced covariates on odds ratio point estimation is well-known in the literature. When exposure or intervention has a non-null effect on disease outcome, omitted covariates lead to underestimation of the effect of exposure or intervention. However, the effect of omitted covariates on confidence interval and study power is unknown.

Study design and setting

A simulation study is carried out to assess the effect of omitted covariates on confidence interval and study power for a plausible range of scenarios. Coverage probability and study power are assessed systematically over a range of study size, type of omitted covariate and magnitude of effect. A real-life example using a randomised experiment on flies' sexuality is provided.

Results

When a balanced covariate is omitted, coverage probability was lowered by 2.9–80%. Likewise study power was reduced by as much as 58%. The impact becomes substantial when the covariate is continuous, has large variability and has a larger effect than the effect of exposure or intervention. The result from a real-life example concurs with the simulation finding.

Conclusion

Omitting an important balanced covariate lowers both coverage probability and study power. This implies the need for thoughtful consideration of important covariates at the design as well as the analysis stages of a study.

Introduction

The consequence of omitting balanced covariates under various non-linear models was first demonstrated by Gail et al. [1]. By balanced covariate we refer to the distribution of the covariate being comparable between the exposure/intervention groups. Asymptotically, i.e., as sample size increases, randomisation ensures well-balanced/comparable intervention groups and hence minimizing the potential for confounding. However, for small-to-moderate studies, chance disparity/imbalance in the distribution of important covariates is still a possibility after randomisation unless stratified block randomisation is employed. The main finding of Gail et al. [1] was a downward bias or underestimation of the effect of the exposure or intervention of interest when important covariates are omitted. Hauck et al. [2] have also discussed the same issue in terms of the different definitions of confounding and illustrated how these definitions at times disagree. In addition, these authors pointed out an important distinction in that the issue of omitted covariates is different than that of classical confounding [3]. The direction of the bias involved in the case of omitted covariates is predictable, i.e., always towards the null, as opposed to classical confounding which could be either way. Particularly, if the variable of main interest such as exposure or intervention does not have effect on outcome, then omitted covariates do not introduce bias while confounders do. Chao et al. [4] extended the general attenuation effect result to the case of correlated binary outcomes.

What is not known in the literature is the impact of omitted covariates on confidence interval and study power. Hauck et al. [2] argue, indirectly, that omitted covariates lead to loss of efficiency since omitting covariates is some form of model misspecification [5]. The goal of this paper is to investigate using a simulation study the effect of omitted but balanced covariates on confidence interval estimation and study power in an uncorrelated binary outcome setting.

Section snippets

Example of what is already known

Table 1 illustrates the effect of an omitted covariate on point estimation of odds ratio using a hypothetical study.

P(D|Ē) is the proportion with the outcome/disease among subjects without exposure/intervention and P(D|E) is the proportion with the outcome/disease among subjects with exposure/intervention. In addition, it is assumed that the probability of assignment to each stratum of the covariate is 0.5 and exposure within each stratum is balanced. Under this configuration, the stratum

Simulation study

We simulated data consisting of disease status (D), a binary exposure (E) and a covariate (C), satisfying independence between exposure and covariate. The exposure and the covariate are associated with disease status through the following model:logitP(D|E,C)=β0+β1E+β2C

The coefficients of E and C in the above model were chosen so as to provide a wide range of combinations between exposure and omitted covariate effects. For exposure, an odds ratio of 2.0 was considered throughout while

Results

In the absence of an exposure effect, there was no bias involved in the estimation of the odds ratio due to omitted covariate. This has been shown analytically [2]. Moreover, under this scenario, the 95% confidence interval has the correct coverage probability and the size of the test was also correct. The empirical coverage rate was within the range of 0.948–0.953 while the empirical Type I error was within the range of 0.045–0.052 (data not shown).

When the effect of the binary omitted

Example: sexual activity and longevity of male fruit flies

We give a real-life example using an interesting experimental data set that appeared previously in the literature [10], [11]. The design of the experiment has been described in detail elsewhere [10]. In short, 125 fruit flies were randomly divided into five groups of 25 to determine whether increased reproduction reduces the longevity of male flies. This effect is known to occur in female flies. Sexual activity of individual males was manipulated by providing each male in the first group with

Discussion

When the variable of interest, i.e., exposure/intervention, has a non-null effect on disease risk the impact of an omitted but balanced covariate is to bias the odds ratio towards the null. This impact also extends to a shift of the corresponding 95% confidence interval to the null with a reduced coverage probability than the nominal level. In addition, study power will be reduced as compared to the model including the covariate. The impact is more dramatic when the effect of the omitted

References (14)

  • W.W. Hauck et al.

    A consequence of omitted covariates when estimating odds ratios

    J Clin Epidemiol

    (1991)
  • W.W. Hauck et al.

    Should we adjust for covariates in nonlinear regression analyses of randomized trials?

    Control Clin Trials

    (1998)
  • M.H. Gail et al.

    Biased estimates of treatment effect in randomized experiments with non-linear regression and omitted covariates

    Biometrika

    (1984)
  • S. Greenland

    Absence of confounding does not correspond to collapsibility of the rate ratio or rate difference

    Epidemiology

    (1996)
  • W.-H. Chao et al.

    Effect of omitted confounders on the analysis of correlated binary data

    Biometrics

    (1997)
  • S.W. Lagakos

    Effects of mismodelling and mismeasuring explanatory variables on tests of their association with a response variable

    Stat Med

    (1988)
  • A.J. Hanley et al.

    Statistical analysis using Generalized Estimating Equations (GEE): an orientation

    Am J Epidemiol

    (2003)
There are more references available in the full text version of this article.

Cited by (19)

  • A clustered randomized controlled trial to assess whether Living Peace Intervention (LPint) reduces domestic violence and its consequences among families of targeted men in Eastern Democratic Republic of the Congo (DRC): Design and methods

    2022, Evaluation and Program Planning
    Citation Excerpt :

    These will be covariates with a strong association with outcome and those with a solid imbalance between treatment groups. A negligible impact will be defined as a less than 5% change in the regression coefficient for the LPint effect after stepwise removal of the covariates variable from the model (Negassa & Hanley, 2007). A difference of more than 5% in the adjusted effect estimate from the crude effect estimate will be considered confounding (Negassa & Hanley, 2007).

  • Overcoming underpowering: Trial simulations and a global rank end point to optimize clinical trials in children with heart disease

    2020, American Heart Journal
    Citation Excerpt :

    Trials in congenital heart disease are especially susceptible to dilution of the treatment effect size-to-noise ratio because there are wide heterogeneity in patient diagnoses and significant variability in outcomes depending on case complexity, preoperative risk factors, and center-level expertise. Prior simulation studies have retrospectively analyzed various RCTs and demonstrated the potential power gains associated with covariate adjustment.9,29-32 Expert and regulatory consensus is that optimal covariate adjustment should be prespecified but ideally based upon the known prognostic value of various covariates.12,33

  • Bayesian adaptive clinical trials of combination treatments

    2017, Contemporary Clinical Trials Communications
  • Estimating adjusted NNTs in randomised controlled trials with binary outcomes: A simulation study

    2010, Contemporary Clinical Trials
    Citation Excerpt :

    In the RCT setting with balanced covariates it is useful in any case to average over the whole sample rather than over only the treated or the untreated patients because all patients are eligible for treatment. It is known that the consequence of adjusting for a balanced covariate in logistic regression is on one hand a loss of precision but on the other hand an increased efficiency in testing for a treatment effect, i.e., a higher study power [8–12]. The reason for the latter is that the downward bias induced by omitting the covariate is avoided.

View all citing articles on Scopus
View full text