Bias due to missing exposure data using complete-case analysis in the proportional hazards regression model

Stat Med. 2003 Feb 28;22(4):545-57. doi: 10.1002/sim.1340.

Abstract

We studied bias due to missing exposure data in the proportional hazards regression model when using complete-case analysis (CCA). Eleven missing data scenarios were considered: one with missing completely at random (MCAR), four missing at random (MAR), and six non-ignorable missingness scenarios, with a variety of hazard ratios, censoring fractions, missingness fractions and sample sizes. When missingness was MCAR or dependent only on the exposure, there was negligible bias (2-3 per cent) that was similar to the difference between the estimate in the full data set with no missing data and the true parameter. In contrast, substantial bias occurred when missingness was dependent on outcome or both outcome and exposure. For models with hazard ratio of 3.5, a sample size of 400, 20 per cent censoring and 40 per cent missing data, the relative bias for the hazard ratio ranged between 7 per cent and 64 per cent. We observed important differences in the direction and magnitude of biases under the various missing data mechanisms. For example, in scenarios where missingness was associated with longer or shorter follow-up, the biases were notably different, although both mechanisms are MAR. The hazard ratio was underestimated (with larger bias) when missingness was associated with longer follow-up and overestimated (with smaller bias) when associated with shorter follow-up. If it is known that missingness is associated with a less frequently observed outcome or with both the outcome and exposure, CCA may result in an invalid inference and other methods for handling missing data should be considered.

MeSH terms

  • Alleles
  • Apolipoproteins E / genetics
  • Bias*
  • Biomedical Research
  • Cardiovascular Diseases / genetics
  • Data Collection / standards*
  • Data Collection / statistics & numerical data
  • Genotype
  • Humans
  • Male
  • Proportional Hazards Models*
  • United States

Substances

  • Apolipoproteins E