Imputation of missing covariates: when standard methods may fail


Our work is motivated by examples from two large cohort studies, the Generation R Study and the Rotterdam Study, in which the analysis models of interest involved non-linear effects, interaction terms or had a longitudinal outcome. As is the case for most observational datasets, missing values in multiple variables complicated the analyses. The most popular method to deal with missing values is multiple imputation using the fully conditional specification (FCS). In settings like our motivating examples, however, the analysis and imputation models specified by FCS are incompatible, which violates an important assumption of FCS and may result in severely biased estimates. Even though many applied researchers have to deal with incomplete data, often they are not aware of the assumptions that are required to obtain valid results from the imputation methods implemented in standard software or the bias that may result from violations.
In our present work, we briefly review assumptions of FCS and the potential effects of violations thereof. We discuss previously proposed extensions aiming to reduce bias due to incompatibility and contrast them to recent approaches that specify imputation models that assure compatibility utilizing the Bayesian framework. Focusing on methods that are available in existing or newly developed R packages, the application of these methods will be illustrated for generalized linear as well as linear mixed models that involve non-linear effects or interaction terms, using relevant data from several recent studies.

Jul 11, 2017
38th Annual Conference of the International Society for Clinical Biostatistics
Vigo, Spain