How black-box use of imputation can cause bias


Missing values pose a complication many applied researchers need to deal with, however, the handling of missing values is usually not the focus of the research. As a consequence, standard imputation methods that are readily available in software, like multiple imputation using chained equations (MICE), are applied in a black-box fashion to fix the problem and move on to the analysis of interest quickly, and the appropriateness of the imputation models not considered. Furthermore, researchers are often unaware of the assumptions implied by the imputation method and the bias that can be caused by their violation. Violations can be due to imputation model misspecification. This may occur when associations between outcome and covariates, or among covariates, are non-linear or interact with each other, or when incomplete continuous variables are not (conditionally) normally distributed. Moreover, complex outcomes, like repeatedly measured or survival outcomes, require special attention. In the present study, we evaluate the bias caused by several misspecifications of imputation models in MICE and a fully Bayesian approach that has been shown to be superior to MICE in settings with complex outcomes. Moreover, we discuss possible extensions of the Bayesian approach with non- and semi-parametric methods, such as penalizes splines, to increase the flexibility of the imputation models and thereby reduce the risk of misspecifications. Additionally, we investigate if, and how, posterior predictive checks can be used to automate evaluation of imputation model fit and to alert users of potential misspecifications.

Kiel, Germany