Imputation model misspecification: how robust are Bayesian methods?


Missing values complicate analyses in many studies. Nevertheless, the availability nowadays of methods, such as Multiple Imputation (MI) in standard software, has enabled researchers to perform statistical analysis accounting for missing data. More recently, fully Bayesian approaches and extensions of MI also have become available in statistical packages. These have been shown to be superior to standard MI, particularly in settings with longitudinal data, non-linear and interaction terms.
In all these approaches, missing values are imputed by draws from the (posterior) predictive distribution of an incomplete variable, conditional on (all) other variables. Therefore, an important requirement is that these predictive distributions fit the data well.
In the literature, relatively little work has been done to investigate robustness of MI to imputation model misspecification, with inconsistent conclusions. In MI with chained equations, the predictive distributions are specified directly and can thus be evaluated directly, however, in practice often no effort is made to check the validity of the postulated model.
Previously, we have proposed a fully Bayesian approach that allows simultaneous analysis and imputation by specifying the joint distribution of the response and all incomplete variables as a sequence, i.e., product, of conditional distributions, of which one is the analysis model of interest. The posterior predictive distribution (PPD) used to draw imputations is derived from this joint distribution and does not generally follow any known distribution. Hence, direct evaluation of its fit to the data is not possible.
In our current work we hypothesise that, in order for this sequential imputation to provide valid results, it is necessary that all conditional distributions involved in the PPD are specified correctly. We investigate if the severity of bias introduced by misspecification depends on the conditional distributions in which it occurs, e.g., in the analysis model or the conditional distribution of the variable to be imputed, and consider misspecifications in both the shape and the mean structure of a distribution, as may occur by not considering skewness or multimodality, omission of important interaction effects, or wrongly assuming associations to be linear. Findings are contrasted to results on robustness of MI and recommendations for evaluation of the model fit made.

Jul 12, 2018
Barcelona, Spain