Bayesian Imputation of Missing Covariates


Missing values are a pervasive problem in almost all kinds of studies. In large cohort studies, the type of study most often conducted in the field of epidemiology, missing observations in covariates pose the major challenge. Since measurements are taken in an uncontrolled environment, typically many covariates need to be considered as potential confounders to filter out unwanted influences that environmental factors may have on the estimates of interest. Due to the large number of variables measured and the fact that measurement often relies on participants recalling and reporting detailed information, large proportions of missing data are common in these types of studies.

In light of the above, the research that forms this thesis focuses on the analysis of incomplete cohort study data where missingness is in the covariates.

We describe a fully Bayesian approach to analyse and impute data in this setting and discuss a number of naive and more sophisticated approaches to impute such data using multiple imputation with chained equations (MICE). The fully Bayesian approach is applied to multiple applications from the field of Epidemiology, and is further extended to settings with time-varying covariates, in which additional challenges, such as the functional form of the association between outcome and covariate and potential endogeneity arise.

Moreover, the implementation of the fully Bayesian approach in the R package JointAI is described and illustrated by means of various examples.

Read the html version (created with bookdown)