Preface

R packages

In this practical, a number of R packages are used. The packages used (with versions that were used to generate the solutions) are:

  • R version 3.6.1 (2019-07-05)
  • mice (version: 3.6.0)
  • JointAI (version: 0.6.0)
  • ggplot2 (version: 3.2.1)
  • reshape2 (version: 1.4.3)
  • ggpubr (version: 0.2.2)

Dataset

For this practical, we will use the NHANES3 data, another subset of the data we have already seen in the lecture slides and the previous practicals. It contains only those cases that have observed wgt and some columns that are not needed were excluded.

To load this dataset, you can use the command file.choose() which opens the explorer and allows you to navigate to the location of the file NHANES3_for_practicals.RData on your computer. If you know the path to the file, you can also use load("<path>/NHANES3_for_practicals.RData").

Aim

The focus of this practical is the imputation of data that has features that require special attention.

In the interest of time, we will focus on these features and abbreviate steps that are the same as in any imputation setting (e.g., getting to know the data or checking that imputed values are realistic). Nevertheless, these steps are of course required when analysing data in practice.

Our aim is to fit the following linear regression model for weight:

lm(wgt ~ gender + bili + age * (chol + HDL) + hgt)

We expect that the effects of cholesterol and HDL may differ with age, and, hence, include interaction terms between age and chol and HDL, respectively.

Additionally, we want to include the other variables in the dataset as auxiliary variables.

Imputation using mice

Use of the Just Another Variable approach can in some settings reduce bias. Alternatively, we can use passive imputation, i.e., calculate the interaction terms in each iteration of the MICE algorithm. Furthermore, predictive mean matching tends to lead to less bias than normal imputation models.

Just Another Variable approach

Task 1

  • Calculate the interaction terms in the incomplete data.
  • Perform the setup-run of mice() without any iterations.

Solution 1

# calculate the interaction terms
NHANES3$agechol <- NHANES3$age * NHANES3$chol
NHANES3$ageHDL <- NHANES3$age * NHANES3$HDL

# setup run
imp0 <- mice(NHANES3, maxit = 0, 
             defaultMethod = c('norm', 'logreg', 'polyreg', 'polr'))
imp0
## Class: mids
## Number of multiple imputations:  5 
## Imputation methods:
##      wgt   gender     bili      age     chol      HDL      hgt     educ     race      SBP   hypten 
##       ""       ""   "norm"       ""   "norm"   "norm"   "norm"   "polr"       ""   "norm" "logreg" 
##       WC  agechol   ageHDL 
##   "norm"   "norm"   "norm" 
## PredictorMatrix:
##        wgt gender bili age chol HDL hgt educ race SBP hypten WC agechol ageHDL
## wgt      0      1    1   1    1   1   1    1    1   1      1  1       1      1
## gender   1      0    1   1    1   1   1    1    1   1      1  1       1      1
## bili     1      1    0   1    1   1   1    1    1   1      1  1       1      1
## age      1      1    1   0    1   1   1    1    1   1      1  1       1      1
## chol     1      1    1   1    0   1   1    1    1   1      1  1       1      1
## HDL      1      1    1   1    1   0   1    1    1   1      1  1       1      1

Task 2

Apply the necessary change to the imputation method and predictor matrix.

Since the interaction terms are calculated from the orignal variables, these interaction terms should not be used to impute the original variables.

Solution 2

meth <- imp0$method 
pred <- imp0$predictorMatrix

# change imputation for "bili" to pmm (to prevent negative values)
meth["bili"] <- 'pmm'
 
# changes in predictor matrix to prevent original variables being imputer based 
# on the interaction terms
pred["chol", "agechol"] <- 0
pred["HDL", "ageHDL"] <- 0

meth
##      wgt   gender     bili      age     chol      HDL      hgt     educ     race      SBP   hypten 
##       ""       ""    "pmm"       ""   "norm"   "norm"   "norm"   "polr"       ""   "norm" "logreg" 
##       WC  agechol   ageHDL 
##   "norm"   "norm"   "norm"
pred
##         wgt gender bili age chol HDL hgt educ race SBP hypten WC agechol ageHDL
## wgt       0      1    1   1    1   1   1    1    1   1      1  1       1      1
## gender    1      0    1   1    1   1   1    1    1   1      1  1       1      1
## bili      1      1    0   1    1   1   1    1    1   1      1  1       1      1
## age       1      1    1   0    1   1   1    1    1   1      1  1       1      1
## chol      1      1    1   1    0   1   1    1    1   1      1  1       0      1
## HDL       1      1    1   1    1   0   1    1    1   1      1  1       1      0
## hgt       1      1    1   1    1   1   0    1    1   1      1  1       1      1
## educ      1      1    1   1    1   1   1    0    1   1      1  1       1      1
## race      1      1    1   1    1   1   1    1    0   1      1  1       1      1
## SBP       1      1    1   1    1   1   1    1    1   0      1  1       1      1
## hypten    1      1    1   1    1   1   1    1    1   1      0  1       1      1
## WC        1      1    1   1    1   1   1    1    1   1      1  0       1      1
## agechol   1      1    1   1    1   1   1    1    1   1      1  1       0      1
## ageHDL    1      1    1   1    1   1   1    1    1   1      1  1       1      0

Task 3

Run the imputation using the JAV approach and check the traceplot.

Solution 3

# run imputation with the JAV approach
impJAV <- mice(NHANES3, method = meth, predictorMatrix = pred,
                maxit = 10, m = 5)

plot(impJAV, layout = c(4, 6))