In this practical, a number of R packages are used. The packages used (with versions that were used to generate the solutions) are:

- R version 3.6.1 (2019-07-05)
`mice`

(version: 3.6.0)`JointAI`

(version: 0.6.0)`ggplot2`

(version: 3.2.1)`reshape2`

(version: 1.4.3)`ggpubr`

(version: 0.2.2)

For this practical, we will use the **NHANES3** data, another subset of the data we have already seen in the lecture slides and the previous practicals. It contains only those cases that have observed `wgt`

and some columns that are not needed were excluded.

To load this dataset, you can use the command `file.choose()`

which opens the explorer and allows you to navigate to the location of the file `NHANES3_for_practicals.RData`

on your computer. If you know the path to the file, you can also use `load("<path>/NHANES3_for_practicals.RData")`

.

The focus of this practical is the imputation of data that has features that require special attention.

In the interest of time, we will focus on these features and abbreviate steps that are the same as in any imputation setting (e.g., getting to know the data or checking that imputed values are realistic). **Nevertheless, these steps are of course required when analysing data in practice.**

Our aim is to fit the following **linear regression model for weight**:

We expect that the effects of cholesterol and HDL may differ with age, and, hence, include **interaction terms** between `age`

and `chol`

and `HDL`

, respectively.

Additionally, we want to include the other variables in the dataset as auxiliary variables.

Use of the *Just Another Variable* approach can in some settings reduce bias. Alternatively, we can use *passive imputation*, i.e., calculate the interaction terms in each iteration of the MICE algorithm. Furthermore, *predictive mean matching* tends to lead to less bias than normal imputation models.

- Calculate the interaction terms in the incomplete data.
- Perform the setup-run of
`mice()`

without any iterations.

```
# calculate the interaction terms
NHANES3$agechol <- NHANES3$age * NHANES3$chol
NHANES3$ageHDL <- NHANES3$age * NHANES3$HDL
# setup run
imp0 <- mice(NHANES3, maxit = 0,
defaultMethod = c('norm', 'logreg', 'polyreg', 'polr'))
imp0
```

```
## Class: mids
## Number of multiple imputations: 5
## Imputation methods:
## wgt gender bili age chol HDL hgt educ race SBP hypten
## "" "" "norm" "" "norm" "norm" "norm" "polr" "" "norm" "logreg"
## WC agechol ageHDL
## "norm" "norm" "norm"
## PredictorMatrix:
## wgt gender bili age chol HDL hgt educ race SBP hypten WC agechol ageHDL
## wgt 0 1 1 1 1 1 1 1 1 1 1 1 1 1
## gender 1 0 1 1 1 1 1 1 1 1 1 1 1 1
## bili 1 1 0 1 1 1 1 1 1 1 1 1 1 1
## age 1 1 1 0 1 1 1 1 1 1 1 1 1 1
## chol 1 1 1 1 0 1 1 1 1 1 1 1 1 1
## HDL 1 1 1 1 1 0 1 1 1 1 1 1 1 1
```

Apply the necessary change to the imputation method and predictor matrix.

Since the interaction terms are calculated from the orignal variables, these interaction terms should not be used to impute the original variables.

```
meth <- imp0$method
pred <- imp0$predictorMatrix
# change imputation for "bili" to pmm (to prevent negative values)
meth["bili"] <- 'pmm'
# changes in predictor matrix to prevent original variables being imputer based
# on the interaction terms
pred["chol", "agechol"] <- 0
pred["HDL", "ageHDL"] <- 0
meth
```

```
## wgt gender bili age chol HDL hgt educ race SBP hypten
## "" "" "pmm" "" "norm" "norm" "norm" "polr" "" "norm" "logreg"
## WC agechol ageHDL
## "norm" "norm" "norm"
```

```
## wgt gender bili age chol HDL hgt educ race SBP hypten WC agechol ageHDL
## wgt 0 1 1 1 1 1 1 1 1 1 1 1 1 1
## gender 1 0 1 1 1 1 1 1 1 1 1 1 1 1
## bili 1 1 0 1 1 1 1 1 1 1 1 1 1 1
## age 1 1 1 0 1 1 1 1 1 1 1 1 1 1
## chol 1 1 1 1 0 1 1 1 1 1 1 1 0 1
## HDL 1 1 1 1 1 0 1 1 1 1 1 1 1 0
## hgt 1 1 1 1 1 1 0 1 1 1 1 1 1 1
## educ 1 1 1 1 1 1 1 0 1 1 1 1 1 1
## race 1 1 1 1 1 1 1 1 0 1 1 1 1 1
## SBP 1 1 1 1 1 1 1 1 1 0 1 1 1 1
## hypten 1 1 1 1 1 1 1 1 1 1 0 1 1 1
## WC 1 1 1 1 1 1 1 1 1 1 1 0 1 1
## agechol 1 1 1 1 1 1 1 1 1 1 1 1 0 1
## ageHDL 1 1 1 1 1 1 1 1 1 1 1 1 1 0
```

Run the imputation using the **JAV approach** and check the traceplot.