In this practical, a number of R packages are used. The packages used (with versions that were used to generate the solutions) are:
mice
(version: 3.6.0)JointAI
(version: 0.6.0)For this practical, we will use the NHANES2 dataset, a subset of the data we have seen in the lecture slides.
To load this dataset, you can use the command file.choose()
which opens the explorer and allows you to navigate to the location of the file NHANES2_for_practicals.RData
on your computer. If you know the path to the file, you can also use load("<path>/NHANES2_for_practicals.RData")
. RStudio users can also just click on the file in the “Files” pane/tab to load it.
Imputation needs to be tailored to the dataset at hand and, hence, using the function mice()
well requires several arguments to be specified. To make the specification easier it is useful to do a dry-run which will create the default versions of everything that needs to be specified.
These default settings can then be adapted to our data.
Do the set-up run of mice()
with the NHANES2 data without any iterations (maxit = 0
).
There are many imputation methods available in mice. You can find the list in the help page of the mice()
function. We will focus here on the following ones:
name | variable type | description |
---|---|---|
pmm | any | Predictive mean matching |
norm | numeric | Bayesian linear regression |
logreg | binary | Logistic regression |
polr | ordered | Proportional odds model |
polyreg | unordered | Polytomous logistic regression |
The default imputation methods that mice()
selects can be specified in the argument defaultMethod
.
If unspecified, mice
will use
pmm
for numerical columns,logreg
for factor columns with two categories,polyreg
for columns with unordered andpolr
for columns with ordered factors with more than two categories.In the NHANES2
data we have the following variables: