Preface

R packages

In this practical, a number of R packages are used. The packages used (with versions that were used to generate the solutions) are:

  • R version 3.6.1 (2019-07-05)
  • mice (version: 3.6.0)
  • (not essential) JointAI (version: 0.6.0)

Dataset

For this practical, we will use the NHANES2 dataset, a subset of the data we have seen in the lecture slides.

To load this dataset, you can use the command file.choose() which opens the explorer and allows you to navigate to the location of the file NHANES2_for_practicals.RData on your computer. If you know the path to the file, you can also use load("<path>/NHANES2_for_practicals.RData"). RStudio users can also just click on the file in the “Files” pane/tab to load it.

Preparing for imputation

Set-up run

Imputation needs to be tailored to the dataset at hand and, hence, using the function mice() well requires several arguments to be specified. To make the specification easier it is useful to do a dry-run which will create the default versions of everything that needs to be specified.

These default settings can then be adapted to our data.

Task

Do the set-up run of mice() with the NHANES2 data without any iterations (maxit = 0).

Solution

# Note: This command will not produce any output.
library(mice)
imp0 <- mice(NHANES2, maxit = 0)
## Warning: Number of logged events: 1

Imputation method

There are many imputation methods available in mice. You can find the list in the help page of the mice() function. We will focus here on the following ones:

name variable type description
pmm any Predictive mean matching
norm numeric Bayesian linear regression
logreg binary Logistic regression
polr ordered Proportional odds model
polyreg unordered Polytomous logistic regression

The default imputation methods that mice() selects can be specified in the argument defaultMethod.

If unspecified, mice will use

  • pmm for numerical columns,
  • logreg for factor columns with two categories,
  • polyreg for columns with unordered and
  • polr for columns with ordered factors with more than two categories.

In the NHANES2 data we have the following variables:

par(mar = c(2,3,2,1), mgp = c(2, 0.6, 0))
JointAI::plot_all(NHANES2, nclass = 30)