In this practical, a number of R packages are used. The packages used (with versions that were used to generate the solutions) are:
mice
(version: 3.6.0)RColorBrewer
(version: 1.1.2)reshape2
(version: 1.4.3)ggplot2
(version: 3.2.1)For this practical, we will again use the NHANES2 dataset that we have seen in the previous practical.
To load this dataset, you can use the command file.choose()
which opens the explorer and allows you to navigate to the location of the file NHANES2_for_practicals.RData
on your computer. If you know the path to the file, you can also use load("<path>/NHANES2_for_practicals.RData")
. RStudio users can also just click on the file in the “Files” pane/tab to load it.
The imputed data are stored in a mids
object called imp
that we created in the previous practical.
You can load it into your workspace by clicking the object imps.RData
if you are using RStudio. Alternatively, you can load this workspace using load("<path>/imps.RData")
. You then need to run:
It is good practice to make sure that mice()
has not done any processing of the data that was not planned or that you are not aware of. This means checking that the correct method
, predictorMatrix
and visitSequence
were used.
Do these checks for imp
.
## [1] TRUE
## [1] TRUE
## [1] TRUE
## HDL race bili smoke DM gender
## "norm" "" "norm" "polr" "" ""
## WC chol HyperMed alc SBP wgt
## "norm" "norm" "" "polr" "norm" "norm"
## hypten cohort occup age educ albu
## "logreg" "" "polyreg" "" "polr" "norm"
## creat uricacid BMI hypchol hgt
## "pmm" "norm" "~I(wgt/hgt^2)" "logreg" "norm"
## HDL race bili smoke DM gender WC chol HyperMed alc SBP wgt hypten cohort occup age educ
## HDL 0 1 1 1 1 1 1 1 0 1 1 0 1 0 1 1 1
## race 1 0 1 1 1 1 1 1 0 1 1 0 1 0 1 1 1
## bili 1 1 0 1 1 1 1 1 0 1 1 0 1 0 1 1 1
## smoke 1 1 1 0 1 1 1 1 0 1 1 0 1 0 1 1 1
## DM 1 1 1 1 0 1 1 1 0 1 1 0 1 0 1 1 1
## gender 1 1 1 1 1 0 1 1 0 1 1 0 1 0 1 1 1
## WC 1 1 1 1 1 1 0 1 0 1 1 0 1 0 1 1 1
## chol 1 1 1 1 1 1 1 0 0 1 1 0 1 0 1 1 1
## HyperMed 1 1 1 1 1 1 1 1 0 1 1 0 1 0 1 1 1
## alc 1 1 1 1 1 1 1 1 0 0 1 0 1 0 1 1 1
## SBP 1 1 1 1 1 1 1 1 0 1 0 0 1 0 1 1 1
## wgt 1 1 1 1 1 1 0 1 0 1 1 0 1 0 1 1 1
## hypten 1 1 1 1 1 1 1 1 0 1 1 0 0 0 1 1 1
## cohort 1 1 1 1 1 1 1 1 0 1 1 0 1 0 1 1 1
## occup 1 1 1 1 1 1 1 1 0 1 1 0 1 0 0 1 1
## age 1 1 1 1 1 1 1 1 0 1 1 0 1 0 1 0 1
## educ 1 1 1 1 1 1 1 1 0 1 1 0 1 0 1 1 0
## albu 1 1 1 1 1 1 1 1 0 1 1 0 1 0 1 1 1
## creat 1 1 1 1 1 1 1 1 0 1 1 0 1 0 1 1 1
## uricacid 1 1 1 1 1 1 1 1 0 1 1 0 1 0 1 1 1
## BMI 1 1 1 1 1 1 1 1 0 1 1 0 1 0 1 1 1
## hypchol 1 1 1 1 1 1 1 1 0 1 1 0 1 0 1 1 1
## hgt 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1
## albu creat uricacid BMI hypchol hgt
## HDL 1 1 1 1 1 0
## race 1 1 1 1 1 0
## bili 1 1 1 1 1 0
## smoke 1 1 1 1 1 0
## DM 1 1 1 1 1 0
## gender 1 1 1 1 1 0
## WC 1 1 1 1 1 0
## chol 1 1 1 1 0 0
## HyperMed 1 1 1 1 1 0
## alc 1 1 1 1 1 0
## SBP 1 1 1 1 1 0
## wgt 1 1 1 0 1 1
## hypten 1 1 1 1 1 0
## cohort 1 1 1 1 1 0
## occup 1 1 1 1 1 0
## age 1 1 1 1 1 0
## educ 1 1 1 1 1 0
## albu 0 1 1 1 1 0
## creat 1 0 1 1 1 0
## uricacid 1 1 0 1 1 0
## BMI 1 1 1 0 1 0
## hypchol 1 1 1 1 0 0
## hgt 1 1 1 0 1 0
## [1] "HDL" "race" "bili" "smoke" "DM" "gender" "WC" "chol"
## [9] "HyperMed" "alc" "SBP" "wgt" "hypten" "cohort" "occup" "age"
## [17] "educ" "albu" "creat" "uricacid" "hypchol" "hgt" "BMI"
Checking the loggedEvent
shows us if mice()
detected any problems during the imputation.
Check the loggedEvents
for imp
.
Let’s see what would have happened if we had not prepared the predictorMatrix
, method
and visitSequence
before imputation.
Run the imputation without setting any additional arguments:
impnaive <- mice(NHANES2, m = 5, maxit = 10)
Take a look at the loggedEvents
of impnaive
.