Preface

R packages

In this practical, a number of R packages are used. If any of them are not installed you may be able to follow the practical but will not be able to run all of the code. The packages used (with versions that were used to generate the solutions) are:

  • R version 3.6.1 (2019-07-05)
  • mice (version: 3.6.0)
  • nlme (version: 3.1.140)
  • JointAI (version: 0.6.0)
  • splines (version: 3.6.1)
  • ggplot2 (version: 3.2.1)

Dataset

In this practical we will work with data from a trial on primary biliary cirrhosis (PBC) of the liver.

To get the pbclong data, load the file pbclong.RData. You can download it here. To load this dataset, you can use the command file.choose() which opens the explorer and allows you to navigate to the location of the downloaded file on your computer. If you know the path to the file, you can also use load("<path>/<pbclong.RData").

The variables contained in the dataset pbclong are:

id patient identifier
day continuously measured day of follow-up time (the time variable)
sex patients’ sex (f: female, m: male)
trt treatment group (0: D-penicillmain, 1: placebo)
age patients’ age at intake
ascites presence of ascites at baseline (0:no, 1:yes)
hepato presence of hepatomegaly or enlarged liver
bili serum bilirubin level at baseline
copper urine copper (ug/day)
albumin serum albumin level at follow-up (time-varying)

The variables have the following distributions and proportions of missing values:

The missing data pattern is:


The longitudinal outcome albumin shows relatively linear trajectories over time:

To analyse the trajectories of albumin we want to use the following linear mixed effects model with random intercept and slope (either using lme from the package nlme or using lmer from the package lme4):

# using package nlme
lme(albumin ~ day + sex + trt + age + ascites + bili + copper, random = ~day|id)


# using package lme4
lmer(albumin ~ day + sex + trt + age + ascites + bili + copper + (day|id))

Imputation using mice

For imputation of longitudinal data, the mice package provides special imputation methods that take into account the two-level structure of the data.

In the pbclong data, missing values occur in baseline covariates ascites and copper, and the time-varying variable hepato.

For imputation of baseline covariates, the imputation methods

  • 2lonly.pmm and
  • 2lonly.norm

are available.

Time-varying variables can be imputed with imputation methods such as 2l.norm or 2l.bin. Since the imputation of hepato with 2l.bin takes quite some time, we will omit it in this part of the practical.

The predictorMatrix requires some extra specification to identify the “id” variable (set to -2) and the random effects structure (set to 2).

Task 1

Run the setup imputation and perform the necessary changes in the imputation method and predictor matrix.

mice does not recognize automatically that the data are multil-level, hence the settings chosen by default are not correct. You need to specify imputation methods for all incomplete variables.

Solution 1

micelong0 <- mice(pbclong, maxit = 0)
meth_micelong <- micelong0$method
pred_micelong <- micelong0$predictorMatrix

# don't impute hepato
meth_micelong[c("hepato")] <- ""
# exclude hepato from predictor of other models (because incomplete)
pred_micelong[, c("hepato")] <- 0

meth_micelong[c("copper", "ascites")] <- "2lonly.pmm"

pred_micelong[, "id"] <- -2
pred_micelong[, "day"] <- 2


# check the imputation method
meth_micelong
##           id          trt          age          sex          day      ascites       hepato 
##           ""           ""           ""           ""           "" "2lonly.pmm"           "" 
##         bili      albumin       copper 
##           ""           "" "2lonly.pmm"
# check the predictor matrix
pred_micelong
##         id trt age sex day ascites hepato bili albumin copper
## id      -2   1   1   1   2       1      0    1       1      1
## trt     -2   0   1   1   2       1      0    1       1      1
## age     -2   1   0   1   2       1      0    1       1      1
## sex     -2   1   1   0   2       1      0    1       1      1
## day     -2   1   1   1   2       1      0    1       1      1
## ascites -2   1   1   1   2       0      0    1       1      1
## hepato  -2   1   1   1   2       1      0    1       1      1
## bili    -2   1   1   1   2       1      0    0       1      1
## albumin -2   1   1   1   2       1      0    1       0      1
## copper  -2   1   1   1   2       1      0    1       1      0

Task 2

  • run the imputation with the adapted imputation method meth_micelong and predictor matrix pred_micelong
  • analyse the imputed data
  • pool the results

Solution 2

micelong <- mice(pbclong, meth = meth_micelong, pred = pred_micelong,
                 maxit = 20, seed = 2019, printFlag = FALSE)

library(nlme)
micelong_mira <- with(micelong, lme(albumin ~ day + sex + trt + age + ascites + 
                                      bili + copper, random = ~day|id)
)

# alternative:
# library(lme4)
# micelong_mira <- with(micelong, lmer(albumin ~ day + sex + trt + age + ascites +
#                                        bili + copper + (day|id))
# )

summary(pool(micelong_mira), conf.int = TRUE)

Imputation using JointAI

To analyse incomplete longitudinal data using a linear mixed model the R package JointAI provides the function lme_imp(). The specification of the main model components is analogous to the function lme() from the nlme package.

Specification of longitudinal models:
When imputing variables in a longitudinal (or other multi-level) model and there are missing values in baseline (level-2) covariates, models need to be specified for all longitudinal covariates, even if they do not have missing values. Specifying no model would imply that the incomplete baseline covariates are independent of the complete longitudinal variable (see also here). Therefore, JointAI automatically specifies models for all longitudinal covariates in such a setting.

An exception may be the time variable: it is often reasonable to assume that the baseline covariates are independent of the measurement times of the outcome and longitudinal covariates. To tell JointAI not to specify a model for the time variable, the argument no_model can be used.

Model types for longitudinal covariates:
For longitudinal covariates the following model types are implemented:
name model variable type
lmm linear mixed model continuous variables
glmm_logit logistic mixed model factors with two levels
glmm_gamma gamma mixed model (with log-link) skewed continuous variables
glmm_poisson poisson mixed model count variables
clmm cumulative logit mixed model ordered factors with >2 levels

More info:
For the specification of the other arguments of lme_imp(), refer to

Task

Run the imputation (start with n.iter = 500; this will take a few seconds).

  • Remember to specify appropriate models for the incomplete covariates and longitudinal variables.
  • Prevent specification of a model for age.
  • Check convergence using a traceplot().
  • If you are satisfied by convergence and mixing of the chains, get the model summary().

Solution

library(JointAI)
JointAIlong <- lme_imp(albumin ~ day + sex + trt + age + ascites + 
                         bili + copper, random = ~day|id,
                       models = c(copper = 'lognorm'),
                       no_model = 'day', data = pbclong, n.iter = 500, seed = 2019)

traceplot(JointAIlong)
summary(JointAIlong)

## 
##  Linear mixed model fitted with JointAI 
## 
## Call:
## lme_imp(fixed = albumin ~ day + sex + trt + age + ascites + bili + 
##     copper, data = pbclong, random = ~day | id, n.adapt = 100, 
##     n.iter = 500, models = c(copper = "lognorm"), no_model = "day", 
##     seed = 2019)
## 
## Posterior summary:
##                  Mean       SD      2.5%     97.5% tail-prob. GR-crit
## (Intercept)  4.180254 1.25e-01  3.932808  4.420392     0.0000    1.23
## sexf        -0.115628 6.03e-02 -0.231414 -0.002686     0.0453    1.53
## trt1         0.004183 3.82e-02 -0.072827  0.076834     0.9227    1.06
## age         -0.007347 1.95e-03 -0.011078 -0.003536     0.0000    1.01
## ascites1    -0.180464 8.83e-02 -0.357131 -0.006822     0.0387    1.01
## copper      -0.000905 2.63e-04 -0.001443 -0.000372     0.0000    1.21
## day         -0.000197 1.41e-05 -0.000224 -0.000170     0.0000    1.01
## bili        -0.023937 2.46e-03 -0.028894 -0.019229     0.0000    1.01
## 
## Posterior summary of random effects covariance matrix:
##          Mean      SD   2.5%  97.5% tail-prob. GR-crit
## D[1,1] 0.0992 0.01078 0.0803 0.1214               1.01
## D[1,2] 0.0166 0.00525 0.0072 0.0278          0    1.01
## D[2,2] 0.0196 0.00455 0.0122 0.0297               1.05
## 
## Posterior summary of residual std. deviation:
##               Mean      SD  2.5% 97.5% GR-crit
## sigma_albumin 0.32 0.00594 0.309 0.332    1.01
## 
## 
## MCMC settings:
## Iterations = 101:600
## Sample size per chain = 500 
## Thinning interval = 1 
## Number of chains = 3 
## 
## Number of observations: 1945 
## Number of groups: 312

Additional exercise JointAI

We want to fit a logistic mixed model for the variable hepato and explore if the association is non-linear over time.

Task 1

  • Fit a logistic mixed model using the function glme_imp using the same covariates as before.
  • Specify a natural cubic spline with 3 degrees of freedom for day.
  • Check convergence using a traceplot().

When specifying a generalized (mixed) model remember to specify the model family and link function.

To use natural cubic splines use the function ns() from the package splines, i.e., ns(day, df = 3).

Solution 1

library(splines)
JointAIlong2 <- glme_imp(hepato ~ ns(day, df = 3) + sex + trt + age + ascites +
                           bili + copper, random = ~1|id, family = binomial(),
                         models = c(copper = 'lognorm'),
                         no_model = 'day', data = pbclong, n.iter = 1000,
                         seed = 2019)

traceplot(JointAIlong2)

Task 2

When the model has converged, we want to visualize the potentially non-linear association of day. To do that, we can create a new dataset containing information on an “average” subject, with different values for day.

The function predDF() creates such a dataset from an object of class JointAI. It sets reference values (i.e., the median for continuous variables and the reference category for categorical variables) for all variables other than the one specified in the argument var. The variable given in var will range across the range of values of that variable encountered in the data.

Use predDF() to create a dataset that allows visualization of the effect of day.

Solution 2

newdf <- predDF(JointAIlong2, var = 'day')

head(newdf)

Task 3

We can now predict the outcome of our model for our “average” subject using the function predict(). It takes a JointAI object and a data.frame containing the data to predict from as arguments. The argument quantiles can be used to specify which quantiles of the distribution of each fitted value are returned (default is 2.5% and 97.5%).

predict() returns a list with the following elements

  • dat: the data.frame provided by the user extended with the fitted values and 2.5% and 97.5% quantiles that form the credible interval for the fitted values
  • fit: a vector containing the fitted values (the mean of the distribution of the fitted value)
  • quantiles: a matrix containing the credible interval for each fitted value
  • Use predict() to obtain the fitted values and corresponding intervals
  • Visualize the result by plotting fitted values and quantiles (y-axis) over time (day; x-axis)

Solution 3

pred <- predict(JointAIlong2, newdata = newdf)

ggplot(pred$dat, aes(x = day, y = fit)) +
  geom_ribbon(aes(ymin = `2.5%`, ymax = `97.5%`), alpha = 0.3) +
  geom_line()

Note:
The fitted values and quantiles are on the scale of the linear predictor, i.e., obtained by multiplying the data in newdf (\(\mathbf x\)) with the samples of the posterior distribution of the parameters (\(\boldsymbol \beta\)).

For a logistic model it is more intuitive to present the fitted values on the probability scale.

\[ \log\left(\frac{\pi}{1-\pi}\right) = \mathbf x^\top\boldsymbol\beta \qquad \Rightarrow \pi = \frac{\exp(\mathbf x^\top\boldsymbol\beta)}{1 + \exp(\mathbf x^\top\boldsymbol\beta)} \]

The function plogis() does this transformation.

ggplot(pred$dat, aes(x = day, y = plogis(fit))) +
  geom_ribbon(aes(ymin = plogis(`2.5%`), ymax = plogis(`97.5%`)), alpha = 0.3) +
  geom_line() +
  ylab('probability of hepatomegaly')

 

© Nicole Erler