Joint Analysis and Imputation of Incomplete Data in R
- generalized linear regressionmodels
- (ordinal) cumulative logit regression models
- generalized linear mixed models (for multi-level data)
- (ordinal) cumulative logit mixed models
- parametric (Weibull) survival models
- Cox proportional hazards survival models
on incomplete (covariate) data using the Bayesian framework.
Some features include parallel computation, shrinkage of regression coefficiens via ridge penalties and the option to use user-specified hyperparameters.
Results can not only be summarized and printed, but also visualized using traceplots or density plots.
It is also possible to obtain predicted values (and corresponding intervals) from JointAI models.
Evaluation of convergence and precision
Two criteria for evaluation of convergence and precision of the posterior estimate are available:
The Gelman-Rubin criterion (‘potential scale reduction factor’) for convergence1, and the Monte Carlo error, specifically the ratio of the Monte Carlo error to the parameter’s standard deviation.
Extraction and visualization of imputed values
Imputed data can be extracted, as multipe imputed datasets for further analyses in R,and directly exported to SPSS. Once extracted, the distribution of the imputed values can be compared visually to the distribution of the observed data.
Visualization of incomplete data
To learn about the data at hand, and make better modelling choices, functions that calculate and plot the missing data pattern and the distribution of each of the variables in the dataset, are provided.
The package comes with the following vignettes, which give a more detailed explanation and demonstration of how to use JointAI:
A minimal example demonstrating the use of the functions
Visualizing Incomplete Data:
Demonstrations of the options in the functions
(plotting histograms and barplots for all variables in the data) and
md_pattern() (plotting or printing the missing data pattern).
Explanation and demonstration of all parameters that are required or optional to
specify the model structure in the main anlysis functions.
Among others, the additional functions
set_refcat() are used.
Examples on how to select the parameters/variables/nodes to follow using the
monitor_params and the parameters/variables/nodes displayed in the
densplot() or when using
Examples demonstrating how to set the arguments controlling settings of the MCMC
Examples on the use of functions to be applied after the model has been fitted,
Gelman, A and Rubin, DB (1992) Inference from iterative simulation using multiple sequences, Statistical Science, 7, 457-511.
Brooks, SP. and Gelman, A. (1998) General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics, 7, 434-455. ↩︎
- R package JointAI
- EP16: Missing Data in Clinical Research
- Multiple Imputation of Missing Data in Simple and More Complex Settings
- Imputation of missing covariates: when standard methods may fail
- Imputation of incomplete covariates in longitudinal data: Can Bayesian non-parametric methods prevent model-misspecification?