Dealing with missing values in multivariate joint models for longitudinal and survival data


Background: Chronic hepatitis C is a severe and increasing public health issue. Although nowadays most patients can be cured, the infection is often undetected until symptoms of permanent liver damage become apparent, putting patients at a considerably higher risk for liver diseases and, as a consequence, liver-related mortality. Patients are therefore often monitored beyond the end of treatment, which provides us with both baseline and repeatedly measured data on patient and disease characteristics. Typically, joint models for longitudinal and time-to-event data are used to adequately model (and subsequently predict) the hazard of experiencing an adverse event utilizing time-varying covariate information. A potentially serious additional issue in the analysis of our retrospective hepatitis C cohort is the large amount of missing data for several important covariates (for up to 58% of patients), and the restriction of currently available methodologies and software to complete case analysis.
Objective: To prevent severe loss of power and to reduce the possibly large bias that a complete case analysis would produce, we present a fully Bayesian approach that jointly models longitudinal and survival outcomes in the presence of missing data.
Methods: We factorize the joint distribution of outcome and covariates into a sequence of univariate distributions, which allows us to appropriately handle multiple incomplete baseline and time-varying covariates of different types. Moreover, non-linear associations between variables can be incorporated while assuring compatibility of the models involved. The approach is valid under Missing At Random, with the potential for further extension to non-random missingness, and is also applicable to other types of joint models (not involving time-to-event).
Results: In our hepatitis C cohort of 490 patients, the complete case analysis would have been based on 24% of the patients, of which only 28 experienced an event, and was therefore not feasible. Using our approach, we successfully implemented a joint model with four longitudinal and eleven baseline covariates. This approach is currently being implemented in an R package.
Conclusion: Our proposed approach provides a flexible way to handle complex joint models in the presence of incomplete data. Simulation studies are needed to empirically confirm that results are unbiased.

Aug 25, 2020