BayesID_AFT {SemiCompRisks} | R Documentation |
Independent semi-competing risks data can be analyzed using AFT models that have a hierarchical structure. The proposed models can accomodate left-truncated and/or interval-censored data. An efficient computational algorithm that gives users the flexibility to adopt either a fully parametric (log-Normal) or a semi-parametric (Dirichlet process mixture) model specification is developed.
BayesID_AFT(Formula, data, model = "LN", hyperParams, startValues, mcmcParams, na.action = "na.fail", subset=NULL, path=NULL)
Formula |
a |
data |
a data.frame in which to interpret the variables named in |
model |
The specification of baseline survival distribution: "LN" or "DPM". |
hyperParams |
a list containing lists or vectors for hyperparameter values in hierarchical models. Components include,
|
startValues |
a list containing vectors of starting values for model parameters. It can be specified as the object returned by the function |
mcmcParams |
a list containing variables required for MCMC sampling. Components include,
|
na.action |
how NAs are treated. See |
subset |
a specification of the rows to be used: defaults to all rows. See |
path |
the name of directory where the results are saved. |
We view the semi-competing risks data as arising from an underlying illness-death model system in which individuals may undergo one or more of three transitions: 1) from some initial condition to non-terminal event, 2) from some initial condition to terminal event, 3) from non-terminal event to terminal event. Let T_{i1}, T_{i2} denote time to non-terminal and terminal event from subject i=1,...,n. We propose to directly model the times of the events via the following AFT model specification:
\log(T_{i1}) = x_{i1}^\topβ_1 + γ_i + ε_{i1}, T_{i1} > 0,
\log(T_{i2}) = x_{i2}^\topβ_2 + γ_i + ε_{i2}, T_{i2} > 0,
\log(T_{i2} - T_{i1}) = x_{i3}^\topβ_3 + γ_i + ε_{i3}, T_{i2} > T_{i1},
where x_{ig} is a vector of transition-specific covariates, β_g is a corresponding vector of transition-specific regression parameters and ε_{ig} is a transition-specific random variable whose distribution determines that of the corresponding transition time, g \in \{1,2,3\}. γ_i is a study participant-specific random effect that induces positive dependence between the two event times, thereby performing a role analogous to that performed by frailties in models for the hazard function. Let L_{i} denote the time at study entry (i.e. the left-truncation time). Furthermore, suppose that study participant i was observed at follow-up times \{c_{i1},…, c_{im_i}\} and let c_i^* denote the time to the end of study or to administrative right-censoring. Considering interval-censoring for both events, the times to non-terminal and terminal event for the i^{th} study participant satisfy c_{ij}≤q T_{i1}< c_{ij+1} for some j and c_{ik}≤q T_{i2}< c_{ik+1} for some k, respectively. Then the observed outcomes for the i^{th} study participant can be succinctly denoted by \{c_{ij}, c_{ij+1}, c_{ik}, c_{ik+1}, L_{i}\}.
For the Bayesian semi-parametric analysis, we proceed by adopting independent DPM of normal distributions for each ε_{ig}. More precisely, ε_{ig} is taken to be an independent draw from a mixture of M_g normal distributions with means and variances (μ_{gr}, σ_{gr}^2), for r \in \{1,…,M_g\}. Since the class-specific (μ_{gr}, σ_{gr}^2) are not known, they are taken to be draws from some common distribution, G_{g0}, often referred to as the centering distribution. Furthermore, since the ‘true’ class membership for any given study participant is not known, we let p_{gr} denote the probability of belonging to the r^{th} class for transition g and p_g = (p_{g1}, …, p_{gM_g}) the collection of such probabilities. Note, p_g is defined at the level of the population (i.e. is not study participant-specific) and its components add up to 1.0. In the absence of prior knowledge regarding the distribution of class memberships for the n individuals across the M_g classes, p_g is assumed to follow a conjugate symmetric Dirichlet(τ_g/M_g,…,τ_g/M_g) distribution, where τ_g is referred to as the precision parameter. The finite mixture distribution can then be succinctly represented as:
ε_{ig} | r_{i} \sim Normal(μ_{r_{i}}, σ_{r_{i}}^2),
(μ_{gr}, σ_{gr}^2) \sim G_{g0}, ~~for~ r=1,…,M_g,
r_{i}| p_g \sim Discrete(r_{i} | p_{g1},…,p_{gM_g}),
p_g \sim Dirichlet(τ_g/M_g, …, τ_g/M_g).
Letting M_g approach infinity, this specification is referred to as a DPM of normal distributions. In our proposed framework, we specify a Gamma(a_{τ_g}, b_{τ_g}) hyperprior for τ_g. For regression parameters, we adopt non-informative flat priors on the real line. For γ=\{γ_1, …, γ_n\}, we assume that each γ_i is an independent random draw from a Normal(0, θ) distribution. In the absence of prior knowledge on the variance component θ, we adopt a conjugate inverse-Gamma hyperprior, IG(a_θ, b_θ). Finally, We take the G_{g0} as a normal distribution centered at μ_{g0} with a variance σ_{g0}^2 for μ_{gr} and an IG(a_{σ_g}, b_{σ_g}) for σ_{gr}^2.
For the Bayesian parametric analysis, we build on the log-Normal formulation and take the ε_{ig} to follow independent Normal(μ_g, σ_g^2) distributions, g=1,2,3. For location parameters \{μ_1, μ_2, μ_3\}, we adopt non-informative flat priors on the real line. For \{σ_1^2, σ_2^2, σ_3^2\}, we adopt independent inverse Gamma distributions, denoted IG(a_{σ g}, b_{σ g}). For β_g, γ, and θ, we adopt the same priors as those adopted for the DPM model.
BayesID_AFT
returns an object of class Bayes_AFT
.
The posterior samples of γ are saved separately in working directory/path
.
For a dataset with large n, nGam_save
should be carefully specified considering the system memory and the storage capacity.
Kyu Ha Lee and Sebastien Haneuse
Maintainer: Kyu Ha Lee <klee15239@gmail.com>
Lee, K. H., Rondeau, V., and Haneuse, S. (2017),
Accelerated failure time models for semicompeting risks data in the presence of complex censoring, Biometrics, 73, 4, 1401-1412.
Alvares, D., Haneuse, S., Lee, C., Lee, K. H. (2018+),
SemiCompRisks: an R package for independent and cluster-correlated analyses of semi-competing risks data, submitted, arXiv:1801.03567.
initiate.startValues_AFT
, print.Bayes_AFT
, summary.Bayes_AFT
, predict.Bayes_AFT
## Not run: # loading a data set data(scrData) scrData$y1L <- scrData$y1U <- scrData[,1] scrData$y1U[which(scrData[,2] == 0)] <- Inf scrData$y2L <- scrData$y2U <- scrData[,3] scrData$y2U[which(scrData[,4] == 0)] <- Inf scrData$LT <- rep(0, dim(scrData)[1]) form <- Formula(LT | y1L + y1U | y2L + y2U ~ x1 + x2 + x3 | x1 + x2 | x1 + x2) ##################### ## Hyperparameters ## ##################### ## Subject-specific random effects variance component ## theta.ab <- c(0.5, 0.05) ## log-Normal model ## LN.ab1 <- c(0.3, 0.3) LN.ab2 <- c(0.3, 0.3) LN.ab3 <- c(0.3, 0.3) ## DPM model ## DPM.mu1 <- log(12) DPM.mu2 <- log(12) DPM.mu3 <- log(12) DPM.sigSq1 <- 100 DPM.sigSq2 <- 100 DPM.sigSq3 <- 100 DPM.ab1 <- c(2, 1) DPM.ab2 <- c(2, 1) DPM.ab3 <- c(2, 1) Tau.ab1 <- c(1.5, 0.0125) Tau.ab2 <- c(1.5, 0.0125) Tau.ab3 <- c(1.5, 0.0125) ## hyperParams <- list(theta=theta.ab, LN=list(LN.ab1=LN.ab1, LN.ab2=LN.ab2, LN.ab3=LN.ab3), DPM=list(DPM.mu1=DPM.mu1, DPM.mu2=DPM.mu2, DPM.mu3=DPM.mu3, DPM.sigSq1=DPM.sigSq1, DPM.sigSq2=DPM.sigSq2, DPM.sigSq3=DPM.sigSq3, DPM.ab1=DPM.ab1, DPM.ab2=DPM.ab2, DPM.ab3=DPM.ab3, Tau.ab1=Tau.ab1, Tau.ab2=Tau.ab2, Tau.ab3=Tau.ab3)) ################### ## MCMC SETTINGS ## ################### ## Setting for the overall run ## numReps <- 300 thin <- 3 burninPerc <- 0.5 ## Setting for storage ## nGam_save <- 10 nY1_save <- 10 nY2_save <- 10 nY1.NA_save <- 10 ## Tuning parameters for specific updates ## ## - those common to all models betag.prop.var <- c(0.01,0.01,0.01) mug.prop.var <- c(0.1,0.1,0.1) zetag.prop.var <- c(0.1,0.1,0.1) gamma.prop.var <- 0.01 ## mcmcParams <- list(run=list(numReps=numReps, thin=thin, burninPerc=burninPerc), storage=list(nGam_save=nGam_save, nY1_save=nY1_save, nY2_save=nY2_save, nY1.NA_save=nY1.NA_save), tuning=list(betag.prop.var=betag.prop.var, mug.prop.var=mug.prop.var, zetag.prop.var=zetag.prop.var, gamma.prop.var=gamma.prop.var)) ################################################################# ## Analysis of Independent Semi-competing risks data ############ ################################################################# ############### ## logNormal ## ############### ## myModel <- "LN" myPath <- "Output/01-Results-LN/" startValues <- initiate.startValues_AFT(form, scrData, model=myModel, nChain=2) ## fit_LN <- BayesID_AFT(form, scrData, model=myModel, hyperParams, startValues, mcmcParams, path=myPath) fit_LN summ.fit_LN <- summary(fit_LN); names(summ.fit_LN) summ.fit_LN pred_LN <- predict(fit_LN, time = seq(0, 35, 1), tseq=seq(from=0, to=30, by=5)) plot(pred_LN, plot.est="Haz") plot(pred_LN, plot.est="Surv") ######### ## DPM ## ######### ## myModel <- "DPM" myPath <- "Output/02-Results-DPM/" startValues <- initiate.startValues_AFT(form, scrData, model=myModel, nChain=2) ## fit_DPM <- BayesID_AFT(form, scrData, model=myModel, hyperParams, startValues, mcmcParams, path=myPath) fit_DPM summ.fit_DPM <- summary(fit_DPM); names(summ.fit_DPM) summ.fit_DPM pred_DPM <- predict(fit_DPM, time = seq(0, 35, 1), tseq=seq(from=0, to=30, by=5)) plot(pred_DPM, plot.est="Haz") plot(pred_DPM, plot.est="Surv") ## End(Not run)