research article

Semiparametric Likelihood Estimation with Clayton-Oakes Model for Multivariate Current Status Data

Yeqian Liu* and Hanyi Li

Department of Mathematical Sciences, Middle Tennessee State University, Murfreesboro, USA

*Corresponding author: Yeqian Liu, Assistant ProfessorDepartment of Mathematical Sciences, Middle Tennessee State University, Murfreesboro, USA

Received Date: 16 April, 2020; Accepted Date: 24 April, 2020; Published Date: 30 April, 2020

Citation: Liu Y, Li H (2020) Semiparametric Likelihood Estimation with Clayton-Oakes Model for Multivariate Current Status Data. J Biostat BiomJBSB-109. DOI: 10.29011/JBSB-109.100009

Abstract

In biomedical and public health studies, current status data arises when each subject is observed only once and the failure time is only known to be either smaller or larger than the observation time. This paper discusses regression analysis of multivariate current status failure time data which usually arise in epidemiological studies and animal carcinogenicity experiments. We propose an EM-algorithm to estimate the regression coefficients under Clayton-Oakes model for multivariate current status data. The Clayton-Oakes model is a very promising model for analysis of multivariate failure time data, it characterize the dependence of multiple failure time by using a gamma frailty. The simulation study indicated that the method works well for practical situations. An application from a tumorigenicity experiment is provided.

Keywords: Multivariate current status data; Clayton-Oaks model; Frailty model; EM algorithm; Maximum likelihood estimation

Introduction
Current status data arise when each subject in a survival study is observed only once and the failure time of interest is known only to be smaller or larger than the observation time. In other words, the failure time of interest is either left- or right-censored instead of being observed exactly. Current status data occur quite often in many applications, including tumorigenicity experiments and epidemiological investigations of the natural history of a disease.
Current status data is a special case of interval-censored data and is often referred to as case 1 interval-censored data [1]. Interval-censored failure time data occur when the failure time of interest is only observed to be bracketed by two adjacent examination times [2]. The interval-censored data reduce to current status data if all intervals include either zero of infinity. In this paper, we will deal with multivariate current status data, arising when there are multiple failure times of interest in a survival study and only current status data are available for each failure time.
We will focus on the regression analysis of multivariate current status data. In order to model the covariate effects on the failure times of interest as well as accounting for the potential dependence of the failure times, we propose to use the Clayton-Oakes (CO) model jointly and the proportional hazards model marginally. The CO model is one of parametric copula models which express the joint distribution of failure times as a function of the marginal distributions and a dependence parameter [3]. The proportional hazards model is one of the most commonly used semiparametric models in survival analysis. In this paper, we will consider the estimation of the CO model with marginal proportional hazards model based on multivariate current status data.
Glidden DV, et al. (1999) [4], considered the estimation of the CO model based on multivariate right-censored failure time data and they developed an approximate EM algorithm for the determination of maximum likelihood estimates. Unlike the right-censored data, the failure time of interest is never observed exactly under current status data. Therefore, the well-developed tools used for right-censored data cannot be readily generalized to fitting current status data. For example, the partial likelihood approach and the elegant martingale theory used in the estimation of proportional hazards model under right-censored data do not carry over to the case of current status data [5]. Thus, the regression analysis of current status data is not an easy task. Here we are trying to develop an EM algorithm for the estimation of a semiparametric regression model with multivariate current status data. This is even more challenging that the univariate case.
Sun T, et al. (2019) [3], pointed out that a reparametrization of the CO model is equivalent to one of gamma frailty models discussed in Wang N, et al. (2015) [6]. Therefore, after a reparametrization, we are essentially dealing with a gamma frailty model but the marginal hazard functions would become much more complicated that proportional hazards functions [6]. The CO model and the equivalent gamma frailty model will be given in section 2. The frailty model approach has been commonly used in the analysis of multivariate failure time data. This approach assumes that there exists a common but unobserved latent variable, called frailty that characterizes the dependence of the possibly correlated failure times. Chen MH, et al. (2009) [7], considered a frailty model approach for regression analysis of multivariate current status data.
The rest of this paper is organized as follows. In section 2, we present models, assumptions and the likelihood function. In section 3, we develop a sieve maximum likelihood approach for the estimation of the unknown parameters. The sieves are constructed based on piecewise linear functions. By using the sieve method, we avoid the estimation of infinite dimensional parameters. To determine the sieve maximum likelihood estimate, we propose an EM algorithm by treating the unobservable latent variable as missing values. In section 4, we conduct simulation studies to examine the finite sample performance of our proposed method. Section 5 applies the approach to NCTR’s tumorigenicity experiment. Section 6 contains some discussion and concluding remarks.
Assumptions, Models and the Likelihood Function
Suppose we are interested in K failure times in a survival study, denoted by T1, , Tk. Let Zk be the covariates that may affect Tk, k=1, …, k, respectively. We assume the proportional hazards model for each failure time, i.e. the marginal hazard function for Tk is given by

Where λ0k(t) is the baseline hazard function for Tk and β is the vector of regression parameters. Note that we assume a common vector of covariate effects. Also, we assume that the joint distribution of the failure times follows the Clayton-Oakes (CO) model, i.e., the joint survival function is given by

Where the parameter α specifies the CO model and  . When α = 0, the joint survival function becomes the product of the marginal survival functions and failures are independent. Increasing values of imply increasing positive association between failures [8].

The CO model is equivalent to the following gamma frailty model [4]. Suppose ξ is a gamma random variable with mean 1 and unknown variance α. Conditional on the frailty ξ, it is assumed that the failure times are independent with hazard functions:


Suppose that only current status data are available for each Tk, and the observed data for a single subject are   where ck denotes the observation time for th failure and τ denotes the length of study. Without loss of generality, we will assume that the  are the same, and the common observation time is denoted by c. Also, we assume that c is independent of Tk given Zk and the distribution of c does not depend on the unknown parameters i.e., the censoring mechanism is noninformative. Then, given ξ, the likelihood of observing δk is proportional to


This likelihood involves infinite-dimensional parameters Λ0k(t) which is difficult to deal with. We propose a sieve approach to estimate Λ0k(t), that is, to approximate it by a piecewise linear function


Define  and denote all unknown parameters as θ = (β,α,γ), then the likelihood function for a single subject is given by 


Where f(ξ;α) is the density function of the gamma distribution with mean 1 and variance α.

From (2.2), it is easy to show that the conditional density function of ξ given Ω* has the form

 

We will develop an EM algorithm to determine the sieve maximum likelihood estimate of θ in the next section.

Maximum Likelihood Estimation


We want to obtain the value of θ that maximizes L(θ;0), i.e. the maximum likelihood estimate . Since there is no closed form solution for the maximum likelihood estimate, we try to determine  using EM algorithm.In order to develop the EM algorithm, we consider the latent variables  to be unobserved values and develop the E-step and M-step iteratively for the EM-algorithm.

 In the E-step, we first derive the pseudo-complete data likelihood function. There are two components for the pseudo-complete data: the observed data O and the missing data ξ. The E-step computes the conditional expectation of the log-likelihood of the pseudo-complete data conditioning on ξ given the observed data O. It is very straightforward to write the log-likelihood function as



Then the conditional expectation of the log-likelihood has the form


With θ set to be θ(m) obtained at the mth iteration. Since the expectation has no closed form, we employ some numerical integral algorithms to compute such expectation


for any function h(ξi) of ξi 

In order to determine  we can rewrite it as


Where the expectations on the right-hand side are taken with respect to the gamma distribution with mean 1 and variance α(m), α(m) denotes the estimate of α obtained at the mth iteration, and


This suggests that for sufficiently large L, the expectation  can be approximated by


Wherel, l = 1, … , L} are i.i.d. samples from the gamma distribution with mean 1 and variance α(m).

Now, we will describe the M-step, which maximizes the conditional expectation l(θ;0) with replacing all expectations involving functions h(ξi) by their approximation  given in (3.1) to determine the updated estimate θ(m+1). We adopt the moment estimate for α in each iteration, that is, the estimate of α at the (m+1)th iteration is given by


For the maximum likelihood estimator of parameters β and γ, we have


Where


Where l, l = 1, … ,L} are i.i.d. samples from the gamma distribution with mean 1 and variance .Then for estimation of β and γ, we can solve the equation


It is obvious that it would not be easy to solve all p + K * J equations in (3.3) above simultaneously. For this, we suggest the following procedure for the (m+1)th iteration.

 

To guarantee that α is positive at each iteration, we reparameterize the frailty variance as  in the procedure. With a little abuse of notations, we still denote the parameter as α. The key advantage of the procedureis that, by using sieve method, one can avoid solving high dimensional or large number of equations and thus can easily apply well developed optimization algorithms in the M-step, such as Newton-Raphson algorithm. Moreover, the computation is more stable and efficient than that using nonparametric estimation of the cumulative baseline hazards functions.

 Note that the number of partitions J should increase as the sample size n increases. In general, larger J leads to a better estimation, and is more computational intensive. A common practice is to choose J to be an integer greater than n1/4 and the Ij’s such that each interval contains roughly equal numbers of observation time points [7]. In practice, it is recommended to try different choices for J and the Ij’s and find out the case that works the best.

A Simulation Study

This section carries out extensive simulation studies to examine the performance of the proposed method. Our model specification mostly follows that used by Glidden DV, et al. (1999) [4], where a Clayton-Oakes (CO) with gamma frailty was studied for multivariate survival times subject to right censoring. In this study, we consider the situation where k = 2, so there exists two failure times T1 and T2. Let the covariate z be one-dimensional and generated from a binomial distribution Bin (1,0.5). We assume the cumulative baseline hazard function for T1 and T2 to be Λ1(t) = 0.2t and Λ2(t) = 0.05t, respectively. Then the conditional hazard functions for T1 and T2 given ξ are


respectively. Here, ξ is the gamma frailty which is assumed to have a gamma distribution with mean 1 and variance α. The failure times T1 and T2 can be generated from the inverse function of the conditional cumulative distribution function given the generated ξ. In the study, we took  with J = 6, and the observation times cj’s are generated from the discrete uniform distribution over {s1, s2, … , sj}.  The results given below are based on 300 replications with L=1000 for the approximation and the sample size n=200 or 400.

To obtain the estimates  we consider four different cases. In the tables 1 and 2, we include the Averages (AVE) and the Sample Standard Deviations (SSE). The true value of β was taken as 0.5 or -0.5. And the frailty variance α was set to be 1.2 or 0.8. One can see from these tables that the proposed estimate  seems to relatively unbiased.

As mentioned in [7], one could estimate β based on the observed current status data on T1 and T2 by assuming that β in the model is the same for T1 and T2. However, this marginal approach is less efficient than the estimation procedure developed here. We also found the EM-algorithm developed is sensitive to the initial value of γk. To obtain good convergence rate and increase convergence speed, it is important and necessary to have good initial values.

An Application

Now we apply the proposed method to an animal tumorigenicity experiment conducted by the National Center for Toxicological Research (NCTR). It is a 2-year rodent carcinogenicity study of chloroprene consisting of F344/N rats and B6C3F1 rats with both sexes. There are 50 male and 50 female rats in both control and high dose group. Rats in the high dose group were exposed to chloroprene at the concentrations of 80 ppm, 6 h per day, 5 days per week for up to 2 years. The animal either died during the study or was sacrificed at the end of the study. At the death or sacrifice, the presence or occurrence of different types of tumors was determined through a pathological examination. Due to the nature of the pathological examination process, the tumor occurrence time was not exactly observed but instead known only to be smaller or greater than the death or sacrifice time. Therefore, the tumor occurrence time of lung and adrenal cancers is bivariate current status data. Following Dunson DB, et al. (2002) [9], the main objective of the study is to investigate the effect of choloroprene on lung and adrenal cancer by comparing the tumor growth rates between the control and high dose group.

Let T1 and T2 denote the occurrence times of adrenal and lung tumors, we model the joint distribution of T1 and T2 using the proposed Clayton-Oaks model with Gamma frailty. Application of the proposed EM-algorithm gave  with an estimated standard error of 0.2851. Therefore, the animals in the high dose group had significantly higher occurrence rate of both adrenal and lung tumors than those in the control group. To illustrate this occurrence rate difference, Figure 1 shows the estimated survivals functions of the lung tumor occurrence time for animals in the control and high dose groups.

To compare the performance of the proposed Clayton-Oaks model with traditional marginal proportional hazards model. The Mean Square Error (MSE) was calculated for both models. For the control group of the lung tumor, we obtained MSE=0.8741 under Clayton-Oaks model and 1.1681 under the marginal proportional hazards model. For the high dose group of the lung tumor, the corresponding values are 0.8254 and 1.3517, respectively. These results suggest that the proposed Clayton-Oaks model fits the data better than the marginal proportional hazard model for the lung tumor. The results for the adrenal tumor are similar.

Discussion and Concluding Remarks

In this paper, we proposed an EM-algorithm to estimate the regression coefficients under Clayton-Oakes model for multivariate current status data. The simulation study and real data application indicated that the method works well for practical situations. The Clayton-Oakes model is a very promising model for analysis of multivariate failure time data, it characterize the dependence of multiple failure time by using a gamma frailty. This model is different from the commonly used marginal Cox model and normal frailty Cox model. The analysis of CO model would benefit from methods which can diagnose the appropriateness of the model.

Actually, the gamma frailty assumption is not essential, and the proposed procedure can be carried out for other choices of frailty distributions, for example, log-normal frailty, positive stable frailty, or inverse Gaussian frailty. It is noteworthy that the proposed frailty model does not provide the marginal model directly, but the marginal model can be derived by integrating out the frailty.

The EM-algorithm developed in this paper can also be extended to case II interval censored data. However, the presented methodology has several limitations, which lead to some directions for future research. The first limitation is that one may have a hard time conducting model selection between our proposed CO model and the log-normal frailty Cox model proposed by Chen MH, et al. (2009) [7]. Moreover, in our model, we assume that for in each cluster i, the failure times {T1,…,Tk} are independent. In practice, this may not be true. However, to consider the dependence between k failure times, we have to introduce a multivariate gamma frailty which will significantly increase the computational difficulty.


Figure 1: Estimated marginal survival functions for time to lung tumor.

     

Parameters

n

J

True

AVE

BIAS

SSE

MSE

Case 1

α

200

6

0.8

0.877

0.077

0.126

0.132

 

β

 

 

0.5

0.473

-0.027

0.163

0.189 

Case 2

α

200

6

0.8

0.749

0.051

0.112

0.115 

 

β

 

 

-0.5

-0.554

-0.054

0.177

0.181 

Case 3

α

200

6

1.2

1.281

0.081

0.147

0.154 

 

β

 

 

0.5

0.435

0.065

0.181

0.185 

Case 4

α

200

6

1.2

1.131

-0.069

0.153

0.157 

 

β

 

 

-0.5

-0.578

-0.078

0.172

0.179 


Table 1: Results on estimations of the regression coefficients with n=200.

 

Parameters

n

J

True

AVE

BIAS

SSE

MSE

Case 1

α

400

6

0.8

0.826

0.026

0.086

0.093 

 

β

 

 

0.5

0.485

-0.015

0.116

0.129 

Case 2

α

400

6

0.8

0.773

0.027

0.082

0.086 

 

β

 

 

-0.5

-0.514

-0.014

0.137

0.125 

Case 3

α

400

6

1.2

1.245

0.045

0.116

0.107 

 

β

 

 

0.5

0.471

0.029

0.135

0.128 

Case 4

α

400

6

1.2

1.161

-0.039

0.108

0.119 

 

β

 

 

-0.5

-0.593

-0.007

0.126

0.131 


Table 2: Results on estimations of the regression coefficients with n=400.

  1. Sun J (2006) The statistical analysis of interval-censored data. Springer.
  2. Zeng D, Gao F, Lin D (2017) Maximum likelihood estimation for semiparametric regression models with multivariate interval-censored data. Biometrika 104: 505-525.
  3. Sun T, Ding Y (2019) Copula-based semiparametric regression method for bivariate data under general interval censoring. Biostatistics.
  4. Glidden DV, Self SG (1999) Semiparametric likelihood estimation in the Clayton-Oakes failure time model. Scand. J Statist 26: 363-372.
  5. Wen CC and Chen YH (2013) A frailty model approach for regression analysis of bivariate interval-censored survival data. Statistica Sinica 23: 383-408.
  6. Wang N, Wang L, McMahan CS (2015) Regression analysis of bivariate current status data under the gamma-frailty proportional hazards model using the EM algorithm. Computational Statistics & Data Analysis 83: 140-150.
  7. Chen MH, Tong X, Sun J (2009) A frailty model approach for regression analysis of multivariate current status data. Statistics in Medicine 2: 3424-3436.
  8. Nielsen GG, Gill RD, Andersen PK, Sorensen TIA (1992) A counting process approach to maximum likelihood estimation in frailty models. Scand J Statist 19: 25-43.
  9. Dunson DB, Dinse GE (2002) Bayesian models for multivariate current status data with informative censoring. Biometrics 58: 79-88.

© by the Authors & Gavin Publishers. This is an Open Access Journal Article Published Under Attribution-Share Alike CC BY-SA: Creative Commons Attribution-Share Alike 4.0 International License. With this license, readers can share, distribute, download, even commercially, as long as the original source is properly cited. Read More.

Journal of Biostatistics & Biometrics

akun gacor olympusrtp slot onlinejam gacor slot pg softtrik gacor slot aztecfitur scatter hitam slot mahjongsugar rush modal recehcheat apk engineslot mahjong gokil histerisinfo rtp harianslot starlight princessslot gacor pgsoftrtp mahjong untungcheat mahjong bandar rungkatmodal receh olympusslot online thailandpola jitu starlightscatter naga hitamrtp gacor banjir wildslot88 jackpot kalitrik pola x5000olympus x500depo dana modal recehpg soft mudah gacorrahasia menang slotrtp balik modalcandu menang slot mahjongslot deposit danatips ampuh bermain slot mahjong waystrik slot sugar rushakun pro mahjong gacorrtp slot terjituslot mahjong ways gacorcara dapetin maxwin olympuspancing scatter mahjong ways 1rekomendasi slot mahjong ways 2scatter mahjong terbarupola mahjong ways hari inimahjong ways modal recehcuan mahjong waysdemo slot pg softnaga awal julyrtp slot awal julymahjong bulan mudamodal receh slotlink slot mahjongwinrate tinggi rtpslot server filipinavolatility pg softwaktu tepat slot gacorjam gacor saldo bancarfitur bonus lucky neko4 simulasi jackpot mahjongtrik sepuh mantan napiamantotorm1131