August 2, 2019 14:15 WSPC
S1793-5245
242-IJB
1950050
Int. J. Biomath. 2019.12. Downloaded from www.worldscientific.com by UNIVERSITY OF NEW ENGLAND on 09/21/19. Re-use and distribution is strictly not permitted, except for Open Access articles.
International Journal of Biomathematics Vol. 12, No. 5 (2019) 1950050 (19 pages) c World Scientific Publishing Company DOI: 10.1142/S1793524519500505
Bayesian empirical likelihood and variable selection for censored linear model with applications to acute myelogenous leukemia data
Chun-Jing Li∗,†,‡ , Hong-Mei Zhao†,§ and Xiao-Gang Dong†,¶ ∗School
of Mathematics, Jilin University Jilin Changchun 130012, P. R. China †School
of Mathematics and Statistics Changchun University of Technology Jilin Changchun 130012, P. R. China ‡
[email protected] §
[email protected] ¶
[email protected] Received 9 November 2018 Accepted 15 April 2019 Published 29 May 2019 This paper develops the Bayesian empirical likelihood (BEL) method and the BEL variable selection for linear regression models with censored data. Empirical likelihood is a multivariate analysis tool that has been widely applied to many fields such as biomedical and social sciences. By introducing two special priors to the empirical likelihood function, we find two obvious superiorities of the BEL methods, that is (i) more precise coverage probabilities of the BEL credible region and (ii) higher accuracy and correct identification rate of the BEL model selection using an hierarchical Bayesian model, vs. some current methods such as the LASSO, ALASSO and SCAD. The numerical simulations and empirical analysis of two data examples show strong competitiveness of the proposed method. Keywords: Bayesian empirical likelihood; censored linear regression; coverage probabilities; spike-and-slab prior. Mathematics Subject Classification 2010: 62N01, 62F15, 62J05
1. Introduction Censored data in recent years has seen a great deal of interest in the survival analysis in medical trials, its research was very extensive and meaningful. Such data are commonly encountered in disease research, for example, in leukemia studies, recovery following bone marrow transplantation which can be censored, may depend on several potential risk factors. The problem of analyzing these data arises in a ¶ Corresponding
author. 1950050-1
August 2, 2019 14:15 WSPC
S1793-5245
242-IJB
1950050
Int. J. Biomath. 2019.12. Downloaded from www.worldscientific.com by UNIVERSITY OF NEW ENGLAND on 09/21/19. Re-use and distribution is strictly not permitted, except for Open Access articles.
C.-J. Li, H.-M. Zhao & X.-G. Dong
number of applied fields, such as medicine, biology, public health, epidemiology, engineering, economics, and demography. Linear regression models are useful alternatives to the Cox proportional hazards models [6] for analyzing censored regression data due to their simple model setting and always result in a closed-form estimator. Buckley and James [3] developed estimating approach by modifying the sum of squares of residuals instead of modifying the normal equations. However, Buckley and James’s estimators use a complicated iterative algorithm. Koul et al. [12] suggested a data transformation approach based on the least square technique. Compared with Buckley and James estimators, this method is easy to be carried out, because no iterations are required and standard least squares routines can be used immediately once the observations of the response are transformed. To fix ideas, we shall restrict our attention here to the estimator proposed by Koul et al. [12]. Although the results shown here can be easily extended to include the more general class of estimators such as Zheng [25] and Leurgans [15]. Some simulation studies indicate that the empirical coverage probability of 100(1 − α)% nominal level built with the plug-in variance estimate can very often fall far short of the nominal level. A relatively recent development in this area is empirical likelihood by Owen [17], a nonparametric analogue of the usual likelihood. Qin and Jing [19] constructed confidence regions for the slope parameter vector in the censored linear regression model. Li and Wang [16] extended the adjusted empirical likelihood (ADEL) method to situations where auxiliary information on X is available. They take the empirical likelihood to incorporate linear regression into a pseudo Bayesian framework. For the first time, Lazar [13] proposed the Bayesian empirical likelihood (BEL) and discussed its validity. In recent years, the BEL inference that should be characterized by several highly correlated observed indicators from different perspectives have been common in many applications. Here, the likelihood function is often thought of as the means of updating prior beliefs about the parameters, in light of the data, to yield posterior inference. Schennach [20] considered Bayesian exponentially tilted empirical likelihood; the BEL method has been discussed by Chang and Mukerjee [4] and among others. Penalty likelihood method is commonly used in the processing of highdimensional data. It is generally assumed that the model is sparse. The purpose is to select a small number of independent variable subsets from a large number of independent variable sets, so as to reduce the dimensionality. The choice of variables is different according to different penalty functions. Here are some classical penalty functions. An earlier penalty likelihood method is a method of LASSO variable selection [23]. In this method, penalty function is used to compress the model coefficients, so that some regression coefficients which have very weak effects on dependent variables become small or even zero directly. Fan and Li [7] put forward a new penalized partial likelihood method: the smoothly clipped absolute deviation (SCAD). Zou [26] proposed an adaptive LASSO (ALASSO) variable selection method, which improves the LASSO penalty by applying a penalty factor 1950050-2
August 2, 2019 14:15 WSPC
S1793-5245
242-IJB
1950050
Int. J. Biomath. 2019.12. Downloaded from www.worldscientific.com by UNIVERSITY OF NEW ENGLAND on 09/21/19. Re-use and distribution is strictly not permitted, except for Open Access articles.
BEL and variable selection for censored linear model
to all regression coefficients. The ALASSO method has different penalty factors for different regression coefficients. Simultaneously, Park and Casella [18], Hans[10] put forward the LASSO method in the Bayesian framework. Then Leng et al. [14] presented Bayesian-based ALASSO method. Alhamzawi and Yu [1] applied Bayesian ALASSO method to quantile regression model. Feng et al. [9] put forward the application of interval censored data in the linear transformation model. Feng et al. [8] developed a Bayesian ALASSO procedure to conduct simultaneous estimation and variable selection. Tang et al. [21] and Tang et al. [22] addressed covariate selection in semiparametric joint models of longitudinal and survival data via the Bayesian LASSO approach. However, to our knowledge, there is little work done on the BEL inference on censored linear model with any prior distribution of parameters. This paper will address this issue. The Monte Carlo method is presented to make Bayesian inference on parameters by combining the Gibbs sampling and the Metropolis–Hastings algorithm. Further, variable selection on the high-dimensional model with spike-and-slab prior to a hierarchical Bayesian model has also been considered. Meanwhile, the paper introduces the alloauto data and the bmt data, both of which are about acute Myelogenous leukemia. Thus, the paper has theoretical research significance and practical reference value. The rest of this paper is organized as follows. Section 2 introduces the BEL for linear regression model. In Sec. 3, we develop nonparametric Bayesian variable selection to the BEL-based censored regression model with the spike-and-slab prior, we compared the proposed method with other methods, via Simulation studies in Sec. 4. Two examples are used to illustrate the proposed methods in Sec. 5. Some concluding remarks are given in Sec. 6. 2. BEL Estimation Suppose there are n subjects and their observations follow the linear regression model Ti = XiT β + εi ,
i = 1, . . . , n,
where εi s follow some unknown distribution with mean zero and finite variance, β ∈ Rp is an unknown parameter vector, and Xi s are p×1 observable random covariates or constant covariates. Let C denote censoring. The observed survival time Yi = min(Ti , Ci ), δi = I(Ti ≤ Ci ) and we get observed samples Di = (Yi , XiT , δi ), i = 1, . . . , n. Here, we assume that Ci s, are i.i.d. with a common distribution G and Ci s are independent of the sequence of independent random vectors (XiT , Ti )T s. Our interest lies in the estimation of the slope parameter vector β. Following Koul et al. [12], we define transformed data YiG =
δi Yi , 1 − G(Yi )
i = 1, . . . , n.
1950050-3
August 2, 2019 14:15 WSPC
S1793-5245
242-IJB
1950050
C.-J. Li, H.-M. Zhao & X.-G. Dong
It can be seen that
Int. J. Biomath. 2019.12. Downloaded from www.worldscientific.com by UNIVERSITY OF NEW ENGLAND on 09/21/19. Re-use and distribution is strictly not permitted, except for Open Access articles.
E[Xi (YiG − XiT β)] = 0,
i = 1, . . . , n.
(2.1)
When G(.) is unknown, a natural solution is to replace G by the Kaplan–Meier ˆ denoted by estimator G I{Y(i) ≤t,δ(i) =0} n n−i ˆ Gn (t) = 1 − , n−i+1 i=1 where Y(1) ≤ · · · ≤ Y(n) are the order statistics of the Y -sample, and δ(i) is δ associated with Y(i) . The Koul et al.’s estimator is given by n −1 n X T Xi XT Y ˆ . βˆ ˆ = (2.2) i
G
i
i=1
i=1
iG
Let Wi (β) = Xi (YiGˆ − XiT β). The empirical likelihood for β is defined by n n n L(β) = sup pi pi Wi (β) = 0, pi = 1, pi ≥ 0 . i=1
i=1
i=1
By the Lagrange multiplier technique, we can easily get pi λT (β)Wi (β)]−1 , where λ(β) is the solution of n i=1
=
n−1 [1 +
Wi (β) = 0. 1 + λT Wi (β)
Let D = {Di , i = 1, . . . , n}. The profile empirical likelihood function of β is given by L(β | D) = exp(l(β)) with l(β) = −
n
log[1 + λT (β)Wi (β)] − n log(n).
i=1
We combine L(β | D) with a specified prior π(β) on β via the Bayesian theorem to obtain a pseudo posterior n T π(β | D) = c(D) exp log[π(β)] − log[1 + λ (β)Wi (β)] , i=1
where c(D) is the normalizing constant such that π(β | D)dβ = 1. Thus, we have n T π(β | D) ∝ exp log[π(β)] − log[1 + λ (β)Wi (β)] , (2.3) i=1
where ∝ denotes equivalence up to some constants. The αth quantile tα of π(β | D) is determined by tα π(β | D)dβ = α. −∞
1950050-4
August 2, 2019 14:15 WSPC
S1793-5245
242-IJB
1950050
Int. J. Biomath. 2019.12. Downloaded from www.worldscientific.com by UNIVERSITY OF NEW ENGLAND on 09/21/19. Re-use and distribution is strictly not permitted, except for Open Access articles.
BEL and variable selection for censored linear model
The validity of posterior inferences based on Eq. (2.3) can be assessed through coverage probabilities of intervals (tα1 , tα2 ) on β. By using a standard Metropolis– Hasting algorithm for a given prior π(β), this can be used to selected values of α1 and α2 based on the Markov chain on β. Following the common practice, the prior distributions of π(β) is assigned as N (0, σ 2 ), where the σ 2 is hyperparameter. Following the existent literature, we assumed that η = σ −2 ∼ Γ(a, b), where a and b are hyperparameters whose values are predetermined. Thus, we get the posterior π(β, η | D) ∝ L(β | D, η)Γ(η, a, b), where Γ(η, a, b) is the density of Γ(a, b) evaluated at η. In simulation and real-data analysis, we set a = 0.1 and b = 0.1. Our simulation studies show the effectiveness of this prior. We propose a simple combination of the Metropolis–Hastings and Gibbs algorithms for the posterior sampling of π(β, η | D). The full conditional distribution of η is given by p βj2 η . π(η | D, β) ∝ η a+0.5p−1 exp − b + 0.5 j=1
The full conditional distribution of η is Γ(a + 0.5p, b + 0.5 ditional distribution of βj is
p
j=1
βj2 ). The full con-
π(βj | D, η) ∝ L(β | D)π(βj | η). We employ a M-H step for sampling from π(β | D, η). The algorithm is given as follows: (1) Let t = 0 and the initial values β (0) = βˆGˆ . (2) At the (t + 1)th iteration with a current value β (t) of β, draw a candidate value β ∗ from the proposal distribution pprop (· | β (t) ). (3) Accept β ∗ as β (t+1) with the probability π(β ∗ ) ni=1 [1 + λT (β (t) )Wi (β (t) )]pprop (β (t) | β ∗ ) min 1, . n π(β (t) ) i=1 [1 + λT (β ∗ )Wi (β ∗ )]pprop (β ∗ | β (t) ) (4) Repeating (2) until the convergence of the M-H sampler. Bayesian estimate βˆB of β is taken to be sample median of the generated observations. We use the modified Newton–Raphson algorithm proposed in Chen et al. [5] to calculate λ(β). The proposal distribution pprop is a normal distribution centered at β (t) and the variance Σ. The Σ is chosen to be have acceptance rate in the range of 0.1–0.4. 1950050-5
August 2, 2019 14:15 WSPC
S1793-5245
242-IJB
1950050
C.-J. Li, H.-M. Zhao & X.-G. Dong
Int. J. Biomath. 2019.12. Downloaded from www.worldscientific.com by UNIVERSITY OF NEW ENGLAND on 09/21/19. Re-use and distribution is strictly not permitted, except for Open Access articles.
3. Variable Selection Based on BEL For the well-studied problem of variable selection in multiple regression, the penalized likelihood approach to estimate β0 is given by p n (YiGˆ − XiT β)2 + n Pλ (βj ) , βˆ = arg min i=1
j=1
where Pλ (·) is a penalty function (indexed by a penalty parameter λ) prioritizing solutions that are suitably disciplined. Most notably, the LASSO penalty of Tibshirani [23] uses Pλ (βj ) = λ|βj |, the ALASSO penalization proposed by Zou [26] uses Pλ (βj ) = λωj |βj | and the SCAD penalty of Fan and Li [7] uses the SCAD. Those proposed estimation all involves the penalization parameter λ, which determines the sparseness of the resulting estimator. To perform Bayesian variable selection, we put the spike-and-slab prior on the coefficient parameters in linear regression and propose a hierarchical Bayesian model. For j = 1, . . . , p, we put a spike-and-slab prior π(βj | θj , τ 2 ) on the regression coefficient βj , where θj and τ 2 are hyperparameters in the prior. Specifically, we assume the following Bayesian hierarchical model: D | β ∼ L(β | D), βj | θj , τ 2 ∼ θj I(βj = 0) + (1 − θj )I(βj = 0)N (0, τ 2 ),
j = 1, . . . , p,
θj ∼ U (0, 1), η = τ −2 ∼ Γ(a, b),
a > 0, b > 0.
Then, we have the posterior π(β, θ, η | D) ∝ L(β | D)
p
π(βj | θj , η)I(0 < θj < 1)Γ(η, a, b).
j=1
Similar to Xi et al. [24], we use a Gibbs sampler to perform statistical inference based on the Bayesian model. The full conditional distribution of η is given by βj2 η , π(η | D, β, θ) ∝ η a+0.5h−1 exp −b + 0.5 j∈H
with H = {j : βj = 0} and h = H, where denotes the number of set elements. The full conditional distribution of η is Γ(a + 0.5h, b + 0.5 j∈H βj2 ). Let β−j and θ−j be the sub-vectors of θ and β by removing their jth element, respectively. The full conditional distribution of θj is π(θj | D, β, θ−j , η) ∝ π(βj | θj , η)π(θj ) η 2 exp −0.5βj η I(0 < θj < 1). ∝ θj I(βj = 0) + (1 − θj )I(βj = 0) 2π 1950050-6
August 2, 2019 14:15 WSPC
S1793-5245
242-IJB
1950050
BEL and variable selection for censored linear model
The full conditional distribution of θj is Beta(1 + I(βj = 0), 1 + I(βj = 0)). The full conditional distribution of βj is Int. J. Biomath. 2019.12. Downloaded from www.worldscientific.com by UNIVERSITY OF NEW ENGLAND on 09/21/19. Re-use and distribution is strictly not permitted, except for Open Access articles.
π(βj | D, β−j , θ, η) ∝ L(β | D)π(βj | θj , η). We employ a Metropolis–Hastings step proposed in Sec. 2 in the Gibbs sampler for sampling from π(βj | D, β−j , θ, η). At the (t + 1)th iteration with a current value (t) βj of βj , the proposal distribution in the M-H step is chosen to be a normal (t)
distribution with mean βj
and variance vj2 , vj2 is estimated by the bootstrapping
suggested by Xi et al. [24]. The algorithm is given as follows: (1) Let t = 0 and the initial values β (0) = βˆGˆ . (2) At the (t + 1)th iteration with a current value β (t) of β, draw η (t+1) from the (t) 2 distribution Γ(a + 0.5h, b + 0.5 j∈H βj ), with H = {j : βj = 0} and h = H. (t) (t) (t+1) Draw θj from the distribution Beta 1 + I βj = 0 , 1 + I βj = 0 .
(3) For j = 1, . . . , p, draw a candidate value βj∗ from the proposal distribution (t) (t) (t+1) pprop (· | βj ). The pprop is a normal distribution N βj , vj2 . Accept βj∗ as βj with the probability (t) (t+1) (t+1) ∗ (t) ,η π βj∗ | θj L βj , β−j | D pprop βj | βj∗ . min 1, (t) (t+1) (t) π βj | θ j , η (t+1) L(β (t) | D)pprop βj∗ | βj (4) Repeating (2) and (3) until the convergence of the Gibbs sampler. Bayesian estimate βˆB of β is taken to be sample median of the generated observations.
4. Simulation Studies In this section, we shall conduct some simulations to illustrate the potential of BEL. 4.1. BEL estimator In this subsection, we use Monte Carlo simulations to investigate the performances of BEL and empirical likelihood methods, the normal approximation (NA) by coverage probability from the frequent viewpoint. Consider following two models: Model 1: Ti = β0 + β1 Xi + ei , Model 2: Ti = β0 + β1 Xi + 0.5(1 + Xi )ei , where Xi s are drawn from the uniform distribution U [0, 2] and ei s are generated from the χ220 −20. In Model 1, the true parameters (β0 , β1 ) = (5, 10) and in Model 2, (β0 , β1 ) = (1, 1). The first model is taken from Qin and Jing [19] and Model 2 covers the heteroscedastic errors. For both models, the censoring times Ci are generated from the exponential distribution with rate a. We vary a to attain different censoring proportion. 1950050-7
August 2, 2019 14:15 WSPC
S1793-5245
242-IJB
1950050
Int. J. Biomath. 2019.12. Downloaded from www.worldscientific.com by UNIVERSITY OF NEW ENGLAND on 09/21/19. Re-use and distribution is strictly not permitted, except for Open Access articles.
C.-J. Li, H.-M. Zhao & X.-G. Dong
We are mainly interested here in comparing the performances of BEL with the two empirical likelihood for the linear model with censored data: the estimated empirical likelihood (EEL) which was proposed by Qin and Jing [19], the ADEL method represented here by Li and Wang [16]. The NA is also being considered. For the Metropolis–Hastings algorithm in the MCMC sampling, we generate 5000 samples and burn in 1000 samples. The simulated coverage probabilities for the parameters using NA, EEL, ADEL and our BEL method are summarized in Tables 1 and 2. The sample size n has Table 1. Simulated coverage probabilities of BEL, EEL, ADEL and NA confidence regions for β in Model 1 based on 1000 replications. Censoring ≈ 0.34
≈ 0.42
≈ 0.49
Levels
Sample size
NA
EEL
ADEL
BEL
0.90
100 200 300 500
0.833 0.867 0.879 0.887
0.855 0.895 0.892 0.888
0.836 0.870 0.884 0.890
0.890 0.920 0.913 0.914
0.95
100 200 300 500
0.901 0.930 0.941 0.943
0.931 0.944 0.951 0.947
0.917 0.932 0.941 0.945
0.942 0.964 0.954 0.961
0.99
100 200 300 500
0.969 0.974 0.978 0.986
0.988 0.987 0.989 0.983
0.973 0.975 0.985 0.984
0.994 0.988 0.995 0.988
0.90
100 200 300 500
0.838 0.848 0.837 0.876
0.860 0.868 0.859 0.878
0.843 0.843 0.841 0.876
0.892 0.904 0.885 0.895
0.95
100 200 300 500
0.893 0.897 0.916 0.930
0.916 0.930 0.918 0.948
0.903 0.901 0.914 0.933
0.947 0.954 0.938 0.957
0.99
100 200 300 500
0.954 0.962 0.974 0.981
0.975 0.980 0.985 0.981
0.971 0.971 0.977 0.988
0.984 0.985 0.989 0.986
0.90
100 200 300 500
0.794 0.833 0.828 0.861
0.835 0.854 0.874 0.881
0.808 0.847 0.841 0.866
0.873 0.892 0.912 0.907
0.95
100 200 300 500
0.865 0.896 0.902 0.917
0.913 0.920 0.932 0.939
0.876 0.912 0.905 0.922
0.947 0.948 0.959 0.956
0.99
100 200 300 500
0.945 0.958 0.970 0.982
0.974 0.986 0.977 0.985
0.951 0.971 0.978 0.980
0.985 0.991 0.989 0.989
1950050-8
August 2, 2019 14:15 WSPC
S1793-5245
242-IJB
1950050
BEL and variable selection for censored linear model
Int. J. Biomath. 2019.12. Downloaded from www.worldscientific.com by UNIVERSITY OF NEW ENGLAND on 09/21/19. Re-use and distribution is strictly not permitted, except for Open Access articles.
Table 2. Simulated coverage probabilities of BEL, EEL, ADEL and NA confidence regions for β in Model 2 based on 1000 replications. Censoring
Nominal level
Sample size
Normal
EEL
ADEL
BEL
≈ 0.14
0.90
100 200 300 500
0.805 0.831 0.846 0.865
0.800 0.841 0.851 0.871
0.810 0.836 0.856 0.868
0.835 0.886 0.907 0.907
0.95
100 200 300 500
0.869 0.905 0.903 0.920
0.875 0.910 0.912 0.932
0.874 0.911 0.914 0.925
0.912 0.955 0.948 0.957
0.99
100 200 300 500
0.939 0.966 0.965 0.973
0.948 0.967 0.970 0.974
0.950 0.969 0.970 0.974
0.979 0.991 0.986 0.990
0.90
100 200 300 500
0.747 0.785 0.808 0.816
0.759 0.786 0.829 0.833
0.769 0.805 0.828 0.833
0.782 0.860 0.893 0.895
0.95
100 200 300 500
0.821 0.851 0.869 0.893
0.855 0.866 0.900 0.906
0.861 0.874 0.905 0.902
0.888 0.926 0.941 0.954
0.99
100 200 300 500
0.919 0.939 0.952 0.952
0.932 0.954 0.962 0.967
0.939 0.951 0.966 0.966
0.970 0.976 0.982 0.982
0.90
100 200 300 500
0.611 0.685 0.681 0.718
0.655 0.732 0.735 0.765
0.658 0.722 0.732 0.768
0.674 0.786 0.829 0.857
0.95
100 200 300 500
0.689 0.752 0.760 0.805
0.734 0.799 0.820 0.841
0.744 0.815 0.819 0.849
0.787 0.878 0.895 0.909
0.99
100 200 300 500
0.808 0.865 0.878 0.902
0.860 0.899 0.925 0.932
0.870 0.900 0.925 0.941
0.928 0.949 0.965 0.975
≈ 0.20
≈ 0.25
been chosen to be 100, 200, 300 and 500, respectively. Each entry in the table was computed based on 1000 replications. (1) From Table 1, we can compare different censoring and different nominal levels, the coverage probabilities of the four methods is very accurate with the increase of sample size. In contrast, the coverage of NA is the worst, the EEL and DEL outperform the NA method. The BEL outperforms EEL, ADEL. Particularly, the BEL performs much better than the NA method in high censoring case. 1950050-9
August 2, 2019 14:15 WSPC
S1793-5245
242-IJB
1950050
Int. J. Biomath. 2019.12. Downloaded from www.worldscientific.com by UNIVERSITY OF NEW ENGLAND on 09/21/19. Re-use and distribution is strictly not permitted, except for Open Access articles.
C.-J. Li, H.-M. Zhao & X.-G. Dong
(2) From Table 2, we get the similar conclusion to Table 1. The BEL outperforms EEL, ADEL and the NA method. Particularly, the BEL performs much better in high censoring case. It also shows that we are also good for heteroscedasticity. (3) At each nominal level and for BEL, EEL, ADEL and NA methods, the coverage accuracies for in all models decrease as the censoring proportions the sample size n increase. However, the coverage accuracies for the BEL are acceptable even in high censoring case. Overall, our simulation studies indicate that the BEL method is superior to the empirical likelihood and the NA-based method in the context of the censored linear regression model, particularly in high censoring proportions. 4.2. BEL variable selection In this subsection, we use Monte Carlo simulations to compare the BEL method with existing methods including the LASSO, ALASSO and SCAD for the censored linear regression. The data in the simulation studies are generated by T i = X T β0 + u i ,
i = 1, . . . , n,
where the random errors ui is generated from N (0, 1) and the rows of X are generated independently from N (0, Σ), where the (i, j)th element of Σ is 0.5|i−j| . For two setups, the censoring times Ci are generated from the exponential distribution with rate a. In BEL, the prior parameter are assumed a = 0.1 and b = 0.1. Two setups are considered: Setup 1: β0 = (1, 2, 0, 0, 0), Setup 2: β0 = (3, 1.5, 0, 0, 2, 0, 0, 0). In the following, we use two criterions to evaluate the performances of each algorithm. The first criterion is the mean of mean absolute deviation (MMAD) Table 3.
Comparison of several variables selection under setup 1.
n
OLS
LASSO
ALASSO
SCAD
BEL
50
MMAD TP FP
1.320 — —
0.822 1.271 0.004
0.786 2.057 0.032
0.800 1.964 0.031
0.766 2.723 0.101
100
MMAD TP FP
0.922 — —
0.590 1.211 0.004
0.537 2.163 0.032
0.530 2.021 0.031
0.471 2.824 0.101
150
MMAD TP FP
0.793 — —
0.507 1.139 0.000
0.467 2.101 0.001
0.453 1.958 0.000
0.384 2.835 0.008
200
MMAD TP FP
0.665 — —
0.430 1.145 0.000
0.390 2.130 0.000
0.376 2.030 0.000
0.312 2.880 0.002
1950050-10
August 2, 2019 14:15 WSPC
S1793-5245
242-IJB
1950050
BEL and variable selection for censored linear model Table 4.
Comparison of several variables selection under setup 2.
Int. J. Biomath. 2019.12. Downloaded from www.worldscientific.com by UNIVERSITY OF NEW ENGLAND on 09/21/19. Re-use and distribution is strictly not permitted, except for Open Access articles.
n
OLS
LASSO
ALASSO
SCAD
BEL
50
MMAD TP FP
4.340 — —
2.721 2.196 0.051
2.631 3.191 0.107
2.686 3.168 0.138
2.846 4.007 0.306
100
MMAD TP FP
3.137 — —
1.997 2.120 0.005
1.818 3.403 0.027
1.806 3.141 0.024
1.792 4.356 0.098
150
MMAD TP FP
2.525 — —
1.624 2.118 0.001
1.420 3.652 0.003
1.395 3.241 0.006
1.363 4.551 0.044
200
MMAD TP FP
2.218 — —
1.437 2.030 0.003
1.222 3.659 0.005
1.194 3.195 0.004
1.100 4.606 0.020
which is defined as pj=1 |βˆj −β0j |. The second criterion is the mean of true positives (TP) and false positives (FP) selected by each algorithm. TP records the average number of zero-coefficients returned by the noise variables. FP records the average number of zero coefficients returned by the signal variables. We give the simulation process of different penalty functions and get the value of MMAD in Tables 3 and 4. With the increase of sample size, the MMAD of different penalty functions are getting smaller. Under the same sample, the MMAD of BEL is smaller than other methods in most cases. For different methods, the TP of BEL is significantly higher than that of SCAD, ALASSO, LASSO. The BEL method is getting closer to the number of zero-coefficients. Under two setups, the results of Tables 3 and 4 are similar, we again see that generally BEL performs the best in most cases. We observe that the BEL have dominated the other method in terms of MMAD and TP in most cases.
5. Illustrative Examples 5.1. BEL estimator We illustrate the BEL method and compare it to the NA method using a real data set which is readily available in the R package KMsurv. The alloauto data consists of 101 patients with advanced acute myelogenous leukemia reported to the International Bone Marrow Transplant Registry. One purpose of the study was to compare of the effectiveness of these two methods of transplant as measured by the length of patients leukemia-free survival. To that end, the data contains information about time to death or relapse (in months), the type of transplant and Leukemia-free survival indicator (0 = alive without relapse, 1 = dead or relapse). Further information on the data may be found, for instance, in Sec. 1.9 in Klein and Moeschberger [11]. 1950050-11
August 2, 2019 14:15 WSPC
S1793-5245
242-IJB
1950050
Int. J. Biomath. 2019.12. Downloaded from www.worldscientific.com by UNIVERSITY OF NEW ENGLAND on 09/21/19. Re-use and distribution is strictly not permitted, except for Open Access articles.
C.-J. Li, H.-M. Zhao & X.-G. Dong
Fig. 1.
The sampling chain of β0 and β1 for the BEL in Model 1 with n = 200 and 34% censored.
ˆ and 97.5% Upper Bounds of β for the BEL in Fig. 2. The potential scale reduction factor R Model 1 with 3 chains (n = 200, 34% censored).
1950050-12
August 2, 2019 14:15 WSPC
S1793-5245
242-IJB
1950050
BEL and variable selection for censored linear model
Let T be the logarithm to time to death or relapse and X be the type of transplant (1 = allogeneic, 2 = autologous). We consider the model
Int. J. Biomath. 2019.12. Downloaded from www.worldscientific.com by UNIVERSITY OF NEW ENGLAND on 09/21/19. Re-use and distribution is strictly not permitted, except for Open Access articles.
T = β0 + β1 X + ε. As in Koul et al. [12], the OLS estimator is given by (2.2). For the M-H algorithm in the BEL, we generate 15000 samples and burn in 5000 samples. As in Brooks and Gelman [2], Fig. 1 shows the sampling chains of β0 and β1 for the BEL in Model 1, when n = 200 and 34% censored, the left subfigure shows that the trajectory is stable. The distribution of posterior density is given in the right subfigure. Figure 2 display a convergence diagnostic for Markov chains. The conclusions of Figs. 3 and 4 are similar result those of Figs. 1 and 2.
Fig. 3.
The sampling chain of β0 and β1 for the BEL in the alloauto data.
1950050-13
August 2, 2019 14:15 WSPC
S1793-5245
242-IJB
1950050
Int. J. Biomath. 2019.12. Downloaded from www.worldscientific.com by UNIVERSITY OF NEW ENGLAND on 09/21/19. Re-use and distribution is strictly not permitted, except for Open Access articles.
C.-J. Li, H.-M. Zhao & X.-G. Dong
Fig. 4.
ˆ and 97.5% Upper Bounds of β for the BEL with 3 chains in the alloauto data. The R
Table 5.
The estimator and confidence interval results for the alloauto data.
Method
Variable
Estimator
SDs
95% LCI
95% UCI
OLS
β0 β1
−1.077 1.609
0.867 0.811
−2.776 0.622
0.019 3.198
BEL (5000–1000)
β0 β1
−0.373 0.984
0.637 0.559
−2.041 0.362
0.497 2.426
BEL (10,000–3000)
β0 β1
−0.451 1.050
0.723 0.638
−2.451 0.374
0.406 2.927
BEL (15,000–5000)
β0 β1
−0.402 1.015
0.729 0.650
−2.357 0.352
0.503 2.849
We display the results in Table 5. There is a difference in the treatment failure rates for the two types of transplants. BEL obtains the estimator, sds, LCI and UCI in the 5000 (burn in 1000), 10,000 (burn in 3000), 15,000 (burn in 5000) samples separately. The confidence interval of Autologous is obviously smaller than that of allogeneic, and different samples are the same. Autologous showed more positive effects for the survival time. 5.2. BEL variable selection In this subsection, we applied the proposed variable selection procedure to the data of the bone marrow transplantation for leukemia, which is available in R package 1950050-14
August 2, 2019 14:15 WSPC
S1793-5245
242-IJB
1950050
Int. J. Biomath. 2019.12. Downloaded from www.worldscientific.com by UNIVERSITY OF NEW ENGLAND on 09/21/19. Re-use and distribution is strictly not permitted, except for Open Access articles.
BEL and variable selection for censored linear model
KMsurv (bmt data). The study was carried out from March 1, 1984 to June 30, 1989. The data set contains 137 patients admitted to four hospitals, including 99 patients with acute myeloctic leukemia and 38 patients with acute lymphoblastic leukemia. There were 81 patients who died and 56 patients alive in the study. Further information on the data may be found, for instance, in Sec. 1.3 in Klein and Moeschberger [11]. We consider the linear model for the logarithm to time to death (Y ) with 11 covariates. The variable names are list in Table 6. The penalized coefficient estimates from the BEL method and other proposed methods are summarized in Table 7. The trace and density of intercept of 12 covariates in Table 7 are given in Figs. 5–8. It give the sampling chains of β for the BEL, which show the stationarity of the sequences. The LASSO suggests that X5 , X10 and X11 have no effect on Y . The ALASSO and SCAD give similar results and suggest that X5 , X6 , X8 , X10 and X11 have no effect on Y . The BEL selects X8 and X9 for this model which is different with the other method.
Table 6.
The variable name of the bmt data.
Variables Y X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11
The logarithm to time to death Patient age in years Donor age in years Patient sex: 1-Male, 0-Female Donor sex: 1-Male, 0-Female Patient CMV Status: 1-CMV Positive, 0-CMV Negative Donor CMV status: 1-CMV Positive, 0-CMV Negative Waiting time to transplant in days FAB: 1-FAB Grade 4 or 5 and AML, 0-Otherwise Hospital: 1-The Ohio State University, 2-Alferd, 3-St. Vincent, 4-Hahnemann MTX: 1-Yes, 0-No Disease group: 1-ALL, 2-AML Low Risk, 3-AML High Risk
Table 7.
The variable selection for LASSO, ALASSO, SCAD and BEL.
Variables (Intercept) X1 -Patient age X2 -Donor age X3 -Patient sex X4 -Donor sex X5 -Patient CMV Status X6 -Donor CMV status X7 -Waiting time X8 -FAB X9 -Hospital X10 -MTX X11 -Disease group
OLS
LASSO
ALASSO
SCAD
7.351 0.059 −0.068 −0.976 −1.227 −0.529 0.638 −0.001 1.135 −1.037 0.469 −0.234
5.214 0.065 −0.053 −0.534 −0.648 0.000 0.234 −0.001 0.617 −0.789 0.000 0.000
5.649 0.038 −0.038 −0.431 −0.572 0.000 0.000 −0.001 0.317 −0.764 0.000 0.000
7.456 0.039 −0.054 −1.113 −1.186 0.000 0.000 −0.001 0.000 −0.956 0.000 0.000
1950050-15
BEL 4.132 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.146 −0.647 0.000 0.000
Int. J. Biomath. 2019.12. Downloaded from www.worldscientific.com by UNIVERSITY OF NEW ENGLAND on 09/21/19. Re-use and distribution is strictly not permitted, except for Open Access articles.
August 2, 2019 14:15 WSPC S1793-5245 242-IJB
1950050-16
1950050
C.-J. Li, H.-M. Zhao & X.-G. Dong
Fig. 5. The sampling chain of β for the BEL.
Fig. 6.
The sampling chain of β for the BEL.
August 2, 2019 14:15 WSPC
S1793-5245
242-IJB
1950050
Int. J. Biomath. 2019.12. Downloaded from www.worldscientific.com by UNIVERSITY OF NEW ENGLAND on 09/21/19. Re-use and distribution is strictly not permitted, except for Open Access articles.
BEL and variable selection for censored linear model
Fig. 7.
The sampling chain of β for the BEL.
Fig. 8.
The sampling chain of β for the BEL. 1950050-17
August 2, 2019 14:15 WSPC
S1793-5245
242-IJB
1950050
C.-J. Li, H.-M. Zhao & X.-G. Dong
Int. J. Biomath. 2019.12. Downloaded from www.worldscientific.com by UNIVERSITY OF NEW ENGLAND on 09/21/19. Re-use and distribution is strictly not permitted, except for Open Access articles.
6. Conclusion In this paper, we consider the BEL-based estimator and variable selection for the censored linear models. We use the modified Newton–Raphson algorithm, the BEL method is superior to the empirical likelihood and the NA, particularly in high censoring proportions. Give two special priors, the provided BEL method is a high efficiency tool for these models. It accurately estimates the model parameters, and gives smaller credible regions than the NA methods and EL methods. BEL model selection higher accuracy and correct identification rate than the LASSO, ALASSO and SCAD. Our simulation results support the validity of our methods. Finally, two real data sets analysis show that censored linear model with BEL is appropriate for the acute myeloctic leukemia data set. The empirical analysis results are significant. The forecasting problem and detailed inference for the censored linear models could also be taken into consideration. We leave these issues as our future work to research. Acknowledgments This research was partially supported by National Science Foundation of China (NSFC) grants 11571050, 11571051 and 11671054 and The Education Department of Jilin Province,“13th Five-Year” project planning 2016316. References [1] R. Alhamzawi and K. Yu, Variable selection in quantile regression via Gibbs sampling, J. Appl. Stat. 39 (2012) 799–813. [2] S. P. Brooks and A. Gelman, General methods for monitoring convergence of iterative simulations, J. Comput. Graph. Statist. 7 (1998) 434–455. [3] J. Buckley and I. James, Linear regression with censored data, Biometrika 66 (1979) 429–436. [4] H. Chang and R. Mukerjee, Bayesian and frequentist confidence intervals arising from empirical-type likelihoods, Biometrika 95 (2008) 139–147. [5] J. Chen, R. R. Sitter and C. Wu, Using empirical likelihood methods to obtain range restricted weights in regression estimators for surveys, Biometrika 89 (2002) 230–237. [6] D. R. Cox, Regression models and life-tables, J. R. Stat. Soc. Ser. B 34 (1972) 187–220. [7] J. Q. Fan and R. Z. Li, Variable selection via nonconvave penalized likelihood and its oracle properties, J. Amer. Statist. Assoc. 456 (2001) 1348–1360. [8] X. N. Feng, G. C. Wang, Y. F. Wang and X. Y. Song, Structure detection of semiparametric structural equation models with Bayesian adaptive group lasso, Statist. Med. 34 (2015) 1527–1547. [9] X. N. Feng, H. T. Wu and X. Y. Song, Bayesian adaptive lasso for ordinal regression with latent variables, Sociol. Methods Res. 46 (2015) 1–28. [10] C. Hans, Bayesian lasso regression, Biometrika 96 (2009) 835–845. [11] J. P. Klein and M. L. Moeschberger, Semiparametric Proportional Hazards Regression with Fixed Covariates, Survival Analysis. Vol. 8 (Springer, New York, 1997). [12] H. Koul, V. Susarla and J. Van Ryzin, Regression analysis with randomly rightcensored data, Ann. Statist. 9 (1981) 1276–1288. [13] N. A. Lazar, Bayesian empirical likelihood, Biometrika 90 (2003) 319–326. 1950050-18
August 2, 2019 14:15 WSPC
S1793-5245
242-IJB
1950050
Int. J. Biomath. 2019.12. Downloaded from www.worldscientific.com by UNIVERSITY OF NEW ENGLAND on 09/21/19. Re-use and distribution is strictly not permitted, except for Open Access articles.
BEL and variable selection for censored linear model
[14] C. L. Leng, M. N. Tran and D. Nott, Bayesian adaptive Lasso, Ann. Inst. Statist. Math. 66 (2014) 221–244. [15] S. Leurgans, Linear models, random censoring and synthetic data, Biometrika 74 (1987) 301–309. [16] G. Li and Q. H. Wang, Empirical likelihood regression analysis for right censored data, Statist. Sinica 13 (2003) 51–68. [17] A. Owen, Empirical likelihood for linear models, Ann. Statist. 19 (1991) 1725–1747. [18] T. Park and G. Casella, The Bayesian lasso, J. Amer. Statist. Assoc. 103 (2008) 681–686. [19] G. S. Qin and B. Y. Jing, Empirical likelihood for censored linear regression, Scand. J. Statist. 28 (2001) 661–673. [20] S. M. Schennach, Exponential specifications and measurement error, Econom. Lett. 85 (2004) 85–91. [21] N. S. Tang, D. W. Li and A. M. Tang, Semiparametric Bayesian inference on generalized linear measurement error models, Statist. Papers 58 (2017) 1–23. [22] A. M. Tang, X. Q. Zhao and N. S. Tang, Bayesian variable selection and estimation in semiparametric joint models of multivariate longitudinal and survival data, Biom. J. 59 (2017) 57–78. [23] R. Tibshirani, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Ser. B 58 (1996) 267–288. [24] R. B. Xi, Y. X. Li and Y. M. Hu, Bayesian quantile regression based on the empirical likelihood with spike and slab priors, Bayesian Anal. 11 (2016) 821–855. [25] D. Z. Zheng, A method for determining the parameter stability regions of linear control systems, IEEE Trans. Automat. Contrl. 29 (1984) 183–186. [26] H. Zou, The adaptive lasso and its oracle properties, J. Amer. Statist. Assoc. 101 (2006) 1418–1429.
1950050-19