Cox regression model for dissecting genetic architecture of survival time

Genomics 104 (2014) 472–476 Contents lists available at ScienceDirect Genomics journal homepage: www.elsevier.com/locate/ygeno Methods Cox regress...

Download PDF

468KB Sizes 2 Downloads 55 Views

Report

PDF Reader
Full Text

Genomics 104 (2014) 472–476

Contents lists available at ScienceDirect

Genomics journal homepage: www.elsevier.com/locate/ygeno

Methods

Cox regression model for dissecting genetic architecture of survival time Dan Jiang a,1, Hongwei Wang b,1, Jiahan Li c, Yang Wu d, Ming Fang a, Runqing Yang e,⁎ a

Life Science College Heilongjiang Bayi Agricultural University, Daqing 163319, People's Republic of China Fishery Technical Extension Station, Beijing Daxing Animal Health Supervisory Commission, Beijing 102600, People's Republic of China c Applied and Computational Mathematics and Statistics, University of Notre Dame, IN 46637, USA d Institute of Animal Science, Chinese Academy of Agricultural Science, Beijing 100193, People's Republic of China e Research Centre for Aquatic Biotechnology, Chinese Academy of Fishery Sciences, Beijing 100141, People's Republic of China b

a r t i c l e

i n f o

Article history: Received 13 April 2014 Accepted 3 October 2014 Available online 12 October 2014 Keywords: Survival time QTL LASSO Cox regression model Partial likelihood algorithm

a b s t r a c t Common quantitative trait locus (QTL) mapping methods fail to analyze survival traits of skewed normal distributions. As a result, some mapping methods for survival traits have been proposed based on survival analysis. Under a single QTL model, however, those methods perform poorly in detecting multiple QTLs and provide biased estimates of QTL parameters. For sparse oversaturated model used to map survival time loci, the least absolute shrinkage and selection operator (LASSO) for Cox regression model can be employed to efﬁciently shrink most of genetic effects to zero. Then, a few non-zero genetic effects are re-estimated and statistically tested using the standard maximum Cox partial likelihood method. Simulation shows that the proposed method has higher statistic power for QTL detection than that of the LASSO for logarithmic linear model or the interval mapping based on Cox model, although it somewhat underestimates QTL effects. Especially, computational speed of the method is very fast. An application of this method illustrates mapping main effect and interacting QTLs for heading time in the North American Barley Genome Mapping Project. © 2014 Elsevier Inc. All rights reserved.

1. Introduction Survival traits measured as time to event usually come with two features: skewed distribution with heavier right tail and censored observations [1]. By incorporating survival analysis theory into the traditional quantitative trait loci (QTL) mapping frameworks, the QTL mapping for survival traits has been introduced to efﬁciently locate survival time loci, allowing a better understanding of genetic architectures underlying survival traits. Broman [2] considered a cure-rate model that treated the mice alive at the end of the study as cured ones when mapping QTL of the time to death of bacterial infection. The survival time was modeled by log-normal distribution. A Cox proportional hazards (PH) model was proposed to characterize the effects of the QTL genotype on failure time [3], where model parameters and computed LOD scores were estimated by a variant of EM algorithm [4]. Diao et al. [5] also used a Cox PH model with a Weibull baseline hazard function to locate QTLs, and then developed efﬁcient likelihood-based inference procedures. These two Cox PH models belong to parametric models for mapping survival time loci, due to the component of estimating baseline hazard functions. Along this line, parametric Cox PH model for mapping QTLs of heading time in rice was optimized by Luo et al. [6], where the best baseline hazard distribution was selected from six commonly used ⁎ Corresponding author. Fax: +86 10 68670701. E-mail address: [email protected] (R. Yang). 1 These authors have contributed equally to this work.

http://dx.doi.org/10.1016/j.ygeno.2014.10.002 0888-7543/© 2014 Elsevier Inc. All rights reserved.

survival distributions. Other than parametric algorithms, Diao and Lin [6] developed semi-parametric statistical methods for Cox PH model to search for survival trait loci. Without need to estimate baseline hazard functions, Fang [7] proposed further a simple and efﬁcient nonparametric approach to estimate QTL parameters through partial likelihood function. Using simulated data with different structures, Moreno et al. [8] systematically compared the parametric model based on Weibull distribution, semi-parametric model, and classical interval mapping based on the normal distribution. Additionally, accelerated failure time model was also used to model the genetic effects of QTL on survival traits [9–11]. All these methods of mapping survival traits are developed for interval mapping, where the whole genome is scanned, but only one QTL is analyzed at each time. For survival traits controlled by multiple QTLs, this mapping strategy is suboptimal due to the existence of linked QTLs. Over the past decade, QTL mapping methods have been developed to simultaneously analyze multiple QTLs for normal traits. Mapping multiple QTLs by either non-Bayesian or Bayesian methods is in fact a model selection problem about selecting QTLs from a large number of genetic loci over the entire genome. Although Bayesian shrinkage analysis greatly facilitates modeling multiple QTLs and the shrinkage estimation, full Bayesian shrinkage mapping is practically infeasible due to high computational cost [12–17]. As an equivalent strategy to Bayesian shrinkage estimation with double exponential priors for regression coefﬁcients [18], the LASSO [19] is widely used to solve for the oversaturated linear model. Most recently, Liu etc. [20] employed

D. Jiang et al. / Genomics 104 (2014) 472–476

473

Table 1 Mean estimates and standard deviations (in parentheses) of QTL positions obtained with three mapping methods for the simulated datasets. Sample size 150

300

Method

Q1

Q2

Q3

Q4

Q5

Q6

Q7

Q8

Q9

Q10

True position Cox-LASSO Gau-LASSO Cox-LS Cox-LASSO Gau-LASSO Cox-LS

23 23.9(2.3) 23.2(3.1) 28.5(5.5) 24.2(1.6) 23.9(2.7) 29.3(4.9)

56 56.6(1.8) 56.8(1.3) 74.0(5.3) 56.6(1.3) 56.6(1.6) 74.1(4.0)

49 50.0(2.5) 49.8(2.4) 54.3(8.2) 49.9(2.3) 49.6(2.5) 55.7(7.1)

94 94.9(2.6) 94.0(3.0) 90.2(7.2) 95.1(2.1) 94.9(2.8) 90.3(6.3)

69 70.3(3.0) 69.5(2.7) – 70.0(2.8) 70.5(2.9) –

35 34.5(2.3) 34.6(2.5) 32.17(6.7) 34.7(2.3) 34.8(2.5) 32.7(5.9)

93 94.1(2.0) 94.5(1.9) 98.5(3.5) 94.4(1.6) 94.0 (2.1) 95.7(2.8)

80 80.7(2.6) 81.0(1.3) 81.3(6.6) 80.8(1.7) 80.9(2.3) 80.9(5.0)

27 26.4(2.6) 26.2 (2.2) – 26.0(2.2) 26.5(2.3) 22.8(7.9)

79 77.9(2.3) 80.1(1.9) – 79.9(2.4) 79.7(2.6) –

a fast LASSO with coordinate descent algorithm [21] to successfully search for the QTLs of normal traits. To date, however, no multipleQTL mapping method is reported for survival traits. In this study, a little modiﬁed LASSO for Cox regression model [22], as a non-parametric approach, is used to efﬁciently analyze sparse oversaturated model for mapping survival loci. Then for the subset of selected loci, their genetic effects are unbiasedly estimated and statistical tests are carried out using the standard maximum Cox partial likelihood method [24]. Simulation is conducted to investigate statistical efﬁciency of the mapping method proposed. A dataset from the North American Barley Genome Mapping Project is analyzed to map the QTLs for heading times.

2.2. Shrinkage estimation of genetic effects Usually survival data are censored because of random loss to followup, failures from competing causes, or limited duration of the experiment. Besides, base hazard function in model (1) is unknown in general. To address these issues, Cox partial likelihood algorithm was developed to handle censoring problem without estimating baseline hazard function. Based on model (1), the Cox's partial likelihood [24] can be written as exp PL ¼ ∏ XlðiÞ n

2. Method

i¼1

2.1. Multiple-QTL proportional hazards model

X p

bx j¼1 j i j

b j xlðiÞ j X ; p exp b x j r j j¼1 j¼1

ð2Þ

where l(i) for l = 1, 2, …, t denotes the index at lth survival time in the increasing list of unique survival times for ith individuals. By deﬁning μ = ∂log L/∂η, A = − ∂2 log L/∂ηηT and z = η + A− 1μ with η = b′x, the logarithm of log-partial likelihood is approximated by a two term Taylor series, as follows

Assume that n individuals from a backcross (BC) population are observed for a survival trait, and are genotyped for m co-dominant markers with known genetic linkage map. To map the QTLs of the observed survival trait, entire genome is evenly divided by k loci that are 1 or 2 cM away from each other, and the candidates of QTLs include the genotyped markers as well as ungenotyped loci between markers. Our genetic design implies that there are two genotypes at each locus on chromosome, denoted by QQ and Qq. With the proportional hazard model, the effects of the QTL candidates can be formulated as λðt i Þ ¼ λ0 ðt i Þexp

r¼1

X p

T T T z−b x : log ðPLÞ ¼ z−b x

ð3Þ

where, b = [b1, b2, ⋯, bp] and x = [x1, x2, ⋯, xn]T with xi = [xi1, xi2, ⋯, xip] Decomposing matrix A into VTV by Cholesky decomposition, we transform objective function (3) into

ð1Þ

T 0 T T 0 y−b x : log ðPLÞ ¼ y−b x

where ti is survival time for the ith individual, λ(ti) is the hazard function evaluated at time ti, λ0(t) is a common baseline hazard function, p = m + k is the number of the QTL candidates, bj is additive effect of the jth QTL candidate, and xij is the indicator variable of the jth QTL candidate for individual i determined by QTL genotypes. If the QTL candidate is the genotyped marker, xij is deﬁned as + 1 for QQ and − 1 for Qq, respectively. Otherwise for any locus between two markers, the indicator variable can be estimated by its expectation conditional on ﬂanking markers, according to least square method by Haley and Knott [23]. Speciﬁcally, the expectation is calculated as

ð4Þ

where y = Vz and x′ = Vx. In QTL mapping with linkage analysis, the number of estimated parameters is far greater than the sample size. Moreover, the number of QTLs with non-zero genetic effects is very limited. The LASSO with coordinate descent algorithm [21,25] can be therefore employed to efﬁciently shrink most of genetic effects to be zero by minimizing T 0 T T 0 y−b x y−b x þ λjbj:

E xi j ¼ ðþ1ÞpðQQ Þ þ ð−1ÞpðQ qÞ ¼ pðQQ Þ−pðQ qÞ

ð5Þ

where λ is a tuning parameter, which will be chosen through cross validation. Since both y and x′ are the function of parameter b, iterations are required in solving the model parameters with the LASSO.

where p(QQ) and p(Qq) are probabilities for two QTL genotypes estimated by ﬂanking markers.

Table 2 Mean estimates and standard deviations (in parentheses) of QTL effects obtained with three mapping methods for the simulated datasets. Sample size 150

300

Method

Q1

Q2

Q3

Q4

Q5

Q6

Q7

Q8

Q9

Q10

True effect Cox-LASSO Gau-LASSO Cox-LS Cox-LASSO Gau-LASSO Cox-LS

1.80 1.17(0.25) 0.55(0.21) 0.69(0.12) 1.14(0.18) 0.61(0.13) 0.68(0.08)

2.30 1.52(0.3) 0.75(0.15) 0.8(0.11) 1.42(0.2) 0.68(0.09) 0.79(0.08)

1.02 0.68(0.15) 0.41(0.21) 0.4(0.07) 0.62(0.12) 0.46(0.18) 0.37(0.06)

1.40 0.91(0.21) 0.47(0.13) 0.42(0.09) 0.86(0.14) 0.36(0.13) 0.41(0.06)

−0.52 −0.48(0.08) −0.19(0.19) – −0.36(0.07) −0.27(0.21) –

1.00 0.68(0.17) 0.39(0.11) 0.33(0.04) 0.61(0.11) 0.38(0.08) 0.22(0.03)

−0.95 −0.6(0.13) 0.17(0.09) 0.31(0.05) −0.6(0.11) −0.29(0.25) 0.22(0.02)

1.55 0.98(0.2) 0.22(0.07) 0.39(0.07) 0.97(0.14) 0.39(0.13) 0.35(0.06)

0.65 0.5(0.09) 0.23(0.1) – 0.37(0.08) 0.27(0.17) 0.2(0.01)

−1.10 1.17(0.25) 0.29(0.07) – −0.68(0.12) 0.37(0.14) –

474

D. Jiang et al. / Genomics 104 (2014) 472–476

Table 3 Statistical powers of QTL detection (%) and false positive rate (FPR, %) obtained with three mapping methods for the simulated datasets. Sample size

150

300

Method

Cox-LASSO Gau-LASSO Cox-LS Cox-LASSO Gau-LASSO Cox-LS

Statistical power

FPR

Q1

Q2

Q3

Q4

Q5

Q6

Q7

Q8

Q10

Q9

81.8 46.2 82.6 89.1 52.8 88.2

94.5 52.4 93.8 96.6 58.8 97.4

72.2 24.6 71.2 84.2 45.8 80.4

85.8 49.8 86.0 96.3 56.6 95.4

15.2 2.0 0.0 43.0 13.6 0.0

59.6 22.4 10.4 80.2 44.2 28.2

55.8 15.8 0.4 78.8 37.8 2.8

75.0 27.2 78.6 90.6 50.8 99.2

17.0 13.2 0.0 55.8 33.6 1.4

68.0 23.4 0.0 86.6 45.2 0.0

7.6 9.5 9.7 4.3 5.8 5.8

2.2.1. Statistical inference for QTL Although LASSO could be used to identify a few QTLs responsible for the survival trait by setting most of the genetic effects to zero, the estimated nonzero genetic effects are biased due to the forcing penalties on all parameters. Moreover, LASSO can not provide signiﬁcance test for the estimated parameters. Therefore we propose to re-estimate model (1) with the selected QTLs by maximum partial likelihood algorithm [24]. From the log-likelihood (3), the least squares normal equations are constructed as

Genetic effects re-estimated by reweighted least squares may be biased upward due to the variable selection procedure in the ﬁrst step. Meanwhile, population structure and marker density also inﬂuence the distribution of the t test statistic. In mapping practice, therefore, permutation tests [26] should be introduced to determine the critical value of t test statistic. The markers and loci corresponding to signiﬁcant genetic effects are identiﬁed as the QTLs for the survival trait.

T −1 T T −1 x A x b ¼x A z:

To validate the statistical gain of the proposed method, simulation study in accordance with the genetic design of the real data example is carried out. In a backcross population with a sample size of 150 or 300, 6 chromosomes of 100 cM are simulated and 11 co-dominant markers are evenly placed on each chromosome. A total of 10 QTLs are inserted between those markers. There are 2 QTLs on chromosome 1, 2 and 4, respectively, while only one QTL on other chromosomes. Positions and genetic effects of the QTLs are listed in Tables 1 and 2. We generate survival time following a Weibull distribution

ð6Þ

Iteratively reweighted algorithm is required to estimate parameter b, since x, A and z are all functions of b. Once the algorithm converges, the least square estimates of genetic effects are obtained: ^ T ¼ xT A−1 x −1 xT A−1 z : b

ð7Þ

Then, the signiﬁcance test can be carried out by the t test statistic

2.3. Simulation study

by t ¼

ln ðU Þ T λ0 exp ðb xÞ

[27], where U is a uniformly distributed random

^ is the least square estimate for kth genetic effect and V is where b k k the diagonal element in matrix (xTA−1x)−1 corresponding to the kth genetic effect.

number, λ0 is a parameter in Weibull distribution and is set to 1.5 here, and bTx is the expected genetic effect from 10 simulated QTLs. Moreover, 15% of simulated survival time is censored randomly. The simulated dataset are separately analyzed by three methods outlined in the previous section. To facilitate comparison of the three analysis methods, test statistics of all methods are transformed to − log(p), where p is the p-value associated with the realized test statistic. For each scenario, the critical value of the test statistic is

Fig. 1. The proﬁle of −log(p) test statistics obtained with Cox-LASSO method for heading times in barely. The horizontal reference line points the genome-wide threshold value. Chromosomes are separated by the vertical dotted lines and marker positions are indicated by the ticks on the horizontal axis.

Fig. 2. The proﬁle of −log(p) test statistics obtained with Gau-LASSO method for heading times in barely. The genome-wide threshold value is pointed by a horizontal reference line. Chromosomes are separated by the vertical dotted lines and marker positions are indicated by the ticks on the horizontal axis.

^ bk t k ¼ pﬃﬃﬃﬃﬃﬃ Vk

ð8Þ

D. Jiang et al. / Genomics 104 (2014) 472–476

Fig. 3. The proﬁle of −log(p) test statistics obtained with Cox-LS method for heading times in barely. The genome-wide threshold value is pointed by a horizontal reference line. Chromosomes are separated by the vertical dotted lines and marker positions are indicated by the ticks on the horizontal axis.

determined by simulating 1000 samples under the null model with zero genetic effects. The simulations are replicated 500 times for estimating QTL parameters and assessing the statistical power of QTL detections. The statistical power is calculated for each locus as the percentage of the number of those simulations that statistic values exceed the critical value. Also, false positive rate is evaluated with 500 replicated simulations under the null model with zero genetic effects. Mapping results of three mapping methods at different sample sizes are listed in Table 1 for QTL position estimates, in Table 2 for genetic effect estimates and in Table 3 for the statistical power of QTL detection. Clearly, the Cox-LASSO method greatly outperforms its competitors in terms of statistical power and QTL parameter estimation. As expected, statistical power and estimation precision increase with heritability and sample size increased. It is worthy to note that the LASSO method slightly underestimates QTL genetic effects, which may be a result of Cox partial likelihood algorithm without estimating baseline hazard function [28,29]. All methods are timed for each dataset simulated (results not reported). The results showed that, on average, the LASSO-based approaches take about half of computing time of the Cox-LS method, suggesting the computational efﬁciency of the proposed method. 3. Real data analysis The dataset is collected from the North American Barley Genome Mapping Project, which can be downloaded from http://wheat.pw.

475

usda.gov/ggpages/SxM/. Doubled-haploid (DH) population was derived from Steptoe × Morex. 150 DH lines were observed for heading days under 16 different environments and genotyped for 223 markers. Linkage map covering a genome of ~1500 cM was constructed based on this DH population. The dataset is reanalyzed by using three mapping methods: (1) ordinary least square method (LS) that analyzes Cox model and estimates single genetic effect for each separately (Cox-LS method for short), (2) our proposed LASSO method that analyzes the oversaturated Cox model by simultaneously estimating all genetic effects (Cox-LASSO method for short), and (3) the LAASO method with coordinate descent algorithm that solves the oversaturated linear model for the logarithm of survival time (Gau-LASSO method for short). Prior to the mapping analysis, the genome is evenly partitioned by 2 cM, leading to a total of 968 loci including the genotyped markers. The phenotypic values of the trait are measured by using the averages of observations across multiple environments. The genotypic indicators of 5% missing markers and loci between markers are simply replaced with their expectations conditional on ﬂanking markers. To declare QTL signiﬁcance, genome-wide critical values at the signiﬁcance level 5% are obtained from 1000 permutation tests, which is 4.05 for Cox-LASSO method, 1.33 for Gau-LASSO method, and 3.48 for Cox-LS method. Fig. 1 displays the proﬁle of − log(p) obtained from Cox-LASSO method, where seven loci from chromosomes 1, 2, 3, 4 and 5 are statistically signiﬁcant. Among these detected QTLs, 4 are located on genetic markers. On the other hand, Gau-LASSO and Cox-LS methods only locate one QTL that is also implied by the Cox-LASSO method (see Figs. 2 and 3). The QTL implied by the Cox-LS method is located between marker ABC156A and MWG858 on the second chromosome, while the QTL implied by Gau-LASSO method is on marker ‘ABG397’ of chromosome 4. Table 4 presents parameter estimates of the detected QTLs from three mapping methods. It can be seen that although two QTLs implied by two alternative approaches (Gau-LASSO and Cox-LS) are less significant than the corresponding ones implied by Cox-LASSO model, the signs of genetic effects are consistent. Cox-LASSO model, however, is capable of discovering more QTLs explaining the heading time in Barley. Moreover, the largest genetic effect estimated by Cox-LASSO method is about three times greater than that of Cox-LS method. Generally speaking, most of QTLs detected by three mapping methods have relatively low genetic effects (less than 1.0), except two QTLs on the second chromosome. Following the procedure proposed in Yi and Banerjee [30], further, we identify interacting QTLs for heading days in Barley chromosome by chromosome. As displayed in Fig. 4, there are evident interacting signals on each chromosome almost, but the −log(p) at only 8 pairs of loci exceed genome-wide critical value (2.01) calculated using permutation tests. Those detected interacting QTLs involve interactions between both main-effect QTLs such as a pair of QTL1 and QTL7. Most of interactions include only one main-effect QTL, and interactions without main-effect QTLs as well. They regulate heading days in different direction and magnitude (results not shown).

Table 4 Estimated QTL parameters obtained with three mapping methods for heading times in Barley. QTL No. 1 2 3 4 5 6 7

Cox-LASSO

Gau-LASSO

Chr.-pos. (cM)

Marker interval

Effect

−log(p)

1-26.9 2-39 2-71 3-87.2 4-58.3 4-129.7 5-145.8

MWG089 ABC156A–MWG858 ABG316C–ABC167B ABG453 WG1026B ABG397 Aga7

0.80 5.47 −1.92 −0.40 −0.62 −0.54 −0.50

7.23 13.24 11.33 4.37 6.39 4.36 4.73

Effect

−0.92

Cox-LS −log(p)

1.75

Effect

−log(p)

1.47

10.60

476

D. Jiang et al. / Genomics 104 (2014) 472–476

challenging but could be addressed with proper variable selection and variable screening techniques. Acknowledgments The research is supported by the National Natural Science Foundation of China (31172190 and 3001001). There is no conﬂict of interest. References

Fig. 4. The two-dimensional proﬁle of −log(p) test statistics obtained with Cox-LASSO method for heading times in barely. The dotted lines represent genome-wide critical value (2.01) calculated with 1000 permutation tests.

4. Discussion Based on Cox PH model, multiple-QTL model is constructed for mapping survival time loci, wherein particular Cox's partial likelihood method is introduced to estimate parameters due to its advantages of handling censored observations without estimating baseline hazard function. Given the sparse oversaturated Cox PH model, the Lasso for Cox regression model [22] can efﬁciently solve Cox's partial likelihood. Finally, a standard maximum Cox partial method is applied to the reduced model for the unbiased genetic effect estimation and the associated signiﬁcant tests. Clearly, the mapping method proposed in this article can not only estimate QTL effects, but also statistically test their signiﬁcance, as what have been done in the conventional interval mapping. Furthermore, our proposed method can be easily extended to genome-wide association study with thousands of single nucleotide polymorphisms. Different from the Lasso for Cox regression model [22] that replaces matrix A by a diagonal matrix with the diagonal entries being from matrix A, our method takes Cholesky decomposition of matrix A to obtain the squared sum of the residual error of the estimated parameters as an approximation of partial likelihood. An extremely fast Lasso with coordinate descent algorithm [21,25] is employed to iteratively solve the approximation of partial likelihood. For convenience of the application, the program implementing our proposed method is written in R, which can be requested from authors. Although the proposed method has higher statistical power in detecting survival time loci than interval mapping based on Cox model, QTL effects are underestimated mainly due to the Cox's partial likelihood method. To further improve the estimation of QTL effects, parametric proportional hazard model for ﬁtting single QTL [5,31,32] is recommended to simultaneously analyze few QTLs. For more than ﬁve detected QTLs, however, the nonlinear mixture model is too complicated to be efﬁciently estimated. Our method is described in BC population, which is suitable for double haploid (DH) and recombinant inbreeding lines (RILs). But it can be easily extended to analyzing multiple survival time loci in other genetically designed population, such as intercross and outbred populations. Furthermore, the extension of main effect model to epistatic model is under development, where selecting important epistatic QTLs from huge combinations between all loci is

[1] J.D. Kalbﬂeisch, R.L. Prentice, Relative risk (Cox) regression models, The Statistical Analysis of Failure Time Data2nd ed., J Wiley, Hoboken, NJ, 2002. 95–147. [2] K.W. Broman, Mapping quantitative trait loci in the case of a spike in the phenotype distribution, Genetics 163 (2003) 1169–1175. [3] R.C. Symons, M.J. Daly, J. Fridlyand, T.P. Speed, W.D. Cook, S. Gerondakis, A.W. Harris, S.J. Foote, Multiple genetic loci modify susceptibility to plasmacytoma-related morbidity in E(mu)-v-abl transgenic mice, Proc. Natl. Acad. Sci. U. S. A. 99 (2002) 11299–11304. [4] S.R. Lipsitz, J.G. Ibrahim, Estimating equations with incomplete categorical covariates in the Cox model, Biometrics (1998) 1002–1013. [5] G. Diao, D. Lin, F. Zou, Mapping quantitative trait loci with censored observations, Genetics 168 (2004) 1689–1698. [6] G. Diao, D. Lin, Semiparametric methods for mapping quantitative trait loci with censored data, Biometrics 61 (2005) 789–798. [7] Y. Fang, A note on QTL detecting for censored traits, Genet. Sel. Evol. 38 (2006) 221. [8] C. Moreno, J. Elsen, P. Le Roy, V. Ducrocq, Interval mapping methods for detecting QTL affecting survival and time-to-event phenotypes, Genet. Res. 85 (2005) 139–150. [9] X. Zhou, L. Yan, D.R. Prows, R. Yang, Generalized F accelerated failure time model for mapping survival trait loci, Genomics 97 (2011) 379–385. [10] Z. Piao, X. Zhou, L. Yan, Y. Guo, R. Yang, Z. Luo, D.R. Prows, Statistical optimization of parametric accelerated failure time model for mapping survival trait loci, Theor. Appl. Genet. 122 (2011) 855–863. [11] J.-Y. Cheng, S.-J. Tzeng, Parametric and semiparametric methods for mapping quantitative trait loci, Comput. Stat. Data Anal. 53 (2009) 1843–1849. [12] H. Wang, Y.-M. Zhang, X. Li, G.L. Masinde, S. Mohan, D.J. Baylink, S. Xu, Bayesian shrinkage estimation of quantitative trait loci parameters, Genetics 170 (2005) 465–480. [13] S. Xu, Derivation of the shrinkage estimates of quantitative trait locus effects, Genetics 177 (2007) 1255–1258. [14] S. Xu, Z. Hu, Methods of plant breeding in the genome era, Genet. Res. 92 (2011) 423. [15] M. Fang, D. Jiang, D. Li, R. Yang, W. Fu, L. Pu, H. Gao, G. Wang, L. Yu, Improved LASSO priors for shrinkage quantitative trait loci mapping, Theor. Appl. Genet. (2012) 1–10. [16] R. Yang, S. Xu, Bayesian shrinkage analysis of quantitative trait loci for dynamic traits, Genetics 176 (2007) 1169–1185. [17] X. Cai, H. Anhui, X. Shizhong, Fast empirical Bayesian LASSO for multiple quantitative trait locus mapping, BMC Bioinforma. 12 (2011). [18] M. Yuan, Y. Lin, Efﬁcient empirical Bayes variable selection and estimation in linear models, J. Am. Stat. Assoc. 100 (2005) 1215–1225. [19] R. Tibshirani, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Ser. B Methodol. (1996) 267–288. [20] Y. Liu, T. Yang, H. Li, R. Yang, Iteratively reweighted LASSO for mapping multiple quantitative trait loci, Brief. Bioinform. 15 (2014) 20–29. [21] J. Friedman, T. Hastie, R. Tibshirani, Regularization paths for generalized linear models via coordinate descent, J. Stat. Softw. 33 (2010) 1. [22] N. Simon, J. Friedman, T. Hastie, R. Tibshirani, Regularization paths for Cox's proportional hazards model via coordinate descent, J. Stat. Softw. 39 (2011) 1–13. [23] C.S. Haley, S.A. Knott, A simple regression method for mapping quantitative trait loci in line crosses using ﬂanking markers, Heredity (Edinb) 69 (1992) 315–324. [24] D.R. Cox, Regression models and life-tables, J. R. Stat. Soc. Ser. B Methodol. (1972) 187–220. [25] M. Yuan, Y. Lin, Model selection and estimation in regression with grouped variables, J. R. Stat. Soc. Ser. B Stat. Methodol. 68 (2007) 49–67. [26] G.A. Churchill, R.W. Doerge, Empirical threshold values for quantitative trait mapping, Genetics 138 (1994) 963–971. [27] R. Bender, T. Augustin, M. Blettner, Generating survival times to simulate Cox proportional hazards models, Stat. Med. 24 (2005) 1713–1723. [28] D.D.R. Cox, D.O. Oakes, Analysis of Survival Data, Chapman & Hall, 1984. [29] D. Oakes, The asymptotic information in censored survival data, Biometrika 64 (1977) 441–448. [30] N. Yi, S. Banerjee, Hierarchical generalized linear models for multiple quantitative trait locus mapping, Genetics 181 (2009) 1101–1113. [31] H. Gao, Y. Liu, T. Zhang, R. Yang, D.R. Prows, Parametric proportional hazards model for mapping genomic imprinting of survival traits, J. Appl. Genet. 54 (2013) 79–88. [32] Z. Luo, Z. Piao, X. Zhou, T. Yang, R. Yang, An optimal parametric proportional hazards model for mapping heading time loci in rice, Euphytica (2012) 1–8.

Cox regression model for dissecting genetic architecture of survival time

Cox regression model for dissecting genetic architecture of survival time

Recommend Documents