ACTA AGRONOMICA SINICA Volume 35, Issue 9, September 2009 Online English edition of the Chinese language journal Cite this article as: Acta Agron Sin, 2009, 35(9): 1569–1575.
RESEARCH PAPER
Bayesian Statistics-Based Multiple Interval Mapping of QTL Controlling Endosperm Traits in Cereals WANG Ya-Min1,2, KONG Wen-Qian1, TANG Zai-Xiang1, LU Xin1, and XU Chen-Wu1,* 1
Jiangsu Provincial Key Laboratory of Crop Genetics and Physiology/ Key Laboratory of Plant Functional Genomics of Ministry of Education, Yangzhou University, Yangzhou 225009, China
2
Basis Course of Lianyungang Technical College, Lianyungang 222006, China
Abstract: Bayesian methods have been widely used in many fields of modern sciences. On the basis of Bayesian statistics and quantitative genetic model of triploid endosperm traits, a Bayesian multiple interval method was proposed for mapping QTL underlying endosperm traits. The quantitative trait loci (QTLs) of endosperm were analyzed with data set from DNA molecular marker genotypes of each plant in F2 segregation population and the single endosperm observation of a few endosperms of each plant. After the construction of the multiple-QTL model, the Bayesian estimates of multiple QTL position and effects were obtained by Markov chain Monte Carlo (MCMC) algorithm implementing via Gibbs and Metropolis-Hastings sampling. The validation of the statistical procedure was verified through chromosome-level simulation. The results showed that the proposed Bayesian method estimates the multiple QTL positions and effects and distinguish the 2 dominant effects of endosperm QTL. Keywords:
Bayesian statistics, endosperm traits, Markov chain Monte Carlo, quantitative trait loci
Genetic improvement of quality traits in cereal crops has become one of the major breeding goals worldwide since the 1990s. As a quantity trait controlled by triploid background, endosperm quality inherits following genetic models for triploid, which are the precondition for quality improvement in crops. Mo et al. [1, 2] proposed a quantitative genetic model for endosperm traits. Based on this model, Xu et al. [3] developed a quantitative trait locus (QTL) mapping method for endosperm traits. Thereafter, a series of mapping methods for endosperm QTLs have been proposed derived from classical mathematical statistics. For example, mapping method using maximum likelihood estimation implemented via expectation-maximization (EM) algorithm and one-stage and two-stage designs [4, 5] and iteratively reweighted least square method [6]. However, these methods were all developed on a single QTL model, which may cause bias estimations of QTL when more QTLs are in the same linkage group and the QTLs interact with each other. Therefore, Kao [7] developed a multiple interval mapping method to solve this problem. Since crop seeds grow in the maternal plants, the genetic expression
of endosperm traits are also influenced by the maternal parent. To analyze the maternal effect synchronously with the effect of endosperm QTL, Cui et al. [8] proposed a genetic model based on the assumption of coexistence of maternal and endosperm genetic systems controlling the genetic expression of endosperm traits. Hu and Xu [9] claimed that the maternal and endosperm QTLs should be attributed to the same genetic system and developed another method for endosperm QTL mapping. Notably, these methods cannot distinguish 2 dominant effects of endosperm when using the mean values of plants as the data. Wen and Wu [10] developed QTL interval mapping method for endosperm traits under the random hybridization design. Wang et al. [11] recently utilized North Carolina Design III (NCIII) and triple test cross (TTC)of classical quantitative genetics for precisely estimating the 2 dominant effects. Besides, in considering the epistatic QTL of endosperm traits, He and Zhang [12] proposed an epistatic QTL mapping method of endosperm traits based on penalized likelihood and random hybridization design. In recent years, with the rapid development of high
Received: 18 December 2008; Accepted: 25 April 2009. * Corresponding author. E-mail:
[email protected] Copyright © 2009, Crop Science Society of China and Institute of Crop Sciences, Chinese Academy of Agricultural Sciences. Published by Elsevier BV. All rights reserved. Chinese edition available online at http://www.chinacrops.org/zwxb/ DOI: 10.1016/S1875-2780(08)60100-5
WANG Ya-Min et al. / Acta Agronomica Sinica, 2009, 35(9): 1569–1575
performance computer and Markov chain Monte Carlo (MCMC) algorithm, Bayesian statistics that is different from the classical mathematical statistics has been applied widely in many scientific research fields. At the same time, in the statistical genomics, many Bayesian methods [13–16] have been developed to map QTL controlling diploid traits. However, the application of Bayesian method to QTL mapping of endosperm traits has not been available. In this article, on the basis of the single QTL model using Bayesian method [17], a multiple QTL model was developed for multiple interval mapping of endosperm traits.
1
Theory and method
1.1
Assumed p QTLs in p consecutive marker intervals control 1 endosperm trait. Because each QTL of a segregation population derived from 2 pure lines only possesses 2 alleles, there are 4 possible genotypes in the mth endosperm QTL, namely QmQmQm, QmQmqm, Qmqmqm, and qmqmqm (m = 1, 2, …, p). The genotypic values are defined as 3am/2, am/2+d1m, −am/2+d2m, and −3am/2 [2], where am is an additive effect, d1m and d2m are the first and the second dominant effects, respectively. Let Yij be the phenotypic value of the jth (j = 1, 2, …, ni) endosperm on the ith plant (i = 1, 2, …, k) in an F2 population, which can be described by the following statistical model: p
(1)
m =1
where µ is the overall mean of the population; Xs are indicator variables defined as X1ijm = 3/2 and X2ijm = X3ijm= 0 for genotype QmQmQm; X1ijm = 1/2, X2ijm = 1, and X3ijm = 0 for genotype QmQmqm; X1ijm = –1/2, X2ijm = 0, and X3ijm = 1 for genotype Qmqmqm; X1ijm = –3/2, and X2ijm = X3ijm = 0 for genotype qmqmqm; eij is the residual error distributed as normal distribution N (0, σ 02 ). Here, σ 02 is residual variance. Denote ⎡ X11 ⎢X 21 X=⎢ ⎢ # ⎢ ⎣⎢ X k1
X12 X 22 # Xk 2
b = ⎡⎣ a1
d11
d 21
e = ⎡⎣e11 e12 " e1n1
Y21 Y22 " Y2 n2
a2
d12
; ni ×3
k
T denotes transpose of the matrix; n = ∑ ni . i =1
Then, model (1) can be expressed in matrix notation as Y = µ I + Xb + e
(2)
1.2
Conditional posterior distribution of parameters
Because the number of parameters (µ, b, and σ 02 ) to be estimated in model (2) is relatively large, Bayesian shrinkage estimation method has to be adopted. With reference to the definition of the prior distribution of parameters by Wang et al. [13], p ( µ ) ∝ 1 , p (σ 02 ) ∝ 1/ σ 02 , p (bl ) = N (0, σ l2 ) , where bl denotes the lth element (l=1, 2, …, 3p) in vector b. In Bayesian shrinkage estimation, we need to further define σ l2 with prior distribution p (σ l2 ) ∝ 1/ σ l2 , and denote 2 2 2 2 v = (σ 0 σ 1 σ 2 " σ 3 p ) . However, it is noted that X in model (2) is unknown and needs to be further deduced from the position vector of QTL
λ = [ λ1
λ2
"
λp ]
T 1× p
and the genotypic matrix M of molecular markers. Therefore, all the unknown variables should be included in
θ = {µ, b, v, λ , X} According to the Bayes’ rule, in condition of the known Y and M, posterior distribution of θ can be described as p (θ Y , M ) =
p ( Y, Μ , θ) p ( Y, M )
∝ p ( Y , M , θ ) = p ( Y , Μ θ ) p (θ )
(3)
where, p(θ) is the joint prior distribution of unknown parameters and p(Y, M|θ) is the likelihood function of observed vector. Function (3) is the core of Bayesian statistics and can be used to deduce the conditional posterior distribution of unknowns, which is described as follows:
" X1 p ⎤ " X 2 p ⎥⎥ " # ⎥ ⎥ " X kp ⎦⎥
Y = [Y11 Y12 " Y1n1
⎡ X 1i1m X 2i1m X 3i1m ⎤ ⎢X X 2i 2 m X 3i 2 m ⎥⎥ 1i 2 m Xim = ⎢ ⎢# ⎥ # # ⎢ ⎥ X X X ⎥ 2 ini m 3ini m ⎦ ⎣⎢ 1ini m
where, I represents a column vector, in which all elements in n rows are 1.
Statistical genetic models of endosperm traits
Yij = µ + ∑ ( am X 1ijm + d1m X 2ijm + d 2 m X 3ijm ) + eij
where,
T
" Yk 1 Yk 2 " Yknk ]
1×n
d 22 " a p
e21 e22 " e2 n2
d1 p
d 2 p ⎤⎦
T
where,
µ=
1× 3 p
" ek1 ek 2 " eknk ⎤⎦
2 1) µ " ~ N ( µ , s0 )
T 1×n
p 1 k ni [Yij − ∑ (am X 1ijm + d1m X 2ijm + d 2 m X 3ijm )] , ∑∑ n i =1 j =1 m =1
s02 = σ 02 / n .
WANG Ya-Min et al. / Acta Agronomica Sinica, 2009, 35(9): 1569–1575
2) bl " ~ N (bl , sl2 )
6) Updating QTL position λ
where,
bl = ( X Xl + σ / σ ) [ X (Y − µ I − X − l b − l )] , T l
2 0
2 −1 l
T l
sl2 = ( XTl Xl + σ 02 / σ l2 ) −1σ 02 . Here, Xl is the lth column of the matrix X, X−l is a new matrix excluding the lth column from the original matrix X, and b−l is a new vector excluding the lth row from the original vector b. 3) σ 02 " ~ Inv − χ 2 [n, (Y − µ I − Xb)T (Y − µ I − Xb)] 4) σ l2 | " ~ Inv − χ 2 (1, bl2 ) The ‘…’ in the above 4 distribution descriptions denotes all θ except for corresponding parameters. The parameters can be updated according to the corresponding conditional posterior distribution of each parameter. 5) Updating matrix X The update for the matrix X is complicated. The method is as follows: Let M iml and M imr be the left and right marker genotypes of the ith plant in the mth QTL with the position λm. According to Haldane’s map function, the recombination fractions between the QTL and left marker and the QTL and right marker can be calculated. Next, the prior probabilities of the 3 possible genotypes of plant QTL are obtained:
Denote the positions of flanking markers of the mth QTL as ζl and ζr, and assume λm with the prior distribution U(ζl, ζr), i.e., p(λm) = 1/(ζr − ζl). Due to the unavailability of the explicit form for the conditional posterior distribution p(λm|···), parameter λm cannot be obtained as other parameters using Gibbs sampling, but employing the MetropolisHastings algorithm instead. First of all, sample a proposal position λm* with uniform distribution U [λm − c, λm + c], where c is a defined constant with the value of 1 or 2 cM, the conditional probability density of λm* given a known λm can be described as
q(λm* λm ) = 1/[(λm + c) − (λm − c)] = 1/ 2c . The probability density for obtaining λm* should be adopted as same as that for sampling λm. Therefore, we have
q(λm λm* ) = 1/[(λm* + c) − (λm* − c)] = 1/ 2c . Consequently,
q(λm* λm ) = q (λm λm* ) . It can be determined to accept the new position λm* or not by the probability min(1,α) through Metropolis-Hastings algorithm, here,
p1im = p (Qm Qm λm , M iml , M imr ) , p2im = p (Qm qm λm , M iml , M imr ) , and p3im = p(qm qm λm , M , M ) . l im
r im
Notably, when the information of the left or right marker is missing or incomplete, the method described by Jiang and Zeng [18] can be adopted to calculate the prior probability of the QTL. From these prior probabilities of plant QTL genotypes, the prior probabilities of the 4 endosperm QTL genotypes for the jth seed of the plant can be obtained: p1ijm = p1im + p2im/4 for QmQmQm, p2ijm = p3ijm = p2im/4 for both QmQmqm and Qmqmqm, and p4ijm = p3im + p2im/4 for qmqmqm. According to Yij and the prior probabilities of the 4 endosperm QTL genotypes, we can further calculate the posterior probabilities of endosperm QTL genotypes using the following formula: 4
* phijm = phijm f hijm (Yij Gh ) / ∑ phijm f hijm (Yij Gh ) , h =1
where, Gh (h = 1, 2, 3, 4) represents the 4 possible endosperm QTL genotypes, fhijm(Yij|Gh) stands for the conditional * probability density of Yij. From phijm , the endosperm QTL genotype of the seed can be randomly sampled. Each endosperm QTL genotype of ni seeds on the plant can be imputed in the same way. Thus, we can further obtain matrix Xim and X from the endosperm QTL genotypes.
k
ni
i =1 k
j =1 ni
i =1
j =1
∏∏ f (Y
ij
α=
∏∏ f (Y
ij
λm* ) λm )
* p (λm* ) q(λm λm ) p (λm ) q(λm* λm )
(4)
is called the probability of acceptance. f (Yij λm* ) and f (Yij λm ) are the probability density function of Yij, which are given λm* and λm, respectively. Since there are 4 possible genotypes for an endosperm, Yij is a mixture of 4 Gaussian distributions with mixing proportions equal to the posterior genotype probabilities. Apparently, the new position λm* will be accepted absolutely when α ≥ 1, and accepted with the probability of α when α < 1. Once the new position is accepted, λm is replaced with λm* . If the new position λm* is rejected, the original λm will remain unchanged. However, we will have q(λm* λm ) ≠ q(λm λm* ) when a QTL position is close to the boundary of the interval. If λm is close to the left marker, λm – ζl = dm < c, and the new position λm* must be sampled from U[λm−dm, λm+c] to make sure that the new position is in the interval. Similarly, if λm is close to the right marker, ζr – λm = dm < c, and the new position must be sampled from U[λm−c, λm+dm]. In either case, the proposal densities q(λm* λm ) and q(λm λm* ) should be modified. The general formulae of the density after incorporating the modification are as follows:
WANG Ya-Min et al. / Acta Agronomica Sinica, 2009, 35(9): 1569–1575
⎧1/[c + (λm − ζ l )] ⎪ * q(λm | λm ) = ⎨1/[c + ζ r − λm ] ⎪1/ 2c ⎩
if λm − ζ l < c if ζ r − λm < c
2 ;
otherwise
⎧1/[c + (λ − ζ l )] if λ − ζ l < c ⎪ q(λm | λm* ) = ⎨1/[c + (ζ r − λm* )] if ζ r − λm* < c . ⎪1/ 2c otherwise ⎩ * m
* m
Let δm and δ m* be the distances of λm and λm* from the nearest boundary of the interval, respectively. The general formulae above can be simplified into q( λm* |λm)=1/[c + min(c, δm)] and q(λm| λm* )=1/[c + min(c, δ m* )]. Because p(λm) and p(λm* ) are both prior probability densities of the same interval with uniform distribution, p(λm* ) /p(λm) = 1. Hence, formula (4) can be changed into
α=
k
ni
i =1 k
j =1 ni
i =1
j =1
∏∏
f (Yij λm* )
∏∏ f (Y
ij
1.3
λm )
c + min(c, δ m* ) c + min(c, δ m )
MCMC processing
Except QTL position λ needs to be updated with the above Metropolis-Hastings algorithm, all the other parameters possess unique conditional posterior distributions, which can be generated by Gibbs sampling. Therefore, the MCMC sampling procedure can be summarized as follows: 1) Give initial values of
µ (0) , b (0) , σ 02(0) , v (0) , X(0) , λ1(0) , " , λ p(0) to the unknowns; 2) Sample random number from N ( µ , s02 ) as µ(1), and update µ(0); 3) Sample random number from N (bl , sl2 ) as bl(1) , and update bl( 0) ; 4) Sample random number from
Inv − χ 2 [n, (Y − µ I − Xb)T (Y − µ I − Xb)] as σ 02(1) , and update residual variance σ 02(0) ; 5) Sample random number from Inv − χ 2 (1, bl2 ) as σ l2(1) , and update σ l2(0) ; 6) Update QTL position λ; 7) Update the indicator variable X(0) of QTL genotypes; 8) Repeat step 2 to step 7 for t times. For those posterior samples after t times of collection, the unconvergent samples in the chains should be eliminated. The posterior samples are collected in every constant sweeps to reduce the serial correlation. Select mean or mode as Bayesian estimates of corresponding parameters according to the distribution characteristics of the posterior samples.
Simulation
2.1
Simulation settings
The sample size of F2 plant population was set at 200 each with 30 endosperm observations. A single chromosome of length 100 cM with 11 evenly spaced markers was simulated. Three QTLs controlling an endosperm trait were set at 15, 55, and 95 cM. The population mean was set at 20. The heritability and effects of each QTL are given in Table 1. The total genetic variance σ G2 is the summation of each QTL genetic variance σ g2 , where,
σ g2 = 5a 2 / 4 + (ad1 − ad 2 ) / 4 + (3d12 + 3d 22 − 2d1 d 2 ) /16 . According to the heritability of H2 and σ G2 , the residual variance can be calculated by
σ E2 = σ G2 (1 − H 2 ) / H 2 . Repeat the simulation for 100 times. The statistical power of QTL is calculated by the counts of QTL detected. The precision and accuracy of the estimates for QTL location and effects are measured by the mean estimates and standard deviation of QTL parameters, respectively. The positions and effects of QTL for every sample can be estimated as the follows. The first step is to collect posterior sample. The proposed MCMC sampler was run for 20,000 sweeps and the first 2000 sweeps were discarded for the burn-in period (convergence was confirmed). The chain was trimmed by saving 1 observation in every 20 sweeps to reduce serial correlation. The total number of samples collected for the post-MCMC analysis was thus 900. The second step is to estimate QTL position and effect. The QTL position sample was used to make histogram first and then to calculate the mean of QTL position λˆ . The additive and dominant effects of QTL were estimated by the means of corresponding samples collected from small interval containing λˆ . 2.2
Simulation result
There are 10 marker intervals in the simulation and so there could be possibly 10 QTLs. Every QTL collects 900 positions. The histogram of QTL location was plotted by 1 cM increment. The frequency distribution of 10 possible QTL locations in a random replicate is shown in Fig. 1-A. There are 3 peaks in the putative QTL intervals, suggesting that all QTL were discovered. Since the distance of every marker interval is 10 cM, which can be divided into 10 small intervals of 1 cM, the average frequency falling into every small interval is only 90. To clarify the chromosome position of target QTL, a weighted frequency distribution diagram of QTL location was used with reference to the method described by Wang et al. [13] and Xu and Yi [19]. First, calculate the mean of additive and 2 dominant effects
WANG Ya-Min et al. / Acta Agronomica Sinica, 2009, 35(9): 1569–1575
Fig. 1 Posterior frequency distribution of QTL position
collected from each small interval. Second, work out σ g2 based on the genetic variance of single QTL. Draw a weighted frequency distribution diagram of QTL location by the weight σ g2 . From Fig. 1-B, we can see obviously that the weighted frequency distribution diagram clearly display the existence of QTL and its corresponding location. According to the analysis above, the mean and standard deviation of the 3 QTL position estimates are 15.98±0.07, 5.01±0.09, and 1.00±0.09, respectively. The estimates of the corresponding additive effects are 1.00±0.03, 0.87±0.02, and 0.79±0.01, respectively. The estimates of the first dominant effect are −0.53±0.47, 0.48±0.45, and −0.01±0.08, and the estimates of the second dominant effect are −0.95±0.57, 2.25±0.43, and 0.02±0.09, respectively. Similarly, the QTL position and effect estimates for other samples can be obtained (Table). Thereby, the QTL locations and additive effects can be accurately estimated. However, the estimates of the 2 dominant effects bias slightly.
3
Discussion
The genetic research on triploid endosperm traits has been a critically attractive topic in the field of crop quality in the recent decade. Compared with the conventional diploid traits, the endosperm traits have more complicated genetic compositions. It is, thus, inappropriate to adopt statistical analyzing method for diploid traits to dissect the endosperm traits, such as interval mapping and composite
Table QTL
Heritability
Statistical
(%)
power (%)
interval mapping. Xu et al. [3] pioneered by reporting regression method for mapping endosperm QTL with means of flanking marker genotypes. Afterwards, a series of models and methods were proposed to analyze endosperm QTL, including maximum likelihood method [4, 20, 21] and iteratively reweighted least square method [6]. These methods were developed from the classical mathematical statistics. In recent years, with the rapid development of modern statistics, Bayesian statistics has become one of the most vital theories and methods in QTL community resulting from its distinctive characteristics and unique analyzing approach. Under many complicated circumstances, Bayesian method can solve the problem more directly and integrate some prior information effectively compared to traditional mathematical statistics. However, it has been unavailable to be applied for a long time due to its high intensive calculation. With the development of high performance computer, Bayesian method has been applied widely in many fields. In the study of quantitative genetics, we first proposed an interval mapping method for endosperm traits by integrating Bayesian statistics with quantitative genetic models of triploid traits. The method used the DNA molecular marker genotypes of each plant in segregation population and the single endosperm observation of a few endosperms of each plant as data set to analyze endosperm QTL [17]. However, this method was based on single QTL model and cannot deal with more than one QTL. Therefore, it was extended to multiple QTL models of endosperm traits in this paper.
Means and standard derivations of three simulated QTLs
QTL position (cM) True
Estimated
qtl1
14.04
100
15
14.81±0.27
qtl2
10.82
100
55
54.91±0.32
qtl3
5.14
100
95
95.09±0.52
Population mean True 20
Estimated 20.32±0.16
Additive effect
First dominant effect
Second dominant effect
True
Estimated
True
Estimated
True
Estimated
1.0
0.90±0.02
−1
−1.09±0.37
−2
−0.85±0.29
1.0
0.85±0.03
1
0.48±0.08
2
1.39±0.33
0.8
0.79±0.02
0
0.01±0.08
0
−0.06±0.08
WANG Ya-Min et al. / Acta Agronomica Sinica, 2009, 35(9): 1569–1575
From the simulation result, Bayesian method proved to be highly effective in discovering and estimating QTLs. For example, even though the QTL heritability is only 5%, the statistical power reached 100% in the simulated experiment. However, the estimation accurate of the 2 dominant effects was not as effective as that of QTL position and additive effect. The possible reasons were analyzed using the conditional posterior samples of the relevant parameters. Form the MCMC chains of QTL position (Fig. 2), population mean (Fig. 3), and QTL effects (Fig. 4), all the posterior samples were obviously in a stationary distribution except for the 2 dominant effects. They biased with the true values. This result was in agreement with that of Wu et al. [5], Kao [7], Wen and Wu [22]. This phenomenon is probably associated with the population taken by the simulation experiment. Because the simulation used the marker genotypes of F2 plants to deduce the F3 endosperm QTL genotypes, it may cause incomplete
Fig. 2 Chain of posterior sample of three QTL positions
information in some cases. On the other hand, even if the value of the additive effect equals to that of the dominant effects, the proportion of genetic variation resulting from dominant effects may be quite lower than that of genetic variation resulting from the additive effect in the variance of endosperm traits. Therefore, there may be additional bias in accuracy and precision of dominant effects. To solve the problem, Wu et al. [5] and Kao [7] proposed so-called two-stage design, in which marker information can be derived from the maternal plants and the derived embryos. In terms of inferring the conditional probabilities of endosperm QTL genotypes, the two-stage design is clearly more informative than the one-stage design that only utilizes the maternal marker genotypes. In company with the advantages of two-stage design, a higher cost is inevitable due to the application of molecular markers. Wen and Wu [10] proposed a random mating population in realizing unbiased estimation for
Fig. 3 Chain of posterior sample of population mean
Fig. 4 Chain of posterior sample of three QTL effects Black, red, and blue lines denote a, d1, and d2, respectively.
WANG Ya-Min et al. / Acta Agronomica Sinica, 2009, 35(9): 1569–1575
the first and the second dominant effects of the endosperm traits. Wang et al. [11] recently developed a QTL interval mapping method based on NCIII and TTC designs, which are expected to be more powerful. In most experiments of Wang et al, even if the QTL heritability is as low as 5%, the probability for QTLs discovered may reach 100%. In addition, NCIII and TTC designs can distinguish all genetic effects of endosperm QTL, especially for distinguishing the 2 dominant effects. An option to “opt-out” of maternal effects has recently been suggested by Wen and Wu [23], which uses F2 or BC1 endosperm phenotypes and the molecular marker genotypes in each seed embryo. With these well-devised designs, the authors showed the possibility of estimating the direct effects of endosperm QTL with or without minimal maternal influence. In fact, the accurate estimation of the 2 dominant effects relies on 2 heterozygous genotypes QQq and Qqq can be recognized effectively. If we can distinguish 2 possible heterozygous genotypes of endosperm by some special designs or by molecular assay, the accurate estimation of dominance effects can be realized regardless of the size of the dominant effects. This requires further study in the future. Besides, it is noted that overloaded operation limits the application of Bayesian methodology. It took about 3.2 h to run the program once on a DELL computer of 3.0GH CPU and 1.0 GB memory in the simulated experiment setting. It is anticipated that longer computation is compulsory in real data analysis. Therefore, optimizing the approach and enhancing the arithmetic speed should also be addressed in the future.
4
[3]
[4]
[5]
[6] [7] [8]
[9]
[10]
[11]
[12]
Conclusions
Accurate mapping of endosperm QTL is the vital fundament of genetic improvement in crop quality traits. This paper initiated the Bayesian statistics-based multiple interval mapping of QTL underlying endosperm traits. Simulated study showed that on the condition of appropriate sample size, the proposed method can accurately estimate the QTL positions and distinguish the 2 dominant effects of endosperm QTL.
Acknowledgments This study was financially supported by the National Key Basic Research Program of China (2006CB101700) and the Program for New Century Excellent Talents in University of Ministry of Education, China (NCET2005-05-0502).
References [1]
[2]
Mo H D. Genetic research of endosperm-quality traits in cereals. Sci Agric Sin, 1995, 28: 1–7 (in Chinese with English abstract)
[13]
[14]
[15] [16]
[17]
[18]
[19] [20]
Mo H. Genetic expression for endosperm traits. In: Weir B S, Eisen E J, Goodman M M, Namkoong G, eds. In Proceedings of the Second International Conference on Quantitative Genetics. Massachusetts: Sinauer Associates, Inc., 1988. pp 478–487 Xu C W, He X H, Kuai J M, Gu S L. Mapping quantitative trait loci underlying endosperm traits in cereals. Sci Agric Sin, 2001, 34: 117–122 (in Chinese with English abstract) Wu R, Lou X Y, Ma C X, Wang X, Larkins B A, Casella G. An improved genetic model generates high-resolution mapping of QTL for protein quality in maize endosperm. Proc Natl Acad Sci USA, 2002, 99: 11281–11286 Wu R, Ma C X, Gallo-Meagher M, Littell R C, Casella G. Statistical methods for dissecting triploid endosperm traits using molecular markers: An autogamous model. Genetics, 2002, 162: 875–892 Xu C, He X, Xu S. Mapping quantitative trait loci underlying triploid endosperm traits. Heredity, 2003, 90: 228–235 Kao C H. Multiple-interval mapping for quantitative trait loci controlling endosperm traits. Genetics, 2004, 167: 1987–2002 Cui Y, Casella G, Wu R. Mapping quantitative trait loci interactions from the maternal and offspring genomes. Genetics, 2004, 167: 1017–1026 Hu Z, Xu C. A new statistical method for mapping QTLs underlying endosperm traits. Chin Sci Bull, 2005, 50: 1470–1476 Wen Y, Wu W. Methods for mapping QTLs underlying endosperm traits based on random hybridization design. Chin Sci Bull, 2006, 51: 1976–1981 Wang X F, Song W, Yang Z F, Wang Y M, Tang Z X, Xu C W. Improved genetic mapping of endosperm traits using NCIII and TTC designs. J Hered, 2009, 100: 496–500 He X H, Zhang Y M. Mapping epistatic quantitative trait loci underlying endosperm traits using all markers on the entire genome in a random hybridization design. Heredity, 2008, 101: 39–47 Wang H, Zhang Y M, Li X, Masinde G L, Mohan S, Baylink D J, Xu S. Bayesian shrinkage estimation of quantitative trait loci parameters. Genetics, 2005, 170: 465–480 Yi N. A unified Markov chain Monte Carlo framework for mapping multiple quantitative trait loci. Genetics, 2004, 167: 967–975 Xu S. Estimating polygenic effects using markers of the entire genome. Genetics, 2003, 163: 789–801 Yi N, Yandell B S, Churchill G A, Allison D B, Eisen E J, Pomp D. Bayesian model selection for genome-wide epistatic quantitative trait loci analysis. Genetics, 2005, 170: 1333–1344 Wang Y M, Sun S S, Tang Z X, Hu Z Q, Xu C W. Bayesian method for mapping QTL controlling endosperm traits in cereals. J Yangzhou Univ (Agric Life Sci Edn), 2008, 29(3): 12–17 (in Chinese with English abstract) Jiang C, Zeng Z B. Mapping quantitative trait loci with dominant and missing markers in various crosses from two inbred lines. Genetica, 1997, 101: 47–58 Xu S, Yi N. Mixed model analysis of quantitative trait loci. Proc Natl Acad Sci USA, 2000, 97: 14542–14547 Wang W, Hu Z Q, Sun C S, Xu C W. Single grain observation-
WANG Ya-Min et al. / Acta Agronomica Sinica, 2009, 35(9): 1569–1575
based mapping of quantitative traits loci underlying endosperm traits. Acta Agron Sin, 2005, 31: 989–994 (in Chinese with English abstract) [21] Xu C W, Wang W, Hu Z Q, Sun C S. Plant average-based maximum likelihood mapping of quantitative traits loci controlling endosperm traits. Acta Agron Sin, 2005, 31: 1271–1276 (in Chinese with English abstract)
[22] Wen Y, Wu W. Interval mapping of quantitative trait loci underlying triploid endosperm traits using F3 seeds. J Genet Genomics, 2007, 34: 429–436 [23] Wen Y, Wu W. Experimental designs and statistical methods for mapping quantitative trait loci underlying triploid endosperm traits without maternal genetic variation. J Hered, 2008, 99: 546–551