Journal of Theoretical Biology 432 (2017) 100–108
Contents lists available at ScienceDirect
Journal of Theoretical Biology journal homepage: www.elsevier.com/locate/jtbi
Joint parameter estimation in the QTL mapping of ordinal traits Xiaona Sheng a, Yihong Qiu b, Ying Zhou b,∗, Wensheng Zhu c,∗ a
School of Information Engineering, Harbin University, Harbin 150086, China School of Mathematical Sciences, Heilongjiang University, Harbin 150080, China c School of Mathematics and Statistics,Northeast Normal University, Changchun 130024, China b
a r t i c l e
i n f o
Article history: Received 26 December 2016 Revised 9 July 2017 Accepted 5 August 2017 Available online 12 August 2017 Keywords: Cumulative logistic regression model EM algorithm Multiple-interval mapping Ordinal traits
a b s t r a c t With the rapid development of statistical genetics, the deep researches of ordinal traits have been gradually emphasized. The data of these traits bear relatively less information than those of continuous phenotypes, therefore it is more complex to map the quantitative trait loci (QTL) of ordinal traits. In this paper, the multiple-interval mapping method is considered in the genetic mapping of ordinal traits. By combining threshold model and statistical model, we build a cumulative logistic regression model to express the relationship between the ordinal data and the QTL genotypes. In order to make the interval mapping more straightforward, we treat the recombination rates as unknown parameters, and then simultaneously obtain the estimates of QTL positions, QTL effects and threshold parameters via the EM algorithm. We perform simulation experiments to investigate and compare the proposed method. We also present a real example to test the reasonableness of the considered model and estimate both model parameters and QTL parameters. Both results of simulations and example show that the method we proposed is reasonable and effective. © 2017 Elsevier Ltd. All rights reserved.
1. Introduction With the appearance of more and more research results in recent years, it was found that as the unit of genetic functions, genes are not completely distributed on chromosomes at random. Some genes that control a certain trait usually have similar structures and related functions, and they are distributed in a certain region of a chromosome in the form of gene cluster. Over the years, people have provided lots of statistical methods on QTL mapping analysis. Performing interval mapping of gene loci has become an important tool in the studies of statistical genetics (Haley and Knott, 1992; Jansen, 1993; Kao et al., 1999; Lander and Botstein, 1989; Zeng, 1994). However, most of these statistical methods are fit for continuous traits. In some studies of QTL mapping, there exists a special kind of traits which are ordinal. These traits include several ordinal classifications. For example, when studying psychological phenomenon, we usually measure attitude or preference according to several types (strongly disagree, disagree, neutral, support, and strongly support). Also, sometimes people are more willing to convert a continuous variable to an ordinal categorical variable, such as when analyzing drug use and health of cocaine users. Researchers
∗
Correspondence authors. E-mail addresses:
[email protected] (Y. Zhou),
[email protected] (W. Zhu).
http://dx.doi.org/10.1016/j.jtbi.2017.08.007 0022-5193/© 2017 Elsevier Ltd. All rights reserved.
can divide their mental depression into four categories (slight depression, low depression, moderate depression, and high depression) by psychology indicators. These response variables are usually coded by a sequence of integers, e.g., 1, 2, 3, 4, etc., and the categories have explicit sort from low to high, but the gap between neighboring categories is unknown. Thus, an ordinal trait is different from to a quantitative trait or a qualitative trait. It is usually divided into the classification of quantitative traits, but sometime it is also regarded as a qualitative trait to a certain extent because quantitative change can cause a qualitative one. On phenotypic scale, most ordinal traits do not follow normal distribution, and they usually present as binomial or multinomial states. So far, there have been some researches on QTL mapping of ordinal traits, because these researches usually have certain biological significance or economic value. In modern animal breeding, people pay more and more attention to ordinal traits. One reason is that many important economic traits belong to ordinal ones, moreover, many examples have shown that conducting researches on ordinal traits can meet many actual needs. Hackett and Weller (1995) proposed a method to perform genetic mapping of ordinal traits by generalized linear models. Rao and Xu (1998) applied generalized linear model to map ordinal traits in four-way cross populations. Within the framework of generalized linear model, a method based on threshold model of mapping QTL of resistance (ordinal) traits in livestock was simulated, and the efficiency of the method was compared with that of gen-
X. Sheng et al. / Journal of Theoretical Biology 432 (2017) 100–108
eral linear methods (Yin et al., 2005). Xu and Xu (2006) proposed a multivariate model to analyze ordinal traits with the EM algorithm to estimate parameters. Using a unified mixture generalized linear model, Chen and Liu (2009) proposed a multipleinterval mapping method in their hybridization experiments. Recently, Feng et al. (2013) applied an efficient hierarchical generalized linear mixed model to map QTL of ordinal traits in crop cultivars. Besides, Rao and Xia (20 0 0) used mixture threshold model in their mapping of ordinal traits. Spyrides-Cunha et al. (20 0 0) discussed the application of proportional odds model when dealing with ordinal data. Rao and Xia (2001) proposed a novel statistical model aiming at structural heterogeneity problem and promoted the research progress of ordinal traits. Based on a threshold model, Li et al. (2006) proposed a multiple-interval mapping method about ordinal trait. Yi et al. (2007) proposed an ordinal Probit model to map ordinal traits with epistasis. Liu et al. (2009) chose the maximum likelihood interval-mapping method to conduct linkage analysis for gene loci of ordinal traits. Based on the method of composite interval mapping, Jia and Liu (2011) identified the QTL for resistance to rice blast. Statistical methods about QTL detection and ordinal-trait mapping have been proposed and continuously improved, so that the efficiency and accuracy of QTL detection have been gradually improved. However, the detecting accuracy in mapping genetic loci of ordinal traits and the implementation of algorithm still need further research and exploration. In this study, we proposed a new strategy for mapping ordinal traits in the framework of multipleinterval mapping. Through simulation experiments and example analysis, we validate that the new method is qualified to map QTL of ordinal traits and has its advantages. 2. Theory and method In QTL mapping, a necessary step is to find an appropriate model to link trait values with QTL genotypes. For continuous traits, statistical models, such as regression model can be directly used in multiple-interval mapping (MIM). But for ordinal traits, these models are not suitable to use directly. Currently, threshold model has been discussed by more and more researchers. In this model, it is assumed that there is an underlying unobservable response variable for the observable ordinal trait, called liability, which may be continuous, and when it reaches a certain threshold value, the classification type can be determined. Thus, if we want to build a model to describe the relationship between ordinal data and QTL genotypes, firstly we should connect ordinal data with continuous liability via the threshold model, and then link the liability to QTL genotypes by another suitable statistical model.
We assume that the trait we discuss is determined by m0 diallelic QTL which are located in m marker intervals (m0 ≤ m), and each interval contains at most one QTL. If a backcross (BC) population is considered, the basic model of cumulative logistic regression model is
ξ =μ+
m
Assume that there are c observable values for an ordinal-scaled trait in a trial, coded as 1, 2,…, c. Let Y denote the trait value of an individual, which takes value from {1, 2, …, c}, and ξ denote the underlying liability of the individual. The two kinds of variables can be linked by the following threshold model:
αq−1 < ξ ≤ αq ⇔ Y = q; q = 1, 2, . . . , c; α0 = −∞; αc = ∞, (1) where αr (r = 0, 1, 2, . . . , c ) is a set of fixed (unknown) values in ascending order, called thresholds. 2.2. Statistical model In this paper, according to the particularity of ordinal trait, we choose a cumulative logistic regression model as statistical model.
β j GQj +
δrt GrQ GtQ + ε ,
(2)
r
j=1
where ξ is the liability, μ denotes the overall mean of genotypic values, β j represents the main effect of the QTL in the jth marker j
interval, and GQ is the corresponding QTL genotype in the jth inj
terval. We let GQ = 0, if there does not exist QTL within the jth j
interval of an individual; otherwise, GQ can be expressed as
GQj =
1, 2,
f or Q j Q j , f or Q j q j .
δrt (r, t = 1, 2, . . . , m ) denotes the epistatic effect of the QTL within the rth interval and the tth intervals, and GrQ GtQ stands for the interaction between the corresponding loci. ε is the error term of the model. Denote the cumulative probabilities of c ordinal classifications by γ 1 , γ 2 , …, γ c , where γc = 1. Assume ε follows the standard m logistic distribution, so ξ − (μ + β j GQj + δrt GrQ GtQ ) follows r
j=1
the standard logistic distribution. According to the threshold model (1), we have
γk = P (Y ≤ k|GQ ) = P (ξ ≤ αk |GQ ) m exp αk − μ + β j GQj + δrt GrQ GtQ r
(3)
r
j=1
Eq. (3) can be written as a generalized linear model
ln
m γk = αk − μ − β j GQj − δrt GrQ GtQ . 1 − γk r
(4)
For an F2 population, we need to modify the statistical model (2) as
ξ = μ+
m j=1
+
a jx j +
m j=1
(ad )rt xr ut +
r=t
2.1. Threshold model
101
d ju j +
(aa )rt xr xt
r
(dd )rt ur ut + ε ,
(5)
r
where ξ , μ and ε have same explanations with those in model (2); aj and dj are respectively the additive effect and the dominant effect of the QTL in the jth interval in the F2 design; (aa)rt , (ad)rt , (dd )rt (r, t = 1, 2, . . . , m ) are respectively additive × additive, additive × dominant, dominant × dominant epistatic effects between QTL in the rth interval and the tth interval; xj and uj are the corresponding genotype variables for the additive and dominant effects. Define x j = u j = 0, if there is no QTL in the jth interval for an individual; otherwise,
xj =
1, 0, −1,
f or Q j Q j , f or Q j q j , and u j = f or q j q j ,
−1/2, 1/2, −1/2,
f or Q j Q j , f or Q j q j , f or q j q j ,
are defined. Similarly, by combining the threshold model (1), model (5) can be rewritten as the following generalized linear model
102
X. Sheng et al. / Journal of Theoretical Biology 432 (2017) 100–108
E-Step : Given (s) and X, compute the Q-function in EM algorithm, i.e., the conditional expectation of l(|T).
(Li et al., 2006)
ln
m m γk = αk − μ − a jx j − d ju j − (aa )rt xr xt 1 − γk r
−
(ad )rt xr ut −
r=t
(dd )rt ur ut .
Q (|(s ) ) = E[l (|T )|X, (s ) ] =
(6)
n
P (GiQ |Xi , (s ) ) ln P (yi |GiQ )
i=1 GiQ
r
+
n m i=1 j=1
2.3. Joint estimation of parameters Once the threshold model and statistical model of mapping QTL are given, appropriate statistical methods should be chosen to obtain the estimates of model parameters and QTL parameters. Since QTL genotypes are unknown, we need to infer them by the corresponding marker genotypes. In this paper, the popular EM algorithm combined with the Newton-Raphson algorithm (NR-EM, Dempster et al., 1977; Lindstrom and Bates, 1988) is employed to calculate the maximum likelihood estimates (MLEs) of parameters in the framework of multiple-interval mapping. The set of the unknown parameters can be expressed as = (α , β , θ ) , where α = (α1 , . . . , αc−1 ) represents the threshold parameters, β = (μ, β1 , . . . , βm , δ12 , . . . , δm(m−1 ) ) are the effect parameters of QTL based on the underlying liability (including one overall mean, m main effects and p epistatic effects, where 2 theoretically), and θ = (θ , . . . , θ ) are the position pap = Cm m 1 rameters of QTL. The recombination rates can be converted into genetic distances by mapping functions, so in this paper we can treat the recombination rates as position parameters, where θ j , θ Mj and θ 2j respectively represent the recombination rate between the putative QTL and the first marker of the jth interval, the recombination rate between the two markers of the jth interval, and the recombination rate between the putative QTL and the second marker of the jth interval, j = 1, 2, . . . , m. Suppose n backcross individuals are sampled. For the ith individual (i = 1, 2, . . . , n), let yi , GiQ and GiM respectively denote the trait value, the QTL genotypic value, and the marker genotypic value. Thus the likelihood function can be written as
L() =
n
P (yi |GiM ) =
i=1
n
P (yi |GiQ )P (GiQ |GiM ),
(7)
i=1 GiQ
where
P (yi |GiQ ) = P (Y ≤ yi |GiQ ) − P (Y ≤ yi − 1|GiQ ) = F (yi |GiQ ) − F (yi − 1|GiQ ),
Q1 (|(s ) ) =
l (|T ) ∝ ln
n
=
i=1
ln P (
|
yi GiQ
)+
n m
ln P (
GiQj
|
P (GiQ |Xi , (s ) ) ln P (yi |GiQ ),
Q2 (|(s ) ) =
n m
P (GiQj |Xi , (s ) ) ln P (GiQj |GiMj ),
i=1 j=1 Gi j Q
then Eq. (8) becomes
Q (|(s ) ) = Q1 (|(s ) ) + Q2 (|(s ) ).
GiMj
(9)
( | ( s )
Note that Q1 in Eq. (9) depends on threshold parameters and effect parameters, and Q2 (|(s) ) only depends on position parameters. M-Step : Maximize the Q-function Q(|(s) ) to obtain (s+1 ) . Iterative formulae of threshold parameters and effect parameters The (s + 1 )th step estimates of model parameters can be obtained from Q1 (|(s) ). Let i(s ) = P (GiQ |Xi , (s ) ), by Bayesian formula,
i(s) =
P (GiQ , GiM , yi |(s ) )
=
P (GiM , yi |(s ) )
P (GiQ , yi |GiM , (s ) ) P (yi |GiM , (s ) )
P (yi |GiQ , (s ) )P (GiQ |GiM , (s ) ) = . P (yi |GiQ , (s ) )P (GiQ |GiM , (s ) ) GiQ
Further let η = (α1 , . . . , αc−1 , μ, β1 , . . . , βm , δ12 , . . . , δm−1,m ), −1 im zyi = (0, . . . , 1, . . . , 0,−1, −GiQ1 , . . . , −Gim ,−GiQ1 GiQ2 , . . . , −Gi,m GQ ) Q Q (notice that the yi th position is 1), where yi = 1, . . . , c − 1, then
∂ P (yi |GiQ ) = F (yi |GiQ )[1 − F (yi |GiQ )]zyi ∂η −F (yi − 1|GiQ )[1 − F (yi − 1|GiQ )]zyi −1 , where
F(
|
yi GiQ
)=
=
m j=1
1 + exp(αyi − μ − exp(zyi η )
1 + exp(zyi η )
β j GiQj −
m j=1
r
β j GiQj −
δrt GirQ GitQ )
r
δrt GirQ GitQ )
.
Denote the above first-order partial derivative as D iη , then after calculating we have
P (yi |GiQ )P (GiQ |GiM )
i=1 n
n i=1 GiQ
vector GiQ = (GiQ1 , . . . , Gim ), and GiM = (GiM1 , . . . , Gim ). Q M
Let T = {(yi , GiM , GiQ ), i = 1, 2, . . . , n} denote the complete data, then the complete log-likelihood function is as follows,
(8)
where (s) denotes current estimated parameter value, X = (X1 , X2 , . . . , Xn ) is the known data. For the ith individual, Xi includes the trait value yi and the marker genotypic value GiM . In Eq. (8), let
exp(αyi − μ −
and F (yi |GiQ ) is the cumulative distribution function of P (yi |GiQ );
P (GiQj |Xi , (s ) ) ln P (GiQj |GiMj ),
GiQj
).
i=1 j=1
The details of our new method for inferring the parameter vector in the multiple-interval mapping can be described as follows:
n D iη ∂ Q1 (|(s) ) =
i(s) , ∂η P (yi |GiQ ) i=1 i GQ
n Diη P (yi |GiQ ) − D iη D iη ∂ 2 Q1 (|(s) ) =
i(s) , ∂ η∂ η [P (yi |GiQ )]2 i=1 Gi Q
X. Sheng et al. / Journal of Theoretical Biology 432 (2017) 100–108
103
Table 1 The conditional probability of QTL genotype given the genotype of the flanking markers. Marker
QTL genotype (GiQj )
Case
No.
genotype (GiMj )
Qj Qj
Qj qj
I=1
1 2 3 4 1 2 3 4 1 2 3 4
M j Nj /M j Nj M j Nj /M j nj M j Nj /mj Nj M j Nj /mj nj M j Nj /M j Nj M j Nj /M j nj M j Nj /mj Nj M j Nj /mj nj M j Nj /M j Nj M j Nj /M j nj M j Nj /mj Nj M j Nj /mj nj
1
0
0 (1 − θ j )(1 − θ2 j )/(1 − θM j ) (1 − θ j )θ2 j /θM j θ j (1 − θ2 j )/θM j θ j θ2 j /(1 − θM j ) (2 − θ j − θ2 j − θM j )/2(1 − θM j ) (θ2 j − θ j + θM j )/2θM j (θ j − θ2 j + θM j )/2θM j (θ j + θ2 j − θM j )/2(1 − θM j )
θ j θ2 j /(1 − θM j ) θ j (1 − θ2 j )/θM j (1 − θ j )θ2 j /θM j (1 − θ j )(1 − θ2 j )/(1 − θM j ) (θ j + θ2 j − θM j )/2(1 − θM j ) (θ j − θ2 j + θM j )/2θM j (θ2 j − θ j + θM j )/2θM j (2 − θ j − θ2 j − θM j )/2(1 − θM j )
I=0
0
(θM j − θ j )/θM j θ j /θ Mj
where
θ j /θ Mj (θM j − θ j )/θM j
where I
(GiMj =t )
Diη =
yi GiQ
−1 ∂ 2 Q1 (|(s) )
∂ Q1 (|(s) )
−
(s ) . ∂ η∂ η (s) ∂η
η
=η
(s )
by solving the equation we obtain the iterative value
(10) When the Newton iteration converges, we can obtain the (s + 1 )th step estimates of η. Remark: The above estimating process of model parameters is same as that given in Li et al. (2006). However, parameters of QTL positions were not estimated simultaneously in their method. Next, we embed the estimating process of QTL positions into the whole algorithm and jointly estimate all parameters of interest. Iterative formula of position parameters ij ij Obviously, ln P (GQ |GM ) in Q2 (|(s) ) contains recombination rates of the jth interval. Let ω1|i jt = P (GQ = 1|GM = t ), ω2|i jt = ij
ij P ( GQ
=
ij 2|GM
ij
ij = t ), and k(s|i)j = P (GQ = k|Xi , (s ) ), then Q2 (|(s) )
can be written as
Q2 (|(s ) ) =
n m 2 i=1 j=1 k=1
=
n m i=1 j=1
ln ωk|i jt k(s|i)j
ln ω1|i jt 1(s|)i j + ln ω2|i jt 2(s|)i j .
Here we consider three cases of BC population. Case one: assume the double recombination phenomenon with one interval is very rare and can be ignored. In this case, the interference coij efficient I = 1, and θM j = θ j + θ2 j . The values of ωk|i jt = P (GQ = ij k|G M
= t ) (k = 1, 2; t = 1, 2, 3, 4) in this case are listed in the first panel of Table 1. Taking the derivatives of ln ω1|ijt and ln ω2|ijt with respect to parameter θ j , we have
∂ ln ω1|i jt ∂ ω1|i jt /∂ θ j 1 1 = = I ij + I ij , ∂θ j ω1|i jt θ j − θM j (GM =2) θ j (GM =3)
is an indicator function, taking value of 1 or 0. Let
n ∂ Q 2 | ( s ) 1 1 =
1(s|)ij I ij + 1(s|)ij I ij G =2 GM =3 ∂θ j θ − θ θ M j Mj i=1 j 1 (s ) 1 + = 0, +
I
(s) I θ j 2|ij GijM =2 θ j − θMj 2|ij GijM =3
∂ P( | ) = F (yi |GiQ )[1 − F (yi |GiQ )][1 − 2F (yi |GiQ )]zyi zyi ∂ η∂ η −F (yi −1|GiQ )[1−F (yi −1|GiQ )][1−2F (yi −1|GiQ )]zyi −1 zyi −1 . 2
Combining the Newton–Raphson iterative formula, we have (s+1 )
1
(s+1 )
θj
θM j =
n i=1
[I(Gi j =2) 2(s|)i j + I(Gi j =3) 1(s|)i j ] M
M
n 2 3 i=1 k=1 t=2
k(s|i)j I(Gi j =t )
,
θ2(sj+1) = θM j − θ j(s+1) .
M
Case two: assume double recombination can occur within an interval, and two intersecting intervals are independent when crossover occurs (i.e., I = 0). In this case, θ j and θ 2j can be related by a function θ j + θ2 j − 2θ j θ2 j = θM j , and the values of ωk|i jt =
P (GQ = k|GM = t ) (k = 1, 2; t = 1, 2, 3, 4) in this case are presented in the second panel of Table 1. Similarly, take the derivatives ln ω1|ijt and ln ω2|ijt of with respect to parameter θ j , and let ij
ij
I n ∂ Q2 (|(s) ) (GiMj =1 ) +I(GiMj =2 ) (s ) I(GiMj =3 ) +I(GiMj =4 ) (s ) =
1 | i j +
1 | i j ∂θj θj − 1 θj i=1 I(Gi j =1) +I(Gi j =2) I(Gi j =3) +I(Gi j =4) M M + M
2(s|)i j + M
2(s|)i j =0. θj θj − 1 After solving the equation we have n
θ j(s+1) =
i=1
I
ij GM =1
(s) + I 2|ij
ij GM =2
n 2 i=1
θ2(sj+1) =
θMj − θ j(s+1) 1 − 2θ j(
s+1 )
(s) + I
k=1
2|ij
4
ij GM =3
(s) + I 1|ij
ij GM =4
(s )
1|ij
(s ) t=1 k|ij I Gij =t
,
M
.
Case three: assume there exists crossover interference within the marker interval, and 0 < I < 1. In this case, θ j and θ 2j have funcg
j
tion relationship θ j + θ2 j − 2cθ j θ2 j = θM j , where c (= θ θ11 ) is the j 2j j
coincidence coefficient, and g11 represents the joint recombination
rate. At this time, we also calculate all the values of ωk|i jt = P (GQ = ij k|GM = t ) (k = 1, 2; t = 1, 2, 3, 4) as presented in the third panel of Table 1. ij
∂ ln ω2|i jt ∂ ω2|i jt /∂ θ j 1 1 = = I(Gi j =2) + I ij , ∂θ j ω2|i jt θj M θ j − θM j (GM =3)
104
X. Sheng et al. / Journal of Theoretical Biology 432 (2017) 100–108 Table 2 The conditional probability of QTL genotype given the genotype of the flanking markers for F2 population. QTL genotype (GiQj )
Marker Case
No.
genotype (GiMj )
Qj Qj
Qj qj
qj qj
I=1
1 2 3 4 5 6 7 8 9
M j Nj /M j Nj M j Nj /M j nj M j nj /M j nj M j Nj /mj Nj M j Nj /mj nj M j nj /mj nj mj Nj /mj Nj mj Nj /mj nj mj nj /mj nj
1 1− p ( 1 − p )2 p cp(1 − p) 0 p2 0 0
0 p 2 p( 1 − p ) 1− p 1 − 2cp(1 − p) 1− p 2 p( 1 − p ) p 0
0 0 p2 0 cp(1 − p) p ( 1 − p )2 1− p 1
I=0
1
M j Nj /M j Nj
( 1 −θ j ) 2 ( 1 −θ2 j ) 2 ( 1 −θM j ) 2 ( 1 −θ j ) 2 θ2 j ( 1 −θ2 j ) θM j ( 1 −θM j ) (1−θ j )2 θ22j θM2 j θ j (1−θ j )(1−θ2 j )2 θM j ( 1 −θM j ) 2θ j θ2 j (1−θ j )(1−θ2 j ) (1−2θM j +2θM2 j ) θ j (1−θ j )θ22j θM j ( 1 −θM j ) θ j2 (1−θ2 j )2 θM2 j θ j2 θ2 j (1−θ2 j ) θM j ( 1 −θM j ) θ j2 θ22j ( 1 −θM j ) 2
2θ j θ2 j (1−θ j )(1−θ2 j ) ( 1 −θM j ) 2 θ j (1−θ j )(1−2θ2 j +2θ22j )
θ j2 θ22j ( 1 −θM j ) 2 2 θ j θ2 j ( 1 −θ2 j ) θM j ( 1 −θM j ) θ j2 (1−θ2 j )2 θM2 j θ j (1−θ j )θ22j θM j ( 1 −θM j ) 2θ j θ2 j (1−θ j )(1−θ2 j ) (1−2θM j +2θM2 j ) θ j (1−θ j )(1−θ2 j )2 θM j ( 1 −θM j ) (1−θ j )2 θ22j θM2 j ( 1 −θ j ) 2 θ2 j ( 1 −θ2 j ) θM j ( 1 −θM j ) ( 1 −θ j ) 2 ( 1 −θ2 j ) 2 ( 1 −θM j ) 2
2
M j Nj /M j nj
3
M j nj /M j nj
4
M j Nj /mj Nj
5
M j Nj /mj nj
6
M j nj /mj nj
7
mj Nj /mj Nj
8
mj Nj /mj nj
9
mj nj /mj nj
θM j ( 1 −θM j ) 2θ j θ2 j (1−θ j )(1−θ2 j )
θM2 j (1−2θ j +2θ j2 )θ2 j (1−θ2 j ) θM j ( 1 −θM j ) (1−2θ j +2θ j2 )(1−2θ2 j +2θ22j ) (1−2θM j +2θM2 j ) (1−2θ j +2θ j2 )θ2 j (1−θ2 j ) θM j ( 1 −θM j ) 2θ j θ2 j (1−θ j )(1−θ2 j ) θM2 j θ j (1−θ j )(1−2θ2 j +2θ22j ) θM j ( 1 −θM j ) 2θ j θ2 j (1−θ j )(1−θ2 j ) ( 1 −θM j ) 2
2 2 Note: p = θ j /θM j , c = θM ). / ( 1 − 2 θ M j + 2 θM j j
Repeat the above process of taking derivatives and let
∂ Q2 =
(|(s) )
∂θj
n
I(Gi j =1) 1(s|)i j + I(Gi j =4) 2(s|)i j M
i=1
+
M
θ j + θ2 j + θM j − 2
I(Gi j =3) 1(s|)i j + I(Gi j =2) 2(s|)i j M
M
θ j − θ2 j + θM j
+
+
n
I(Gi j =1) 1(s|)i j + I(Gi j =4) 2(s|)i j M
i=1
+
M
θ j + θ2 j + θM j − 2
I(Gi j =3) 1(s|)i j + I(Gi j =2) 2(s|)i j M
M
θ2 j − θ j − θM j
+
M
M
θ j − θ2 j − θM j
I(Gi j =4) 1(s|)i j + I(Gi j =1) 2(s|)i j M
∂ Q2 (|(s) ) ∂ θ2 j =
I(Gi j =2) 1(s|)i j + I(Gi j =3) 2(s|)i j
+
M
θ j + θ2 j − θM j
=0,
I(Gi j =2) 1(s|)i j + I(Gi j =3) 2(s|)i j M
M
θ2 j − θ j + θM j
I(Gi j =4) 1(s|)i j + I(Gi j =1) 2(s|)i j M
M
θ j + θ2 j − θM j
=0.
By combining the above two equations together, we can obtain an equation group of θ j and θ 2j . Since the equation is not linear, it is difficult to obtain an explicit expression of the solution, and here we can employ the Newton-Raphson method to solve it and get the approximate solutions of the recombination rates. For F2 population, the conditional probabilities of QTL genotypes given the genotypes of the flanking markers are listed in Table 2. The specific process of parameter estimation for F2 population is similar to the case of backcross, and here we does not describe it in detail. 3. Simulation studies In this section, simulations are performed to evaluate the feasibility and effectiveness of the proposed method in the paper, and compare the performance of it with that of an existing method. Backcross population is firstly used in the simulations. We assume that an ordinal trait (taking values 1, 2 and 3) are contributed by two putative QTL on a single chromosome. We take
10 cM as the true values of the genetic distances of two marker intervals (i.e., I = 1). We assume the first QTL is located in the interval formed by marker M1 and N1 , and the distance between the QTL and M1 is 3 cM; the second QTL is in the interval formed by marker M2 and N2 , and the distance between the QTL and M2 is 6 cM. We can convert the map distance into the recombination rate by Morgan mapping function, thus the recombination rate θ M between the two flanking markers of each interval is 0.10, the recombination rate θ 1 between marker M1 and the QTL in the first interval is 0.03, and the recombination rate θ 2 between marker M2 and the QTL in the second interval is 0.06. The detailed process of generating data is as follows: Firstly, simulate the marker genotypes of the backcross offsprings given the parental genotypes; Secondly, generate the QTL genotypes within the two intervals of the offsprings according to the conditional probabilities of QTL genotype given the genotype of the flanking markers (see Table 1); Thirdly, simulate the ordinal trait values according to the threshold and statistical models with the assumed values of parameters (see Eqs. (1) and (2)); Finally, estimate the parameters by the proposed algorithm and the MIM method (Li et al., 2006) based on the data that we have generated in the first three steps. Different sample sizes (n=30 0, 50 0) and heritabilities (h2 = 0.1, 0.3) are considered in the simulations. The true value of QTL effects and thresholds can be chosen according to h2 . For each case, we repeated the experiment for 10 0 0 times to test the accuracy of the estimates obtained by the two methods, and we use the mean of estimates over 10 0 0 replications corresponding to√each parameter and the square root of the mean square error ( MSE ) of the estimates as the MLE of each parameter and criterion to measure the accuracy of parameter estimation, respectively.√ Tables 3–5 respectively contain the MLEs and MSE of all parameters obtained by the two methods under different sample sizes and heritabilities (see the last two columns). For convenience of observation, the parameters in the three tables are divided into three groups according to different parameter categories: the first group consists of position parameters (θ 1 , θ 21 , θ 2 , and θ 22 ); the second group consists of threshold parameters (α 1 and α 2 ); and
X. Sheng et al. / Journal of Theoretical Biology 432 (2017) 100–108
105
Table 3 Simulation results by the proposed method and the MIM for the BC samples of 300 individuals (h2 = 0.3).
Group
Parameter
True value
MLE
MIM √ MSE
Proposed method √ MLE MSE
QTL position
θ1 θ 21 θ2 θ 22 α1 α2 μ β1 β2 δ 12
0.030 0.070 0.060 0.040 2.70 0 0 4.8600 0 0.5846 0.8717 0.6381
0.0413 0.0587 0.0598 0.0402 – – – −0.4673 −0.7086 –
0.0367 0.0367 0.0206 0.0206 – – – 1.0809 1.5979 –
0.0345 0.0655 0.0603 0.0397 2.6597 4.8421 −0.1118 0.5874 0.9323 0.6300
Threshold QTL effect
0.0153 0.0153 0.0148 0.0148 0.6936 0.6769 1.3580 1.5297 1.5510 0.9916
Note: θ21 = θ − θ1 , θ22 = θ − θ2 , and θ = 0.1; ‘–’ denotes the corresponding estimates can not be obtained by the MIM method. Table 4 Simulation results by the proposed method and the MIM for the BC samples of 500 individuals (h2 = 0.3). MIM Group
Parameter
True value
MLE
QTL position
θ1 θ 21 θ2 θ 22 α1 α2 μ β1 β2 δ 12
0.030 0.070 0.060 0.040 2.70 0 0 4.8600 0 0.5846 0.8717 0.6381
0.0369 0.0631 0.0708 0.0292 – – – −0.6607 −0.5112 –
Threshold QTL effect
√
MSE
0.0234 0.0234 0.0244 0.0244 – – – 1.2632 1.4021 –
Proposed method √ MSE
MLE
0.0343 0.0657 0.0575 0.0425 2.6256 4.8175 −0.0531 0.5785 0.8678 0.6408
0.0090 0.0090 0.0106 0.0106 0.4447 0.4419 0.8590 0.9593 1.0663 0.6481
Note: θ21 = θ − θ1 , θ22 = θ − θ2 , and θ = 0.1.
the third group includes effect parameters (overall mean μ, main effects β 1 and β 2 , and epistatic effect δ 12 ). The true values of parameters are presented in the third column of the three tables. From Table 3, we can see that the proposed method in this paper can well provide the MLE of each parameter, and the corresponding estimating accuracy is satisfactory. The simulation results of the MIM method are implemented in Windows QTL Cartographer (http://statgen.ncsu.edu/qtlcart/). The estimating results of QTL positions by the MIM method are near to the corresponding true values, whereas compared with the true values, the two estimates of QTL effects all show opposite signs. By contrast, the proposed method has higher estimating accuracy, since almost all √ the MSE of the estimates by the proposed method are less than those by the MIM method, except the one of β 1 . The estimates of the thresholds, the overall mean, and the epistatic effect can not provided by the Windows QTL Cartographer, therefore no comparing results about these estimates are given. The main reason that the proposed method has better accuracy for estimating model parameters is that we treat the recombination rates between the putative QTL and the flanking marker within each interval as unknown parameters, and embed them into the whole algorithm to get the estimates simultaneously with the other model parameters. The results obtained by this method are more simple and direct, and therefore have a higher accuracy. However, in most other existing studies, the position of QTL is mostly determined by scanning a genome-wide region in a linkage group to establish a log-likelihood ratio (LR) graph to find the maximum of LR, which may make these methods suffer from some latent accuracy loss. As expected, when the heritability is fixed at 0.3 and the sample size increases from 300 to 500, the accuracy of all parameter
estimates improves significantly too (Tables 3 and 4), and the estimating accuracy of all parameters obtained by the new method are higher than those by the MIM method when the sample size n = 500. Therefore, the QTL parameters can be estimated more precisely with the increase of sample size, and the advantage of the new method becomes more apparent. The heritability measures the proportion of the variation explained by genetic factors. We also conducted additional simulation when heritability h2 = 0.1 and sample size n = 500 (See Table 5 for the results). Comparing with Table 4, it is not hard to find the fact that the greater the heritability is, the parameter estimates obtained by the proposed method are better, which is in accordance with the actual cases. At the same time, even for the small heritability, the proposed method can still provide more accurate estimates than the existing method. To better evaluate the powers of detecting QTL by the new method, we also computed the false positive rates (FPRs) of detecting QTL over the 10 0 0 simulation replicates. Here the FPR is defined as the proportion that the estimated positions are located outside the interval of the true QTL position ± 3SD. In the above three simulation cases, the new method’s the false positive rates of detecting QTL1 are 0.031, 0.020 and 0.045, respectively; and those of detecting QTL2 are only 0.024, 0.016 and 0.032, respectively, which is satisfactory in the practice of QTL mapping. However, the FPRs of the MIM method can hardly be controlled, which further shows the advantage of the new method. Besides, F2 population is also considered in our simulations in order to evaluate the proposed method. For this case, we consider three threshold values in the threshold model and five QTL effect parameters in statistical model. The true values of all parameters and the corresponding computing results are presented in Table 6.
106
X. Sheng et al. / Journal of Theoretical Biology 432 (2017) 100–108 Table 5 Simulation results by the proposed method and the MIM for the BC samples of 500 individuals (h2 = 0.1).
Group
Parameter
True value
MLE
MIM √ MSE
Proposed method √ MLE MSE
QTL position
θ1 θ 21 θ2 θ 22 α1 α2 μ β1 β2 δ 12
0.030 0.070 0.060 0.040 1.4100 2.9100 0 0.4235 0.5816 0.2605
0.0134 0.0866 0.0540 0.0460 – – – −0.3815 −0.5837 –
0.0291 0.0291 0.0245 0.0245 – – – 0.8074 1.1741 –
0.0335 0.0665 0.0543 0.0457 1.4257 3.0203 0.0839 0.3066 0.5869 0.3075
Threshold QTL effect
0.0072 0.0072 0.0122 0.0122 0.5388 0.5413 1.0726 1.2889 1.1520 0.7726
Note: θ21 = θ − θ1 , θ22 = θ − θ2 , and θ = 0.1. Table 6 Simulation results by the proposed method and the MIM for the F2 samples of 500 individuals (h2 = 0.3).
Group
Parameter
True value
MLE
MIM √ MSE
Proposed method √ MLE MSE
QTL position
θ1 θ 21 θ2 θ 22 α1 α2 α3 μ
0.035 0.065 0.045 0.055 −1.8500 0 1.4700 0 0.50 0 0 1.1500 0.30 0 0 1.10 0 0
0.0081 0.0919 0.0350 0.0650 – – – – 0.1692 0.6204 0.1295 0.5064
0.0340 0.0340 0.0157 0.0157 – – – – 0.3667 0.5514 0.2151 0.6138
0.0327 0.0673 0.0390 0.0610 −1.4096 −0.0094 1.4407 −0.0706 0.5944 1.3107 0.4798 1.1333
Threshold
QTL effect
a1 a2 d1 d2
0.0058 0.0058 0.0120 0.0120 0.5009 0.2104 0.1648 0.1845 0.1689 0.1878 0.2701 0.1132
Note: θ21 = θ − θ1 , θ22 = θ − θ2 , and θ = 0.1.
It is clear to find that the accuracy advantage of the proposed method still holds in the case of F2 population. Comparing with the BC’s case, the difference is that the signs of the estimated QTL effects coincide with the corresponding true values for the MIM method. All in all, from the estimating results of QTL mapping for ordinal trait, we conclude that the proposed method is feasible, meaningful, and owns its advantages in estimating accuracy.
4. Real example Here we further tested the feasibility of using the proposed method to estimate the main and epistatic effects of ordinal trait loci. The well-known data set of colitis severity in IL-10-deficient mice (Farmer et al., 2001) was used for demonstration. In the data set, F1 offspring of IL-10-deficient mice were intercrossed to generate 203 C3H.Il10−/− × B6.Il10−/− F2 mice (102 females and 101 males) and 207 B6.Il10−/− × C3H.Il10−/− F2 mice (103 females and 104 males). These 410 mice are the experimental population. Mice were maintained in a humidity, temperature, and light cycle controlled vivarium under specific pathogen-free conditions. At 6 weeks of age, mice were necropsied for tissue collection and analyzed for various phenotypes positively correlated with the progression of colitis, including cecum severity, cecum hyperplasia, cecum ulceration and the percentage of area involved. The first three phenotypes are all ordinal traits (0=normal, 1=mild, 2=moderate, and 3=severe), and here we mainly focus on the trait of cecum hyperplasia. We utilize seven marker loci (D3Mit203, D3Mit212, D3Mit348 and D3Mit257 on chromosome 3; D18Mit119, D18Mit124 and D18Mit7 on chromosome 18) to detect the nearby QTL that may affect cecum hyperplasia.
4.1. Model testing Firstly, we use software SAS to perform the cumulative logistic regression analysis, to test the applicability of the model established in this paper. If the model is suitable for the real data, we should analyze whether there are markers which have significant effects on the trait of cecum hyperplasia, and further determine the latent QTL by simultaneously estimating the recombination rates and the main/epistatic effects of QTL. In the SAS output results, the value of χ 2 is 34.7390, and the pvalue is 0.5285 (not significant), so we can not reject the assumption of proportionality, that is to say we can use the cumulative logistic regression model to analyze this group of data. Secondly, from the output results we find there are some marker (or QTL) loci that significantly affect the cecum hyperplasia of mice among the considered marker loci. At the significance level of 0.05, both the markers D3Mit348 (61.8 cM) and D3Mit257 (70.3 cM) have significant additive and dominant effects. Despite the SAS output results do not show significant main effects for loci on Chr. 18, significant epistatic effect between marker D18Mit7 on Chr. 18 and marker D3Mit348 on Chr. 3 is detected, which shows that the interaction between these two markers has significant impact on the trait of cecum hyperplasia. Thus we can infer that there exists colitis-induced QTL on Chr. 18, which is in accordance with the conclusion in the literature of Farmer et al. (2001).
4.2. Joint estimation of parameters Next, with the proposed algorithm in the paper we further infer that the QTL on Chr. 3 by constructing marker interval consisted of
X. Sheng et al. / Journal of Theoretical Biology 432 (2017) 100–108 Table 7 The estimated results of model parameters and QTL parameters. Category
Parameter
MLE
QTL Position
θ1 θ 21 θ2 θ 22 α1 α2 α3 μ
0.1616 0.1538 0.1910 0.0902 1.5118 4.5143 7.6148 14.9452 5.4928 5.4903 4.9259 5.2151 3.3277 3.3079 4.2342 3.4650
Threshold
QTL effect
a1 a2 d1 d2 (aa)12 (dd)12 (ad)12 (ad)21
D3Mit189 (49.7 cM) and D3Mit19 (87.6 cM), and the QTL on Chr. 18 by marker interval consisted of D18Mit119 (16 cM) and D18Mit7 (50 cM). Since the selected marker interval is not short, it can be considered that this example conforms with the situation I=0. Using the proposed method, we simultaneously estimate both model parameters and QTL parameters. The estimated results are shown in Table 7, where the recombination rate θ 1 between QTL and D3Mit189 (49.7 cM) on Chr. 3 is 0.1616, which can be converted into map distance by Haldane mapping function, that is to say the distance between QTL and D3Mit189 is 19.52 cM. So this QTL is located at 69.22 cM of the genetic map in Chr. 3, which is within the range 65 to 75 cM given in literature (Farmer et al., 2001), and the QTL position detected by the proposed method is a more detailed point estimate. Similarly, the recombination rate θ 2 between the other QTL and D18Mit119 (16 cM) on Chr. 18 is 0.1910, by Haldane mapping function we can conclude that the distance between the QTL and D18Mit119 is 24.06 cM, so this QTL is located at 40.06 cM of Chr. 18, while this QTL position is not reported in the literature of Farmer et al. (2001). 5. Conclusion and discussion In this paper, we mainly discussed problem of QTL mapping for ordinal traits in the framework of multiple-interval mapping. According to the characteristics of the ordinal traits, we selected the cumulative logistic regression model as the main model to establish the relationship between the ordinal trait and QTL genotypes. To make the multiple-interval mapping more simple and direct, instead of the genome-wide scanning method, we consider the recombination rates as unknown parameters and estimate them along with other model parameters by the NR-EM algorithm. Then the recombination rates can be converted into map distances by selecting suitable mapping functions according to the interference coefficient, so that we can realize the interval mapping of QTL. It is worth mentioning that different interference coefficients will affect the estimates of recombination rates, therefore, the interference coefficient is divided into three cases in this paper (I = 1, I = 0 and 0 < I < 1). In each case the recombination rates can be estimated by the proposed method, therefore the statistical inference of this paper is more general. From the simulation studies we carried out in the paper, the proposed method shows its superiority. Although only the case of I = 1 is mainly considered in the simulations, in fact, cases of I = 0 and 0 < I < 1 have similar simulation results and we does not describe in detail. Through comparing the MLEs and the mean square
107
errors in the simulation results, we conclude that the estimates obtained by the new method are satisfactory, and we verify the expected fact that the larger the sample size or the heritability is, the accuracy of estimating parameters is higher. In practice, we suggest that multiple short (i.e., ≤ 10 cM) marker intervals are utilized in the QTL mapping, since the relationship of the recombination rates within each marker interval is more simple at this time, and more accurate estimating results can be obtained. In addition, we also perform an example analysis. By the test of proportionality hypothesis, we conclude that the cumulative logistic regression model is reasonable to analyze the actual data. We adopt the proposed method to simultaneously estimate the QTL positions, QTL effects and other model parameters. The results show that, for the considered real set data, there exist some loci affecting the considered ordinal trait. What’s more, the estimated QTL positions are in accordance with the results in existing literature (Farmer et al., 2001), but more accurate, which fully demonstrate and validate the feasibility and effectiveness of the new method. In this paper, we mainly focus on the estimation problem of parameter vector . In fact, after obtaining the estimate of , we can further discuss whether there significantly exist QTL in the considered marker intervals. Let H0 : β1 = β2 · · · = βm = 0 v.s. H1 : at least one interval has QTL with main effect for BC population. ˆ L() ˜ ] can be used to The likelihood ratio statistic LRT = Ln[L() test the hypotheses. At this time, a threshold need to be chosen, which has significant influence on factual mapping. We suggest that permutation is used to choose the threshold (Churchill and Doerge, 1994), which can promise the accuracy of significant tests, although it is computation-intensive and time-consuming. Of course, the research on genetic mapping for ordinal trait loci in this paper needs to be further explored, and there still exist some complicated issues in the method, e.g., the selection of marker intervals and the number m. The number is restricted by some practical factors, such as the number c of ordinal trait values, sample size n, and so on. In the statistical model for BC population, there are totally c + m(m2+3 ) parameters. With the increase of m, the number of parameters increases rapidly, but it must be less than the sample size n in the framework of the EM algorithm. At the same time, the likelihood function is a finite logistic mixture and it will become increasingly difficult in maximization when the number m of QTL fitted into the model increases (Li et al., 2006). Here we recommend that Ma et al.’s idea (Ma et al., 2011) can be adopted to select the marker intervals which are to be used in the proposed method, i.e., first determine the candidate markers with information by advanced statistical methods (e.g., Lasso (Tibshirani, 1996); DS method (Candes and Tao, 2007)), thus a smaller number of makers can be taken (i.e., 4–10) if there is weak linkage disequilibrium between adjacent markers; and the second step is to construct intervals by the selected candidate markers. Besides, we did not consider the effect of population stratification and environmental factors on the mapping of ordinal trait. It is necessary for us to further consider these issues in the future.
Acknowledgments This research was supported by the Natural Science Foundation of Heilongjiang Province (A201207), the Science and Technology Innovation Team in Higher Education Institutions of Heilongjiang Province (no. 2014TD005).
108
X. Sheng et al. / Journal of Theoretical Biology 432 (2017) 100–108
References Candes, E., Tao, T., 2007. The dantzig selector: statistical estimation when p is much larger than n. Ann. Stat. 35, 2313–2351. Chen, Z.H., Liu, J.B., 2009. Mixture generalized linear models for multiple interval mapping of quantitative trait loci in experimental crosses. Biometrics 65, 470–477. Churchill, G.A., Doerge, R.W., 1994. Empirical threshold values for quantitative trait mapping. Genetics 138, 963–971. Dempster, A.P., Laird, N.M., Rubin, D.B., 1977. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39, 1–38. Farmer, M.A., Sundberg, J.P., Bristol, I.J., Churchill, G.A., Li, R.H., Elson, C.O., Leiter, E.H., 2001. A major quantitative trait locus on chromosome 3 controls colitis severity in IL-10-deficient mice. In: Proceedings of the National Academy of Sciences of the United States of America, 98, pp. 13820–13825. Feng, J.Y., Zhang, J., Zhang, W.J., Wang, S.B., Han, S.F., Zhang, Y.M., 2013. An efficient hierarchical generalized linear mixed model for mapping QTL of ordinal traits in crop cultivars. PLoS ONE 8, e59541. Hackett, C.A., Weller, J.I., 1995. Genetic mapping of quantitative trait loci for traits with ordinal distributions. Biometrics 51, 1252–1263. Haley, C.S., Knott, S.A., 1992. A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69, 315–324. Jansen, R.C., 1993. Interval mapping of multiple quantitative trait loci. Genetics 135, 205–211. Jia, Y., Liu, G., 2011. Mapping quantitative trait loci for resistance to rice blast. Phytopathology 101, 176–181. Kao, C.H., Zeng, Z.B., Teasdale, R.D., 1999. Multiple interval mapping for quantitative trait loci. Genetics 152, 1203–1216. Lander, E.S., Botstein, D., 1989. Mapping mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121, 185–199.
Li, J., Wang, S.C., Zeng, Z.B., 2006. Multiple-interval mapping for ordinal traits. Genetics 173, 1649–1663. Lindstrom, M.J., Bates, D.M., 1988. Newton-raphson and EM algorithms for linear mixed-effects models for repeated-measures data. J. Am. Stat. Assoc. 83, 1014–1022. Liu, F.L., Li, R.J., Liu, N., Sun, J.H., 2009. Linkage analysis between marker and QTL for threshold traits. J. Qingdao Agric. Univ. (Nat. Sci.) 26, 113–118. Ma, W., Zhou, Y., Zhang, S., 2011. A two-step method for estimating QTL effects and positions in multi-marker analysis. Genet. Res. 93, 115–124. Rao, S., Xia, L., 20 0 0. Strategies for genetic mapping of categorical traits. Genetics 109, 183–197. Rao, S., Xia, L., 2001. Mapping quantitative trait loci (QTLs) of ordinal traits using a structural heteroskedastic threshold model: I theory. Mathematica Applicata 14, 46–51. Rao, S., Xu, S., 1998. Mapping quantitative trait loci for ordered categorical traits in four-way crosses. Heredity 81, 214–224. Spyrides-Cunha, M.H., Demetrio, C.G., Camargo, L.E.A., 20 0 0. Proportional odds model applied to mapping of disease resistance genes in plants. Genet. Mol. Biol. 23, 223–227. Tibshirani, R., 1996. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288. Xu, S., Xu, C., 2006. A multivariate model for ordinal trait analysis. Heredity 97, 409–417. Yi, N., Banerjee, S., Pomp, D., Yandel, S., 2007. Bayesian mapping of genome-wide interacting quantitative trait loci for ordinal traits. Genetics 176, 1855–1864. Yin, Z.J., Zhang, Q., Zhang, J.G., Ding, X.D., 2005. Mapping quantitative trait loci for ordinal traits of disease resistance using generalized linear method. Acta Veterinaria et Zootechnica Sinica 36, 1241–1246. Zeng, Z.B., 1994. Precision mapping of quantitative trait loci. Genetics 136, 1457–1468.