Math1 Compat. Modelling Printed in Great Britain.
THE
Vol. 15, No. 6, pp. 43-50, All rights reserved
1991 Copyright@
0895-7177191 $3.00 + 0.00 1991 Pergamon Press plc
IDENTIFICATION OF AN INHERITANCE AND ITS APPLICATION CHEN
JIAN,
ZHENG
Systems Engineering
WEIMIN
AND WANG
Lab, School of Economics
Tsinghua University,
Beijing, 100084,
(Received
July
MODEL
YONGXIAN
and Management P. Ft. China
1990)
Abstract-The inheritance of a quantitative character is very complex. There is still lack of a powerful method to deal with it. In this paper, an inheritance model of a quantitative character is built by means of effective factors and the method of parameter identification is also presented. It is found that the model is much better than Wright’s formula. The statistical analysis also shows that the model is reliable. It is then applied to rice breeding. The predicting accuracy and the main inheritance feature patterns of each character are satisfactory.
1. INTRODUCTION To study quantitative characters, the geneticists are usually confronted with the situation that the data are available only for the parents, F2 and/or Fl populations, which is the case at the early stage of the experiment. In addition, the statistical procedures for its analysis are not efficient. Today, crop breeding is mainly operated by experienced breeders, so it would take a long period to obtain a new variety. In order to improve the efficiency of crop breeding, it’s necessary to explore the possibility of developing some effective methods for analyzing quantitative inheritance. In studies of quantitative inheritance, an effective factor [1,2] was employed, which represents the effect of mass genes in the same chromosome. From the earlier definition, the effective factor can only represent the additive effect of one gene. But research on quantitative characters shows that besides an additive effect there is a dominance effect and an epistatic effect, which are the interactions of genes in a locus and among loci, respectively. The distribution forms of characters are related to various effects among genes. To analyze a quantitative character in detail, the dominance and epistatic effects among genes should not be neglected. In this paper, we extend the concept of the effective factor so that three main effects of genes will be considered. With the effective factor, an inheritance model that describes the distribution of a second filial generation (Fz) of a quantitative character is established. The parameters of the model are estimated by curve fitting and some discussion about the estimation method is also presented. Then, it will be compared with Wright’s formula [1,2]. To examine the reliability of the model, statistical analysis is given. Finally, it will be applied to rice breeding and seven characters will be discussed. 2. DISTRIBUTION
MODEL
OF QUANTITATIVE
INHERITANCE
For quantitative characters, there are two types of random variables concerned: one is of a discrete nature, to be associated with the segregation of effective factors, another is of a continuous nature, to be associated with random disturbances or environmental effects, which are usually assumed to be normally distributed. In general, the phenotypic value Y of a quantitative character can be expressed as [2]: Y=G+E, (1) This work is supported
by the National Natural Science Foundation
of China, Grant No. 6864009. Typeset
43
by A&-T@
44
C. JIAN et al.
where G is the genotypic value of a quantitative character, E is the environmental with a normal distribution N(0, cr2), and G and E act independently. According to the different action-types of genes, we can divide G into three parts:
deviation,
G=A+D+I
(2)
where A denotes the additive value, D denotes the dominance deviation, and I denotes the epistatic deviation. A, D and I axe the macro expression of effective factors. We assume that there are N loci segregated independently and there are only two alleles at each locus, say Ai and a, for the ith locus. Then, equation (1) can be rewritten as: ’
=
21 +
where xi is the discrete random variable i= 1,2 ,..., N. Furthermore, we make two additional
x2 +
***+
associated
2N
+
E,
(3)
with the segregation
of each effective
factor,
basic assumptions:
lo. The genotypic values of different loci are identical [2]. 2”. Parents (PI and P 2) are pure lines, and the concentration of the genes among them is said to be complete if the genotypes of PI and P2 are given, respectively, by alala2u2 ..,C,NaN and AlAlAzAz . . . ANAN [2]. With these assumptions, we classify the N effective factors different action-types of genes, as follows: Y =
into five classes that
I&x(‘) + IC2z (‘1+ IC3x(3) + IC4d4)+ I@)
represent
+ Et
the
(4
where K1 is the number of effective factors having only additive effect; I<2 is the number of effective factors having both additive and positive dominance effects, I<3 is the number of effective factors having both additive and negative dominance effects; 2K4, 2K5 denote the number of effective factors having additive effect and dominant epistatic effect, and additive effect and recessive epistatic effect, respectively; and c (‘1 is the random variable associated with the segregation of an effective factor in class Ki, i = 1,2,3,4,5. Then, the genotypic value G can be expressed as: G =
Klx(‘) + Ic-~x(~) + IC3x(3) + K4d4) + I&X/(~).
Based on the different effect of each kind of effective factor, I<~z(~) (i = 1,2,3,4,5) is as follows: (1)
The probability function of I<1 effective ICI = 1, the distribution table of F2 is: aa
genotype
Written
as a probability
factors
that
function
have only additive
effect.
of
Letting
AA
Aa
function: 2! Pl(iyj)
where j = 0,1,2.
the probability
(5)
=
j!(2-j)!
1” 02
’
~
(6)
Identification
Extended
to the general
of an inheritance model
45
case:
PQ(Xj)
(22(i)!
=
j!(2Ki
1 2K1
-j)!
’
05
where j=O,l,2 ,..., 2Kr. (2) The probability function of Kz effective factors that have both additive and positive dominance effects (gene A is dominant). Letting 1<2 = 1, the distribution table of F2 is: genotype
The probability
function
I aa
(Aa, AA)
state
X0
Xl
state value
0
EFF
probability
114
314
can be written
as: j
PQ(Xj)
where j = 0,l. Extended to the general
=
j!(l-j)!1
1-j
0043
41
(8)
’
case:
PrZ(xj)
=
3
1-2 ! j!(Kz-
j
1
00 4
Z
where j=O,l,Z ,..., Ka. (3) The probability function of Ks effective factors dominance effects. Similar to (2), we have:
that
PT3(Xi)
=
j)!
.
I(s! i! (I(3
-
Ka-j
’
have both
(2)Ks-i(;)i,
z)!
additive
and
negative
(10)
4
wherei=0,1,2 ,..., Ks. (4) The probability function of 211’4 effective factors that have an additive effect and the dominance epistatic effect (the dominant gene A will cover the effect of genes B and b). Letting 1<4 = 1, the distribution table of F2 is : genotype
The probability
PT4(Xi)
Xl
EFF
2EFF
probability
l/16
3116
314
1 .
k j,=.
jl!.h!
case, it has:
,...,
1
is:
j,=O
2jltjz=i, jltjZ
x2
x0 0
k j,=.
jl=O
(A-B-,A-bb)
state
Zjl+ja=i, jltj251
where i = 0,1,2. In the general
aaB_
state value
function
=
aabb
21id.
jl!j2!
(1 -
31 -
. 32)!
(?) 4
j’
(3J
(lJjy
(11)
46
C. JIAN ef 01.
(5) The probability function recessive epistatic effect.
of 2K5 effective factors Similar to (4), we have:
K5! jlZO
jz=o
jl!j2!
(I(s
-
,
.
31 -
that
have an additive
effect
and the
(~)K._jl_j,(~)j,(~)jl,
(13)
4
32y
2jl+h=i, jl+hSKs
where i = 0,1,2,. . . ,2K5. With
a discrete
convolution,
the probability
PrG(ey xi) = PrlCxil)@Pr2(Xi,) =~~~~~
function
of genotypic
$ pT3(Xis) $ pr4(Xi,) f4F
$
value G will be:
pr5(Xi,)
52
(14)
.
la=0 i3=0 id=0 il+io+i3+il+il=i
ll=O
I
if,=0
jl=O
ja=O
2jl-!=ja=i4, jl-l-ia
l1=0
IQ=0
211+la=i~, 11+rz
(2Ki)!K2!
il!(2Kl
- il)! i2!(K2 - i2)!
i3!(K3
- i3)!jl!j2!(1<4 - jl - j2)!ll!l2!(1<5 -
Kl+Ka+i.-il-il-i.(~)ja+iz(~)K.+ilfjl-ir
(;)il
11 - 12)!
(A)Kd-jl-j21),
[(;)2K1(t)
where i = 0, 1,2,. . . , M; CDis the symbol of discrete convolution; A4 = 21(1+1(2+1(3+21C4+21(5; Xi = min(pl ,p2) + IPI - pz 1 x i/M; pl, p2 are the phenotype values of two parents; and 0 = [KI, Kz, Ks, K4, KalT is a parameter vector. Furthermore, the probability density function of G can be written as
(15) i=O
where
Then, we can derive the probability density F2 population) by means of the convolution,
With
the properties
of the 6 function,
.h(e$)
=
function
of Y (which describes
the distribution
of
we have:
5 prG(e,xi) i=o
3. PARAMETER
-&e-w, i?u
(17)
ESTIMATION
If all parameters { Ii1 , Ii’z, K3, I(4, I-5, u2, pl , ~2) in the model (17) are known, it can be used to analyze the genetic process of crossbreeding. When the parameters are unknown, they can be
Identification of an inheritance model
47
estimated. Before identifying 8, let’s first estimate the character values of parents and environmental variance based on the populations of PI, P2 and PI: *,I (18-a)
$1 =-$&Q, PI i=l
(18-b)
ba=;
1A”” Pl
2 Xi,Pl
CC r=l
-A
1 +m
>
P2
2
NP, jcjpz -lj2
c( j=l
’
> (18-c)
where NpI, NPa, NpI are the numbers of observations of the parents (PI, P2) and the Fl populations, respectively; Xi,p, , Xj,p2 and XkJ, are the ith, jth and kth observed values of the parents and the Fl populations, respectively. For identifying the parameters of the model, that is, to estimate ICI, K2, Ks, & and Ks in equation (17), compare the experimental distribution with model (17) and choose a set of parameters that will make the two distributions as close as possible. For curve fitting, it is necessary to define a loss function, V(O, x), which measures the difference between the model and the observed population distributions, as follows:
V(O,x) =Jco _m[fF#,X)
where
fF,(‘)
and
spectively. The parameter
fh2
(.)
are the theoretical
-
fk2(X>1”
and experimental
dX>
(19)
probability
density functions,
re-
vector is determined by: V(J, x) = mjn V(0, x).
To simpify the computation,
(20)
equation (20) can be approximated
FF2(e, Xj) - FF,(e, "j-l) Axj
1 -
[
Ff,,(xj)
as:
- Fgxj-1 Axj
(21)
where
FF2(e,X) = F;Jx)=
t
J -co
fFdk
Y)dY,
J_-03 = f&(Y) dY,
NT is the number of intervals, Axj = xj - d-l is the length of the j-th interval. With probability theory, equation (21) will approach equation (20) as NT + co. Then, we give two properites about statistics. PROPERTY 1. jl,
fi2 and ti2 derived from equation (18) are unbiased and consistent estimations.
PROPERTY 2. Experimental
distribution Fb2(x) will approach the true distribution
Based on these two properties,
as NF~ + 00.
the following theorem is naturally true.
THEOREM. Suppose that a quantitative
character can be described as equation (4), assumptions 1’4’ are satisfied and the number of genes N is limited. Then: (1) its F2 distribution density function is determined by equation (17) theoretically; _(2) th ere exists a set of parameters e which satisfy equation (20); (3) the estimation values, 8, will approach the real parameters as NpI + 00, NPa + 00, NpI -+ 00 and NF~ + 00. The conclusions of the theorem are clear so that the proof is omitted (the details can be seen in [3]).
C. JIAN et al.
48
4. EXAMPLES
AND
STATISTICAL
ANALYSIS
To compare with Wright’s formula, five examples are checked. From Table that the result by the model is better than that of Wright’s formula. Table 1. Comparison
of estimation
based on formula (17) with Wright’s
K2 = hi
= hr5 = 0,
fKt = 3
Md x Fla
Goodwin
[5]
(0.02208)
XC1 = 6, K3 = 6,
I
I 5
(KI Lycopene
Liu [6]
Kl
0.00988 (0.02574)
= 7)
= 1, K3 = 1,
0.00395
K2 = h’, = hT’, = 0,
content
(KP = Length of
Liu [7]
(0.01805)
21
K1 = 1, K3 = 2,
cornbar
formula.
71
K2 = bra = hTs = 0,
4
1, it can be seen
0.00136
Kd = 2,
(0.01793)
hr2 = h’5 = 0,
l
To examine
Data in brackets are the results of Wright’s
the model quantitatively,
the Pearson’s
formula estimation.
Chi-square
test is employed.
The statistics
is
(22) where fj is the observed frequency, fj is the predicted frequency calculated by equation is the number of intervals, n is the number of observations of the F2 population.
(23), L
(23)
Jaj
where aj, bj are the lower and upper bounds of the interval (aj, bj). In Table 2, the results of statistical analysis are given. X,“(z) is the upper Chi-square distribution with degree of freedom f. Table 2. Results of statisticat Model
freedom f
of a
analysis. Wright’s
(17)
formula
degrees of
degrees of No.
lOOa% point
X2
x:(0.05)
freedom f
X2
x:(0.05)
1
8
6.736
15.51
12
88.998
21.03
2
12
15.764
21.03
16
34.886
26.30
3
11
18.369
19.68
15
39.160
25.00
4
1
3.252
3.84
6
18.744
11.07
5
6
10.306
12.59
10
114.27
18.31
From Table 2, for Wright’s formula, the X2 values are larger than the Xj(O.05) values in all cases. This would lead to the suspicion of the validity of some of the assumptions in the formula. The X2 values based on the new model are smaller than the Xj(O.05) values, which indicates that the model is reasonable and reliable.
Identification of an inheritance model
49
5. APPLICATION In this section the new model is applied to rice breeding. Seven characters are discussed. The data of each cross-combination consist of the 9, P2, Fl and F2 populations, which are provided by the Institute of Crop Breeding and Cultivation of the Chinese Academy of Agricultural Science in Beijing. By the identification method mentioned previously, we can get the number of effective factors of each character listed in Table 3. As we can’t justify whether the parents are two extreme lines, the numbers of the effective factors in Table 3 are the minimum numbers. Table 3. The number of effective factors of seven main characters of rice.
1particleCharacter no.
Kl
K2
K3
0
2
3
0
0
panicle length
0
2
1
0
0
K4
&
weight per 1000 grain
1
1
1
2
0
grain no. per panicle
0
1
2
1
0
plant height
1
1
2
3
0
grain length
2
0
0
1
0
grain width
1
2
1
2
1
5.26
5.35
5.44
5.53
5.62
fre.
0 4.9
4.98
5.08
5.17
5.7
length of grain, mm. (a) Tigin/Lijiangxintuanheigu
fra
95
100.6 106.1 111.7 117.2 122.5 128.3 133.9 130.4
145
plant height, cm. (b) Tigin/Lijiangxintuanheigu s predicted value
+ observed value
Figure 1. The predicted and observed frequencies.
150.6
50
C. JIAN ef al.
After the parameters of each character are estimated, we use equation (23) to predict the F2 distribution through the information of their parents. Four cross-combinations are checked. Substituting the parameters in Table 3 in formula (23), we obtain the predicted frequencies of each of the populations as given in Figure 1 (because of the space limitation, only two cases are presented here). From Figure 1, it can be seen obviously that the observed and predicted frequencies are very close. The fitting in all cases is very good over the main parts of the distribution. 6. CONCLUSIONS In this paper, an inheritance model of quantitative characters and its parameter identification method are presented. By comparing some examples, we find that the new model is much better than Wright’s formula and it is reliable by statistical analysis. It is concluded that the main feature patterns in the inheritance of a quantitative character can be described by the three effects of genes considered in this paper. This will provide a solid theoretical basis for analyzing the inheritance of quantitative characters. Finally, the model is applied to rice breeding and the results of predicting the FZ population of each character based on the distribution model are satisfactory. REFERENCES 1. S. Wright, The results of crosses between inbred strains of c&a pigs, Gene&s 19, 537-551 (1934). 2. K. Mother and J.L. Jinks, Biomelrical Gene&x, 3rd Ed., London, (1982). 3. Chen Jian, The system analysis and control of genetic process in crop breeding, Ph.D. Dissertation, Tsinghua University, Beijing, (1989). 4. K. Shkudo, Studies of the quantitative inheritance (5), Studies on Breeding 4, 13-32 (1950). 5. R.H. Goodwin, The inheritance of flowering time in short-day species Solidag sempes vixens L, Genetics 29, 503-519 (1944). 6. Liu Jinsheng, Inheritance of lycopene content in fruit of tomato, Hereditas (Beijing) 8 (2), 9-12 (1986). 7. Liu Zhutong, Genetics, Chinese Higher Education Press, Beijing, (1979).