The identification of an inheritance model and its application

The identification of an inheritance model and its application

Math1 Compat. Modelling Printed in Great Britain. THE Vol. 15, No. 6, pp. 43-50, All rights reserved 1991 Copyright@ 0895-7177191 $3.00 + 0.00 199...

524KB Sizes 2 Downloads 20 Views

Math1 Compat. Modelling Printed in Great Britain.

THE

Vol. 15, No. 6, pp. 43-50, All rights reserved

1991 Copyright@

0895-7177191 $3.00 + 0.00 1991 Pergamon Press plc

IDENTIFICATION OF AN INHERITANCE AND ITS APPLICATION CHEN

JIAN,

ZHENG

Systems Engineering

WEIMIN

AND WANG

Lab, School of Economics

Tsinghua University,

Beijing, 100084,

(Received

July

MODEL

YONGXIAN

and Management P. Ft. China

1990)

Abstract-The inheritance of a quantitative character is very complex. There is still lack of a powerful method to deal with it. In this paper, an inheritance model of a quantitative character is built by means of effective factors and the method of parameter identification is also presented. It is found that the model is much better than Wright’s formula. The statistical analysis also shows that the model is reliable. It is then applied to rice breeding. The predicting accuracy and the main inheritance feature patterns of each character are satisfactory.

1. INTRODUCTION To study quantitative characters, the geneticists are usually confronted with the situation that the data are available only for the parents, F2 and/or Fl populations, which is the case at the early stage of the experiment. In addition, the statistical procedures for its analysis are not efficient. Today, crop breeding is mainly operated by experienced breeders, so it would take a long period to obtain a new variety. In order to improve the efficiency of crop breeding, it’s necessary to explore the possibility of developing some effective methods for analyzing quantitative inheritance. In studies of quantitative inheritance, an effective factor [1,2] was employed, which represents the effect of mass genes in the same chromosome. From the earlier definition, the effective factor can only represent the additive effect of one gene. But research on quantitative characters shows that besides an additive effect there is a dominance effect and an epistatic effect, which are the interactions of genes in a locus and among loci, respectively. The distribution forms of characters are related to various effects among genes. To analyze a quantitative character in detail, the dominance and epistatic effects among genes should not be neglected. In this paper, we extend the concept of the effective factor so that three main effects of genes will be considered. With the effective factor, an inheritance model that describes the distribution of a second filial generation (Fz) of a quantitative character is established. The parameters of the model are estimated by curve fitting and some discussion about the estimation method is also presented. Then, it will be compared with Wright’s formula [1,2]. To examine the reliability of the model, statistical analysis is given. Finally, it will be applied to rice breeding and seven characters will be discussed. 2. DISTRIBUTION

MODEL

OF QUANTITATIVE

INHERITANCE

For quantitative characters, there are two types of random variables concerned: one is of a discrete nature, to be associated with the segregation of effective factors, another is of a continuous nature, to be associated with random disturbances or environmental effects, which are usually assumed to be normally distributed. In general, the phenotypic value Y of a quantitative character can be expressed as [2]: Y=G+E, (1) This work is supported

by the National Natural Science Foundation

of China, Grant No. 6864009. Typeset

43

by A&-T@

44

C. JIAN et al.

where G is the genotypic value of a quantitative character, E is the environmental with a normal distribution N(0, cr2), and G and E act independently. According to the different action-types of genes, we can divide G into three parts:

deviation,

G=A+D+I

(2)

where A denotes the additive value, D denotes the dominance deviation, and I denotes the epistatic deviation. A, D and I axe the macro expression of effective factors. We assume that there are N loci segregated independently and there are only two alleles at each locus, say Ai and a, for the ith locus. Then, equation (1) can be rewritten as: ’

=

21 +

where xi is the discrete random variable i= 1,2 ,..., N. Furthermore, we make two additional

x2 +

***+

associated

2N

+

E,

(3)

with the segregation

of each effective

factor,

basic assumptions:

lo. The genotypic values of different loci are identical [2]. 2”. Parents (PI and P 2) are pure lines, and the concentration of the genes among them is said to be complete if the genotypes of PI and P2 are given, respectively, by alala2u2 ..,C,NaN and AlAlAzAz . . . ANAN [2]. With these assumptions, we classify the N effective factors different action-types of genes, as follows: Y =

into five classes that

I&x(‘) + IC2z (‘1+ IC3x(3) + IC4d4)+ I@)

represent

+ Et

the

(4

where K1 is the number of effective factors having only additive effect; I<2 is the number of effective factors having both additive and positive dominance effects, I<3 is the number of effective factors having both additive and negative dominance effects; 2K4, 2K5 denote the number of effective factors having additive effect and dominant epistatic effect, and additive effect and recessive epistatic effect, respectively; and c (‘1 is the random variable associated with the segregation of an effective factor in class Ki, i = 1,2,3,4,5. Then, the genotypic value G can be expressed as: G =

Klx(‘) + Ic-~x(~) + IC3x(3) + K4d4) + I&X/(~).

Based on the different effect of each kind of effective factor, I<~z(~) (i = 1,2,3,4,5) is as follows: (1)

The probability function of I<1 effective ICI = 1, the distribution table of F2 is: aa

genotype

Written

as a probability

factors

that

function

have only additive

effect.

of

Letting

AA

Aa

function: 2! Pl(iyj)

where j = 0,1,2.

the probability

(5)

=

j!(2-j)!

1” 02



~

(6)

Identification

Extended

to the general

of an inheritance model

45

case:

PQ(Xj)

(22(i)!

=

j!(2Ki

1 2K1

-j)!



05

where j=O,l,2 ,..., 2Kr. (2) The probability function of Kz effective factors that have both additive and positive dominance effects (gene A is dominant). Letting 1<2 = 1, the distribution table of F2 is: genotype

The probability

function

I aa

(Aa, AA)

state

X0

Xl

state value

0

EFF

probability

114

314

can be written

as: j

PQ(Xj)

where j = 0,l. Extended to the general

=

j!(l-j)!1

1-j

0043

41

(8)



case:

PrZ(xj)

=

3

1-2 ! j!(Kz-

j

1

00 4

Z

where j=O,l,Z ,..., Ka. (3) The probability function of Ks effective factors dominance effects. Similar to (2), we have:

that

PT3(Xi)

=

j)!

.

I(s! i! (I(3

-

Ka-j



have both

(2)Ks-i(;)i,

z)!

additive

and

negative

(10)

4

wherei=0,1,2 ,..., Ks. (4) The probability function of 211’4 effective factors that have an additive effect and the dominance epistatic effect (the dominant gene A will cover the effect of genes B and b). Letting 1<4 = 1, the distribution table of F2 is : genotype

The probability

PT4(Xi)

Xl

EFF

2EFF

probability

l/16

3116

314

1 .

k j,=.

jl!.h!

case, it has:

,...,

1

is:

j,=O

2jltjz=i, jltjZ
x2

x0 0

k j,=.

jl=O

(A-B-,A-bb)

state

Zjl+ja=i, jltj251

where i = 0,1,2. In the general

aaB_

state value

function

=

aabb

21id.

jl!j2!

(1 -

31 -

. 32)!

(?) 4

j’

(3J

(lJjy

(11)

46

C. JIAN ef 01.

(5) The probability function recessive epistatic effect.

of 2K5 effective factors Similar to (4), we have:

K5! jlZO

jz=o

jl!j2!

(I(s

-

,

.

31 -

that

have an additive

effect

and the

(~)K._jl_j,(~)j,(~)jl,

(13)

4

32y

2jl+h=i, jl+hSKs

where i = 0,1,2,. . . ,2K5. With

a discrete

convolution,

the probability

PrG(ey xi) = PrlCxil)@Pr2(Xi,) =~~~~~

function

of genotypic

$ pT3(Xis) $ pr4(Xi,) f4F

$

value G will be:

pr5(Xi,)

52

(14)

.

la=0 i3=0 id=0 il+io+i3+il+il=i

ll=O

I

if,=0

jl=O

ja=O

2jl-!=ja=i4, jl-l-ia
l1=0

IQ=0

211+la=i~, 11+rz
(2Ki)!K2!

il!(2Kl

- il)! i2!(K2 - i2)!

i3!(K3

- i3)!jl!j2!(1<4 - jl - j2)!ll!l2!(1<5 -

Kl+Ka+i.-il-il-i.(~)ja+iz(~)K.+ilfjl-ir

(;)il

11 - 12)!

(A)Kd-jl-j21),

[(;)2K1(t)

where i = 0, 1,2,. . . , M; CDis the symbol of discrete convolution; A4 = 21(1+1(2+1(3+21C4+21(5; Xi = min(pl ,p2) + IPI - pz 1 x i/M; pl, p2 are the phenotype values of two parents; and 0 = [KI, Kz, Ks, K4, KalT is a parameter vector. Furthermore, the probability density function of G can be written as

(15) i=O

where

Then, we can derive the probability density F2 population) by means of the convolution,

With

the properties

of the 6 function,

.h(e$)

=

function

of Y (which describes

the distribution

of

we have:

5 prG(e,xi) i=o

3. PARAMETER

-&e-w, i?u

(17)

ESTIMATION

If all parameters { Ii1 , Ii’z, K3, I(4, I-5, u2, pl , ~2) in the model (17) are known, it can be used to analyze the genetic process of crossbreeding. When the parameters are unknown, they can be

Identification of an inheritance model

47

estimated. Before identifying 8, let’s first estimate the character values of parents and environmental variance based on the populations of PI, P2 and PI: *,I (18-a)

$1 =-$&Q, PI i=l

(18-b)

ba=;

1A”” Pl

2 Xi,Pl

CC r=l

-A

1 +m

>

P2

2

NP, jcjpz -lj2

c( j=l



> (18-c)

where NpI, NPa, NpI are the numbers of observations of the parents (PI, P2) and the Fl populations, respectively; Xi,p, , Xj,p2 and XkJ, are the ith, jth and kth observed values of the parents and the Fl populations, respectively. For identifying the parameters of the model, that is, to estimate ICI, K2, Ks, & and Ks in equation (17), compare the experimental distribution with model (17) and choose a set of parameters that will make the two distributions as close as possible. For curve fitting, it is necessary to define a loss function, V(O, x), which measures the difference between the model and the observed population distributions, as follows:

V(O,x) =Jco _m[fF#,X)

where

fF,(‘)

and

spectively. The parameter

fh2

(.)

are the theoretical

-

fk2(X>1”

and experimental

dX>

(19)

probability

density functions,

re-

vector is determined by: V(J, x) = mjn V(0, x).

To simpify the computation,

(20)

equation (20) can be approximated

FF2(e, Xj) - FF,(e, "j-l) Axj

1 -

[

Ff,,(xj)

as:

- Fgxj-1 Axj

(21)

where

FF2(e,X) = F;Jx)=

t

J -co

fFdk

Y)dY,

J_-03 = f&(Y) dY,

NT is the number of intervals, Axj = xj - d-l is the length of the j-th interval. With probability theory, equation (21) will approach equation (20) as NT + co. Then, we give two properites about statistics. PROPERTY 1. jl,

fi2 and ti2 derived from equation (18) are unbiased and consistent estimations.

PROPERTY 2. Experimental

distribution Fb2(x) will approach the true distribution

Based on these two properties,

as NF~ + 00.

the following theorem is naturally true.

THEOREM. Suppose that a quantitative

character can be described as equation (4), assumptions 1’4’ are satisfied and the number of genes N is limited. Then: (1) its F2 distribution density function is determined by equation (17) theoretically; _(2) th ere exists a set of parameters e which satisfy equation (20); (3) the estimation values, 8, will approach the real parameters as NpI + 00, NPa + 00, NpI -+ 00 and NF~ + 00. The conclusions of the theorem are clear so that the proof is omitted (the details can be seen in [3]).

C. JIAN et al.

48

4. EXAMPLES

AND

STATISTICAL

ANALYSIS

To compare with Wright’s formula, five examples are checked. From Table that the result by the model is better than that of Wright’s formula. Table 1. Comparison

of estimation

based on formula (17) with Wright’s

K2 = hi

= hr5 = 0,

fKt = 3

Md x Fla

Goodwin

[5]

(0.02208)

XC1 = 6, K3 = 6,

I

I 5

(KI Lycopene

Liu [6]

Kl

0.00988 (0.02574)

= 7)

= 1, K3 = 1,

0.00395

K2 = h’, = hT’, = 0,

content

(KP = Length of

Liu [7]

(0.01805)

21

K1 = 1, K3 = 2,

cornbar

formula.

71

K2 = bra = hTs = 0,

4

1, it can be seen

0.00136

Kd = 2,

(0.01793)

hr2 = h’5 = 0,

l

To examine

Data in brackets are the results of Wright’s

the model quantitatively,

the Pearson’s

formula estimation.

Chi-square

test is employed.

The statistics

is

(22) where fj is the observed frequency, fj is the predicted frequency calculated by equation is the number of intervals, n is the number of observations of the F2 population.

(23), L

(23)

Jaj

where aj, bj are the lower and upper bounds of the interval (aj, bj). In Table 2, the results of statistical analysis are given. X,“(z) is the upper Chi-square distribution with degree of freedom f. Table 2. Results of statisticat Model

freedom f

of a

analysis. Wright’s

(17)

formula

degrees of

degrees of No.

lOOa% point

X2

x:(0.05)

freedom f

X2

x:(0.05)

1

8

6.736

15.51

12

88.998

21.03

2

12

15.764

21.03

16

34.886

26.30

3

11

18.369

19.68

15

39.160

25.00

4

1

3.252

3.84

6

18.744

11.07

5

6

10.306

12.59

10

114.27

18.31

From Table 2, for Wright’s formula, the X2 values are larger than the Xj(O.05) values in all cases. This would lead to the suspicion of the validity of some of the assumptions in the formula. The X2 values based on the new model are smaller than the Xj(O.05) values, which indicates that the model is reasonable and reliable.

Identification of an inheritance model

49

5. APPLICATION In this section the new model is applied to rice breeding. Seven characters are discussed. The data of each cross-combination consist of the 9, P2, Fl and F2 populations, which are provided by the Institute of Crop Breeding and Cultivation of the Chinese Academy of Agricultural Science in Beijing. By the identification method mentioned previously, we can get the number of effective factors of each character listed in Table 3. As we can’t justify whether the parents are two extreme lines, the numbers of the effective factors in Table 3 are the minimum numbers. Table 3. The number of effective factors of seven main characters of rice.

1particleCharacter no.

Kl

K2

K3

0

2

3

0

0

panicle length

0

2

1

0

0

K4

&

weight per 1000 grain

1

1

1

2

0

grain no. per panicle

0

1

2

1

0

plant height

1

1

2

3

0

grain length

2

0

0

1

0

grain width

1

2

1

2

1

5.26

5.35

5.44

5.53

5.62

fre.

0 4.9

4.98

5.08

5.17

5.7

length of grain, mm. (a) Tigin/Lijiangxintuanheigu

fra

95

100.6 106.1 111.7 117.2 122.5 128.3 133.9 130.4

145

plant height, cm. (b) Tigin/Lijiangxintuanheigu s predicted value

+ observed value

Figure 1. The predicted and observed frequencies.

150.6

50

C. JIAN ef al.

After the parameters of each character are estimated, we use equation (23) to predict the F2 distribution through the information of their parents. Four cross-combinations are checked. Substituting the parameters in Table 3 in formula (23), we obtain the predicted frequencies of each of the populations as given in Figure 1 (because of the space limitation, only two cases are presented here). From Figure 1, it can be seen obviously that the observed and predicted frequencies are very close. The fitting in all cases is very good over the main parts of the distribution. 6. CONCLUSIONS In this paper, an inheritance model of quantitative characters and its parameter identification method are presented. By comparing some examples, we find that the new model is much better than Wright’s formula and it is reliable by statistical analysis. It is concluded that the main feature patterns in the inheritance of a quantitative character can be described by the three effects of genes considered in this paper. This will provide a solid theoretical basis for analyzing the inheritance of quantitative characters. Finally, the model is applied to rice breeding and the results of predicting the FZ population of each character based on the distribution model are satisfactory. REFERENCES 1. S. Wright, The results of crosses between inbred strains of c&a pigs, Gene&s 19, 537-551 (1934). 2. K. Mother and J.L. Jinks, Biomelrical Gene&x, 3rd Ed., London, (1982). 3. Chen Jian, The system analysis and control of genetic process in crop breeding, Ph.D. Dissertation, Tsinghua University, Beijing, (1989). 4. K. Shkudo, Studies of the quantitative inheritance (5), Studies on Breeding 4, 13-32 (1950). 5. R.H. Goodwin, The inheritance of flowering time in short-day species Solidag sempes vixens L, Genetics 29, 503-519 (1944). 6. Liu Jinsheng, Inheritance of lycopene content in fruit of tomato, Hereditas (Beijing) 8 (2), 9-12 (1986). 7. Liu Zhutong, Genetics, Chinese Higher Education Press, Beijing, (1979).