THEORETICAL
POPULATION
BIOLOGY
3, 249-257 (1972)
Linearization of Crossing-Over in a Finite Random-Mating DANIEL
SERANT
U.E.R. de Mathbmatiques,
AND
and Mutation Population
MICHEL
VILLARD
Umbersite de Lyon I, Lyon, France
Received
April
26, 1971
With a haploid model equivalent to the two-locus random-mating diploid stochastic model, owing to linkage, the recurrent equations for joint moments of gametic frequencies are nonlinear; it is shown that, in some cases, these can be linearized. The method is then applied to the study of crossing over and crossing over with mutation.
A. THE MODEL We are concerned with a diploid population that we assume random mating, isolated, of fixed size N and with noninterbreeding We propose to study the joint evolution of two loci with two respectively, A or a and B or b; the four gametic genotypes are
El : (AB)
E, : (Ab)
E3 : (ab)
monoecious, generations. alleles each, numbered:
E4 : (uB).
The zygotic genotypes being denoted (Ei , E,), i t is convenient to distinguish conventionally (Ed , Ej) and (I$ , EJ f or i f-j and to consider the corresponding frequencies qii and qii as equal. We use a haploid model where the (n + 1)-th gametic generation G(n + I) is obtained from G(n) in the following way: (i) A finite diploid population pairing of the 2N gametes in G(n);
D(n) of N zygotes is formed by random
(ii) An infinite gametic urn R(n) is issued from D(n) taking account of deterministic evolutionary pressures (crossing-over, mutations, etc.); (iii) Finally, from R(n).
G(n + 1) is obtained
by random
drawing
of 2N gametes
Although haploid, this model allows the introduction of diploid factors in the zygotic phase D(n) and is, in fact, equivalent to the random mating diploid model (see [lo, 121).
249 Copyright All rights
0 1972 by Academic Press, Inc. of reproduction in any form reserved.
250
SERANT AND VILLARD
This model, or similar ones, have been widely used, in particular by Hill and Robertson, Karlin, Karlin and McGregor, Kimura, Ohta and Kimura, Watterson, etc.
B. LINEARIZATION OF CROSSING-OVER Let D(n) be known by the zygotic frequencies qii(n); if we assume that crossing-over is the only diploid factor (which excludes in particular zygotic selection), the gametic output (considered of infinite size) of D(n) immediately after gametogenesis is described by the gametic frequencies pi’(n) which are, by standard formulas,
4iw = QiW + (- 1); yd(n),
i = l,..., 4,
(1)
where denotes the recombination fraction qi = Cj qii are the gametic frequencies in G(n) or D(n) d = q13 - qz4 is the difference of double heterozygote frequencies Y
This gametic output is then modified by the other deterministic (haploid) factors, to form the infinite gametic urn R(n); the gametic frequencies p,(n) in R(n) are of the type: Pie4 = fit. . . 4i’W
. .),
i = I,..., 4,
(2)
with fi given functions introduced by the haploid factors. Owing to linkage, moments of order p of gametic frequencies in G(n + 1) are dependent on moments of higher order in G(n). We wish to show that this can sometimes be overcome and that all joint moments of gametic frequencies can be calculated. This is a systematic extension of properties appearing in [3] for a slightly different model (see specially IV-5). We first change variables; to describe G(n), we use, instead of qi(n), i = l,..., 4, the variables 4Ak-9 = 41(4 + 42(4,
48(4 = q&)
+ %(4>
and
(3) 44 = 4d4 4d4 - !A&) q&4 = q&J) - !7&) 4&+
In the same way, we use p, , p, and e, instead of pi to describe R. We next replace the notion of “order” of a joint moment by that of “rank”: we call rank of E(qAaqBBP) or E(qAmq#qly) or E(pAapB@ev) or E(pAapBflpl~) the integer p = 01+ ,6 + 2~. Similarly, we call monomial of rank p in qA , qB and 1 (or qJ or in pa , p, and e (or p,) any expression kqAaqB8P or kqAaqB@qly or
LINEARIZATION
251
OF CROSSING-OVER
k~~~~sfie~ or &~~~~sflpiy with 01+ ,!I + 2~ = p, and polynomial of rank p any polynomial such that its highest rank monomial is of rank p. Although G(n) is completely known when D(n) is given, the converse is not true; D(n) is dependent not only on G(n), but also on the outcome of the random pairing process. We thus use a two-step calculation of moments by repeated use of the formula for conditional expectations
-q-q + 1)) = J%%(,)(ED(n)(~(n+ 1)))). It is clearly equivalent to calculate joint moments of any order of the qi or joint moments of any rank of qA , qe and 1. The linearization of crossing-over is then the result of the two following lemmas (the proofs are detailed in [lo]): LEMMA 1. Joint moments of rank p, conditioned by D(n), of qa(n + I), qe(n + 1) and Z(n + 1) arepoZynomiuZs of rank p in PA(n), pB(n) andp,(n) (or e(n)).
LEMMA
2.
E,(,)(d(n)P)
is a polynomial
of rank 2p in qA(n), qB(n) and ql(n)
(or 44). By (1) and (2) the composition PA
=
fA(qA
of R(n) is
P qB 9 z -
rd)>
PB
=
fB(qA
, qB , z -
rd), (4
Hypothesis H. We now assume that the gametic factors do not increase rank, i.e., that fA , fB and f are polynomials in qA , qB and Z - rd of respective rank at most 1, 1 and 2. This hypothesis is, in particular, satisfied in the cases of crossing-over alone and crossing-over with mutations which we study in the next section. Since ED&qAaqBeZY(n + 1)) is (by Lemma 1) a polynomial of rank IX+ p + 27 in p,(n), pB(n), e(n), and, therefore (hypothesis H), a polynomial of rank 01+ B + 2~
in
qAb),
‘%dqAaqB’zY(n
d4,W
-
+
1))
rd(n),
=
EG(IE)(ED(n)(qA01qB6zY(n
-f-
I)))
is a polynomial in qA(n), qB(n) and Z(n) of rank N + fi + 2~. As a result, E(qA*qBV(n + 1)) is a linear combination of joint moments of rank at most a + B + 2~ of !?A@), q&l and 44. Therefore, subject to hypothesis H, it is possible to calculate, from rank to rank, by linear recurrence relations, all joint moments of the gametic frequencies qiW
252
SERANT AND VILLARD C. APPLICATION TO CROSSING-OVER AND MUTATION
We introduce recurrent of a gamete, with rates
mutation,
acting
A +a1 a, 02
and we set
independently
on the two genes
B 2 b, b, b = b, + b, .
a = a, + a2 , In the present case, (4) is PA
=
(l
-
d
qA
+
u2 ,
%
=
(1
-
W qB
+
62 ,
e = (1 -
a)(1 -
b)(Z - d).
We observe that the rank is preserved, so that we can linearize crossing-over; furthermore, haploid factors act independently on the genes of a gamete, and the method developed in the preceding section gives for marginal moments of qA and qB , the standard equations and results of one locus theory. We use the notations:
X(n) = -%(X(n)), x~ = k-5X(n), qAqB(l
Q2 = py]>
Q3 =
Qa =
~~~-;,‘)]t
-
&7/i
-
!!A)(l i!kB
-
qB) 3)
>
12
and P2 , P3 , Pa for the similar expressions with p, , p, , and e. These quantities are interesting by themselves since they can be interpreted in terms of various probabilities and rates of fixation (see [3-5, 111 for this interpretation and the conclusions that can be drawn); they also give neater formulas and their expectations are sufficient (with the results of one locus theory) to determine means, variances and covariances of gametic frequencies qi . By taking expectations over the multinomial drawing, we get -&mQ2(n
where
+ 1) = [:,
1”
a] P2(n)9
Eocn,Q&+ 1) = (1 - 4 [:, 1 ?2a] -%(n)Q&+ 1)= (1- 4A *P&4> 4L-q - a) (1 - 2L9a 4cr(l -E)
P3b
1 -22af2012
+
1)~
LINEARIZATION
OF
253
CROSSING-OVER
Crossing-over alone Without
mutation,
p, = qA , pB = qB , e = 1 - rd; therefore
‘%dPdn) = [; 1 0 p] f&b), EGh)P4(n)
=
B
* Q&h
where 1 0 0 1-p cS(l - 401) -&x2x
B=
0 0 1-2/A+x(l
---4C9)
and, for a priori expectations,
Q4(n + 1) = (1 -
ol)A . B . Q4(n).
For rank two moments, we have
E(n)= WW - 4(1 - CL))“; thus,
go
= qi(o) + (-l)iPZ(O)
ll-:((!l:a;(ll_,:,n A!
so that the means of gametic frequencies converge (1 - Y - l/ZiVy to the absorption probabilities
ri = qi = Pi(O)+ (-lY
P as (( 1 -
, a)(1 - p))” =
CL@) 1 _ (1 _ &,)(I _ p) .
These results are equivalent to those given in Watterson [l l] provided we note that the two models differ not only by the variables (qi here and pi in [l I]), but also by the initiaI population (G(0) here and D(0) in [II]); the quantities denoted pi and D by Watterson, stand for qi(0) and Z(0) here, and Watterson’s A must be regarded as d(0) = Z(O)/(l - a).
254
SERANT
AND
VILLARD
The qualitative conclusions are similar to those given in [5]; however, it is to be noted that, with the present model, in the case of loose linkage, the eigenvalue (1 - ol)(l - p) may be negative and I(n) (hence d(n) = ?@/(I - a)) may have an oscilatory behavior. The solution for Qa(n) is just as immediate and it shows that G (and qB) __ converge to zero at the same asymptotic rate as Z(n); on the contrary, qxB -(and qaqB2)converge to 41 as (1 - 01)“, therefore much more slowly and independently on r. -As we are concerned with a finite Markov chain with absorbing states, Q4(n) converges to zero; in order to estimate the rate of convergence we must study the eigenvalues of matrix A . B. They are the roots of the determinantal equation 1A . B - sI 1 = 0, which by row and column transformation is equivalent to s, - s
-2p
2cu((l -
201)(1 - p) + S) s, - s -&X2(1 - 201)X
+ (Y - W)X
0 E 0
where we have set s, = 1 -
01,
s, = (1 - 2cYy(1 - p), s, = (1 -
201)(1 - 2p + h(1 -
Matrix A * B always has one eigenvalue is O(1) with regard to 01= 1/2N, matrix 0 \3-.211--. < s < s < s < 1 with approximations s1 = 1 -a+22a3(1
+g
Ly-
4Cx7).
S, E [l - (Y, I]. Furthermore, if r A . B has three real eigenvalues
q2) +o(cP),
s2 = 1 - Y - a(4 - 3r) + O(2), s3 = (1 - r)” - a(2 - 27 + 3r2) + o(a3). --
We can conclude that Q4(n) approaches zero as (1 - cu)as,“; thus, since s, is generally greater than 1 - TV= 1 - r/(1 - cy), variances and covariances of the qi(n) and the variance of Z(n) converge much more slowly than the means (see [5] for the important implications of this result). It is interesting to note that s, , and therefore the rate of convergence, is a decreasing function of Y, and that, to O(OL~),it is independent from it. The present value of s, is in disagreement with the value given in [ll]; this is due to an algebraic mistake in [l I]; Littler and Watterson have since corrected this and indeed given a determination including the O(d) term; they have also studied the variation of sr throughout the range [0, l] (see the errata which is to appear in Theor. Pop. Biol.).
LINEARIZATION
255
OF CROSSING-OVER
Crossing-over and Mutation The same method, applied to the case with mutation, gives very similar equations; we have now
__P2(n>= (1 - 41 - W[k 1 0 p] Q44 + %(n>9 __ P3(4 = (1 - d2 (1 - b) [A 1 0 p] Q&4 + J&(4, P&) = (1 - 4” (1 - bj2B *QXn) + f&(n), where Hi(n) is a column matrix only depending on moments of rank < i. The eigenvalues involved here only differ by factors 0 < (1 - a)i (1 - b)j < 1 from the corresponding eigenvalues in the case with no mutation; we can thus conclude that the Qi(n), and therefore the corresponding moments, have limiting values. It can be shown that ~=[hgz],
and
ffj=[ho81(
with A2 = +$(I
- (1 - a)(1 - b)),
h3 = 1 - (1 - a)(1 - a)2
(1 _ (1 - a)(1 - a)2 (1 - b)2).
This implies the remarkable result that to obtain the limiting values of moments of rank < 3, qA , qB and 2 (resp. p, , pB and e) may be regarded as asgmptotically independent and the moments obtained as products of I= B = 0;
4.40- $9)= (1 - 4 *PA1
-PA
= (1 - a)[(2 - a) ala2/a( 1 - (1 -
41 - a)“>1
and similar expressions with B. We then find 1 F4 = h, 0 0
I
with h,
653/3/3-z
=
Cdl
-
%;o
’ !?B(l
-
!?B)
’ (j&q
-
(1
- a)” (1 - b)l),
256
SERANT AND VILLARD
and Q4 is the solution of Q, = (1 -
ol)A . ((1 -
a)” (1 -
b)2 B&
+ H,).
We obtain -__ Q4= cdl -
1 + O(Q”) O(a2)
-
q.4 +~~41 - 4s) O11 -
1 + (1 - #(I (1 - u>a (1 -
- b)2 b)a (1 - r)a + o(a2)
1.
The apparent asymptotic independence breaks down at rank four; however, __---for qA(1 - qA) qB(1 - qB) and Z(qA - +)(qB - 4) the discrepancy is only O(cz2). -Mutation increases the speed of convergence for I, Z(qA- 4) and l(qB - 8) since the rate is now (1 - a)( 1 - b)( 1 - CY)(1 - p) (instead of (1 - a)( 1 - CL)) for the above three quantities. In contrast, for other moments convergence is controled by the terms in Hi(n), so that the rate is in fact max(1 - a, 1 - b); the convergence is thus very much slowed down by the presence of mutation. The general behavior of the model is somewhat exaggerated by mutation; for the variance of linkage disequilibrium, not only is the convergence very slow in general, but the limit itself 1 + (I “iz-0
-42(1
u)” (1 - 6)2 -b)2(1
is nonzero, so that linkage disequilibrium role, and especially so with tight linkage.
-r)2
+“((r2)
can be expected to play an important
REFERENCES 1. HILL, W. G. AND ROBERTSON, A. 1968. Linkage disequilibrium in finite populations, Theor. Appl. Genet. 38, 226-231. 2. KARLIN, S. 1966. “A First Course in Stochastic Processes,” Academic Press, New York. 3. KARLIN, S. 1968. Equilibrium behaviour of population genetic models with nonrandom mating, J. Appl. Prob. 5. Product Branching Processes and 4. KARLIN, S. AND MCGREGOR, J. 1965. “Direct Related Induced Markoff Chains, I: Calculations of Rates of Approach to Homozygosity,” Bernoulli, Bayes, Laplace Anniversary Volume, Springer-Verlag, New York, pp. 1 l-145. 5. KARLIN, S. AND MCGREGOR, J. 1968. Rates and probabilities of fixation for two-locus random mating finite populations without selection, Genetics 58, 141-159. 6. MORAN, P. A. P. 1962. “The Statistical Processes of Evolutionary Theory,” Clarendon Press, Oxford. 7. OHTA, T. 1968. Effect of initial linkage disequilibrium and epistasis on fixation probability in a small population with two segregating loci, Theor. Appl. Genet. 38, 243-248.
LINEARIZATION
OF CROSSING-OVER
257
8. OHTA, T. AND KIMURA, M. 1969a. Linkage disequilibrium due to random genetic drift, Genet. Res. Camb. 13, 41-55. 9. OHTA, T. AND KIMURA, M. 196913. Linkage disequilibrium at steady state determined by random genetic drift and recurrent mutation, Genetics 63, 229-238. 10. VILLARD, M. 1970. Incidence du crossing-over sur l’evolution dune population panmictique d’effectif limit& These de 3eme cycle Lyon. 11. WATTERSON, G. A. 1970a. The effect of linkage in a finite random mating population, Theor. Pop. Biol. 1, 72-87. 12. WATTERSON, G. A. 1970b. On the equivalence of random mating and random union of gametes models in finite, monoecious populations, Theor. Pop. Biol. 1, 233-2.50.