THEORETICAL
POPULATION
BIOLOGY
The Effect of Linkage
1,
72-87
(1970)
in a Finite
Random-Mating
Population
G. A. WATTERSON Department
of Mathematics, Received
Monash October
University,
Victoria,
Australia
15, 1969
A model is proposed to study the effects of linkage two alleles are possible. In a finite, random-mating will usually be the case that one locus will fix first, other locus. The probabilities for which locus will become fixed, and the rates at which these fixations of the recombination fraction for the pair of loci state of the population. Most of the results are at those found by Karlin and McGregor (1968) (Genetics model, but fixation of the first locus is found to be complete linkage, in contrast with their conclusion first-fixation.
of two loci, at each of which monoecious population, it followed by fixation of the fix first, which alleles will occur, are found in terms in question, and the initial least qualitatively similar to 58, 141-159) for a different quickest with tight but not that looser linkage speeds
1. INTRODUCTION
In 1963, Kimura gave the probabilities of fixation, and a certain rate of expected change, for two linked loci carrying various alleles not subject to mutation or selection. Hill and Robertson (1966) were mostly concerned with selectionwhich they tackled by computer simulation, but they did quote a rate for expected changeat two loci in the absenceof selectionwhich differed from Kimura’s result. Karlin and McGregor (1968) pointed out that the divergence can be explained by noting that Kimura’s model is for a “random mating of zygotes” population, whereas,Hill and Robertson useda model which hasbeen termed a “random union of gametes”model, and it is not equivalent to the former in this context. By formulating the random union of gametes model as a Markov chain, Karlin and McGregor (1968)were ableto obtain many further resultsconcerning probabilities and rates of fixation, by methodsnot hitherto usedon a two-locus model. In this paper, we make the samestudy for the random mating of zygotes model. We find that someof the results are identical with those found for the other model, some are qualitatively similar although quantitatively different, and most surprisingly, some results are even qualitatively different. Why this should be soseemsan interesting question requiring someexplanation. 72
LINKAGE
IN
A FINITE
POPULATION
73
Specifically, we find that one of the two loci becomes fixed first at a rate depending on the recombination fraction. The rate is fastest when the recombination fraction is small but positive, that is, for tight but not complete linkage. This conclusion may be compared with that for the other model, in which first-fixation occurs more quickly the looser the linkage. Both models yield the same probabilities for which locus will fix first, but different probabilities as to which allele will fix first. However, it is roughly true of both models that tighter linkage favours the first-fixation of an allele which is (proportionately) associated more with the rarer allele at the other locus, initially. Eventually, after one locus has become fixed, the other will also. Both models have the same rate at which this will occur; namely, the classical rate of fixation of a single-locus model. The probabilities of fixation of the various gametes differ between the models but, qualitatively, they are roughly described by saying that tighter linkage favours the survival of the gametes which are initially more common. Both models possess the feature that the disequilibrium function approaches zero in mean and in mean square, but the rate of approach in mean square is the same as the rate of first-fixation. Thus, there is a considerable loss of generality if the disequilibrium function is assumed to be zero, initially. We do not here attempt the generalization of our work to include arbitrary family size distributions, as was done by Karlin and McGregor (1968).
2.
THE
MODEL
Consider a population of N monoecious diploid individuals, each consisting of a pair of gametes which can be one of the four types AB, Ab, aB, ab (and also called types 1,2, 3, and 4, respectively). These gamete types indicate that at one locus (the “A” locus) on a chromosome, either of the alleles A or a may occur, and that at another locus (the “B” locus) on the same, or a different, chromosome either of the alleles B or b may occur. There are, in all, ten distinguishable genotypes AB/AB, ABlAb,..., ab/ab (we identify symmetrically, thus, AB/Ab = Ab/AB). Most genotypes pass on gametes to their offspring which are copies of one or other gamete possessed by the parent. However, if there is a recombination fraction, r with 0 < r < 1, such that a proportion r of gametes produced actually consist of copies of the A-locus on one of the parent’s chromosomes and of the B-locus of the other chromosome, then the double heterozygotes AB/ab and Ab/aB may pass on gametes which they did not themselves inherit. Writing aijl for the probability that a gamete produced by a parent of genotype (i,j) will be of type Z(i \( j; i, j, 1 = 1, 2, 3, 4), we have that these probabilities are as in Table I. Denote by Xi:’ the number of (i, j) genotypes existing in generation t
74
WATTERSON
TABLE
I
aijl for the Two-Locus
i,i
1
Model
2
3
1, 1 I,2 193
0 0
i
1,4
$51 -
s &jY
4
2, 2 2, 3 294 3, 3 3,4 494
4
&1
& 0 0 0
0 0 1
4
t; 0
(i < j; i, j = 1, 2, 3,4; t = 0, 1, 2 ,... ), with the total population being of size N = c&j X$. If the next generation is also restricted to be of size N individuals, we suppose that these N offspring have their genotypes determined according to N independent multinomial trials, in which the probability of an offspring being of genotype (i, j), i < j, is 2p!t)p!t’ t 3 [p!t’]2 1
for
i < j,
for
i = j,
where p\Q is the probability that a parent, chosen at random from generation t, passes on a gamete of type 1. In particular, given Xii) = x$), then
(t) PC’“’= ;<$ s
%z ,
1 = I, 2,3,4.
This model may be called a random mating of zygotes model, for, when an offspring is to be formed, the mechanism may be interpreted as two parents being chosen at random (with replacement so that selfing is allowed) to provide the required gametes. The model is, thus, specified by the Markov chain transition probabilities pr(X:i+l)
= .(f+l) , i < j 1X!? = x!?), i < j) 53 P3 13 N (t+l) (pw)zz:4+~’ zzz xp, &+l),..*, xqp 1 1
(
(2p:“‘p~‘)&+l’
. . . (p;t92,”
(2.2)
LINKAGE
IN
in which c&j xi;+‘+” = CCigj diately reduces to
A FINITE
75
POPULATION
x$’ = N, and (2.1) holds. Now,
(2.2) imme-
Pr(X/jt+l) = Xjitfl), i < j / X!?’ t3 = x!?, 23 i < j)
provided we define yy+l’
= 2xgfl’
+ & x;;“’
+ Zl xy,
= zz xg+l’ + jgt xy,
1 = 1,2,3,4.
(2.4)
In any case, it is clear from the fact that the transition probabilities depend only on the previous state , x(t) 13, i < j, through the functions pv), 1 = 1, 2, 3, 4, that these latter form a Markov chain, with state-spacedescribed by the 4-vectors of the form (P:“‘, P:t), Pf’, Pi”‘) rather than the IO-vectors of genotype counts: (xii’,
X(t)12 ,...)
xk’).
It is a worthwhile simplification to use the four-dimensioned process. Its transition probabilities are best summarizedby meansof the moment generating function
(2.5) using (2.1) and then the moment generating function for the multinomial distribution (2.2). In (2.5) we have also made the summationsa bit neater by introducing terms with j > i; for this to be meaningful, we require the definition olijl = Q for all i, j, I = 1, 2, 3, 4. By substituting from Table I for the tiijl , we have, eventually, the model expressedas E(ee,9~+*)+e19~“t1)+ess~t+1’+e~w~’+” 1pf), p$) , p!), pf))
=
2 + 2(&)&’ + 2@&)
+
##)[c0sh(*~(e2
- &‘&
sinh(L(8, + 0, - 0, - 8,)/N) +
e,
-
e, -
e,pv)
-
11 “,
(2.6)
76
WATTERSON
where we have written qjt’ = pat)eb@JN, i = 1, 2, 3, 4. The model has been written in this form to indicate, in part, the role that the quantities D(t) = p$$’
_ $pp$’
and c$t) =
pf’p’t’ 4
+
pp 2
3
will play when r is not zero. For Y = 0, the model simply reducesto
E(e61p:t+l)+egp:t+l)+eQ~:t+1)+8~p:t+l) IPP’,Pi?PP’,I$‘>= ($ 4qBN, where c$’ is as above, and this model (effectively for four allelesat a single locus) has been studied before, e.g., by Karlin and McGregor (1965). Also, when Y = 0 (and only in this case)is the present model equivalent to the two-locus model of Karlin and McGregor (1968). 3. THE RESULTS The major conclusionswe wish to reach concernthe probabilities of the various allelesA, a, B, or b fixing first, and of the various gametesAB, Ab, aB, or ab being ultimately fixed, and the rates at which these occur. The methods we employ are those usedby Karlin and McGregor (1968), see,also, Karlin (1968), with somemodifications. 3.1 Rate of First Fixation We start by consideringthe rate at which the first locus to fix, doesso. The quantity
is positive if, and only if, the t-th generationpossesses all four alleles.And asthe expectation E(St)) is bounded between Pr(S@) > 0) . min{V)
: St) > 0}
and Pr(S’*) > 0) * max{Su) : St) > 0},
we find that E(S’Q) and Pr(St) > 0) approach zero at the samerate. Thus, the rate at which at least one locus becomesfixed is the rate of approach to zero of E(,!V) as t -+ co. To establishthis rate, a great deal of elementary algebra and calculus applied to (2.6) showsthat E(S’t+l’) = (1 - &- - r + ;, - 2r(l - r) (1 - 4)
E(S’t’) .1?((0’“‘)~)+ 2r (1 - -$) E(Z”‘),
(3.1)
LINKAGE in
which
j)(t)
=
p~t)p~t)
-
.e) = (&’
IN
fi’)pi’)
A FINITE
77
POPULATION
is the disequilibrium function, and
+ jp)(pjt’
+ pp’)(p:)
+ py)(p:’
+ #).
Further analysis of (2.6) then shows that E((D(t+l))y
= &
[ (1 - &)(
1 - &-
+ (1 - id[(l -(l
1- &
- &)(
-$&(I
- 29 + P] E(P) - 2r) + ra] E((P)2)
-&-2r)E(Z’t’),
(3.2)
and that
qZ’t+l’) = 1 (
&)’
&
+ (1 - &J
- (1 - &)(
E(P) (1 - $)
1 - +,
+ E((P)2)
E(Z’t’).
(3.3)
The system of Eqs. 3.1-3.3 is linear, and the solution for E(.V) as a function of time and initial values could be carried out explicitly, in principle. We content ourselves with the remark that E(P) approaches zero like c. pt as t + co, where p is the largest (in modulus) eigenvalue of the 3 x 3 matrix of coefficients in (3.1)-(3.3). In fact, TV= 1 - 1/2N - y/2N, where y is the smallest real solution of the cubic equation
with 2N2F = 4N3r2 - 4N(3N2 - 4N + 2)’ - (2N - 1)(7N - 3), 2N3G = -8N2(N + 2N(2N
- 1)2r3 + 2N(8N4 - 34N3 + 25N2 - 2N - l)r2 - 1)(13N2 - 15N + 6)r + (2N - 1)2(5N - 3),
N3H = 2(N + l)N(N
- l)(N - 2)(2N - l)r3
+ (2N - l)(-4N4
+ 16N3 - 6N2 - 6N + 2)r2
- N(2N - 1)2(5N - 3)r. The equation cannot be solved easily, except perhaps in a few special cases. In fact, if N = 1 we find p = 4 - r(1 - r) (whereas, Karlin and McGregor’s result is p = $ - $r( 1 - $r)) for th is selfing population. Notice that,unlike Karlin and McGregor’s result, the present model has p a minimum at Y between 0 and 1 (for N = 1, at r = 3), corresponding to fastest first-fixation at this interior value. More generally, in Table II we list some values of TVin cases when N is
78
WATTERSON
8 J
h
r" E E c
8
z 8 2 s
0
1'0
I 1 000'1 000'1 000'1 000'1 TWO'1 0OO"I Ot-JO.1 000'1 000'1 000'1 ZOO'1 POO'I
Z'O
1 000-I 000'1 000-I 000'1 000'1 WO'I
r)F
(4
-
P-0
T OOO't 000'1 000'1 000'1 000'1 ZOO'1
CO
I 000'1 000'1 000'1 OOO'I 000'1 030'1
:pawnqu
SO
I 000'1 000'1 000'1 000'1 000'1 666‘0 9'0
I 000'1 000'1 000'1 000'1 666'0 E66'0 L'O
I 000'1 000'1 000'1 866'0 P66'0 186'0 8'0
I 666'0 866'0 966'0 066'0 8L6.0 956'0 6'0
I 586'0 8L6'0 896'0 ES6'0 EE6'0 806'0
rc :pm? -mw
I
SPZB'O SPZO'O SPZP'O SPZO'O WZ8'0 ZPZO'O 8128'0 1'1
I 6E8'0 86L'O OSL'O 269'0 LZ9'0 555'0 Z'I
I 896'0 6P6'0 616'0 ZL8'0 86L'O 269'0 VI
I 666'0 L.66'0 266'0 086'0 6P6'0 IL8'0 S'I
I 000'1 666'0 L66'0 266'0 SL6'0 616'0
.4r\rz/lc:pawlnqeA
&'I
I P66'0 L86'0 Sf.6'0 6P6'0 868‘0 86.L'0 O'Z
I 000'1 000'1 000'1 000'1 666'0 266'0
O'E
I 000'1 000'1 000'1 000'1 000'1 000'1
x) \
co LO1 001 EOI rO1 SOI 801
N
80
WATTERSON
reasonably small, for contrast with a similar Table in Karlin and McGregor (1968). Also, we can prove that as N -+ co but r is kept fixed and positive, y + 1 - +r, so that
p~l-&~, However, find
for Y bounded away from 0.
the approximation
breaks down for small values of r, and instead we
p N 1 - 1/2N - 1/2N,
for
Y = O(l/Na),
O
and /.LN 1 - 1/2N - y/2N,
for
Y = y/N,
yfixedas
N-+
w,
where y is the smallest root (actually between 0 and 1) of y3 - (7 + 6y> y2 + (10 + 26y + Sy2) y - 207 - 8y2 = 0,
(3.4)
and /.LEI
-11/2N-r,
for
Y = O(l/Na),
01> 1.
We, thus, have the somewhat paradoxical situation that although in general first-fixation slows down as linkage decreases (i.e., r increases), the case with I = 0 must, of course, have the slowest rate of all (the model then being equivalent to a single locus, four allele situation), and so the fastest rate of firstfixation is actually achieved at an internal, but small, value for r: r = 0(1/N”), 0 < 01< 1, when the rate itself is TVN (1 - 1/2N)2. These assertions are checked numerically in Tables III and IV, and are quite different from the TABLE Minimum
Value
of ~1 for Small
Populations,
N
r
1 2 3 5 8 10 15 25 50
0.50 0.59 0.42 0.31 0.24 0.21 0.16 0.12 0.08
With
IV Corresponding
P 0.2500 0.5778 0.7111 0.8205 0.8849 0.9070 0.9371 0.9617 0.9806
r and (1 -
(1 -
1/2N)*
0.2500 0.5625 0.6944 0.8100 0.8789 0.9025 0.9345 0.9604 0.9801
1 /2N)e
Values
LINKAGE
IN
A FINITE
81
POPULATION
corresponding result for Karlin and MecGregor’s model, in which looser linkage (that is, increasing Y) produced faster fixation throughout the range 0 < Y < 1. 3.2 Rate of GameteFixation For the rate at which fixation at both loci occurs, we study the function T(t)
= p;‘p$d
+ p$$f’
+ p$‘p’“’ 4
+ p $t) p (t) + 2p(t)p(t) 4 1
4
+
2p p p t’
which is zero at the absorbing states(1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 0), and (0, 0, 0, I), and strictly positive elsewhere(i.e., if at leasttwo different gametesare present). It can be shown from (2.6) that E( Pfl’)
= (1 - 1/2N)E( Tit’),
so that E(P)) and, hence, the probability that no gamete has been fixed, approacheszero at the geometric rate At, where h = 1 - 1/2N, as t -+ co. This conclusion is the sameas that in Karlin and McGregor’s (1968) work; it showsthat the classicalrate of fixation for a single locus population applieshere to the slower of the two loci to fix. 3.3 Probabilitiesof First-Fixation It is possiblethat both loci might fix, for the first time, simultaneously.If this occurs, then it is not appropriate to talk about one allele fixing first. However, conditional on the two loci not fixing simultaneously,the generalreasoninggiven in Karlin (1966), pp. 417-425, shows that the probability that the A gene is first lost (and consequently, that the a gene fixes first), is proportional to a quantity L(O), where for each t = 0, 1, 2,..., Lft) is a random variable that is positive if the A allele is lost but a, B, and b allelesare all present, and is zero if the A allele is present but any of a, B, or b are lost. Also, Lo) must be an eigenvector associatedwith the eigenvalueh = 1 - 1/2N governing the rate of fixation at both loci, that is, zqu+l))
= (1 - 1/2N)E(P)).
(3.5)
While the quantity
At) = (p?' + pp)(pp' + p;')(p;'
+ p?')
has the correct behaviour as regardsindicating the lossof genesother than the A gene, it is not the completeanswerfor we can find from (2.6) that E(kt+l))
= (1 - &)( + (1 - &)
6531111-6
1 - +, &
E(P)) E(S@)).
+ (1 - &)
$ E(p;‘p;‘) (3.6)
82
WATTERSON
This necessitates the investigation also of the variable pit)pit’, and one can show that E(pp’p;+l’)
= r (1 - 4)
&4(t))
+ $1 - r) & -+
+ (1 - &
&P)
- r (1 - $))
+ r(1 - r) (1 - +,
E(&‘p:‘) E@P)2)
--&)E(Z’t’).
(3.7)
Coupling this with Eqs. 3.1,3.2,3.3, and 3.6, we find that (3.5) holds provided we define (up to an arbitrary constant multiplier) L(*’ = r(N - 1) /lo) + +(I - 1/2N)(2p$$$’
+ 9’).
The sum of L(O) and the corresponding expressions for the first-fixation A, B, and b genes is [r(N - 1) + 1 - 1/2N]I, where I = T(O)= (p,‘“’ + p$‘)(pp
+ pp, + (p$’ + py,
+ p?‘),
so that conditional on no simultaneous fixing of loci, the probability a gene is the first to fix is Qa = L(O) - I-‘/[r(N
-
of the
that the
1) + 1 - &]
The parameters p \O’, D(O), etc., here refer to the whole gametic output of the initial generation of genotypes, and do not necessarily correspond to similar quantities for the gametes actual used in forming the first generation of offspring, or for that matter, forming the initial generation itself. The connection between these parameters and the initial genotype numbers is via (2.1) and Table I. We find that, if xii denotes the initial number of (i, j) genotypes, Npp’
=
x11
+
h2
+
Bx13
+
3x14
-
Mx14
-
x22)
' =
x22
+
8x12
+
4x23
+
4x24
+
L(x14
-
x23)
Np$"
=
xss
+
3%
+
4x23
+
ii%4
+
Hx14
-
x22)
Np;’
=
x44
+
hl4
+
ax24
+
Bx24
-
~yh4
-
x23),
Np f
LINKAGE
IN
A FINITE
83
POPULATION
which we write as p!”z = p.I &
YA,
with
“-”
for i = I,4
and
“+”
for i = 2, 3,
(3.9)
where
and where pi is the i-type chromosome proportion among the initial individuals. Further, if we write D = p1p4 -pap, , then D(O) = D - YA, and (3.8) may be rewritten
d
Qa =\c+
2N(N-
I)Y
+2IV-1
1 I-l,
where
m(Nl)c= [2N(N-
1)/(2N-
~)(PI + A)(A - 0) + (2N-
2N(Nl)]d = -(t-P,
-P,)[2N(N-
1)D + (2N-
I)(* -p, -p3)A, l)A],
and
1 = (PI + Pz)(P, + P4) + (PI + PJ(Pz + P*). From the form of (3.10), and the sign of d, we can conclude that as Y increases the probability of the a gene fixing first (i) increasing if p1 + pa < Q and 2N(N pI+p,>+and2N(N-l)D+(2N-l)A + and 2N(N p, +p, < 4 and 2N(N - 1)D + (2N - 1)A <
1)D + (2N - 1)A > 0 or if - l)D + (2N - l)A = 0; 1)D + (2N - 1)A > 0 or if 0.
For Karlin and McGregor’s model, there is no completely analogous notation available to compare the results for Qa , as their initial state is expressed solely in terms of actual gamete numbers (unlike our gametic output probabilities pjo), or our initial genotype numbers). However, in a notation not too different in interpretation from ours, their result for Qa can be written as in (3.10) but with 2W -
1)~
W - VWd
=
WI
= 4
l)~, - ND] + D, - P, - PAD,
+
PsW
-
and the divisor of d in (3.10) being replaced by 2(N - 1)~ + 2. The dependence of Qa on Y will be qualitatively the same as in our model if, in the latter, D and A are of the same sign, or of opposite sign but with ID’
2N-‘2N(N-
1)
/Al.
84
WAlTERSON
Roughly, the conclusion in each case is that the a allele’s chances of first-fixation are enhanced by looser linkage if it is (proportionately) more associated with the more common allele at the B locus than with the other, and by tighter linkage if it is more associated with the rarer allele at the B locus than with the commoner one. The probabilities of first-fixation of the other alleles may be written down by symmetry from (3.10). Thus, for the A allele, QA is obtained from (3.10) by interchanging p, and p, , p, with p, , A with -A, and D with -D. For QZb, we simply interchange pa and p, in (3.10) while for Qe we need the replacements P,-~P,,P,-~P,,P,~P,,P,-~P,,A~ -A,D+ -D. The probability that the A locus is the first to fix, given that the loci do not fix simultaneously, is
QA+ Qa=
(PI + P,)(PZ + P$--l = (Pl + PdPz + P,)((Pl + P,)(Pz + P4) + (PI + Pz)(P, + P*kl
and that it is the B locus is 1 - QA - Qa . These results are the same as for Karlin and McGregor’s model, and confirm the obvious conclusion that the A-locus is more likely to fix first if initially the B-locus is far from fixation (B and b alleles being near to equally represented, (pr + p&s + p4) being large) and the A-locus is close to fixation (either (p, + ps) or (p, + p4) close to zero). 3.4 Probabilities for Gamete Fixation For the present model (not, however, expressed as the Markov chain of this paper) Kimura (1963) f ound the probabilities of ultimate fixation of gametes (and, hence, genotypes) determined by two loci and the rate at which certain limits are approached. We here derive the same results from our Markov chain model. Because it is known that any particular gamete type will eventually either die out or take over completely, we have that pit) will either approach 0 or 1 as t -+ co. However, from (2.6) we can easily show that E(p@l)) z
= E(P!~‘) B f rE(Dct’)
E(Dttfl))
= (1 - 1/2N -
and r)
E(Dct’),
The joint solution of these equations is E(D(t))
= (1 - 1/2N - r)tD(o) = tt(D - rd), say,
where E = 1 - 1/2N - r,
LINKAGE
IN
A FINITE
85
POPULATION
while qpjt))
= pp f 2;yN;
1 (1 - 5”) D’O’
= Pi f (YA + 2;yN; * 2NyP + 4w 2Nyt 1
+pi
1 (1 - EW
- 4)
’ “_>P for i = 1, 4 , ‘I+”
for i = 2,3. (3.11)
This limit is just the probability that pit) + 1 as t + co, that is, the fixation probability for gamete type i (and genotype (i, i)). The rate 5 at which the convergence in (3.11) holds was found by Kimura, as were the limits themselves. For instance, for the AB gamete, the fixation probability is p _ 2Ny(D + A/W 1 2Nr + 1
= (P, + PdP,
D - YA + P3) + 2Nr + 1
which is, in Kimura’s notation, denoted by C, and obtained from his (1963) Eq. (3.1.7) by the substitutions Cl =p$”
=p,
- YA and S, = (Pf’ + P!?)(P?‘-l- P!‘) = (PI -I- PdPl
i- Pd
Notice that even in the unlinked case Y = 4, the AB fixation probability is not the product of the fixation probabilities for A and B separately, namely, p, + pa and p, + p, , respectively, unless D - YA = 0 or we let N + CO. On the other hand, Hill and Robertson (1966) and Karlin and McGregor (1968) find, for their model, that E(Dct)), approaches zero at a rate ft, where 5 = (1 - 1/2N)(l - r), somewhat slower than in our model; but in both cases, looser linkage means quicker convergence (at least if Y < 1 - lj2N in our model). It must be remembered, however, that this is not the rate of fixation as such, which is the slower rate h = 1 - 1/2N. Moreover, f is deceptive when used as a measure of the rate of approach to zero of the disequilibrium function Dtt). For, in connection with the Eqs. 3.1-3.3 we saw that Dtt) approaches zero in mean square at the rate p which is slower than f. For the probabilities of fixation, one sees from (3.11) that tighter linkage (smaller Y) favours the AB and ab gametes if D + A/2N is positive, and favours the Ab and aB gametes if D + A/2N is negative. Roughly, this may be interpreted by saying that the gametes most common initially will tend to be favoured by tight linkage. The same qualitative interpretation holds for Karlin and McGregor’s analogous fixation probabilities
pt f (2Ny@Ny - y + 1))Q
i = 1,2, 3,4.
86
WATTERSON
4.
DISCUSSION
From the mathematical point of view, we have found the eigenvalues )I, /.L, 6, and it is possible to show from the various recurrence relations that h = (1 - 1/2N - r)( 1 - l/N) is also an eigenvalue of the transition matrix in the model. Eigenvectors which, in state (p:“), pa”), p&“, pi’)) take the values T(s) and L(O), are both right-eigenvectors corresponding to h, while Do) yields a right-eigenvector for 6. The complexity of the eigenvalue t.~,however, suggests that it would be very difficult to obtain a full spectral expansion, or even all the eigenvalues, for the model. The model has been discussed using the state description based on gametic output probabilities. It is also possible to consider a Markov chain for the number of the various gamete types actually used in forming zygotes, that is, the quantities (yit+l), yp+l), yit+l), ylt+l)) of Eq. 2.4. We plan to expand on this matter in another paper, but the fact that the zygote model may be reconstructed from this gamete-count model by randomly pairing off (without replacement) gametes suggests that it would be appropriate to call the gamete-count model a random union of gametes model, which would then be equivalent to the random mating of zygotes model presented here. The model studied by Hill and Robertson (1966), and Karlin and McGregor (1968)is alsocalled a random union of gametes model but, asit forms 4N2 zygotes from only 2N gametes,it has, to the author’s mind, a drawback in not being readily interpretable biologically. Nevertheless, many of the conclusionsare qualitatively the same,the outstanding exception being the rate of first-fixation. Further work, both analytical and by simulation, has been done by Hill and Robertson (1968) on their model, including the case without selection. Particularly, they have studied the behavior of E(D(t)2) and of E(D(t)2/Z’t)) when Do) = 0. Also, Ohta (1968), and Ohta and Kimura (1969) have been concerned with the effects of selection on some of the quantities discussedin this paper, partly using diffusion theory and partly by simulation. In particular, the latter paper gives, in equation (12), the diffusion approximation to our first-fixation eigenvalueCL;their equation agreesexactly with our (3.4) after the appropriate substitutions R = y and X = -+( 1 + y).
ACKNOWLEDGMENTS I thank Mrs. H. Walker for computing Tables II, III, and IV and, also, Professor S. Karlin for many helpful comments on an earlier draft. The hospitality of the universities of Sheffield and Aarhus has been much appreciated during the stage of revising an earlier draft.
LINKAGE
IN
A FINITE
87
POPULATION
REFERENCES
HILL,
W. G., AND ROBERTSON, A. 1966. The effect of linkage on limits to artificial selection, Genet. Res. 8, 269-294. HILL, W. G., AND ROBERTSON, A. 1968. Linkage disequilibrium in finite populations, Theor. Appl. Genet. 38, 226-231. KARLIN, S. 1966. “A First Course in Stochastic Processes,” Academic Press, New York. KARLIN, S. 1968. Equilibrium behavior of population genetic models with non-random mating, Part II, Pedigrees, homozygosity and stochastic models, J. Appl. Prob. 5, 487-566.
KARLIN, S., AND MCGREGOR, J. 1965. “Direct Product Branching Induced Markoff Chains. I.” pp. 111-145, Bernoulli, Bayes, Volume, Springer-Verlag, New York/Berlin. KARLIN, S., AND MCGREGOR, J. 1968. Rates and probabilities of random mating finite populations without selection, Genetics KIMUFIA, M. 1963. A probability method for treating inbreeding with linked genes, Biometrics 19, 1-17. OHTA, T. 1968. Effect of initial linkage disequilibrium and epistasis in a small population, with two segregating loci, Theor. Appl. OHTA, T. AND KIMURA, M. 1969. Linkage disequilibrium due to Genet. Res., Camb. 13, 47-55.
Processes Laplace
and Related Anniversary
fixation for two-locus 58, 141-159. schemes especially on fixation probability Genet. 38, 243-248. random genetic drift,