Linkage disequilibrium in two-locus, finite, random mating models without selection or mutation

Linkage disequilibrium in two-locus, finite, random mating models without selection or mutation

THEORETICAL POPULATION BIOLOGY 4, 259-275 (1973) Linkage Disequilibrium in Two-Locus, Finite, Random Mating Models Without Selection or Mutation R...

797KB Sizes 0 Downloads 65 Views

THEORETICAL

POPULATION

BIOLOGY

4, 259-275 (1973)

Linkage Disequilibrium in Two-Locus, Finite, Random Mating Models Without Selection or Mutation R. A. LITTLER* Monash University, Victoria, Australia Received May 16, 1972

Linkage disequilibrium is discussed for three two-locus, finite, random mating models in genetics, two models being Markov chains and the third a diffusion process. The expected value of the square of the disequilibrium function at any time is computed for the Markov chains. There is discussion of the relationships between the models and of the influence of finite population size on correlation between loci. It is suggested that there may be danger in assuming too simple a relationship between population size and degree of disequilibrium.

1. INTRODUCTION Some interest has been shown recently in the degree of linkage disequilibrium between loci (or nucleotide sites), sometimes termed “correlation between loci” or “nonrandom association of genes,” which might be found in an evolving population. Investigations have, therefore, been made of the properties of the common mathematical models in this regard. The knowledge thus gained may be put to two uses. One would be to help in interpreting disequilibrium which might be measured in natural populations, e.g. as evidence for epistasis or as a guide to measuring effective population size (as suggested by Sved (1971)). The other would be in considering the implications of considerable disequilibrium if the predictions of the models turned out to be accurate, e.g. associative overdominance (Ohta and Kimura, 1970), increase in the probability of simultaneous segregation (Hill, 1968) or in the effect of linkage modifier genes (Feldman, 1972). A recent development has been the treatment of the problem in terms of the average disequilibrium between pairs of loci or sites over a whole chromosome (see e.g. Ohta and Kimura, 1971 and Kimura, 1971). Franklin and Lewontin (1970) assert, mainly on the basis of simulation studies, that two-locus theory seriously underestimates the pairwise disequilibrium which might be found in some multilocus models. However, a good deal of the work on one degree of * Present address: University

of Waikato, Hamilton, New Zealand.

259 Copyright AU rights

0 1973 by Academic Press. Inc. of reproduction in any form reserved.

260

LITTLER

disequilibrium has been based on two-locus models, both deterministic and stochastic. The evidence from deterministic random mating models is fairly straightforward. Of general interest in these models are the position, stability, and domains of attraction of equilibria under various selection patterns. It has been shown that for certain selection coefficients and values of the recombination fraction, c, there exist stable equilibria at which the disequilibrium function, D, is nonzero (e.g. Lewontin and Kojima (1960), Bodmer and Felsenstein (1967), and Karlin and Feldman (1969)). A number of papers have been mainly concerned with linkage disequilibrium in two-locus, finite, random-mating models. We may cite Hill and Robertson (1968), Sved (1968, 1971), and Ohta and Kimura (1969a,b, 1970). A general conclusion in these papers is that “linkage disequilibrium may arise from causes other than selection, in particular finite population size” (Franklin and Lewontin, 1970). We shall be concerned with two-locus, finite random-mating models and hope to clarify this statement. We suggest that although correlation between genes is probably common, the relationship between this and finite population size is not entirely straightforward.

2. Two-Locus

RANDOM-MATING

MODELS

We discuss a population of N monoecious diploid individuals, each characterized by its genotype with respect to two loci, with two alleles at each locus. Each individual is formed from a pair of gametes each of which may be of types AB, Ab, aB, ab (types 1 to 4), and there are, thus, ten distinguishable genotypes. We shall employ two commonly used Markov Chain models for random mating and specify them in the terminology of Watterson (1970b). Model 1 The random union of gametes ‘haploid’ model with Poisson fertility distribution. The most complete treatments of this model is found in Karlin and McGregor (1968) and Karlin (1968). It was also the model studied by Hill and Robertson (1966, 1968). Model 2 The “random mating by independent trials” model was the model of Kimura (1963) and Watterson (1970a). An equivalent model has been studied by Villard (1970) and Serant and Villard (1972). Each of these Markov chains may be defined on the state-space S, =

(p, , pa , pa , p4); p, = i/2N, i nonnegative integer, c pk = 1 . I

DISEQUILIBRIUM

IN

TWO

LOCUS

261

MODELS

PC’ may be for example the proportions of gametes of type k forming tion t. It is often useful to work in terms of the alternative variables: P”) = PF’ + pt’ 4

proportion

of A alleles in generation

t;

(t) _- p,(0 + pf’ proportion

of B alleles in generation

t;

genera-

D(t) = pf’ _ pwq(t) = p:‘pf’ Our alternative

- pF’p$‘, linkage disequilibrium

in generation

t.

state-space is

S,=((P,~,D);P=P~+P~,~=PI+P,,D=P,-P~=P,P,-P~,P,}. Note.

It follows from the restrictions

on the pe’s that 0 < p, q < 1 and

-Pq d D < ~(1 - q)

if

P < 4, P < 1 - q,

-41 - P)U - q) < D < ~(1 - q)

if

P < q, P 3 1 - 9,

-(I

- p>(l - 4) < D < (1 - P)q

if

P 2 9, P 3 1 - q,

--Pq < D d (1 - P)q

if

P 2 q, P < 1 - 4.

Another model has been studied between two loci.

in connection

with

linkage

disequilibrium

Model 3 The diffusion

process characterized

+)(%&!-~),

by the backward

ap,

a~,

equation

y = NC,

7 = t/2N,

(2.1)

with state-space Ss = {(Pr , Pz , P3); pk 2 0, c”, P, < 1). In the transformed variables, p, 9, D (2.1) takes the form

au - a7 =kP(l -PI% + ;{pq(l

-p)(l

+ D(l - 214 + with state-space

+;dl

- 4)$

- q) + D(1 - 2p)(l - 2q) - D2)$

+D(l-3)

as ___aqaD

+D&

au (1 + ~Y)D m ,

(2.2)

S, , say. This diffusion has been studied by Hill and Robertson

262

LITTLER

(1966), Ohta (1968), and Ohta and Kimura (1969a). It appears to have been considered as an approximation to the Markov chain of Model 1. Before dealing specifically with linkage disequilibrium, it may be appropriate to comment briefly on the relationship between the three models. After certain slips have been connected in the papers of Watterson (1970a) and Karlin and McGregor (1968), (see e.g. Watterson (1970a)), it appears that the qualitative conclusions of the papers are very similar. Thus, there is broad agreement in the rates and probabilities calculated for Models 1 and 2. We can be more specific about how Model 3 approximates Model 1. We suppose that in such a Markov chain we have qpy

- &’

1pf’, pp, &‘)

= &-

UP:‘,

PP, PP’> + 0 (&), i = 1,2, 3,

jq( pit+l) - pi”’ )( fp)

- py ) ) py , pp’ , py )

= -JUij(& pp,p:‘) + 0 (-&)s 2N

i,j

= 1, 2, 3.

(2.3)

and that higher order central moments are O(1/4iV2). Then it is commonly assumed that the diffusion process, characterized by the backward equation, (2.4) and appropriate boundary conditions, approximates the Markov chain in a useful fashion. This assumption may be partly justified by the fact that for the case of general one-locus models (even without the Markov property in the original process) Watterson (1962) proved that the procedure implies convergence of the relevant (one-dimensional) distribution functions. (Trotter (1958) had proved convergence for Markov chain models with mutation but no selection.) Although a number of more recent articles have treated the general problem of the convergence of Markov chains to diffusion (see e.g. Gikhman and Skorokhod (1969), Stroock and Varadhan (1971), and Borovkov (1970)), none seem to apply to our case. It seems plausible, however, that for our case simple convergence of the one-dimensional (in time) distribution functions may be provable by considering the moments of these distributions, and the author will pursue this matter elsewhere. Hill and Robertson (1966) verified that if c = y/N, conditions (2.3) are satisfied for Model 1 and that the appropriate diffusion is that characterized by (2.1). In this case pit) is the proportion of gamete i in generation t. Computation of the relevant moments for Model 2, using the moment generating function (2.6) of Watterson (1970a), shows that, if c = y/N, then again (2.3) are satisfied

DISEQUILIBRIUM

IN

TWO

LOCUS

263

MODELS

for the same functions au and bi . The variable pit) is in this case not the proportion of gamete i but the probability that an individual, chosen at random from generation t, produces gamete i for the next generation. However, we can approximate both models by the same diffusion. (In fact it is possible to show that Eqs. (2.3) are also satisfied by the gamete proportions in Model 2.) The identity of the approximating diffusions alone would suggest that for small c and large N the two Markov chain models should agree closely. In fact one may show that rates of first and final fixation and the probabilities for final fixation based on the diffusion process are the common limiting values of those found for Models 1 and 2. 3. EXPECTED DISEQUILIBRIUM In their discussion of Model 1, Karlin and McGregor (1968) pointed out that although E(D@J) tends to zero at a fairly rapid rate, faster than first fixation, E((DQ)*) tends to zero at the same rate as that of first fixation. The same is true for Models 2 and 3. This suggests that we must look at E((Dt))a) for an indication of the amount of disequilibrium one expects to find at some stage in the evolution of a population. It also suggests something about “typical” sample paths of the process. If we consider the sample paths in state-space S, , we might expect a typical sample path to achieve first entry to the D = 0 plane fairly rapidly and, thereafter, exhibit significant fluctuations about the plane until first fixation occurs. The fluctuations would diminish as N increases. In support of this general picture is the fact that in Model 1, one finds that the transitions of greatest probability tend to be about the direction perpendicular to the D = 0 plane in S, . Hill and Robertson (1968) d erived an expression for E((D(t))2) in Model 1 for large N and c = 0, while Ohta and Kimura (1969a) found E((Dct))2) for the diffusion model. Villard (1970) stated the form of the eigenfunction expansion for E((Dft))2) in Model 2 without deriving the eigenfunctions. The largest eigenvalue in the expansion, being the asymptotic rate of first fixation, has been fully discussed by Karlin and McGregor (1968) Karlin (1968), Watterson (1970a), and Serant and Villard (1972). Note, however, that the published tables in Karlin and McGregor (1968) and Watterson (1970a) contain errors (see e.g. Watterson, 1970a). In view of this we include in Table I the corrected values and compare them with corresponding diffusion eigenvalues. The derivation of E((Dft))*) in Models 1 and 2 is straightforward and we now carry this out. In justification for this computation, we mention that (a)

our answers are not much clumsier than the diffusion

result,

(b) the diffusion result may in theory be a good approximation c = y/N, N large, and 653/4/3-z

only for

6

5

3

2

1

Model

N

C

0.771

0.825

0.897

0.929

0.500

0.607

0.750

0.750

0.779

0.833

0.833

0.846

0.900

0.900

0.905

0.938

0.938

0.939

(2)

(3)

(1)

(2)

(3)

(1)

(2)

(3)

(1)

(2)

(3)

(1)

(2)

(3)

0.931

0.928

0.890

0.891

0.839

0.824

0.740

0.743

0.600

0.490

0.500

0.495

0.01

(1)

0.00

0.908

0.908

0.905*

0.870

0.860

0.863

0.810

0.790

0.796

0.744

0.705

0.715

0.578

0.453

0.476

0.05

0.895

0.890

0.891

0.850

0.839

0.842

0.780

0.759

0.767”

0.713

0.669

0.685

0.553

0.410

0.453

0.10

Rate of Loss of First Allele (First 0.30

pi1 Model

I

0.888

0.883

0.883”

0.834

0.821

0.823

0.755

0.725

0.732*

0.678

0.621

0.639

0,513

0.340

0.885

0.881

0.881*

0.827

0.816

0.816

0.741

0.711

0.714*

0.656

0.596

0.609

0.483

0.290

--. ~0.410 0.373

0.20

Fixation),

TABLE

0.884

0.880

0.880*

0.824

0.813

0.813

0.733

0.705

0.714*

0.642

0.584

0.590

0.460

0.260

0.340

0.40

0.884

0.880

0.879*

0.824

0.813

0.812

0.731

0.703

0.702*

0.638

0.581

0.583

0.452

0.253

0.326

0.45

1, pl1 , Model

0.579

0.578

0.443

0.250

0.313

0.50

0.883

0.880

0.879*

0.823

0.812

0.811

0.729

0.702

0.883

0.880

0.879*

0.823

0.812

0.811

0.729

0.702

0.883

0.879

0.879’

0.821

0.812

0.810

0.726

0.701

0.696*

0.624

0.579

0.567

0.421

0.290

0.273

0.70

3, pa1

0.634 _-__-~0.700* 0.700*

0.635

0.579

0.579

0.444

0.250

0.315

0.49

2, pLzl , Model

0.882

0.879

0.879*

0.820

0.812

0.810

0.722

0.702

0.695*

0.619

0.588

0.563

0.407

0.410

0.253

0.90

0.882

0.879

0.879*

0.820

0.812

0.810

0.721

0.703

0.694*

0.617

0.596

0.563

0.402

0.500

~0.250

1.oo

4

g b E E 2 ti =i 5 0, t;l E

086'0 086'0

086'0

086'0 086'0

086'0

086'0

086'0

086'0

086'0

086'0

086'0

086'0

086'0

086'0

086'0

086'0

086'0

086'0

086'0

086'0

086'0

086'0

086'0

086'0

E96'0

186'0 086'0

086'0

ZL6'0

ZL6'0

P86'0 P86'0

P86'0

086'0

086'0

066'0 066'0

066'0

(E)

(I)

(E)

mI.~ea UIOJ~ lagp

086'0 086'0

086'0

E96'0

‘z Iapom 303 01’0 < 3 ‘E Q N IO3 sapiua wow op se ‘sarqea paysgqnd

086'0

196'0

196’0

E06'0

E06'0

E06'0

SE6'0

SE6'0

LE6'0

196'0

ET6'0

L‘i6'0

LE6'0

6E6'0

r96.0

226'0

ZP6'0

ZP6'0

PP6'0

E96'0

5376'0

856'0

856'0

6S6'0

ZL6'0

OS6'0

056'0

IS6'0

L96'0

L96'0

896'0

086'0

(z)

(E)

(I)

(z)

(0

(I)

506'0

806-O

616’0

IP6'0 016'0

016’0

IP6'0

506'0

(1)

026'0

(z)

086'0

196'0

096'0 096'0

096'0

9E6'0

096'0 096'0

SC6'0

096'0

9E6'0

SE6'0

096'0 096'0

X6-0

096'0

9E6'0

SE6'0

096'0 096'0

X6.0

096'0

9E6'0

'X6.0

096'0 096'0

X6.0

096'0

9E6'0

SE6'0

096'0

096'0

SE6'0

096'0

SE6'0

PE6'0

096'0

096'0

SE6'0

096-O

5X6.0

PE6'0

L06'0

096'0

096'0

'X6.0

506'0

096'0

SE6'0

PE6'0

506'0

096'0

SE6'0

SO6'0

(z)

PC6'0

506'0

E06'0

PO6'0

506'0

E06'0

E06'0

E06'0

506'0

E06'0

E06'0

E06'0

506'0

E06'0

E06'0

E06'0 E06'0

EO6'0

sanIm asaq~, o

266

LITTLER

(c) our formulae could be the basis for a numerical study of what sort of approximation diffusion gives when the conditions in (b) are violated (computations of the rates of first fixation suggest it may be good). 3.1. Model 1

Ht 1 [I [ pCO)q’O’( 1-pCO))( 1-q(O)) 4 =D(O)(l 1 [I c

Consider

E(p(%p’( 1 - p(t))( 1 - Q(t))) , E(P(1 - 2p’t’)(l - 2p9)

=

where

2

E((D(t))2)

;

2p”J’)(l - 2q’O’) . (D(O))*

[;;I=A[;],

Now

(3.1.1)

where

A, = (1 - (1/2N)) 1x [

(1/2iv) 0 l/W

1/2N(l (1 1/2iv(l

1/2Na (1 - c)” - (1/2N))(l - c) wo - wm1 WV2 (1 - 4 - (1/2N))(l - c) (1 - (l/N) + (1/4W))(l

1

f 4” - c)2

aa stated by Hill and Robertson (1968). ((5.7) of Karlin (1968) is the same equation if we change numbers to proportions and substitute

E(P)

= 2H, + It + 2Jt , E(P))

Using the spectral representation

[T]

= Ht + It + 3Jt.) of A, , we may write

=w[tJ

i

&,Ho

+ h/o

+ G,Jo,) di

+ &Jo

+ G,Jo) di

i-l

=

i

&Ho

i-l

i i-l

G%,Ho + BJ;O + G‘Jo) /&

8

(3.1.2)

DISEQUILIBRIUM

IN

TWO

LOCUS

261

MODELS

where hi are the eigenvalues of A, (see Karlin and McGregor (1968) and also Table I) and the (AHi , Bx‘ , C,‘), etc., are left eigenvectors of A, corresponding to ki. We may determine these eigenvectors in a standard fashion and use the first generation equation to completely specify (3.1.2). We find

E((Dct))2) = t1 G [ 2N[x, - ((: - (l/q)] wJ(l

- u/~Nh

pco’qco’(l - pCO))(l _ q’o’) - (1 - (1/W]

+ [Xi- (1- wwll[~, - (1- (1/W (1- c)] x my 1 - 2p’O’)(1 - 2q@‘) + (P)s}

#4&)

(3.1.4)

where

{(x2+ x2)- (1[ c, = -

WV) - (1- ww2 (1- 4 - (1 - (l/N) + (1/2Ar2))(1 - c)“} SIT,

- x2) {h + x2)- (1- (mw) - (1- (l/W2 (1- 4 [ - (1- (UN)+ (1/2N2))(I- c)“}SJ, c, = , Wx,b2 - %)(X2 - 32) &I + x2) - (1- (ww) - (1- (l/W2 (1- 4 [ - (1- (l/N) + W2N2))(1 - cl”>W, I . c, = 2=2(x2- %)(X2 - x2> 2%(%

-

1,

X2)(%

1 t3 .1.5)

3.2. Model 2 Here we have [;;I

=A2[3

(3.2.1)

where l/N (1 - (1/2N))” (1 - (l/m2 (1 - u/w - 4 - 2c(l - c)] 1/2N(l - (1/2fV) - c>s

(1 - ww)2 A, = 1/2N [l - (l/2$

l/m2 (1 - (l/W) (2/N)(l - wv(1 - (l/W - 4c(2 - (3/N) + (1/2N2)) c2 + (1 - (l/q - 2c)(l - (l/N) + (1/2NZ))

1

(3.2.1) may be derived from Eqs. (3.1)-(3.3) in Watterson (1970a) and was given for an equivalent model in Villard (1970) and Serant and Villard (1972).

268

LITTLER

These papers also discuss the eigenvalues of A, and Table I lists some values for the largest, pa1 . Repeating the procedure of Section 3.1, we find

E((D(t))2) = ; Bi UPN [l - (J/W P2i -

[(l -

+

I

(1/2N))2

(1/2N))2 (1 + 2w

(1/2N)

4N2b2i

’ DQyl

-

(l

- 241 - 41 .&(I _ p)(l _ 4)

-

i=l

-

241 -

c)

- (VW - c)”(CL‘&- WW2(1- (1/2N)c)][pcLzd-

(1 - (1/2N))2)] (1 - (1/2N))a]

1

2p(*))(l - 2p’O’) (3.2.2)

+ (D(o))2l Pli 9 and &

B,

B,

=

=

=

_

_

_

[~21-(1-(1/N))2(1-(1/2N)-c)1(CL22+~23-M)[I121-(1-(1/2N))2] P21(112l

-

P22)(P2l

Pza(P22

-

P2l)(P22

[~22-(1-(1/N))2(1-(1/2N)-c)1(~21

-

P2s)

+p23-“)[&2-(1-(1/2N))2]

ICL23-(1-(1/N))2(1-(1/2N)-c)1(1121 1123(P23 -

-

P22)

+lL22-“)~23-(1-(1/2N))2] P2l)(P23

-

,

P22)

where M = (1 - (1/2N) - c)(2 - (3/N) + (3/2N2) + (1 - (1/2N)2 + c2 and the p2is are the eigenvalues of A, . In fact, (3.2.2) is the quotient of two symmetric polynomials in p2r , pa2 , and p2s so that it may, in theory, be written in a form dependent only on the coefficients of the characteristic equation of A, rather than its eigenvalues.

3.3. Remarks 1. When we mentioned that A, was derived from the results of Watterson (197Oa) or Villard (1970), we glossed over the fact that while these papers deal with the same model, the first is analysed in terms of gamete probabilities and the second in terms of gamete proportions. Thus, the expression (3.2.2) applies equally to D@) as a function of gamete proportions forming generation t and to Dct) as a function of the probabilities for the gametes produced by a randomly chosen individual in generation t. Because of the first interpretation we can compare the results of Sections 3.1 and 3.2. 2. If we assume c = y/N, expressions (3.1.5) and (3.2.2) each converge, as N -+ 03, to the expression obtained by Ohta and Kimura (1969a) for Model 3. Close agreement between the equations defining eigenvalues for Models 2

DISEQUILIBRIUM

269

IN TWO LOCUS MODELS

and 3 were commented upon by Watterson (1970a) and a similar comment applies to Model 1. Table 1 lists the largest eigenvalue for each of the three models (for diffusion we list psi = e- (Al/N), where X, is the eigenvalue evaluated by Ohta and Kimura (1969a)). The agreement is good even when N is 10 and c takes values greater than those, (0(1/N)), f or which we expect such agreement. 3. Similar calculations based on (3.1.1) and (3.2.1) yield, for each Markov Chain model, expressions for E(p41@)(1 - pu))(l - @J)) and E(pWqW(l - pW(l _ ,$t’) ’ the square root of which was termed standard linkage deviation by Ohta and Kimura (1969b).

4. THE DEGREE OF DISEQUILIBRIUM

IN

Two-Locus

MODELS

AND THE BEHAVIOUR OF SAMPLE PATHS

As suggested in the introduction, a number of papers have argued for a connection between finite population size and linkage disequilibrium as well as for a considerable degree of correlation between genes in general. The evidence in these papers is based on a considerable number of models and measures of disequilibrium of which we mention only some. As already stated there are deterministic models with selection patterns leading to equilibria with nonzero disequilibrium. Ohta and Kimura (1969b) (diffusion) and Villard (1970) (Model 2) have discussed two-locus stochastic models with mutation where E(D2) for the stationary distribution may be nonzero. Ohta and Kimura (1970) also analysed an infinite sites model with mutation to find average pairwise disequilibrium. We shall leave aside these models and others and concentrate on the contribution from models such as our Models 1 to 3-twolocus models without selection or mutation. We shall examine first the proposition that finite population size causes disequilibrium. This is equivalently a statement about typical sample paths of the processes (e.g. in sample space S,). Clearly if a process starts with D = 0, pq( 1 - p)( 1 - 4) # 0, the corresponding deterministic process remains there; whereas, most sample paths of the stochastic processes leave the D = 0 plane immediately. Thus, it is trivial that finite population size causes disequilibrium in this situation, the only question being the degree thus generated. Figure 1 gives some idea of the degree of disequilibrium generated (as measured by E(D2) in this situation, for a case of tight linkage. It is interesting to compare this figure with the other two, especially figure 2 with t large. Another suggestion

270

LITTLER

Or 25

50

FIG. 1.

75

E((D~t’)a) withfP’

100

125

150

= q’O’ = 0.5, IP

l75

200

225

= 0, c = 0.005.

0 25

FIG. 2.

50

E((DtQP)

75

tithp

100 r&125_ t

= p

160

17s

200

225

= 0.5, ~~0) = 0.125, ,Z = 0.005.

DISEQUILIBRIUM

IN

TWO

LOCUS

MODELS

271

which seems at least implicit in some works is that if a sample path of a stochastic process attains a position where D # 0, it will approach the D = 0 plane more slowly than the deterministic process starting at the same point, the effect being most marked when NC is small. That no such general statement can be made is evident from an inspection of Figs. 2 and 3. For example, Fig. 3 suggests that when D(O)= 0.25, E((D(t))2) d ecreases as N decreases for all t in the case of tight linkage. Figure 2 (D co)= 0.125) demonstrates the complexity of the relationship between N and E(D2). In this context we are using E(D2) as an indicator of distance from D = 0. For ease of computation we have used the formula derived by Ohta and Kimura (1969a) to plot the values, i.e., the figures represent the diffusion approximation to either Markov chain model for any particular N value. Another measure of disequilibrium or correlation between segregating sites was suggested by Hill and Robertson (1968). E [tO(” 0.070 f

FIG. 3.

E((D’~))e) with p (01 = q’O’ = 0.5, Do

= 0.25, c = 0.005.

Define r = D/(pq(l - p)( 1 - q))lle (if pn(l - p)(l - Q) > 0). Then, Hill and Robertson discussed E((Y(~))~1j~(~)p(~)(l- Po))(l - qo)) # 0) = E’((Y~))~), say. As pointed out by Wright (e.g. p. 5, 1969) the function Y may be interpreted as a measure of correlation between segregating loci and r2 may, therefore, take values between 0 and 1. It seems diEcult to find an exact formula for E’((r@Q2)

272

LITTLER

for our models. Sved (1971), dealing with Model 2, produces formula an argument this author finds hard to follow:

wT2)

- 4”l” [vo92- 1 +

= [l - wwu

4Nc[(l

1C,2),(l

1 + 1 + 4Nc[( 1 -

(4.1), by

-

4211

(4.1)

c/2)/(1 - c)“] *

For the case N = 1, (4.1) is clearly incorrect because here we should have E’((r(t))2) = 1, for all t, c. It has been suggested (e.g. Ohta and Kimura, 1969a) that E’(r) may be closely approximated by the standard linkage deviation. This might be heuristically justified by the following argument. Put, D2 = X, $q(l - p)(l - q) = Y, E’(Y) = F. Then

= E' (+

[l + "

'1

+ ( ';

' )" + . ..)I

- E(X) + E’(X) W’4 - E’W’) + ... . LwN2 E(Y) Thus, we might expect

Jw2) E(Ml - PIG - q)) to be a good approximation

to E’(r2) if

E(D2) E(pq(1 - $)(l - q) I Pq(l - P)(l - q) # 0) - W2pqV

- P)(l - q)>

is small (which is, however, difficult to verify analytically). Expressions for standard linkage deviation for our models follow immediately of course from the results of Ohta and Kimura (1969a) and Section 3. Most emphasis has been placed on lim,,, E’((Y($))~).Sved (1970), using (4.1), asserted that this limit was

1 1 + 4Nc[( 1 -

c/2)/(1 - c)“]

DISEQUILIBRIUM

273

IN TWO LOCUS MODELS

(which is, like (4.1), incorrect for N = l), and Ohta and Kimura (1969a) noted that the limit of the standard linkage deviation for diffusion was approximately 1/4Nc. My computations indicate that convergence to this limit is monotonic for any initial position. However, as Sved points out, the limit has relevance for only very few sample paths of the processes-those which take longest to first fixation. It is interesting, however, to relate E’(r2) to the sample paths of the processes. The function ~n(l - p)(l - 4) is a measure of the distance from the projection of the path on the D = 0 plane to the boundary of the square, p~(l - p)(l - Q) = 0, (or “distance” from first fixation). In our models E(po)qu)(l - pu))( 1 - q(o)) tends to zero monotonically. For a particular sample path r2 is, thus, a measure of the relative distance from the D = 0 plane, given the restrictions placed on it by its value of pa(l - p)(l - 4). Note that (e.g. in Model 1) we have

E(D(t+l) - Dct) Ipi"',j$',#')

= [( 1 -

(1/2N))(l (Karlin

=---

1+2y

[

2N

-

c) -

l]

D(t),

and McGregor

(1968), Eq. (6))

- Dtt), 2;cr2I

if c = y/N.

(4.2)

Equation (4.2) indicates the strength of the attraction to the D = 0 plane. For reasonable N values, therefore, we would expect that as y = NC decreases the “attraction” to the D = 0 plane weakens so that E’(r2) should become approximately proportional to ~/NC, the result usually suggested by simulation studies (Hill and Robertson (1968) and Sved (1970)). However, if we try to apply the same argument to E(D2) we find that the above tendency may be confounded by the fact that as N decreases, the chance of early first fixation increases (see Fig. 3). Incidentally, study of E(pq(1 - p)(l - Q)) and E(D2) points up the danger of relying too much on the largest eigenvalue for comparisons. The spectral expansions for these two expectations have the same eigenvalues. However, by continuity with the deterministic case, we know that as N -+ co, the expressions approach E(p(t’pW(l

- p(t))(l - $t’))

= ~WqCO’(l _ pCO))(1 -

q’O’)

and

E((D(t))2)=

(1 - c)~~(D("))2,

respectively. Thus, one remains constant while the other converges quickly to zero. Although most interest has been shown in correlation between still segregating loci, two-locus models also exhibit lack of independence in the alleles which

274

LITTLER

fix at the two loci. For example in Model 2 the results of Watterson (1970a) imply that Pr(fina1 fixation is in B 1final fixation is in A)

zzzq(O)+

I

D(O)

1 + 2Nc p(O)

= Pr(fina1 fixation is in B) +

D(O)

1 +‘uNc

pco,’

Once again, for any particular initial position, the lack of independence intensifies as NC decreases.

ACKNOWLEDGMENTS I am grateful to Dr. G. A. Watterson for many helpful discussions and his critical reading of the manuscript. I am also indebted to Mrs. M. Driver who carried out the computations required for the figures.

BODMER, W. F., AND FELSENSTEIN, J. 1967. Linkage and selection: theoretical analysis of the determinis tic two locus mating model, Genetics 57, 237-265. BORO~KOV, A. A. 1970. Theorems on the convergence to Markov diffusion processes, Zeitsckrift fiir Wok*sckeinlickkM’trtke~e 16, 47-76. FELDMAN, M. W. 1972. Selection for linkage modification I: Random mating populations,

Theor. Pop. Biol. 3, 324-346. I. AND LEWONTIN, R. C., 1970. Is the gene the unit of selection ? Genetics 65, 707-734. GIKHMAN, I. I. AND SKOROKHOD, A. V. 1969. “Introduction to the Theory of Random Processes,” Saunders, Philldelphia, PA. HILL, W. G. 1968. Population dynamics of linked genes in finite populations, Proc. 12th Intern. Congo. Genet. , 146147. HILL, W. G. AND ROBERTSONA. 1966. The effect of linkage on limits to artificial selection, FRANKLIN,

Genet. Res. Camb. 8, 269-294. HILL,

W. G. AND ROBERTSON,

A. 1968. Linkage

disequilibrium

in finite

populations,

Tkeor. Appl. Genet. 38, 226-231. S. 1968. Equilibrium behaviour of population genetic models with non-random mating. Part II: Pedigrees, homozygosity and stochastic models, J. Appl. Prob. 5, 487-566. KARLIN, S. AND FELDMAN, M. W. 1970. Linkage and selection: Two locus symmetric viability model, Tkeor. Pop. Biol. 1, 39-71. KARLIN, S. AND MCGREGOR, J. 1968. Rates and probabilities of fixation for two locus random mating finite populations without selection, Generics 58, 141-159. KIMURA, M. 1963. A probability method for treating inbredding systems especially with linked genes, Biometrics 19, 1-17. KIMURA, M. 1971. Theoretical foundation of population genetics at the molecular level, KARLIN,

Tkeor. Pop. Biol. 2, 174-208.

DISEQUILIBRIUM

IN

TWO

LOCUS

MODELS

275

R. C. AND KOJIMA, K. 1960. The evolutionary dynamics of complex polymorphisms, Evolution 14, 458-472. OHTA, T. 1968. Effect of initial linkage disequilibrium and epistasis on fixation probability in a small population with two segregating loci, Theor. Appl. Genet. 38, 243-248. OHTA, T. AND KIMURA, M. 1969a. Linkage disequilibrium due to random genetic drift, LEWONTIN,

Genet. Res. 13, 41-55. T. AND KIMURA, M. 1969b. Linkage disequilibrium at steady state determined by random genetic drift and recurrent mutation, Genetics 63, 229-238. OHTA, T. AND KIMURA, M. 1970. Development of associative overdominance through linkage disequilibrium in finite populations, Genet Res. Camb. 16, 165-177. OHTA, T. AND KIMIJRA, M. 1971. Linkage disequilibrium between two segregating nucleotide sites under the steady flux of mutations in a finite population, Genetics 68, OHTA,

511480. SFXANT, D. AND VILLARD. M. 1972. Linearization of crossing over and mutation in a finite random-mating population, Theor. Pop. Biol. 3, 249-257. STROOK, D. W. AND VA-HAN, S. R. S. 1971. Diffusion processes with boundary conditions, Commun. Pure Appl. Math. 24, 147-225. SVED, J. A. 1968. The stability of linked systems of loci with a small population size,

Genetics 59, 543-563. Svrm, J. A. 1971. Linkage disequilibrium and homozygosity of chromosome segments in finite populations, Theor. Pop. Biol. 2, 125-141. TROTIXR, H. F. 1958. Approximation of semi-groups of operators, Pacijk I. Math. 8,

887-919. M. 1970. Incidence du crossing-over sur I’evolution d’une population pand’effectif limite, unpublished thesis, University of Lyon. WATTERSON, G. A. 1962. Some theoretical aspects of diffusion theory in population genetics. Arm. Math. Stat. 33, 939-957. WATTBRSON, G. A. 1970a. The effect of linkage in a finite random-mating population. Theor. Pop. Biol. 1, 72-97. See also Errata, Theor. Pop. Biol. 3, 117. WATTERSON, G. A. 1970b. On the equivalence of random mating and random union of gametes models in finite, monoecious populations, Theor. Pop. Biol. 1, 233-250. WRIGHT, S. 1969. “Evolution and the Genetics of Populations. The Theory of Gene Frequencies,” Vol. 2, University of Chicago Press, Chicago, IL.

VILLARD,

mistique