Is the most frequent allele the oldest?

Is the most frequent allele the oldest?

THEORETICAL POPZILATIOS BIOLOGY Is The Most 11, 141-160 Frequent Allele G. A. WATTERSO?;*+ .\-ationnl of Environmental Research Triangle Inst...

907KB Sizes 58 Downloads 105 Views

THEORETICAL

POPZILATIOS

BIOLOGY

Is The Most

11, 141-160

Frequent

Allele

G. A. WATTERSO?;*+ .\-ationnl

of Environmental Research Triangle

Institute

Received

Exact and most frequent produce new

(1977)

AND

The Oldest?

H. A. GUESS

Health Sciences, Sational Institutes Park. Sort11 Carolina 27709 December

19. 1975

approximate expressions are obtained for the probability that the allele is oldest, in neutral allele models in which all mutations alleles. The higher the mutation rate, the less likely is it that the

most frequent allele would be oldest. The results are in agreement tion studies by Evens and Gillespie (1974) (Thor. Popul. Biol. limit the range of validity of a suggestion made by Crow (1972) 306-316)

with

of Htwlth,

respect

to the

statistical

testing

of the

neutral

with

simulaand

6, 35-57), (J. Hered.

allele

63,

hypothesis.

1. IX;TR~DUCTIOY For the statistical analysis of data pertaining to the neutral hypothesis for multiple alleles, Crow (1972) suggested “omitting the most frequent allele at each locus. Of course, if we assume that the most common allele is the oldest we shall sometimes be wrong, but it may be that the assumption is correct so often that this is a useful device.” To check on Crow’s suggestion, Ewens and Gillespie (1974) simulated the behavior of multiple allelic populations subject to rarious degrees of population subdivision, and various mutation rates. Denoting the effective diploid population size by L\:~ , and the mutation rate per gene per generation by u, they found that the probability that the most frequent allele is oldest decreased from around 0.78 when 6’ = 4AVeu = 0.4 down to about 0.49 when 0 = 2.0, in the case when _A’ F= 250 and the population was not subdivided. Subdivision appeared to have litile effect. The overall conclusion of our work is the same as expressed by Ewens and Gillespie (1974): “the assumption that the oldest allele is the most frequent cannot be made with any certainty.” Moreover, the other alleles ma>- very well not all hare arisen by mutation from the most frequent. Our results are upper bounds on the probability that all other alleles arose by mutation from the most frequent. L Partially Pennsylvania. + Permanent

supported address:

by Nonash

XIH

Grant Lynirersitl-,

PHSI Clayton,

ROl

at the

GX121135 1.ictoria

3 168,

University

Australia.

141 Copyright ~11 rights

C 1977 by Academic Press, Inc. of reproduction in any form reserved.

ISSN

0040-5809’

of

142

WATTERSON

AND

GUESS

In the present study, we obtain exact expressions, and useful approximations and bounds, for the required probabilities in nonsubdivided populations of neutral alleles. Denote (i) by X(r) , the probability that the most frequent allele in a given population is oldest, (ii) by R, , the probability that the most frequent allele in a sample of size AI genes is oldest in the population, and (iii) by Q1 , the probability that the most frequent allele in a sample of size M is oldest in the sample. We find for these quantities the results: Xc,, = relative frequency (Theorem 1);

of the allele most frequent

in the population

R, = n,!(M - O), where

n, == number of genes in the sample allelic type most frequent in the sample (Theorem 2);

Qi == ni/M

= relative sample frequency sample (Theorem 3).

of the allele

most

frequent

of the in the

The theorems referred to above also give similar expressions for the probabilities that a less frequent allele be oldest. A corollary of Theorem 2 asserts that the probability that the oldest population allele will not appear in a sample of size 112 is e/(M T 0). The quantities So) , R, , and Qi are obtained assuming that the respective relative frequencies are known. Their long-term behaviors may be obtained by averaging over possible population configurations, in the case of XI(i) , and over the possible sample configurations in the cases of R, and Qr . These longterm averages we here denote by E(Xu,), E(R,), and E(Q). In Theorem 4, we find E(Xo,) in the case of a diffusion model having possibly infinitely many neutral alleles. The results would also be good approximations to large &‘right-Fisher models, Moran models, and others to be referenced below. Numerical values of E(Xc,,) are given in Table III, together with certain bounds and approximations. The asymptotic behavior is

E(X(l)) -

1 - t9( 1 - 0) In 2, In

e/e,

as

0-

0.

as

e+

0~.

The former is a strict upper bound for all 8, with relative error less than 5S.b for 0 < 0.4. The convergence in the latter is very slow, so that E(Xo,) lies between 0.8 In e/f? and 0.9 In 0/e for all 0 between 6 and 106. In Theorems 5 and 6, we find E(Xo,) f or a finite, Moran model population; in Theorem 5 the result is conditional on a fixed number of alleles being present in the population, and in Theorem 6 without that conditioning. Numerical values are given in Tables II and I, respectively. It is seen that the diffusion results described in the previous paragraph are very good approximations (and lower bounds) for E(X(,,) in Moran’s model. They are also in good agreement with the simulations of the \l’right-Fisher model done by Eu-ens and Gillespie (1974); see Table III.

THE OLDEST ALLELE

143

The work of Ewens (1972), Karlin and McGregor (1972), Ratterson (1974), and Kirby and Ewens (1977) has shown that samples of size M from various larger, neutral allelic populations behave approximately as Moran model populations of size M genes. Thus the long-term sample quantities E(R,) and E(Q,) may be obtained from the Moran model population results, and hence Tables I and II serve that dual purpose. We do no more than mention, here, the possibility of applying the theory of this paper to other contexts. For instance, in so much as the population genetics model for neutral alleles has application in the survival of family surnames, see Yasuda et al. (1974), our results may be taken over to answer the question: What is the probability that the most frequent surname is the oldest ? Similarly, the construction of phylogenetic trees from species data, and linguistics, may be other areas of application.

2. REVERSIBLE POPGLATIOX

WITH KNOWX

ALLELE

FREQUENCIES

For bloran’s model, and other similarly reversible, neutral models, the probability distribution of the ages of the alleles present at a particular time is the same as the probability distribution of their extinction times (\\-atterson, 1976). In particular, the probability that an allele is oldest is equal to the probability that it will be the last allele (of those present) to become extinct, due to mutation and/or random drift. But with all alleles being selectively neutral, each gene has an equal change of being that gene whose descendants survive longest in nonmutant form. Thus, the probability that a particular allele will last longest is that allele’s relative frequency. We have thus proved: THEOREM 1. In a neutral allele model zxhich is time reversible, that an allele is the oldest equals the probability that it will nameLv, its present relative frequency in the population. In probability that the most frequent allele in the population is also relative frequency.

the probability survive longest, particular, the the oldest is its

The above theorem applies to Moran’s model at stationary, and, at least approximately, to other discrete models having a reversible diffusion approximation. The Wright-Fisher model for finite populations is not exactly reversible, but for large populations, the diffusion approximation applies. Theorem 1 has been obtained independently by F. P. Kelly (personal communication).

3. POPULATIONS WITH CNKNOWK

ALLELE FREQCENCIES: DIFFUSION %'IETHODS

Consider a population with relative allele frequencies Xr , & ,..., XK for the K alleles present. The probability that the allele of frequency Xj is oldest

144

WATTERSON

AND

GUESS

is -Yj , under the conditions of Theorem 1. However, if the allele frequencies in the population are unknown, one may proceed in either of two ways. A small sample might be chosen from the population, yielding some observed allele frequencies, and the question asked as to the probability that the allele most frequent in the sample is oldest in the population. Alternatively. the population frequencies, being unknown, might be replaced by their expectations, so that, for instance, the expected value of the highest allele frequency would also be the long-term probability that the most frequent allele in the population would be the oldest. L\‘e take up both of the above aspects in the remainder of this section. 3.1, Small Sample Results from

a Dt#uusion Population

Consider a small sample, of M genes, drawn at random from a large population of effecti\-e diploid size S,. It is known that in certain neutral-allelic cases, the probabilit>- that the sample will contain 1~~genes of one allelic type, n, of a second type,.... and izI. genes of a Kth type, is approsimatel! Pr(n)

= P(0; n, , n, ,..., nJ =

JI! a,n, ..-

1 nh

,cil!

a,!

...

H”‘r(H) *

(3.1.1) where

the allelic

numbers

are assumed arranged

in nonincreasing

order

where J; is the number of the n’s equal to.;, and where 19 =I 4S,.u in which u is the mutation rate per gene per generation. All mutants are assumed to be of new all&c type. That (3.1.1) h o Id s as an approximation when M < 2iV,, , for sampling from a [l-right-Fisher population, was proved by Karlin and McGregor (1972) f&owing its motivation by diffusion results in the work of Eaens (I 972). The same sampling distribution (3.1. I) was also shown to hold approximatel! for sampling from a Moran model population, and from a third model of Karlin and McGregor, by M’atterson (1974). Kirby and Evens (1977) have extended the range of population models for which (3.1.1) 1s k nown to apply still further. Vi’e shall assume in what follows that (3.1.1) is true for our sample. This is consistent with the diffusion model for the population, which we introduce shortI!-. IYe seek, for instance, the probability- that the allele of frequency n,,!M in the sample is the oldest in the population. Our discussion requires only trivial modifications to apply to alleles of lesser sample frequencies. As well as (3. I .I), we shall need the joint distribution of the sample and population compositions, and various probabilities conditional on a given population composition.

THE

in in in in

OLDEST

145

ALLELE

Let X = (X, , X, ,..., X,) denote the vector of relative frequencies the population, and let n = (nr , 12s,..., n,J denote the vector of allele numbers the sample. First observe that the sample consists of n allelic representatives, any nonincreasing order of magnitude in the sample without regard to order the whole population, with probabilit! Pr(n

] X) = [dl!,/(n,!

n,!

(3.1.2)

The vector i = (ir , i, ,..., iJ is a k-tuple of distinct integers chosen from the population allelic indices (1, 2,..., K). The summation is taken over all of the K!,qK - k)!a,! ... 01&I!] distinguishable k-tuples, where the oli’s were defined in conjunction with Eq. (3.1.1). Next, note that the probability that the sample values are n, and that the most frequent sample allele (nr in gene number) is the Eth population allele is, say, Pr(n. “I” j X) = ~

d’! x1! n,! ... n,!

‘,l”l 1 i’

(

fi -Ji$ 1 1’ j-2

where the summation is now over the possible choice of k ~- 1 alleles from the population to produce the sample values n, , n3 . .. .. nb . Thus i’ is of the form (i 2 1 13 ,...> il;), with none of the i’s equaling 1. Th e probability that the most frequent allele in the sample is the Zth in the population, given both the sample numbers n and the population frequencies X, is? sav Pr(“l”

1 n, X)

= Pr(n,

“1” i X)/Pr(n

1 X)

But, according to Theorem 1. the probability that the Eth population allele is oldest in the population is X, . Hence the probability that the allele of relative frequency n,!M in the sample is the oldest in the population may be expressed as R,(n,

X) = Jf dYl Pr(“l”

/ n, X)

(3.1.3)

The probability R,(n, X) is conditional on both the sample allelic numbers, n, and the population relative frequencies, X. Of course, the reason for sampling was that the x’s were not known. The probability, R,(n) sap, that the most

146

WATTERSON

frequent is

AND

GUESS

sample allele is oldest in the population, R,(n)

= C &(n,

given only the sample values n,

X) Pr(X

I n)

X) Pr(n

1 X) Pr(X)/Pr(n)

X

= 2 R,(n, x

(using

Bayes’

rule)

by (3.1.2), (3.1.3). The expectation is over the population distribution Pr(X), say, treating the nj’s as fixed integers. In the neutral allele case there is symmetry in the population allele frequency distribution, so that all expectations in the above sum are equal. Using also (3.1. I), we find

h(n)

=

r(e 7 M) ekr(e) x E(Xl

K! (K - k)!

(n, - I)! (n, -ll)!

... (n, - I)!

fi XY’). j=l

(3.1.4)

It remains to calculate the expectation on the right. We do so using a limiting argument, first considering a population having only a fixed, finite number K of alleles at all generations. Let the mutation rate between any two different alleles be u,!(K - 1) (so that the mutation rate away from any allele to all others is u). Notice that, as K -+ ,x, the mutation rate towards any of the K alleles approaches zero; in effect all mutants are new in the limit. According to Wright (1969, (14.7)), the stationary population density, for such a symmetric mutation population in the diffusion case, is (3.15) where E = ej(K -

1)

and where x1

+

x2

+

. ‘.

+

XK

=

I (

with

0 < xj < 1 for all j.

Hence

E XlfiXT) (. jzl =

Iyki) ___

[WI”

, f . . . J xfl+rXp-l

-

. . . .xp+c--lXL;ll

. . . xF-’

dxl

. . . dxK-l

,

THE

integrating

OLDEST

147

ALLELE

over the space

(0 < yj < l;i

== l,..., K -

1, and
... - 3cK-i >, O}.

Thus

Inserting this expression E + 0, KE - 8, and [l-(c)]” we find the beautifully

into (3.1.4)

= [r(l

and taking

-c +I”

the limit

h [(K -

as K + cc so that

l)V]k,

simple result R,(n)

In general we have the following

--_ n,l’(M

i

0).

(3.1.6j

theorem.

THEOREM 2. The probability that an allele of relative frequency sample is the oldest in the population is R,(n)

=

njj(M

As a consequence of Theorem included in the sample is

This

may also be expressed

T B),

ni,!M in the

; = 1, 2 ).... h.

2, the probability

in the complementary

that the oldest

allele

is

way:

COROLLARY. The probability that the oldest allele in the population appear in a sample of size is @(M T 0).

nz

(3.1.7)

does not

The corollary may be proved more directly by taking the expectation of CL1 X,(1 - Xj)“’ using (3.1.5). Although we have assumed random sampling with replacement, the probability that the oldest allele is not in a sample of size M is not the Mth power of the probability that the oldest allele is not in a sample of size 1. A good intuitive explanation for why these quantities should be expected to be different is given in Feller (1968, pp. 122-124). It is interesting to note that if one gene is chosen from the population the probability that it is of the oldest allelic type is the same as the expected population homozygosity, namely E(x XJ = (1 + 6)-l. In Section 4 we discuss, inter alia, the long-term probability that the most

148

WATTERSOS

frequent of R,(n)

AND

GUESS

allele in a sample is the oldest in the population. over the sample distribution (3.1.1). In Table E(R,(n))

= [M(M

This is the expectation I, we tabulate values of

+ e)] E(n,M)

(3.1.8)

for various combinations of 0 and AZ. The values E(n,!M) Table I, for a different purpose.

.-VI

0

0.2

TABLE

I

8 0.6

0.8

0.4

are also tabulated

I.0

2.0

in

10.0

%c

~5

(i)” (ii)”

I 1

0.366 0.90 I

0.765 0.826

0.68i 0.769

0.623 0.723

0.571 0.685

0.403 0.564

0.114 0.341

0 0.200

10

(9 (ii)

I :

0.873 0.891

0.778 0.810

0.704 0.747

0.645 0.696

0.595 0.655

0.434 0.521

0.139 0.279

0 0.100

20

(i) (ii)

:

0.878 0.886

0.786 0.802

0.715 0.736

0.657 0.683

0.609 0.640

0.454 0.499

0.160 0.239

0 0.050

(i) (ii)

:

0.880 0.884

0.790 0.798

0.720 0.73 I

0.664 0.677

0.617 0.632

0.464 0.487

0.174 0.218

0 0.025

0.880 0.884

0.79 I 0.797

0.721 0.730

0.665 0.675

0.618 0.63 I

0.466 0.485

0.178 0.214

0 0.020

40 50

(i) (ii)

100

(i) (ii)

:

0.88 I 0.883

0.793 0.796

0.723 0.728

0.667 0.673

0.621 0.627

0.47 I 0.480

0.186 0.204

0.0 0.010

500

(i) (ii)

: I

0.882 0.882

0.794 0.794

0.725 0.726

0.670 0.671

0.624 0.625

0.475 0.477

0.193 0.197

0 0.002

zc

(i) (ii)

1 I

0.882 0.882

0.794 0.794

0.726 0.726

0.670 0.670

0.624 0.624

0.476 0.476

0.195 0.195

” (i) Probability population, (3.1.8). ‘# (ii) Probability of size M is oldest ” *lf = x is the

that

the most

frequent

allele

in a sample

of size ,1f is oldest

0 0 in the

that the most frequent allele in a sample (or Moran model population) in the sample (or Moran model population), (3.1.12) and (4. I 1). diffusion limit (3.2.8).

It is clear that, for fixed 0, E(R,(n)) app roaches a limit, in a monotonically increasing fashion, as M -+ co. U’e derive that limit in (3.2.8), below. For fixed M, it is obvious that E(R,(n)) app roaches monotonically to 0 as 0 -+ x. In addition to (3.1.8), it may be of greater practical interest to calculate the conditional probability that the most frequent allele in a sample, which contains k different alleles out of M genes, is the oldest allele in the population. This probability we denote by &K,(n)

k)

=: [:VI:(~lI

~~ 0)]

E[(n,‘:II)

, k].

(3.1.9)

THE

OLDEST

149

ALLELE

The motivation for such a calculation is that the expression (3.1.8) is, in fact, a complicated function of 8, which in any case would not be known exactly in practice. For fairly large M, the factor M/(M $ 0) is approximately 1 (assuming 0 is not fairly large), and the factor E[(n,!M) k] is not a function of 0 at all. M*e have tabulated the factor E[(n,/M k], but without the coefficient W(3Z - O), in Table II. The supporting theory is given in Section 4, Theorem 5. Of course, for a particular sample with M, k, and n, all known, but 13unknown, n-e could estimate R,(n) by n,,(M I- 8) where 4 is, say, the maximum likelihood estimate of 0 as a function of k and M. Ewens (1972) showed that d is the solution of

‘I’.-\BLE Prohabilit~-

II

that the Most Frequent Allele in a Sample (or Moran of Size .\I Is the Oldest in the Sample (or Moran Alodel Given there are k Distinct Alleles, k < .If

XIodel Population) Population),

5 I 0.720

0.514

0.400

0.200

-

-

-

-

-

-

IO I 0.772

0.617

0.500

0.412

0.342

0.280

0.228

0.200

0.100

-

0.100

20 I 0.81 I

0.677

0.576

0.498

0.434

0.382

40 I 0.840

0.722

0.631

0.559

0.500

0.451

0.339 0.410

0.295 0.374

0.269 0.344

0.050 0.170

0.050 0.025

50 I 0.847

0.734

0.646

0.576

0.518

0.685

0.619

0.565

0.470 0.519

0.429 0.480

0.394 0.446

0.363 0.416

0.190

0.765

0.020 0.010

100 71

I 0.867 I

I

I

I

I

I

I

I

I

0.200

1

Ewens’ table of 6, and our Tables I and II suggest that E(n,/M / k) is close to E(w,:.ll), when 4 is used in place of 0 in the latter. Possibly E(n,/M k) is overestimated by that procedure, however. Ke do not know whether l/(M + 0) is under-, or overestimated by l;(M L 8); there is no unbiased estimator of 1 .‘(.I1 -:- 0) (see Evens, 1972). We now change the question of interest slightly, by asking: ii-hat is the probability that an allele of relative frequency n;iJ in the sample is the oldest amongst those alleles in the sample 1 If the k alleles in the sample ha\-e relative then the probability that the allele frequencies S, , X-, ,..., -Y,: in the population, of frequency -Y,, is the oldest amongst these alleles is lYj/(S, ?- X2 T ... -t .Y,) rather than just X, _ The anal!-sis proceeds with slightly greater difficulty than before and yields the following result.

150

WATTERSON

AND

GUESS

THEOREM 3. The probability that an allele of relative frequency nil&I in the sampleis oldestin the sampleis

Qj(n)

= nj,/M,

j = 1, 2 ,..., K.

(3.1.11)

Observe that now, x:=rQj(n) = 1, indicating the trivial, but nonetheless reassuring, fact that one of the alleles in the sample must be the oldest in the sample. We give, in Table I, the long-term average probability E(Q1(n))

(3.1.12)

== E(n,/M)

that the most frequent allele in the sample is also the oldest in the sample. The supporting theory is in Section 4. Note that Table II also may be used in this context, for the long-term probability conditional on samples having R distinct alleles.

3.2. -qveragePopulation Results6y DiSfusion We now turn to finding the expected value of the relative allele frequency in the population, for the allele of highest frequency in the population. This is also the probability that the most frequent allele in the population is also the oldest using the convention that X(r) denotes allele, and we shall denote it by E(Xo)), the largestof the population relative frequencies Xi , Xa ,... . By symmetry in the neutral allelic case, if we have a diffusion population whose stationary probability density is (3.1.5) for its K alleles’ frequencies, then X(i) is equally likely to equal any of Xi , X, ,..., X,. For definiteness, we write the integral for E(Xu,) as if it were the variate XK which was the largest. Thus, for fixed A’ E(X-(,,I where 0

<

the integral xj

,(

XK

=

= K 1 -.. [ xK+(xl * _

, x2 ,..., xK) dxl . . . dxKpl ,

is over the region 1 - xi - xa -

... - ‘Y&l < 1,

for

The density 4 is given in (3.1.5), b e f ore we go to the limit using E = Bi(K - I) as before,

j = 1, 2 ,..., K -

as K --) CG.Hence,

The substitutions Jj

=x1/(1

-x1-x2--“‘-xv

*-1

)>

j=

I.

I,2 ,..., K -

1

THE

OLDEST

151

ALLELE

are such that

and the Jacobian

The integral

may be proved to be

may then be written

where now the integration 0
is over the much simpler < 1,

To continue with the integral’s .?-I; the function

region

j = 1, 2 ,..., K evaluation,

1.

we introduce

a normalizing

factor

(3.2.1) is the joint density of random variables Yi , Y, ,..., YK-i pendent and identically distributed, each having density

say, which

are inde-

Thus 8&k-(,))

= {Kr(&):‘[+)]K}

+Kf?[l/(ZK-l

+ l)IKfT1,

(3.2.2)

where Z,-, = Yr + 1, i ... + YK-, , and the expectation on the right of density. Equivalently, (3.2.2) is to be evaluated using (3.2.1) as the appropriate we could complete the integration by finding the density of Z,-, . It turns out that the Laplace transform of that density is more tractible, especially in the limit as K --) oc’. By the independence of the Y’s,

,Q,-“%) ==[,qe-tYyK-l K-1

zz

e-‘+l;“-l

1

d 1 ?

(,-t!~ - 1) y-l

&jK-’

152

W’ATTERSON

AND

GUESS

the last expression arising after recalling that E = O/(K - I), and that (1 + aj(K - l))K-l - exp a as K - cc. Letting K - CC on the right of (3.2.2) and continuing to denote the limiting value by (EXc,,), we find

E(X(,,) ==teqe L 1)E(1&z + I )]“r’. The significance of the various terms in (3.2.3) is explained below. In (3.2.3), the variable 2 has a distribution equal to the limiting of 2,--r as K ---f CC, that is, a distribution with Laplace transform E(eefz) The

other

= exp [o 1: (e+

aspects in the passage from r(a)

= r(Ke/yK

-

-

(3.2.2)

(3.2.3) distribution

1) 31-l ~$1. to (3.2.3)

I)) -+ r(e)

(3.2.4)

depend

on the limits

= Iye + Iye,

and [r(e)]K Thus

as K +

= [T(l

f

c)ic]”

-

[T(l)

+ J’(l)]+K.

CO [T(C)]”

.- [I - 6ylKcK

-

e--Byc-K,

where --P’(l) = y = 0.57721... is Euler’s constant and l = B/(K - 1). For subsequent discussion, it is valuable to introduce the exponential integral function, defined by . 2.

E,(t)

q

=

= ( (c~,‘.Y) cl.rc= -y - In t .‘ ln

-Y-

t

-

I1 (e-f!! -

f

[(-t)“/nn!]

7l=l

1) y-1 &,

(3.2.5)

‘0

see Abramowitz and Stegun (1968, transform (3.2.4) may be written E(ecfz) It remains to evaluate gamma function

the substitution

(5.1.1)

= exp[--ey

and (51.11).

Then,

our Laplace

- 0 In t - eEl(t

the expectation

in (3.2.3).

x = (1 - .Z)t produces

the formula

From

(3.2.6) the definition

of the

THE

Substituting

OLDEST

the above expression E(X(,,)

into (3.2.3) and using (3.2.6) give us

= t+

J

_ j, A change of variable, z = E,(t) yields an alternative form to (3.2.7) THEOREM

probability frequency,

153

ALLELE

nX t%+E(e-=‘)

ete-82LW

dt

dt.

(3.2.7)

so that t = E;‘(c) and dv = -(e-‘/t) dt, w h’ic h we quote in the form of a theorem.

4. For a difJusion model having infinitely many neutral alleles, the that the most frequent allele is oldest equals the allele’s expected relative which is E(X(,))

=

fz E;l(z)

e-OL’ dr.

(3.2.8)

‘0

In Table III, we list values of E(X (i) ) computed numerically from (3.2.7) using, in effect, the series (3.2.5) as part of the computation. We have made some remarks on the numerical values in Section 1. T-ABLE Probability

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 2.0 5.0 10.0 20.0 50.0 100.0 1000.0

that

I 0.933 0.871 0.812 0.758 0.707 0.660 0.616 0.574 0.536 0.500 0.250 0.031 0.001 0.000 0.000 0.000 0.000

the &Iost

Frequent

1 0.908 0.828 0.759 0.698 0.644 0.596 0.552 0.513 0.478 0.445 0.235 0.046 0.004 0.000 0.000 0.000 0.000

a E(Xt,,) as in (3.2.7), together C: Evens and Gillespie simulation E: asymptotic value In 0:B for 1 for 0 ... I.

Xllele

0.771 0.754 0.687 0.687 0.494

III in a Diffusion

1 0.936 0.882 0.835 0.794 0.758 0.726 0.697 0.670 0.646 0.624 0.416 0.297 0.195 0.122 0.063 0.037 0.006

Population

is Oldest”

1 0.938 0.889 0.854 0.834 0.827

1 0.999 0.992 0.977 0.953 0.917

E 0.347 0.322 0.230 0.150 0.078 0.046 0.007

1.373 0.924 0.846 0.817 0.806 0.803 0.813

with A: lower bound (&)e; B: lower bound (3.2.11); mean; D: upper bound (3.2.12), for 0 Q 19 < 0.5; 6; F: ratios E(X(,))/D for 0 Q 8 < 0.5, E(Xt,j)/[ln 0/e]

154

WATTERSON

.AND

GUESS

It may be verified that the inverse function E;‘(o) is a probability density function on (0, a) so that our result (3.2.8) is the Lailace transform of a probability density function. An asymptotic approximation to E;‘(r) for small c is E$(a) - --In @ - In( --In z) as n - 0, so that, because of an Abelian theorem connecting values of a density near 0 with values of its Laplace transform near co (see Feller, 1966, pp. 421-423) E(X(,j)

-

In tJj0

as

0-

cc.

(3.2.9)

In Table III, we show values of In B/8, in column E, for comparison with E(X(,,). For 0 above about 3.7, the true value E(Xt,,) is somewhat below In e/e. 1Ve also show column F, the ratio of the two expressions E(Xc,,),/[ln e/e], which converges to 1 as 04 x. However, the convergence is very slow; in fact from more extensive calculations we can say that for values of P between 6 and 10s, E(Xc,,) lies between 0.8 In e/0 and 0.9 In e/0. The ratio (F) has a minimum for 0 close to 100. The expectation E(Xt,,) can be evaluated somewhat differently. It has been pointed out to us by TV. J. Ewens that the density of X(i) has a very simple form in the range (4, 1). By carrying through the above procedure but modified to find the marginal density, f(r) say, of X(i) rather than its mean, we find f(x)

=

@+(l

-

i
+-,

(l-(8 + 1) e%Ye-2g(x-l

-

l),

(3.2.10)

whereg(z) is the density of Z, whose Laplace transform is (3.2.4). The expectation (3.2.3) may thus be written as either

E(X(,,) = esyT(O- 1) I= g(z)[l 0

(z f

dz,

l)]“”

or as

= (4)” + eevr(e L 1) j, We see that there W. J. Ewens)

is therefore

g(z)[ 1.i(~ : a simple

E(&,)

z = x-1 -

l)]@-1 &,

lower

bound

(as pointed

1. out by

2 (t)“.

This bound is given in Table III, column A. Another lower bound, which is slightly better than (3)” for large 0, is obtained by applying Jensen’s inequality to (3.2.3), noting that E(Z) = 0 from (3.2.6). We find E(LY(,,) > eevqe + I)[1 (0 + 1)“1’],

- (2~)~ 2 e-WQ-?( The bound

is given in column

1 T ,9)l ‘f,

B of table III.

all

0 > 0 as

e --f

x.

(3.2.11)

THE

OLDEST

155

ALLELE

An upper bound for E(Xa,), and an asymptotic approximation as 6’ --f 0, may be obtained by taking its quadratic Taylor series expansion, using (3.2.7). We obtain all B > 0, (3.2.12) E(X(,j) < 1 - 0(1 - 0) In 2, which is most useful for 0 < ~9 -< $ when it is monotonic decreasing. &Xc,)) is always monotonic decreasing. The bound, for 0 < 0 < +, is column D of Table III, and the ratio of the two sides of (3.2.12) is in column F. \-ery good bounds on E(Xo,) are given by (3.1.8) and (3.1.12); see Table I. But these are harder to compute than E(Xo,) itself. We do not have an explicit expression for g, the density corresponding to (3.2.4) nor therefore forfover its lolver half-domain in (3.2.10). It can be shown from (3.2.4) that Z is asymptotically normal, mean 8 and variance e/2, as 0 -j x, however, and that g is infinitely divisible for all 0.

4.

POPULATIONS

WITH

UNKXOWN

ALLELE

FREQUENCIES:

~'IORM'S

MODEL

The following theory applies both to Moran’s population model, and for small samples chosen from Moran’s and other population models. We shall first present the theory in the context of AIoran’s population model. Consider a population of ;M(= 2N) genes. In Moran’s model, at each unit of time, one gene is chosen at random to produce an offspring gene, and one gene is chosen at random to die. Assume a mutation rate u, for the probability that the offspring is not of its parent gene’s type, but rather, is of a completely new allelic type. The population may be described either by the number of genes belonging to the various allelic types present at the time (similar to Section 3.1) or, as we shall prefer here, by indicating the number of alleles having various gene frequencies. We denote by pj the number of alleles having i representative genes in the population. Then the total number of genes is, of course,

(4-l) and the number

of distinct

alleles present is the random

quantity

(4.2) say. It was conjectured by Patterson (1974) and proved that for Moran’s model at stationarity, the probability

(,B,, A ,...,Ad is

by Trajstman distribution

(1974) of 13 =

156

WATTERSON

AND

GUESS

and of K is Pr(K

= k) = Ok 1 SE

l;(e),

,

k = 1, 2 ,...) M,

(4.4)

where (e),

= qe + 1) *.. (0 + M -

1) = (@ L ;t -

1) M!,

6, = MU/(1

-

U), (4.5)

and where Sg’ is a Stirling number of the first kind. The conditional distribution corresponding to (4.3), but with Pr(P 1 K = k) = M!/[I

K fixed, is

SE’ 1 fJ (~BJP~!)],

(4.6)

subject to (4.1) and (4.2). It is our present task to mimic Theorem 4, and find the probability that the most frequent allele in Moran’s model is oldest. Of course, Theorem 1 applies directly. But to average over all possible configurations in Moran’s model, we proceed to calculate E(X(,)), where X(r) is the relative frequency of the most frequent allele: Xcl) = max(j;‘M: Now the distribution Pr(X(,,


function

flj > O}.

of X(r) , conditional

( K = k) = Pr(pjel

q

on K = k, is

= pjep z= -0. = /3M = 0 j K == k)

where the summation is over all p such that (4.1) and (4.2) are true (with K -= k) and also p+r = pjWe = ..’ = p, = 0. We can evaluate this sum by using a generating function formula from Abramowitz and Stegun (1968, p. 823); in our notation it is

Choosing Pr(X(,,

,x’, = 1 for n = 1, 2 ,..., j and xfl = 0 for n = j + I,..., M, we find &j/M

1K = k) = [,lf!J

To continue the evaluation for j, k = 1, 2 ,..., M

S$’ 11 coefficient

of E(Xu,),

of t” in (g,

we introduce

bj.k.M = coefficient of tM in (ir

t” n)l/k!.

the simplifying

t”/n)L/k!

(4.7)

notation

(4.8)

THE

OLDEST

157

ALLELE

and h,P,.U = 0. From (4.7) we can now write the distribution Pr(S(,)

=jjN

1 K = k) = [-III!‘1 S$

of X(i)

]](6j,t,,W -

, given K = 6, as

6j-1,k,AW),

j, k = 1, 2 ,..., Al. (4.9)

This leads to the following THEOREM

probability population E(9(,)

theorem.

5. For an infinitely many neutral alleles Moran model, the long-term that the most frequent allele in a population is oldest, given that the contains K == k alleles, is j K = k) = [(Jf -

l)l:;

k :

SE’ I] x j(bj,k-,.\f - 6j-i,k,:\<),

I, 2 ,,.., -11.

j=l

(4.10) The expectation and probability (4.10) is tabulated in Table II. The computing of (4.10) may be achieved using the standard recurrence relation for the Stirling numbers (in absolute value)

and for the 6j.s,,t, we have, from (4.8)

subject

to the initial

the recurrence

values bj.l..\, = 0, = 1 ‘X3,

for

i < M:

forj

> ilf > 0.

We see from Table II that the probability increases for increasing M, fixed k. This corresponds to a population with a decreasing mutation parameter 0, since E(K) m 1 mr 0 In ;M as M - x (see Ewens, 1972; D’atterson, 1974). Hence the limit, I _ -1r-i explicit formula for (4.10) is possible when k is small. Thus E(S(,,

I K = 1) = 1,

and, for N even, E(X(,)

1 K = 2) = [’ ‘y-l

j-l

f 11-l]

/ [g

j-l]

j=l

-

1 - In 2;(ln M +- 7)

as

31-+

z’;‘.

158

U’ATTERSON

AND

GUESS

In the other direction, putting K = M in (4.10) and Table II means that all the population genes are of different allelic type, each being of relative frequency 1 /M. Hence E(Xu, ! k’ == 32) = 1/M. Theorem 5 is appropriate to use when the number of alleles is known but not their frequencies. The fact that (4.10) IS independent of 6 is a consequence of the fact that K is a sufficient statistic for estimating 0 (see Ewens, 1972). When the reverse is the case, so that 0 is known but K is not known, we should average the result (4.10) with respect to the distribution (4.4). M’e obtain the result quoted in Theorem 6.

THEOREM

probability

6. For an inJinitely that the most frequent

many neutral alleles Moran model, the long-term allele in the population is oldest is

(4.11)

j

=- coefficient

with b,,,, = 0, b,i.,, ~- (“-ii-‘)

of

in exp

t”

(

0 z a=1

t”

n , )

j >T- 1, 2

,-*.,

for j 3 :+I.

Sotice that now, the coefficients culated recursively b!

bj,;,< are functions

of 0. They

may be cal-

I .x4 ‘il bj,,

=

x

(S/j)’

bjPl.,\<-jl?‘l!,

j == 2, 3,...

!=O

subject

to the initial

values b l,M -= P’!M!.

Sumerical values of (4.11) are given as results (ii) in Table I. As $2 increases for fixed 0, the probability decreases monotonically to the diffusion limit (3.2.8). The latter may be seen to be a very good approximation for even quite low values of 31, at least for 0 between 0 and 10. Keeping M fixed but letting 0 --z SO has the effect of making all genes new mutants, and so all genes would be different alleles and Xu, = l/M. So E(Xu,) - l!M as 0 - ,x. With M fixed and 0 ---f 0, we have E(Xu,) ---f 1.

THE

OLDEST

159

ALLELE

Some special cases of (4.1 l), which way be of use in cases where a small sample of size 1% is taken from a large population (see next paragraph), are ~(X(I))

= 1,

zxl-

for

-II = 1,

H

for

qe c l).’

M = 2,

:= 1 -

(3 + 20) 3(8 - l)(e 12)

= l -

0(14 + 128 + 3P) 4(@ - l)(e L 2)(0 + 3)

=‘-

for



JI = 3, for

31 = 4,

0(70 - 856’ + 308” 7 403) 5(0+1)(012)(0+3)(014)

for

31 = 5.

It remains to discuss the connection between the above results for 3Ioran’s population and the sampling theory results of Section 3.1. The distribution (4.3) is equivalent to the sampling distribution (3.1.1), provided we take M as the sample size, identify 01~with pj , and treat 0 as the population parameter ~N,u (or its limit, as N + x.) rather than as in (4.5). Elsewhere, we may identify n&M with the quantity Xll) of this section. Hence we may interpret the results of Theorems 5 and 6, Tables I and II, as being sampling results and as Moran model population results. This duality has already been exploited with reference to (3.1.8), (3.1.9), and (3.1.12).

kKNO\VLEDGMENTS

We thank J. Gillespie, preparation of this paper.

C.

Langley,

and

W.

Ewens

for

helpful

suggestions

in the

REFERENCES

AFMMOWITZ, M., AND STEGLX, I. A. 1968. “Handbook National Bureau of Standards, Washington. CROW, J. F. 1972. The dilemma of nearly neutral mutations; evolution and human welfare ? J. Hered. 63, 306-316. E\n-s, TV. J. 1972. The sampling theory of selectively Bid. 3, 87-112. EUWS, W. J., AND GILLESPIE, J. H. 1974. Some simulation model, with interpretations, Theor. Popul. Biol. 6, 35-57. FELLER, W. 1966. “.4n Introduction to Probability Theory Wiley, Xew York. FELLER, W. 1968. “An Introduction to Probability Theory 3rd ed., Wiley. New York. KIRBY, K., AND EI’ENS, W. J. 1977, to appear.

of Pvlathematical how neutral

important alleles,

results

Functions,” are they Theor.

for the neutral

and Its Applications,” and

Its Applications,”

for

Popd. allele Vol. Vol.

II, I.

160

WATTERSOK

TRAJSTXIAN,

A. C. 1974.

489-493. PATTERSON,

G.

Probability

A.

On a conjecture

1974.

The

of G.

sampling

theory

GUESS

A. \I’atterson, of selectively

ddr.

-4ppI.

neutral

Probnbility Adz.

alleles,

6, Appl.

6, 463-488.

WATTERSOX, G. A. 1976. neutral alleles model,

Reversibility

Theor.

WRIGHT, S. 1969. “Evolution Gene Frequencies,” L-niv. SAXD-I, X., CAV.ALLI-SFORZ.*, of surnames: an analysis 123-142.

AND

and

Popul.

Biol.

the

age of an allele.

I. IIoran’s

infinitely

man)

10, 239-253.

and the Genetics of Populations. Chicago Press, Chicago. L. L., SKOLNICK, M., ASD ivORON1, of their distribution and extinction,

k-01.

2,”

A.

1974.

Theor.

The The

Popul.

Theory

of

evolution

Biol.

5,