J. theor. Biol. (1987) 128, 247-252
On the Equality of Origin and Fixation Times in Genetics J. M A Y N A R D S M I T H
School of Biological Sciences, University of Sussex, Falmer, Brighton, U.K. (Received 26 February 1987, and in revised form 27 April 1987) For any sexually reproducing entity (e.g. asexual organism, mitochondrion or gene), the fixation time is the number of generations in the future before all entities are descended from a single one in the present. The origin time is the number of generations in the past when a single entity was ancestral to all entities in the present. It is proved that, over a sufficiently long time, the means and distributions of the fixation and origin times are identical. The proof holds even if population size varies, selection is acting, or conditions change with time.
1. Introduction C o n s i d e r a p o p u l a t i o n o f asexually r e p r o d u c i n g entities, in which each individual has a single parent, but m a y have any n u m b e r o f offspring. These m a y be asexual organisms, or m i t o c h o n d r i a , or, if we ignore intragenic r e c o m b i n a t i o n , genes. For simplicity, assume that generations are separate; that is, all entities r e p r o d u c e at the same time, and then die. G e n e r a t i o n s can then be n u m b e r e d . . . Go, G t , G2, • • •, Gi, • • • • The fixation time is the m i n i m u m n u m b e r o f generations, n, in the future before all entities in G , are d e s c e n d e d from a single entity in Go. The origin time, m, is the m i n i m u m n u m b e r o f generations, m, in the past before all entities in Go were d e s c e n d e d f r o m a single entity in G_m. This p a p e r proves that, over a sufficiently long time, these two times have identical means and distributions. This statement, a l t h o u g h true, is not obviously true. T h u s imagine (Fig. 1) a single gene in Go that is destined to be the a n c e s t o r o f all genes in G,. S u p p o s e that in generation Gk, where 0 < k < n, there is only one d e s c e n d e n t o f this gene. Then the fixation time in Go is n generations, but the origin time in G , is n - k generations. The origin and fixation times are k n o w n to be the same for stochastic r e p r o d u c t i o n in a p o p u l a t i o n o f c o n s t a n t size (Ewens, 1979). The p r o o f given here makes no
Copy number
0
L k
I n
Generahon
FIG. I. An entity that is present in one copy in generations 0 and k, and is fixed in generation n. 247
0022-5193/87/180247 + 06 $03.00/0
© 1987 Academic Press Ltd
248
j. MAYNARD SMITH
//
k+l
fj
n /
/
.............
/
/
/
/
/
/
/
/ /
/
/
/
/
/
i
I
bl
/
fi
................
b I.
n-k
Generation FWG. 2. N o t a t i o n .
assumption about stationarity, reversibility, absence of selection or constant population size. It does assume separate generations, uniparental reproduction, and averaging over a long period. 2. Notation
Figure 2 illustrates the notation to be used. A full line, Fi, connects generation f~ to a later generation fj, which is the earliest generation in which one entity in f~ is ancestral to all entities in f~. F~ is a "forward line", or F line. There is exactly one F line from each generation f~. A broken line, Bj, connects generation bj to an earlier generation hi, which is the latest generation in which all the entities in bj are descended from a single entity in b~. Bi is a " b a c k w a r d line", or B line. There is exactly one B line from each generation bj. The length L~ of a forward line is fj -f~, and the length Lj of a backward line is bj-b~. Consider a period b o u n d e d by a forward line F~ and a backward line B,, each of length k. There are n - k forward lines, FI to F , - k , and n - k backward lines, Bk+ ~to B,. It is required to prove that the distributions o f L~ and Li are identical. 3. The Proof
Figure 3 shows a typical representation, for n = 10 and k = 3. Note the convention that, if a B and F line connect the same two points, the B line is shown to the 4
5
,/f / )
6
7
8
9
I0
) ) 9) ) ,'/,,"'/'////
/
/
/
///,,,,',4,.,///,,. ///',;"/,'///, / ! t
2
, , 3
4
5
6
7
8
9
10
FIG. 3. A t y p i c a l r e p r e s e n t a t i o n , f o r n = 10 a n d k = 3.
EQUALITY
OF ORIGIN
AND FIXATION
249
'TIMES
right. By counting from left to right, we can arrange the lines in a unique order, as follows: line length
F1 3
F2 2
F3 1
B4 1
B5 2
B6 3
F4 3
87 3
B8 4
F5 4
F6 3
F7 2
89 2
Bio 3
In passing along this sequence, there are only four types of transition: (a) F line--> F line, 1 generation shorter. (b) B line-* B line, 1 generation longer, (c) F line-* B line, same length, (d) B line--> F line, same length. To justify this, it is sufficient to show that the following transitions are impossible. (i) F line-* F line, same length or longer, (ii) F line-* F line, more than 1 generation shorter, (iii) B line-* B line, same length or shorter, (iv) B line-* B line, more than one generation longer, (v) B line-* F line, either shorter or longer, (vi) F line-* B line, either shorter or longer. As a first step in showing that transitions (i) to (vi) are impossible, I first show that the transitions in Fig. 4, (a)-(d), are impossible.
jj+l
j
j j+l
~ j+l
/ / //
// ,,)f
//~ i /+1
"'/ / / i i+1 /
(a)
i
(b)
i i+1
(c)
(d)
FIG. 4. Four forbidden transitions.
4(a) is forbidden because, if all entities in j are descended from a single entity in i + 1, the Bj line should connect to i + 1, and not to i as shown. 4(b) is forbidden because, if all entities in j are descended from a single entity in i, the F~ line should connect to j, and not to j + 1. By similar reasoning, 4(c) is forbidden because the F,. line should connect to j, and 4(d) because the Bj+~ line should connect to i + 1. Now consider transitions (i) to (vi) in turn: the arguments are illustrated in Fig. 5. (i) This transition is shown in 5(a). It is forbidden because 4(a) forbids 5(b), and therefore the next line after F~ must be Bi, and not F~+~ as shown. (ii) This transition is equivalent to 4(c), which is forbidden. (iii) This transition is shown in 5(c). It is forbidden because 4(b) forbids 5(d), and therefore the next line after Bj is F~+~, and not Bj+~ as shown.
250
J. MAYNARD SMITH
J
J
J
J
/
" I" ~"J' / / // / / / /
,j,f~
/ / /+1
(a)
/.
i
;'+l
/
(e)
(b)
//// /// .//// /
(f)
/ /
(c)
/
/+I
(d )
'/j
j /+t
j j+l
/7
// 1/ / i+I
,',7 j/' / y //y
j
j
//
/ /+l
(g)
/ /+l
(h)
FIG. 5. Further forbidden transitions. (iv) This transition is equivalent to 4(d), which is forbidden. (v) These two transitions are equivalent to 4(a) and 4(b), which are forbidden. (vi) The transition of an F line to a longer B line is shown in 5(e). It is forbidden because 4(a) forbids 5(f), and therefore the next line after F; is Bj, and not B~+, as shown. The transition of an F line to a shorter B line is shown in 5(g). It is forbidden because 4(b) forbids 5(h), and therefore the next line after Fi is F;+l, and not Bj. Since transitions (i) to (vi) are forbidden, only transitions (a) to (d) are allowed, as shown in Fig. 3. We can therefore represent the sequence of lines as a track, as illustrated in Fig. 6. There are a series of verticals l a b e l l e d . . . , k - 2, k - 1, k, k + 1, k + 2 . . . . corresponding to the lengths of the lines. An F line of length k is represented as a crossing of the kth vertical from right to left, and a B line of length k as a crossing of the kth vertical from left to right. After a crossing, the track can either make a further crossing in the same direction (transitions (a) or (b)), or can re-cross the same vertical in the reverse direction (transitions (c) or (d)). Since the first line is an F line of length k, and the last line is a B line of length k, the track starts and ends in the interval between the kth and the (k + 1)th vertical. Hence the track must cross every vertical an equal number of times to the right and the left. In other words, for each length k, there is an equal number of F and B lines. This proves that the fixation and origin times have the same mean, and identical distributions. It was assumed that the first and last lines, F~ and Bn, were of the same length, k. In a sufficiently long series of generations, it will usually be possible to choose lines of which this is true. Suppose, however, that this is not so. Let the lengths o f
EQUALITY OF ORIGIN AND FIXATION TIMES k--2
k--I
k
k-I-I
I
2
:5
4
F2
F3
251 k+2 5
El START
B5
B4
B6
Br
88
F7
F6
--F5 --1
B9
Bio __D-
L m
__
O
END
FIG. 6.
Track representing
Fig. 3.
the first and last lines be k and l, where 1 > k. There will be a total of n - 1 F lines, and n - k B lines. Of these B lines, n - l can be paired off with F lines o f identical lengths, and l - k cannot: if k > l, there are k - I unpaired F lines. Hence the theorem is approximately true if n >> [ k - I I , and b e c o m e s increasingly exact as n--> oo.
4. Discussion Estimates of fixation and origin times are relevant mainly to studies o f molecular evolution. For example, measurements o f existing variation in a nuclear gene, or in mitochondria, may enable us to estimate the origin time o f the entity in question, and that in turn may provide an estimate o f the effective population size in the past. For any given set of assumptions about population size, family size distribution, and selection, it is easier to calculate the expected value o f the fixation time than of the origin time. The expected origin time, therefore, has usually been obtained by assuming that it is equal to the fixation time. This is k n o w n to be correct for genetic drift in a population of constant size. The point o f the present paper is to show that the equality o f fixation and origin times holds if there is selection, or if population size varies. Essentially, the only assumptions made are: (i) The entity has uniparental inheritance: note that, in the absence o f intragenic recombination, a nuclear gene has a single "parent". (ii) The period over which the expectation is calculated is b o u n d e d by an initial fixation time, and a final origin time, of the same length. If this is not so, the theorem
252
J. MAYNARD SMITH
is still a p p r o x i m a t e l y true p r o v i d e d that the period is long c o m p a r e d to the difference between the initial fixation time and the final origin time. I have discussed this problem with S. Nee, who has helped to make the proof a little clearer than it might otherwise be, REFERENCE EWENS, W, J, (1979), Mathematical Population Genetics. Berlin: Springer-Verlag.