ARTICLE IN PRESS
Theoretical Population Biology 66 (2004) 317–321
http://www.elsevier.com/locate/ytpbi
On the meaning of non-epistatic selection Amit Puniyani,a Uri Liberman,b and Marcus W. Feldmana,* a
Department of Biological Sciences, Stanford University, Stanford, CA 94305-5020, USA b School of Mathematical Sciences, Tel Aviv University, Tel Aviv 69978, Israel Received 15 October 2003 Available online 7 October 2004
Abstract In population genetics, the additive and multiplicative viability models are often used for the quantitative description of models in which the genetic contributions of several different loci are independent; that is, there is no epistasis. Non-epistasis may also be quantitatively defined in terms of measures of interaction used widely in statistics. Setting these measures of epistasis to zero yields alternative definitions of non-epistasis. We show here that these two definitions of non-epistasis are equivalent; that is, in the most general case of a multilocus, multiallele system, the additive and multiplicative viability models are unique solutions of the additive and multiplicative conditions, respectively, for non-epistasis. r 2004 Elsevier Inc. All rights reserved. Keywords: Additive viability model; Multiplicative viability model; Non-epistasis
0. Introduction Two different mathematical models have been used to represent independent contributions of separate genes to total genotypic fitness. These are called the additive and multiplicative viability models, after Moran (1968), who regarded the former as ‘‘less natural but also interesting’’. One reason that these models are important is that they constitute a baseline by which to calibrate models in which selection regimes affecting single genes are not independent. Another is that both schemes of independence permit a population genetic frequency equilibrium at which there is no statistical association between the loci (Moran, 1968; Bodmer and Felsenstein, 1967) when there is heterozygote advantage at each locus. For two loci, in the multiplicative case this point is stable for sufficiently loose linkage between the loci (Bodmer and Felsenstein, 1967; Moran, 1968; Karlin and Liberman, 1979a–c). In the additive case, as long as there is any recombination, the point is globally stable (Ewens, 1969; Karlin and Feldman, 1970). Ewens (1969) has also shown that in the additive case, the mean fitness *Corresponding author. E-mail addresses:
[email protected] (A. Puniyani), liberman@post. tau.ac.il (U. Liberman),
[email protected] (M.W. Feldman). 0040-5809/$ - see front matter r 2004 Elsevier Inc. All rights reserved. doi:10.1016/j.tpb.2004.05.001
increases over time. This is not necessarily true for the multiplicative case (Moran, 1964). Measures of departure from independence of the genetic effects of many loci are most frequently framed in terms of contributions to a quantitative character, and calculation of these epistatic effects usually involves assumptions about Hardy–Weinberg equilibrium and linkage equilibrium (e.g., Lynch and Walsh, 1998, pp. 85–87, 212–215). In terms of fitness parameters, however, and specifically for viability, epistasis is calculated as a function of the fitnesses. For example, Bodmer and Felsenstein (1967) take the following two-locus twoallele viability matrix W ¼ ½wij 4i; j¼1 : A1 B1 2 A1 B1 w11 A1 B2 6 6 w21 6 A2 B1 4 w31 A2 B2
w41
A1 B2 A2 B1 A2 B2 3 w12 w13 w14 w22 w23 w24 7 7 7 w32 w33 w34 5 w42
w43
ð1Þ
w44
and write eA i ¼ wi1 wi2 wi3 þ wi4 ;
i ¼ 1; 2; 3; 4
ð2Þ
as four epistatic parameters. These eA i are measures of additive epistasis in fitness. A corresponding set of multiplicative epistasis measures can be produced
ARTICLE IN PRESS 318
A. Puniyani et al. / Theoretical Population Biology 66 (2004) 317–321
(see, e.g., Felsenstein, 1965): wi1 wi4 1; i ¼ 1; 2; 3; 4: eM i ¼ wi2 wi3
satisfy the following equations: ð3Þ
Independence, as defined by no epistasis, then assumes M eA i ¼ 0 or ei ¼ 0 in the additive and multiplicative cases, respectively. A second definition of non-epistasis is perhaps more commonly used and starts from the parameters that represent the contributions from each locus to the viability of the complete multilocus genotype. If the latter is formed by summing the former, the fitnesses might be said to exhibit no additive epistasis, and if the combination is multiplicative, there is no multiplicative epistasis (e.g., Nagylaki, 1992, pp. 187–188). In the multilocus–multiallele case, analogs to Eq. (2) in the additive case or (3) in the multiplicative case are less obvious than in the two-locus case. In light of the importance of epistasis for many problems in evolutionary genetics, it is important to determine whether these multilocus–multiallele extensions of (2) and (3) are equivalent to corresponding formulations that directly assemble fitness contributions from each locus. This note shows that the two formulations are equivalent.
1. Two loci with two alleles Suppose we have a population where the character under consideration is determined at two loci with two possible alleles at each locus. Let A1 ; A2 be the possible alleles at the first locus and B1 ; B2 at the second locus. We assume that there is viability selection operating on the population. As there are four different two-locus gametes: A1 B1 ; A1 B2 ; A2 B1 ; A2 B2 ; there are 16 genotypes Ai Bj =Ak Bc : Therefore the fitness selection can be represented by the 4 4 viability matrix W ¼ ½wij 4i; j¼1 given in (1). Hence w11 is the viability of the genotype A1 B1 =A1 B1 ; w23 of A1 B2 =A2 B1 ; etc. Here we assume that W is a symmetric matrix, namely wij ¼ wji for all i; j ¼ 1; 2; 3; 4: In some situations, for example with genomic imprinting (e.g., Spencer et al., 1998), the fitnesses may not be symmetric. In addition, it is customary to assume that the two double heterozygotes A1 B1 =A2 B2 and A1 B2 =A2 B1 have the same viability, w14 ¼ w23 : That is, there is no position effect in the selection. Two-locus schemes with position effects can be extremely complicated (Nordborg et al., 1995). Here we do not allow this complication. For some evolutionary analyses it is important to know when the two-locus selection, represented by the fitness matrix W; shows non-epistasis. There are two approaches in the literature for defining non-epistasis. The first approach implies linear constraints on the entries of the fitness matrix W: More specifically, W is said to show non-epistasis if and only if the rows of W
wi1 wi2 wi3 þ wi4 ¼ 0;
i ¼ 1; 2; 3; 4:
ð4Þ
The second approach defines non-epistasis as additive selection. Let the fitness matrices associated with the two loci, taken separately, be W1 and W2 ; such that A1 A2 A 1 a1 a2 A 2 a2
a3
¼ W1 ;
B1 B2 B1 b1 b2 B2 b2
b3
¼ W2 :
Then the two-locus fitness matrix W is said to be of the non-epistatic type if its entries are additive in the corresponding entries of W1 and W2 : Thus w11 ; the fitness of the genotype A1 B1 =A1 B1 ; is a1 þ b1 ; the sum of the fitnesses of the two one-locus corresponding genotypes A1 A1 and B1 B1 : In general, W is non-epistatic if the matrix can be represented as A1 B1
2
A1 B1 a1 þ b 1
A1 B2 6 6 a1 þ b 2 6 A 2 B 1 4 a2 þ b 1 A2 B2
a2 þ b 2
A1 B2 a 1 þ b2
A2 B1 a 2 þ b1
a 1 þ b3
a 2 þ b2
a 2 þ b2 a 2 þ b3
a 3 þ b1 a 3 þ b2
A2 B2 3 a2 þ b2 a2 þ b3 7 7 7 ¼ W: a3 þ b2 5 a3 þ b3 ð5Þ
More generally, let A1 ; A2 ; y; Am be the m possible alleles at the first locus where B1 ; B2 ; y; Bn are the n possible alleles at the second locus. As in the two-allele case, assume that the viability selection shows no position effects. This means that the viability associated with the two genotypes Ai Bj =Ak Bc and Ai Bc =Ak Bj are the same for any i; j; k; c: Equivalently, the no-position effect assumption entails that the viability of a two-locus genotype Ai Bj =Ak Bc depends only on its associated onelocus genotypes Ai Ak and Bj Bc : Therefore, let X1 ; X2 ; y; XM be the possible geno) and let types at the first locus (M ¼ mðmþ1Þ 2 Y1 ; Y2 ; y; YN be those at the second loci (N ¼ nðnþ1Þ 2 ). Then the mn mn viability matrix W ¼ ½wij ; that determines the viability parameters of all mn mn genotypes Ai Bj =Ak Bc for i; k ¼ 1; 2; y; m and j; c ¼ 1; 2; y; n; is completely characterized by the M N matrix F ¼ ½ fij : Here fij ¼ f ðXi ; Yj Þ
ð6Þ
is the viability associated with any two-locus genotype whose one-locus components are Xi and Yj : Thus, e.g., with two alleles at each locus, A1 ; A2 at the first and B1 ; B2 at the second, we have F ¼ ½ fij as a 3 3 matrix B1 B1 2 A1 A1 f11 6 A1 A2 4 f21 A2 A2
f31
B1 B2 f12 f22 f32
B2 B2 3 f13 7 f23 5 f33
ð7Þ
ARTICLE IN PRESS A. Puniyani et al. / Theoretical Population Biology 66 (2004) 317–321
and the 4 4 matrix as A1 B1 2 A1 B1 f11 A1 B2 6 6 f12 6 A2 B1 4 f21 A2 B2
f22
fitness matrix W ¼ ½wij is given by the F A1 B2 f12 f13 f22
A2 B1 A2 B2 3 f21 f22 f22 f23 7 7 : 7 f31 f32 5
f23
f32
ð8Þ
f33
Observe that the four linear equations of (4) wi1 wi2 wi3 þ wi4 ¼ 0;
i ¼ 1; 2; 3; 4
that characterize non-epistasis in this case can be written in terms of the fij ’s as the following four equations: f11 f12 f21 þ f22 ¼ 0; f12 f13 f22 þ f23 ¼ 0; f21 f22 f31 þ f32 ¼ 0;
ð9Þ
f22 f23 f32 þ f33 ¼ 0: These last equations are of the general form fij fic fkj þ fkc ¼ 0
ð10Þ
for any choice of i; j; k; c ¼ 1; 2; 3: On the other hand, the fitness matrix is of the additive non-epistatic form if the parameters fij can be represented as a linear combination of positive parameters fij ¼ ai þ bj ;
i; j ¼ 1; 2; 3:
ð11Þ
The equivalence of (10) and (11) is stated as: Theorem 1. Let F ¼ ½ fij be any M N two-locus fitness scheme. Then, there exist M positive parameters a1 ; a2 ; y; aM and N positive parameters b1 ; b2 ; y; bN such that fij ¼ ai þ bj ;
i ¼ 1; 2; y; M;
j ¼ 1; 2; y; N
Q ð1Þ ð2Þ ðnÞ are M ¼ nk¼1 mk n-locus gametes Ai1 Ai2 yAin where for each k ¼ 1; 2; y; n; ik ¼ 1; 2; y; mk : Therefore for each pair ði; jÞ where i ¼ ði1 ; y; in Þ and j ¼ ð j1 ; y; jn Þ ð1Þ ðnÞ ð1Þ ðnÞ we have an n-locus genotype Ai1 yAin =Aj1 yAjn : Thus the M M fitness matrix W ¼ ½wði; jÞ associates with each pair ði; jÞ the fitness coefficient wði; jÞ: ðkÞ ðkÞ For each locus k of the n loci, let X1 ; y; XMk be the Mk possible genotypes at locus k where Mk ¼ mk ðm2k þ1Þ: Hence, given l ¼ ðc1 ; c2 ; y; cn Þ where ck ¼ 1; 2; y; Mk ð1Þ ð2Þ ðnÞ we have an n-locus genotype Xc1 ; Xc2 ; y; Xcn : Let F ¼ ½ f ðlÞ be the n-dimensional fitness scheme such that f ðlÞ ¼ f ðc1 ; c2 ; y; cn Þ is the fitness associated with the n-locus genotype determined by l: The difference between W and F is that F is based on ‘‘no position effects’’ and symmetry whereas these two properties should be imposed on W: Of course if W is symmetric and shows ‘‘no position effects’’ then F determines W: Now the n-locus fitness matrix is additive nonepistatic if the fitness parameters f ðlÞ ¼ f ðc1 ; y; cn Þ can be represented as f ðc1 ; y; cn Þ ¼ að1Þ ðc1 Þ þ að2Þ ðc2 Þ þ ? þ aðnÞ ðcn Þ;
ð15Þ
where all the parameters a ðck Þ are positive for any ck and any k ¼ 1; 2; y; n: We would like to characterize the n-locus generalization of Eqs. (10) for the fitness parameters of F: The easiest way of doing this is by induction. In fact, suppose we have a ‘‘new’’ ðn þ 1Þth locus with ðnþ1Þ ðnþ1Þ Mnþ1 genotypes X1 ; y; XMnþ1 and let ðkÞ
f ðl; cnþ1 Þ ¼ f ðc1 ; y; cn ; cnþ1 Þ
ð16Þ
ð12Þ
be the parameters of the n þ 1 loci fitness scheme F: Then we have the following theorem.
ð13Þ
Theorem 2. The fitness scheme f ðl; cnþ1 Þ is additive, namely there exist positive parameters f ðlÞ and anþ1 ðcnþ1 Þ for all possible l ¼ ðc1 ; y; cn Þ and cnþ1 such that
if and only if fij fic fkj þ fkc ¼ 0
319
for all possible choices of i; k ¼ 1; 2; y; M j; c ¼ 1; 2; y; N: Moreover, representation (12) is unique up to an additive constant. Namely, if in addition to (11), fij ¼ a* i þ b* j ð14Þ
f ðl; cnþ1 Þ ¼ f ðlÞ þ anþ1 ðcnþ1 Þ
ð17Þ
if and only if the equations
for all i; j; then there exists a constant C such that a* i ¼ ai þ C; b* j ¼ bj C
f ðl; cnþ1 Þ f ðl; cnþ1 0 Þ f ðl 0 ; cnþ1 Þ þ f ðl 0 ; cnþ1 0 Þ ¼ 0
for all i and j:
are satisfied for all l ¼ ðc1 ; y; cn Þ; l ¼ ðc1 ; y; cn 0 Þ; cnþ1 ; and cnþ1 0 : Representation (17) is unique up to an additive constant.
ð18Þ 0
We now state and prove a general multilocus multiallele version of Theorem 1.
2. The multilocus–multiallele case ðkÞ
ðkÞ
Suppose we have n loci and let A1 ; y; Amk be the mk possible alleles at locus k; for k ¼ 1; 2; y; n: Thus there
0
Proof. Of course if the additive representation (17) holds, then clearly all Eqs. (18) are satisfied. On the other hand, if Eqs. (18) are satisfied, then for given cnþ1 and c0nþ1 the difference f ðl; cnþ1 Þ f ðl; c0nþ1 Þ ¼ bðcnþ1 ; c0nþ1 Þ
ð19Þ
ARTICLE IN PRESS A. Puniyani et al. / Theoretical Population Biology 66 (2004) 317–321
320
is independent of l: Then choose an l 0 and write bðcnþ1 ; c0nþ1 Þ ¼ f ðl 0 ; cnþ1 Þ f ðl 0 ; c0nþ1 Þ:
and the 6 6 fitness matrix W ¼ ½wij is given as: ð20Þ
Let f ðl 0 ; cÞ ¼ mincnþ1 f ðl 0 ; cnþ1 Þ: Then clearly bðcnþ1 ; cÞX0 for all cnþ1 : Now based on (19) we have the representation f ðl; cnþ1 Þ ¼ f ðl; cÞ þ bðcnþ1 ; cÞ:
ð21Þ
Let a ¼ minl f ðl; cÞ and choose any 0oboa: Define f ðlÞ and aðnþ1Þ ðcnþ1 Þ as f ðlÞ ¼ f ðl; cÞ b; aðnþ1Þ ðcnþ1 Þ ¼ bðcnþ1 ; cÞ þ b:
ð22Þ
ðnþ1Þ
˜ þ a* ðnþ1Þ ðcnþ1 Þ f ðlÞ þ aðnþ1Þ ðcnþ1 Þ ¼ fðlÞ
ð23Þ
˜ f ðlÞ ¼ aðnþ1Þ ðcnþ1 Þ a* ðnþ1Þ ðcnþ1 Þ: fðlÞ
ð24Þ
Now the left-hand side of (24) depends only on l while the right-hand side depends on cnþ1 : Hence both sides of (24) must be a constant C: Therefore for all l and cnþ1 a*
ðl nþ1 Þ ¼ a
ðnþ1Þ
ðcnþ1 Þ C
and the additive representation (17) is unique up to an additive constant C: Using Theorem 2 we actually have the complete characterization of additive non-epistatic selection in the multilocus case. The characterization is obtained by Eqs. (18) where the ‘‘new’’ locus is any locus and the ‘‘old’’ loci are the remaining loci. Thus the set of linear homogeneous equations (18) gives the desired characterization. As an example, consider the case of two loci with two alleles A1 ; A2 at the first and three alleles B1 ; B2 ; B3 at the second. We have three genotypes A1 A1 ; A1 A2 ; A2 A2 at the first locus and six genotypes B1 B1 ; B1 B2 ; B1 B3 ; B2 B2 ; B2 B3 ; B3 B3 at the second. Therefore the 3 6 F ¼ ½ fij matrix is B1 B1 2 A1 A1 f11 6 A1 A2 4 f21 A2 A2 f31
A2 B1 f21 f22
A2 B2 f22 f24
f15
f16
f23
f25
f22 f24
f23 f25
f31 f32
f32 f34
A2 B3 3 f23 f25 7 7 7 f26 7 7 f33 7 7 7 f35 5
f25
f26
f33
f35
f36
B1 B2 f12
B1 B3 f13
B2 B2 f14
B2 B3 f15
f22 f32
f23 f33
f24 f34
f25 f35
ð26Þ
ð27Þ
f12 f13 f22 þ f23 ¼ 0:
Observe that any two of the three identities in (27) imply the third one. Hence, the first row of W produces two independent identities. This is in fact the case with all the other rows. So all together we have 12 independent identities associated with W; and although there are 45 identities of the form fij fic fkj þ fkc ¼ 0;
for all l and cnþ1 : Then for all l and cnþ1
ðnþ1Þ
f23
A1 B3 f13 f15
For W in (26), the identities corresponding to (10) that characterize non-epistasis can now be written. Corresponding to the first row of (26), there are three identities
ðcÞ
ðcnþ1 Þ;
where f ðlÞ40 and aðnþ1Þ ðcnþ1 Þ40: We have thus secured the additive representation (17). To check the uniqueness of the additive representation (17), suppose
˜ ¼ f ðlÞ þ C; fðlÞ
A2 B3
A1 B2 f12 f14
ðaÞ f11 f12 f21 þ f22 ¼ 0; ðbÞ f11 f13 f21 þ f23 ¼ 0;
Then for all l and cnþ1 f ðl; cnþ1 Þ ¼ f ðlÞ þ a
A1 B1 2 A1 B1 f11 A1 B2 6 6 f12 6 A1 B3 6 f13 6 A2 B1 6 6 f21 6 A2 B2 4 f22
B3 B3 3 f16 7 f26 5 f36
ð25Þ
it is easily seen that they are implied by these 12 identities. Representing the symmetric viability matrix W in terms of the matrix F; we can write the conditions on W that it is non-epistatic: (i) W shows ‘‘no position effects’’ such that double heterozygotes have the same fitness; (ii) Identities (10) are satisfied. In the two-locus, two-allele case where W ¼ ½wij 4i; j¼1 the conditions are (following (4)) w14 ¼ w23 ; ‘‘no position effect’’; ð28Þ wi1 wi2 wi3 þ wi4 ¼ 0; i ¼ 1; 2; 3; 4: In the case of two loci with two alleles at the first and three at the second, the conditions are (following (27)) w15 ¼ w24 ; w16 ¼ w34 ; ‘‘no position effects’’; wi1 wi2 wi4 þ wi5 ¼ 0; i ¼ 1; 2; y; 6: wi1 wi3 wi4 þ wi6 ¼ 0;
ð29Þ
3. The multiplicative non-epistatic case Among the class of generalized non-epistatic selection schemes, we have in addition to the additive selection the case of multiplicative selection. For the two-locus with two alleles at each locus, the corresponding fitness matrix W ¼ ½wij 4i; j¼1 of (1) is said
ARTICLE IN PRESS A. Puniyani et al. / Theoretical Population Biology 66 (2004) 317–321
to be multiplicative 2 a1 b1 a1 b2 6a b a b 1 3 6 1 2 W¼6 4 a2 b1 a2 b2 a2 b2 where
a1 W1 ¼ a2
a2 b 3 a2 ; a3
if it can be represented as 3 a2 b1 a2 b2 a2 b2 a2 b3 7 7 7; a3 b1 a3 b2 5 a3 b2
of the linear relations (18) applied to log f ; which can be written in terms of the entries of log W: ð30Þ 4. Conclusion
a3 b3
b1 W2 ¼ b2
b2 b3
ð31Þ
are the fitness matrices associated with the two loci, taken separately. In fact, we see that the fitness matrix W can be represented as W ¼ W1 #W2 ;
ð32Þ
where # is the Kronecker product of matrices. Thus (32) is a characterization of multiplicative selection, and it can be easily generalized to the multilocus multiallele case. Specifically, if we use the notations of Section 2, where we have n loci with mk possible alleles at locus k; for k ¼ 1; 2; y; n; then theQcorresponding M M fitness matrix W (where M ¼ nk¼1 mk ) is multiplicative if and only if there exist n fitness matrices W1 ; W2 ; y; Wn (where Wk is of order mk mk ) such that W is the Kronecker product of W1 ; y; Wn ; namely W ¼ W1 #W2 #?#Wn :
321
ð33Þ
Using the representation of the additive case, we have another representation of W which is equivalent to (33). Thus, with multiplicative selection, the symmetric fitness matrix W in (33) shows no position effects, so we can use the F scheme to characterize W: Therefore, the multilocus matrix W shows multiplicative selection if and only if its associated F scheme is such that for all l ¼ ðc1 ; c2 ; y; cn Þ we have the representation n Y aðkÞ ðck Þ; ð34Þ f ðlÞ ¼ f ðc1 ; y; cn Þ ¼ k¼1
where a ðck Þ are positive for any ck and any k ¼ 1; 2; y; n: If the multiplicative representation (34) holds, then clearly n X log f ðlÞ ¼ log f ðc1 ; y; cn Þ ¼ log aðkÞ ðck Þ ð35Þ ðkÞ
k¼1
and the scheme log F ¼ ½log f ðlÞ is in fact additive. Thus, the scheme F is multiplicative if and only if log F is additive, and therefore we have an equivalent characterization of multiplicative selection. Thus, in the general multilocus multiallele case, the conditions for multiplicativity are no position effect and the existence
The two classical baseline models of viability independence between loci have been shown to reduce, respectively, to additive and multiplicative viabilities. For multiple alleles and/or multiple loci general parametric definitions of epistasis in terms of combinations of viabilities remains an interesting question.
Acknowledgments The authors thank Prof. Warren Ewens for his careful reading of an earlier draft. This research supported in part by NIH Grant GM-28016.
References Bodmer, W.F., Felsenstein, J., 1967. Linkage and selection: theoretical analysis of the deterministic two locus random mating model. Genetics 57, 237–265. Ewens, W.J., 1969. Mean fitness increases when fitnesses are additive. Nature 221, 1076. Felsenstein, J., 1965. The effect of linkage on directional selection. Genetics 52, 349–363. Karlin, S., Feldman, M.W., 1970. Convergence to equilibrium of the two locus additive viability model. J. Appl. Prob. 7, 262–271. Karlin, S., Liberman, U., 1979a. Central equilibria in multilocus systems. I. Generalized nonepistatic selection regimes. Genetics 91, 777–798. Karlin, S., Liberman, U., 1979b. Central equilibria in multilocus systems. II. Bisexual generalized nonepistatic selection models. Genetics 91, 799–816. Karlin, S., Liberman, U., 1979c. Representation of nonepistatic selection models and analysis of multilocus Hardy–Weinberg equilibrium configurations. J. Math. Biol. 7, 353–374. Lynch, M., Walsh, B., 1998. Genetics and Analysis of Quantitative Traits. Sinauer Associates, Sunderland, MA. Moran, P.A.P., 1964. On the nonexistence of adaptive topographies. Ann. Hum. Genet. 27, 383–393. Moran, P.A.P., 1968. On the theory of selection dependent on two loci. Ann. Hum. Genet. 32, 183–190. Nagylaki, T., 1992. Introduction to Theoretical Population Genetics. Springer Verlag, Berlin. Nordborg, M., Franklin, I.R., Feldman, M.W., 1995. The effects of cis-trans selection on two-locus viability models. Theor. Popul. Biol. 47, 365–392. Spencer, H.G., Feldman, M.W., Clark, A.G., 1998. Genetic conflicts, multiple paternity, and the evolution of genomic imprinting. Genetics 148, 893–904.