Gene 291 (2000) 115±125
www.elsevier.com/locate/gene
Neutrality tests of conservative-radical amino acid changes in nuclear- and mitochondrially-encoded proteins David M. Rand*, Daniel M. Weinreich, Brent O. Cezairliyan Department of Ecology and Evolutionary Biology, Box G-W, 69 Brown Street, Brown University, Providence, RI 02912, USA Received 8 June 2000; received in revised form 26 September 2000; accepted 5 October 2000 Received by G. Bernardi
Abstract The neutralist-selectionist debate should not be viewed as a dichotomy but as a continuum. While the strictly neutral model suggests a neutralist-selectionist dichotomy, the nearly neutral model is a continuous model spanning strict neutrality through weak selection (Ns , 1) to deterministic selection (Ns . 3). We illustrate these points with polymorphism and divergence data from a sample of 73 genes (31 mitochondrial, 36 nuclear genes from Drosophila, and six Arabidopsis data sets). In an earlier study we used the McDonald±Kreitman (MK) test to show that amino acid replacement polymorphism in animal mitochondrial genes and Arabidopsis genes show a consistent trend toward negative selection, whereas nuclear genes from Drosophila span a range from negative selection, through neutrality, to positive selection. Here we analyze a subset of these genes (13 Drosophila nuclear, ten mitochondrial, and six Arabidopsis nuclear) for polymorphism and divergence of conservative and radical amino acid replacements (a protein-based conservative-radical MK, or pMK, test). The distinct patterns of selection between the different genomes is not apparent with the pMK test. Different de®nitions of conservative and radical (based on amino acid polarity, volume or charge) give inconsistent results across genes. We suggest that segregating ®tness difference between silent and replacement mutations are more visible to selection than are segregating ®tness differences between conservative and radical amino acid mutations. New data on the variation among genes with different opportunities for positive and negative selection are as important to the continuum view of the neutralist-selectionist debate as is the distribution of selection coef®cients within individual genes. q 2000 Elsevier Science B.V. All rights reserved. Keywords: Neutrality test; Mildly deleterious single nucleotide polymorphism; Neutral theory; Natural selection; Molecular evolution; Drosophila; mtDNA; McDonald±Kreitman test
1. Introduction The controversial statement that ª¼the great majority of evolutionary changes at the molecular level¼are caused not by Darwinian selection but by random drift of selectively neutral or nearly neutral mutantsº (Kimura, 1983, pg. xi) has been the focal point of the long-running neutralist-selectionist debate. While some evolutionists have taken this view as a threat to the foundation of the Modern Synthesis, Kimura clearly quali®es his statement in the next sentence by clarifying that he does not deny the role of natural selection in adaptive evolution. The issue of whether mutations are Abbreviations: N.I., Neutrality Index; pN.I., protein Neutrality Index; cN.I., Codon Neutrality Index; MK test, McDonald±Kreitman test; pMK test, protein McDonald±Kreitman test; cMK, codon McDonald±Kreitman test; mtDNA, mitochondrial DNA * Corresponding author. Tel.: 11-401-863-2890; fax: 11-401-863-2166. E-mail addresses:
[email protected] (D.M. Rand);
[email protected] (D.M. Weinreich).
neutral OR selected seems to have greatly overshadowed two crucial words in Kimura's statement: `great majority'. Just how much is a `great majority'? In an election, 75% of the votes would be considered a landslide, but Kimura would probably not agree that as many as 25% of substitutions are not neutral. Indeed, he goes on to say that ª¼only a minute fraction of changes at the DNA level are adaptive in nature¼º (Kimura, 1983, pg. xi). Here we argue that the neutralist-selectionist debate is over because it is not a qualitative, dichotomous problem. Rather, if the debate is to continue it should focus on quantifying the relative sizes of the `great majority' of neutral substitutions and the `minute fraction' of adaptive DNA changes. In recent years the neutral theory has been subjected to a variety of tests using the growing database of DNA sequences that have become available (Dispersion index, HKA test, Tajima and Fu and Li tests, McDonald±Kreitman tests; see Kreitman and Akashi, 1995). For example, the dispersion index, or the ratio of the variance to the mean
0378-1119/00/$ - see front matter q 2000 Elsevier Science B.V. All rights reserved. PII: S 0378-111 9(00)00483-2
116
D.M. Rand et al. / Gene 291 (2000) 115±125
number of substitutions between species is generally greater than the neutral expectation of 1.0 (Gillespie, 1989; 1995; Ohta, 1996). Among a sample of nuclear genes in Drosophila, about half of them showed departures from the strict neutral models (Moriyama and Powell, 1996). For mitochondrial genes, about half of published data sets also depart from neutral expectations, generally in the direction of negative selection (Nachman, 1998; Rand and Kann, 1998). Even silent sites have been shown to deviate from neutral evolution (Akashi, 1995; 1996). On balance then, a `sizeable proportion' of the tests from recent data actually reject strict neutral assumptions. Does this mean that the neutral theory is wrong, or that we just need more insightful tests of departures from strict neutrality? (e.g. Kreitman, 1996; Ohta, 1996). Tests that focus on polymorphic sequences sampled from natural populations (Tajima, 1989; Fu and Li, 1993) can `reject' neutrality if the sample is, in fact, not a truly random one. But as argued above, the issue of neutrality vs. non-neutrality will become secondary to studies that allow one to characterize the distribution of selection coef®cients by placing an individual data set somewhere on the continuum from strong purifying selection through neutrality to strong positive selection. Studies of polymorphism and divergence in DNA sequences allow one to translate empirical data into selection coef®cients. Following the predictions of Kimura (1983, pg., 44±45) and Sawyer and Hartl (1992), Akashi (1995) pointed out that the ratio of polymorphism to divergence (rpd) scales monotonically with effective selection coef®cient, Ns. High values of rpd are indicative of negative selection and low values indicate positive selection. One can extend this to the McDonald±Kreitman (MK) test and express this 2 £ 2 table as a ratio of ratios: rpd-replacement/rpdsilent (referred to as the Neutrality Index; Rand and Kann, 1996). The intention of a neutrality index is to provide a measure of the direction and magnitude of a gene's departure from neutral expectation (Rand and Kann, 1996). N.I. scales monotonically with selection: N:I: , 1 indicates an excess of amino acid ®xations, or positive selection; N:I: . 1 indicates an excess of amino acid polymorphisms, or negative selection (Rand and Kann, 1996; Nachman, 1998; Weinreich and Rand, 2000). One assumption that follows from Kimura's (1983) analyses is that one class of nucleotide sites is strictly neutral (e.g. silent sites; but see Akashi, 1995; 1996). Empirically, it becomes relatively straightforward to use MK tests and N.I. values to place genes on the spectrum from negative to positive selection. Because the MK test focuses on the ratios of counts of segregating and ®xed sites, it should be less sensitive to non-equilibrium conditions than other neutrality tests such as the Tajima test or the HKA test (cf. McDonald and Kreitman, 1991; Akashi, 1999). In an earlier paper we showed that patterns of molecular evolution were signi®cantly different for genes encoded in nuclear vs. mitochondrial genomes (Weinreich and Rand, 2000). Speci®cally, mitochondrial genes show a clear trend
toward excess amino acid polymorphism (N:I: . 1 for 25/ 31 data sets) while nuclear genes show a roughly normal distribution centered around neutrality (mean N:I: 1:2; 15/36 data sets with N:I: . 1). We suggested that the low recombination environment of mtDNA hinders the ®xation of advantageous mutations as they arise on haplotypes carrying accumulated deleterious mutations. In support of this argument is the observation that ®ve out of six genes from Arabidopsis thaliana also show N:I: . 1; this plant is known to be highly sel®ng, resulting in low effective recombination (but see Kuittinen and Aguade (2000)). In principle, any two or more functionally distinct classes of DNA changes can be subjected to the McDonald±Kreitman test format. These lead logically to a variety of possible ratio of ratios, or neutrality index (N.I.) values that are effective for measuring selection (Sawyer and Hartl, 1992; Akashi, 1995; Nachman, 1998; Weinreich and Rand, 2000). Here we extend these studies by performing a conservativeradical McDonald Kreitman tests (CRMK tests) on a subset of the genes analyzed in Weinreich and Rand (2000). Only amino acid sequence data are considered, and amino acid changes are classi®ed as conservative or radical depending on charge, volume or polarity (Zhang, 2000). Our intention is to examine the relationship between the distribution of selection coef®cients within versus between individual genes. By contrasting the distributions of neutrality index values based on protein sequences (pN.I. values) with neutrality index values based on silent and replacement changes in codons (cN.I. values), we are asking the question: Does polymorphism and divergence in protein sequences reveal the same patterns of genome-speci®c non-neutral evolution as for gene sequences at the DNA level? From these analyses we seek to assess the relative strengths and directions of selection acting on nucleotide changes with varying levels of functional constraint. A similar approach has proven effective in analyses of speci®c genes. In a study of MHC variation, Hughes et al. (1990) showed that non-synonymous changes exceeded synonymous changes in the binding cleft of the molecule, suggesting overdominant selection. They went on to show that amino acid changes altering side-chain charge occurred more frequently than by chance alone, further suggesting that that selection was acting to promote a diversity of charge pro®les among alleles at MHC (Hughes et al., 1990). These types of comparisons seek to distinguish between the phenotypic effects of mutations that alter codons in a messenger RNA, from mutations that alter amino acids in a protein.
2. Materials and methods 2.1. Data sets The genes selected for this study are a subset of those analyzed in Weinreich and Rand (2000), which consisted of
D.M. Rand et al. / Gene 291 (2000) 115±125
Drosophila nuclear genes, animal mitochondrial genes, and Arabidopsis thaliana nuclear genes. We restricted our analyses to only those data sets from Weinreich and Rand (2000) for which there were an appreciable number (.12) of amino acid replacement differences (®xed and/or polymorphic) so that the 2 £ 2 tests would have reasonable power. Thirteen Drosophila nuclear genes were studied including four caboxylesterases (Est-5A, Est-5B, Est-5C, Est-6), two accessory gland proteins (Acp26Aa, Acp29AB), an acid phosphatase (Acph-1), the period gene (per) involved in circadian rhythms, a gene involved in conferring viral resistance (ref(2)p), three genes of unknown function (Anon1A3, Anon1E9, Anon1G5), and relish, and NF-kB/IkB protein. Ten animal mitochondrial data sets were studied, consisting primarily of the cytochrome b gene (cyt b) for mammals (Microtus, Ursus, Isothrix, Sciurus), birds (Grus, Melospiza/Passerella), reptiles (Emoia), and amphibians (Ambystoma), as well as (ND5) locus in Drosophila. Six Arabidopsis thaliana genes were studied including alcohol dehydrogenase (Adh), the homeotic genes APETALA (AP3) and PISTILLATA (PI), acidic endochitinase (ChiA), and basic endochitinase (ChiB), plus Chalcone Isomerase (Kuittinen and AguadeÂ, 2000, and references therein). 2.2. Conservative-radical McDonald±Kreitman test Three modi®cations of the McDonald±Kreitman (MK) test were performed for each data set according to groups of amino acids distinguished by polarity, both polarity and volume, and charge (Zhang, 2000). We refer to these as protein-based conservative-radical MK (CRMK) tests, or more generally as pMK tests. Amino acid changes involving amino acids within a category (see below) are de®ned as conservative, while changes involving amino acids from different categories are de®ned as radical. Classi®cation by polarity was according to the following categories: Polar (R, N, D, C, Q, E, G, H, K, S, T, Y); non-polar (A, I, L, M, F, P, W, V). Classi®cation by polarity and volume was according to the following categories: Special (C); Neutral and small (A, G, P, S, T); Polar and relatively small (N, D, Q, E); Polar and relatively large (R, H, K); Non-polar and relatively small (I, L, M, V); Non-polar and relatively large (F, W, Y). Classi®cation by charge was according to the following categories: Positive (R, H, K); Negative (D, E); Neutral (A, N, C, Q, G, I, L, M, F, P, S, T, W, Y, V) (Zhang, 2000). Note that the three different ways of de®ning `conservative' and `radical' differ in the number of opportunities for variability within vs. between groups. Protein sequences taken from GenBank were aligned using a web interface to ClustalW at www.ibc.wustl.edu/ clustal.html. Alignment ®les were converted manually into MEGA (Kumar et al., 1993) format, and MEGA was used to identify and export data for variable amino acid sites. These data were read by a program written in QuickBASIC 4.5 by
117
B.O.C. that performed the conservative/radical McDonald± Kreitman tests (pMK tests). Among the aligned sequences of alleles from two species, the program ®nds amino acid positions that have exactly two different amino acids, and these sites are classi®ed as conservative or radical according to the categories de®ned above. If there is no variation among the sequences from within either of the two species the site, the site is labeled `®xed', otherwise it is labeled `polymorphic'. If a site has more than two different amino acids, the program queries the user for input. These sites were classi®ed systematically by hand. In the presence of multiple amino acids at a site within any of the species, each unique amino acid was labeled as a polymorphism and either conservative or radical depending on its comparison to the most common amino acid at that site. The detection of sites with multiple polymorphisms is desirable to account for multiple mutations within codons. Because the species compared were not greatly diverged, these cases represented only a small fraction of the data so no correction for multiple substitutions was attempted (cf. Maynard Smith, 1994). After the test has been performed on all sites, the program totals the number of ®xed radical (FR), ®xed conservative (FC), polymorphic radical (PR), and polymorphic conservative (PC) differences. Contingency tests for the pMK tests and other 2 £ 2 test were done using G-tests. Some variation in results were obtained when different outgroups were used in the pMK tests. This was most notable in the Arabidopsis data, so an effort was made to use data sets employing Arabis lyrataas the outgroup. 2.3. Neutrality indexes To describe the magnitude and direction of departures from neutrality, a conservative-radical neutrality index, or a `protein' N.I. was determined for each gene, de®ned as pN.I. (PR/FR)/(PC/FC). Here, PR polymorphic radical, FR ®xed radical, PC polymorphic conservative, and FC ®xed conservative. In the case that either FR or PC equals zero, they were replaced with 1 so that N.I. was not unde®ned (Rand and Kann, 1996; Weinreich and Rand, 2000). For ease of nomenclature, we will refer to the N.I. value from a traditional silent-replacement McDonald Kreitman test as a `codon' N.I., or cN.I., and contrast this with the protein N.I.'s (pN.I.'s) based on the conservativeradical comparison de®ned above. These N.I. terms are distinguished from their respective McDonald±Kreitman tests (cN.I. and cMK test; pN.I. and pMK test) on the grounds that N.I. values seek to pin a number on the mode of selection whereas MK tests provide a statistical statement of the departure from neutral expectations. Hence, a cN.I. may indicate positive or negative selection, and the respective cMK test can determine if the departure is signi®cant. 2.4. Amino acid composition The amino acid compositions of the proteins under study were calculated for each gene studied using MEGA. A
118
D.M. Rand et al. / Gene 291 (2000) 115±125
random allele was chosen from each gene, and means vales for all genes in each genome in the data sets used (Drosophila nuclear, mitochondrial, and Arabidopsis nuclear) were tabulated. For presentation these mean values for each amino acid were pooled in to a non-polar group (A, F, I, L, M, P, V, W), a polar group (C, G, N, Q, S, T, Y), a negatively charged group (D, E) and a positively charged group (H, K, R). 3. Results For initial comparison, the results from our earlier study (Weinreich and Rand, 2000) are summarized in Fig. 1. Fig. 1A shows the distribution of N.I. values for 36 Drosophila nuclear data sets and 31 animal mtDNA data sets. The difference between these distributions is signi®cant (G 18:56, d:f: 6, P 0:005). In Fig. 1B we show the distribution of N.I. values for MK tests that signi®cantly reject the null hypothesis. Splitting the tests into those with N.I. values less then, and greater then, 1.0, the data sets from the two genomes are also signi®cantly different (G 12:37, d:f: 1, P 0:0004; Weinreich and Rand, 2000). Drosophila nuclear genes tend to depart from neutrality in the direction of excess amino acid ®xations, while mitochondrial genes tend to depart from neutrality in the direction of excess amino acid polymorphism. 3.1. Drosophila nuclear genes The protein based conservative/radical McDonald±Kreitman (or pMK) test results for Drosophila nuclear genes are presented in Table 1. In Fig. 2A the number of genes with neutrality index less than one and greater than one are tabulated for each type of conservative/radical test, as well as for the traditional codon-based silent/replacement McDonald± Kreitman (cMK) test from Weinreich and Rand (2000). Two of the cMK tests were signi®cant below the 0.01 level (per and relish) and Est-5B was marginally signi®cant (P 0:053). All three of these cMK tests depart from neutrality in the direction of adaptive ®xation of amino acid changes (data from Weinreich and Rand, 2000). For the pMK tests in Table 1, three were signi®cant at the 5% level by G-tests in the direction of excess radical ®xations (pN:I: , 1) and two were signi®cant in the direction of an excess of radical polymorphism (pN:I: . 1). The 13 tests show a slight bias towards pN:I: , 1 with the polarity tests and the polarity 1 volume test giving reciprocal results. But no partitioning of the data by pN:I: , 1 vs. pN:I: . 1 is signi®cant when comparing the three different versions of pN.I. (polarity, polarity 1 volume, or charge), or when comparing pN.I. values with cN.I. values. 3.2. Animal mitochondrial genes Table 2 and Fig. 2B present the results for animal mitochondrial DNA. Compared to the cMK tests which show a
Fig. 1. Distribution of neutrality index values for Drosophila nuclear, animal mitochondria, and Arabidopsis nuclear genes. (A) Distribution of all genes examined. The animal mtDNA and Arabidopsis distributions are not signi®cantly different, but the Drosophila nuclear distribution is significantly different from both the animal mtDNA and Arabidopsis nuclear gene distributions (P , 0:005). (B) Distribution of neutrality index values for those genes showing a signi®cant departure from neutral expectations by the McDonald±Kreitman test. The same patterns of signi®cant differences between genomes observed in (A) hold for this restricted set of nonneutral genes. Data from Weinreich and Rand (2000).
skew towards cN.I. values .1, the pMK tests show a more even split between pN.I. values greater than one and less than one. The charge-based pN.I. values showed seven pN:I:'s , 1 and three pN:I:'s . 1. This is signi®cantly
D.M. Rand et al. / Gene 291 (2000) 115±125
119
Table 1 Locus, number of codons, species and sample size, mutation class counts, pN.I. and cN.I. values for nuclear-encoded DNA sequence surveys used in this study. Citations as in Weinreich and Rand (2000) Locus
Codons
Species a
AA grouping b
FR c
FC c
PR c
PC c
pN.I. d,e
cN.I. d,e
Mst26Aa (Acp26Aa)
267
10 mel 1 sim
220
39 mel 1 sim
Acph-1
447
53 sub 1 mad
Anon1A3
310
Anon1E9
595
Anon1G5
261
Est-5A
548
Est-5B
545
Est-5C
545
Est-6
544
26 mel 12 sim (1 yak) 15 mel 8 sim (1 yak) 3 mel 10 sim (1 yak) 8 pse 1 per (1 mir) 16 pse 1 per (1 mir) 8 pse 1 per (1 mir) 30 mel 3 sim
Per
402
Ref(2)p
599
Relish
803
24 42 22 8 19 13 0 0 0 12 16 7 11 23 21 9 12 6 1 2 2 23 29 19 1 0 0 2 6 5 4 7 1 14 16 11 21 50 36
42 24 44 20 9 15 4 4 4 14 10 19 28 16 18 14 11 17 2 1 1 13 7 17 0 1 1 17 13 14 12 9 15 20 18 23 60 31 45
3 4 3 3 3 3 10 15 8 8 14 8 12 19 12 7 8 6 5 10 4 9 16 10 7 8 4 8 6 4 9 7 1 3 4 3 3 9 2
6 5 6 3 3 3 20 15 22 13 7 13 24 17 24 9 8 10 14 9 15 23 16 22 5 4 8 12 14 16 5 7 13 7 6 7 8 2 9
0.88 0.46 1.00 2.50 0.47 1.15 2.00 4.00 1.45 0.72 1.25 1.67 1.27 0.78 0.43 1.21 0.92 1.70 0.71 0.56 0.13 0.22 0.24 0.41 1.40 2.00 0.50 5.67 0.93 0.70 5.40 1.29 1.15 0.61 0.75 0.90 1.07 2.79 0.28
0.41
Acp29AB
Pol pol&vol Charge Pol pol&vol Charge Pol pol&vol Charge Pol pol&vol Charge Pol pol&vol Charge Pol pol&vol Charge Pol pol&vol Charge Pol pol&vol Charge Pol pol&vol Charge Pol pol&vol Charge Pol pol&vol Charge Pol pol&vol Charge Pol pol&vol Charge
18 wil 7 equ (1 yak) 10 mel 1 sim 6 mel 7 sim (1 yak)
0.37 0.68 1.61 0.51 0.48 0.53 0.48 2.36 0.47 0.28 1.15 0.12
a All genes from Drosophila spp. equ D. equinoxialis, mad D. maderiensis, mel D. melanogaster, per D. persimillis, pse D. pseudoobscura, sim D. simulans, sub D. subobscurca, wil D. willistoni. b Amino acid grouping from Zhang (2000): pol polarity, pol&vol polarity and volume. c Mutation classes: FR ®xed radical, FC ®xed conservative, PR polymorphic radical, PC polymorphic conservative. d pN.I. Protein neutrality index. See Section 2. cN.I. Codon neutrality index. Data from Weinreich and Rand (2000). Some minor differences in cN.I. values between Weinreich and Rand (2000) and those reported here stem from a reanalysis of the DNA sequences deposited in GenBank for the gene in question. e Zero replaced with 1 for the purposes of calculating pN.I. N.I. values appearing in bold face correspond to MK tests that are signi®cant at or below the 0.05 level. See Section 2. The Relish data are from Begun and Whitley (2000).
different from the skewed pattern of nine cN:I:'s . 1 and one cN:I: , 1 (P , 0:05). Considering tests that are significant at the 5% level, four of the ten cMK tests are signi®cant (one with cN:I: , 1, three with cN:I: . 1), while only three of the 30 pMK test are signi®cant (all with pN:I: . 1). Thus, although there appears to be excess `general' amino acid polymorphism relative to divergence for mtDNA (cN:I:'s . 1), this excess is less pronounced when one
considers radical amino acid polymorphism (smaller proportion of pN.I.'s greater than 1). 3.3. Arabidopsis nuclear genes Table 3 and Fig. 2C present the results for Arabidopsis nuclear genes. None of the individual pMK tests were signi®cant at the 5% level. Similar to the results for animal
120
D.M. Rand et al. / Gene 291 (2000) 115±125
mtDNA, ®ve out of six cN.I.'s are greater than one (and three of these ®ve are signi®cant at the 5% level; Weinreich and Rand, 2000), but the pN.I.'s for polarity or polarity 1 volume are, if anything, slightly skewed towards tests with pN:I: , 1. The polarity 1 volume-based pN.I. values in Arabidopsis stand out as skewed to values less than 1 (®ve out of six), which differs signi®cantly from the pattern for cN.I. values (®ve out of six .1; G 5:2; P , 0:05). To summarize the results presented in Tables 1±3 and Fig. 2, there is no signi®cant heterogeneity among Drosophila, mitochondrial, or Arabidopsis data sets for the conservative-radical neutrality index (pN.I.) values based on polarity or polarity 1 volume when one considers pN.I. values greater than vs. less than 1. This is in contrast to our earlier study which showed that cN.I. values for Drosophila nuclear genes are signi®cantly different from cN.I.'s from animal mtDNA and Arabidopsis genes (Weinreich and Rand, 200). However, there is signi®cant heterogeneity between cN.I. and charge-based pN.I. values for the mtDNA genes, and between the cN.I. and polarity 1 volume based pN.I.'s for Arabidopsis genes. In both of these cases, the cN.I. values tend to be greater than 1.0 more often than the pN.I. values. There were no signi®cant correlations between the three different pN.I. values (polarity, polarity 1 volume and charge) within either the Drosophila, mtDNA or Arabidopsis data sets alone, or across all genes studied in the three data sets. There were also no signi®cant correlations between cN.I. and pN.I. values across the entire data set. 3.4. Amino acid composition
Fig. 2. Distribution of Conservative-Radical and Silent-Replacement neutrality index scores in Drosophila nuclear, animal mitochondria and Arabidopsis nuclear genes. Three different Conservative-Radical McDonald±Kreitman tests were done for each gene, based on polarity, polarity and volume, and charge properties of individual amino acids (Zhang, 2000). Among the 87 possible protein-based McDonald±Kreitman (pMK) tests, only eight were signi®cant, which is not signi®cantly different from chance after correcting for multiple tests. Mitochondrial genes show a signi®cant difference between their conservative-radical neutrality index scores (protein N.I., or pN.I.'s) and their codon-based N.I. scores (cN.I.'s). There is no signi®cant difference between the pN.I. and cN.I. distributions for Drosophila. For animal mtDNA, pN.I. based on charge and cN.I. distributions are signi®cantly different (P , 0:05). For Arabidopsis genes, the pN.I. from polarity 1 volume is signi®cantly different from the cN.I. distribution (P , 0:05). None of the pN.I. distributions are signi®cantly heterogeneous across genomes, but the cN.I. distributions for mtDNA and for Arabidopsis are signi®cantly different from the Drosophila nuclear distribution (P , 0:05).
Amino acid frequencies in the three categories of genes studied are presented in Fig. 3. Proteins coded for by the mitochondrial DNA have a greater proportion of the nonpolar residues F, I, L, M in comparison to nuclear-coded proteins (P , 0:001 by Chi-square tests; see also Naylor et al., 1995). This is due most likely to the presence of multiple hydrophobic transmembrane regions in these proteins that serve to anchor them in the mitochondrial membrane. There is also a signi®cant de®ciency of charged residues (D, E, K, R) in the mitochondrial genes, relative to the Drosophila and Arabidopsis nuclear genes (P , 0:01). The Drosophila and Arabidopsis nuclear proteins studied have very similar proportions of the different groups of amino acid (Fig. 3). 4. Discussion Our original goal was to perform a different test of the observation that animal mitochondrial genes tend to deviate from neutrality in the direction of negative selection, while nuclear genes do not show this trend. Our aim was to use a protein-based conservative-radical McDonald Kreitman (pMK) test to ask if the functional distinction between conservative and radical amino acid mutations within proteins would be `perceived' by evolutionary forces in
D.M. Rand et al. / Gene 291 (2000) 115±125
121
Table 2 Locus, number of codons, species and sample size, mutation class counts, pN.I. and cN.I. values for mtDNA-encoded DNA sequence surveys used in this study. Citations as in Weinreich and Rand (2000) Locus
Codons
Species a
AA grouping b
FR c
FC c
PR c
PC c
pN.I. d,e
ND5
505
Cytb
227
25 mel 20 sim (1 yak) 23 Ensatina eschscholtzii 1 Plethodon elongatus
cyt b
380
9 Grus antigone 2 Grus rubicunda
cyt b
380
15 Microtus arvalis 9 M. rossiaemeridionalis
cyt b
266
10 Isothrix bistriata 1 Isothrix pagurus
cyt b
266
29 Mesomys hispidus 2 Mesomys stimulax
cyt b
143
11 Passerella iliaca 6 Melospiza melodia
cyt b
97
8 Phyllobates lugubris 1 Phyllobates vittatus
cyt b
379
20 Sciurus aberti 1 Sciurus niger
cyt b
379
28 Ursus arctos 1 Ursus americanus
Pol pol&vol Charge Pol pol&vol Charge Pol pol&vol Charge Pol pol&vol Charge Pol pol&vol Charge Pol pol&vol Charge Pol pol&vol Charge Pol pol&vol Charge Pol pol&vol Charge Pol pol&vol Charge
6 8 2 10 13 2 1 1 0 3 4 0 3 4 1 0 0 0 0 3 0 0 0 0 5 12 2 6 7 1
9 7 13 7 5 15 0 0 1 5 3 7 3 2 5 0 0 0 10 7 10 0 0 0 17 10 20 11 10 16
4 5 0 14 20 5 2 4 2 6 13 5 6 4 1 15 20 3 2 8 6 1 6 2 6 5 1 6 3 0
6 5 10 31 24 39 5 3 5 12 5 13 10 12 15 18 13 30 7 1 3 10 5 9 3 4 8 5 8 11
1.00 0.88 0.00 0.32 0.32 0.96 0.40 1.33 0.40 0.83 1.95 2.69 0.60 0.17 0.33 0.83 1.54 0.10 2.86 18.67 20.00 0.10 1.20 0.22 6.80 1.04 1.25 2.20 0.54 0.00
cN.I. d,e 2.11 0.28 16.20 5.07 1.20 4.71 1.90 3.54 1.67 1.10
a
Drosophila species names as in Table 1. Amino acid grouping from Zhang (2000): pol polarity, pol&vol polarity and volume. c Mutation classes: FR ®xed radical, FC ®xed conservative, PR polymorphic radical, PC polymorphic conservative. d pN.I. Protein neutrality index. See Section 2. cN.I. Codon neutrality index. See Weinreich and Rand (2000). e Zero replaced with 1 for the purposes of calculating pN.I. N.I. values appearing in bold face correspond to MK tests that are signi®cant at or below the 0.05 level. See Section 2. b
the same manner that silent-replacement mutations within codons are perceived. In particular we predicted that the distribution of neutrality index values for proteins (pN.I.'s) would show a similar pattern to the distributions of neutrality index values for codons (cN.I.'s), namely that pN.I. values for mitochondrial genes would tend to be greater than Drosophila nuclear genes, as seen for cN.I. values (Weinreich and Rand, 2000). We would further predict that if adaptive evolution has been inferred using the standard McDonald Kreitman test (low cN.I.), the pN.I. values for these genes should also be low. Our results show that the partitioning of protein sequence data according to conservative and radical amino acid changes gives a very different picture of evolution than that evident in comparisons of silent and replacement sites in DNA. Mitochondrial and Arabidopsis genes do not show a trend towards high pN.I. values, and those Drosophilanuclear genes with signi®cant adaptive evolution at the codon level do not tend to show low pN.I. values. Indeed
there is no signi®cant correlation between cN.I. values and either of the three pN.I. values in either genome (data not shown). 4.1. Power of the tests This different signal for protein based neutrality tests as compared to codon based neutrality tests could be due to the lower power of the data sets when restricted to amino acid changes. It is interesting to note that seven of the of the 87 pMK tests are signi®cant at 5% level (four among 3 £ 13 39 Drosophila tests, three among the 3 £ 10 30 mitochondrial tests, and none among the 3 £ 6 18 Arabidopsis tests). While the different tests for each gene may not be independent, the pN.I. values are not correlated (see Section 3.3). Considering the 29 traditional MK test (cMK test) among these same data sets, two, four, and three tests, respectively, are signi®cant at the 5% level. In total, 31.0% of the cMK tests are signi®cant, while only 8.0% of the pMK tests
122
D.M. Rand et al. / Gene 291 (2000) 115±125
Table 3 Locus, number of codons, species and sample size, mutation class counts, pN.I. and cN.I. values for nuclear-encoded DNA sequence surveys in Arabidopsis thaliana used in this study H
Codons
Species
AA grouping a
FR b
FC b
PR b
PC b
pN.I. c
cN.I. c
Adh
361
15 A. thaliana 1 Arabis lyrata
246
23 A. thaliana 1 Arabis lyrata
AP3
231
17 A. thaliana 1 Arabis lyrata
ChiA
302
ChiB
335
14 A. thaliana 1 Arabis lyrata (no Ci-0) 16 A. thaliana 1 Arabis gemmifera
PI
209
5 5 4 8 11 6 2 4 1 5 9 3 6 9 5 1 3 1
10 10 11 16 13 18 4 2 5 12 8 14 11 8 12 7 5 7
3 2 1 0 0 0 5 10 8 6 6 1 1 1 2 3 6 5
4 5 6 2 2 2 11 6 8 6 6 11 3 3 2 9 6 7
1.50 0.80 0.46 0.00 0.00 0.00 0.91 0.83 5.00 2.40 0.89 0.42 0.61 0.30 2.40 2.33 1.67 5.00
1.29
CHI
Pol Pol&vol Charge Pol pol&vol charge pol pol&vol charge pol pol&vol charge pol pol&vol charge pol pol&vol charge
16 A. thaliana 1 Arabis lyrata
2.10 4.00 3.07 0.38 6.00
a
Amino acid grouping from Zhang (2000): pol polarity, pol&vol polarity and volume. Mutation classes: FR ®xed radical, FC ®xed conservative, PR polymorphic radical, PC polymorphic conservative. c pN.I. Protein neutrality index. See Section 2. cN.I. Codon neutrality index. Data from Weinreich and Rand (2000). CHI data from Kuittinen and Aguade (2000). N.I. values appearing in bold face correspond to MK tests that are signi®cant at or below the 0.05 level. b
are signi®cant. These two proportions are signi®cantly different (G 5:0, P , 0:05). A pMK test partitions only the amino acid `row' of the traditional cMK test into the four cells of the comparable 2 £ 2 test. While this will reduce counts, and hence power (e.g. Akashi, 1999), some of the genes in each of the three genomes have a reasonable number of counts for each cell in the pMK test (see Tables 1±3). While we acknowledge that our pMK tests will have lower power, there appear to be enough data to detect a positive
Fig. 3. Protein composition of genes studied. Individual amino acids were grouped in non-polar, polar, positively charge, and negatively charged as described in Section 2. Mitochondrial genes show signi®cantly greater percent composition of non-polar amino acids, and signi®cantly smaller proportion of charged amino acids (both P , 0:001).
correlation between pN.I. and cN.I. values if a strong one existed. 4.2. Protein and nucleotide composition An alternative hypothesis is that contrasting patterns of cN.I. and pN.I. values in the three genomes are due to differences in protein composition inherent in our limited sample of genes. Fig. 3 shows that percent composition of non-polar amino acids and charged amino acids are signi®cantly different among the three genomes. However, it is the mitochondrial proteins that are most inconsistent with the other two genomes with respect to non-polar composition yet there is no heterogeneity among genomes for pN.I. values based on polarity (Fig. 2). Since the mitochondrial genes show approximately a 50% de®ciency in charged residues, relative to the two other data sets (Fig. 3), this may account for the signi®cant difference between the charge-based pN.I. distribution and the cN.I. distribution for mitochondrial genes. It is not entirely clear how these amino acid composition differences might account for the patterns we see among the pN.I. from the different genomes, or with respect to the distribution of their respective cN.I. values. A study contrasting the amount of radical and conservative polymorphism in transmembrane and extramembrane regions will assist in differentiating these two possibilities. An obvious question posed by our data is whether the differences between pN.I.'s and cN.I.'s scored within and between genomes is due mostly to differences in silent site evolution. This seems plausible given recent evidence for
D.M. Rand et al. / Gene 291 (2000) 115±125
non-neutral patterns of evolution at silent sites in both nuclear (Akashi, 1995; 1996; Akashi and Schaeffer, 1997) and mitochondrial genes (Ballard and Kreitman, 1995; Rand and Kann, 1998; Ballard, 2000a,b). While the data are limited, the silent site N.I. values for mitochondrial genes (`sN.I.'s', where preferred and unpreferred synonymous codons are scored as ®xed or polymorphic) also tend to be greater than 1.0, consistent with the pattern for traditional cN.I. values (Rand and Kann, 1998). We are currently examining the data sets in Weinreich and Rand (2000) with various silent site tests with the aim of comparing pN.I., cN.I. and sN.I. values across genomes. It is clear that differences in base composition, mutational properties and constraints of amino acid composition could interact to affect pN.I. and cN.I. values. This may be especially true across different genomes with different mutation rates (Li, 1997), or for genes on leading vs. lagging strands of replication that could experience different mutation rates (Rand and Kann, 1998; but see Francino et al., 1996). Since local rates of recombination can also alter the ef®cacy of selection on silent or amino acid sites (Kliman and Hey, 1993; Moran, 1996), it will be important to disentangle nucleotide and protein composition issues from other confounding factors that might alter the strength of selection. 4.3. Different selection differentials A third possible explanation for the lower levels of among-genome heterogeneity for pN.I. values in contrast to the signi®cant among-genome heterogeneity for the cN.I. values, is that the phenotypic differences between radical and conservative polymorphisms may be smaller than the differences between replacement and silent polymorphisms. By this we do not mean that there is less selective difference between an average conservative and radical mutation than between a silent and replacement mutation. Many amino acid mutations may have suf®ciently strong phenotypic effects that they are eliminated quickly and not observed as polymorphisms. The residual amino acid variants that do reach observable polymorphic frequencies or eventual ®xation, however, may have only subtle phenotypic consequences for protein function. Alternatively, these patterns may suggest that factors effecting the expression of proteins (e.g. preferred vs. unpreferred codons within mRNAs; cf. Akashi, 1995; 1996) are a more important currency in the economy of molecular evolution than are the alternative phenotypic states of individual proteins. If the selection differentials between conservative and radical substitutions are indeed smaller than those for silent and replacement changes, pN.I. values should be less `extreme' than cN.I. values, and fewer pMK tests should be signi®cant then cMK tests. To address the issue of the `extreme-ness' of pN.I. vs. cN.I. values we de®ned less extreme as being closer to the strictly neutral expectation
123
of pN.I. or cN:I: 1:0. Considering the 29 genes studied, each with three pN.I. values for the different amino acid classi®cations (polarity, polarity and volume, and charge) there are 87 pN.I. values. For the 13 Drosophila genes, 23/ 39 pN.I. values are less extreme than the comparable cN.I. values. For the ten animal mtDNA genes, there are 17/30 pN.I. values that are less extreme than their respective cN.I. values. And for the six Arabidopsis genes, 10/18 pN.I. values are less extreme than the cN.I. values. Combining the data, 50/87 pN.I. values are less extreme than their respective cN.I. values. While the data does indicate that less extreme pN.I. values are more common than more extreme pN.I. values, these differences are not signi®cant (P , 0:2). These simple contingency tests would be ¯awed if there were some non-independence among the three pN.I. values for the different amino acid classi®cations within a given gene (and cN.I.s). Since none of the three cN.I. values are signi®cantly correlated with each other, we have treated them as statistically independent. As for the number of signi®cant tests, the data do reveal that cMK tests reject neutrality a greater proportion of the time than do pMK tests (31% vs. 8%; see Section 4.1). As discussed above, it is not clear whether this is a result of limited power, or a pattern more consistent with neutrality. While the tests of these two predictions appear weak individually, both suggest the same trend consistent with the notion that conservative-radical amino acid differences have smaller selection differentials than silent-replacement differences. There simply may not be enough data at present to distinguish between these two alternatives. Clearly these two different classes of mutations have different mutational properties which need to be examined. Furthermore, it will be important to examine the site frequencies of segregating amino acid polymorphisms to get a sense of the relative ages of distinct amino acid mutations (Nielsen and Weinreich, 1999). If purifying selection is acting disproportionately on radical amino acid replacement mutations, we predict that the mean age of such polymorphisms will be less that the mean age of conservative polymorphisms (Nielsen and Weinreich, 1999). Since there appears to be some ®ltering of general replacement polymorphisms that are admitted into populations, this could provide a test of whether the observed amino acid variants are closer to effectively neutral variants than are silent and replacement changes. 4.4. Neutrality and selection as a continuum An important distinction between the strictly- and nearlyneutral models is that the former places the neutralist selectionist debate in a dichotomous context, while the latter clearly allows for a continuous view (cf. Ohta, 1992, 1995). The distribution of selection coef®cients across the genome thus becomes a problem in the units of selection. The important question is to distinguish between the distribution of biological functions performed by individual
124
D.M. Rand et al. / Gene 291 (2000) 115±125
genes, and the distribution of phenotypically distinct mutations presented to those genes. Different genes with very different roles in the biology of the cell, or the organism, or the population of individuals in the reproductive and physical environment will capitalize on different subsets of mutations that are provided by raw DNA change. The behavior of the various neutrality index values for genes subject to strong purifying selection (e.g. histones), weak or relaxed selection (e.g. ®brinopeptides, Dickerson, 1971; ANON loci, Schmid et al., 1999), or strong adaptive evolution such as reproductive proteins (accessory gland proteins: Tsaur et al., 1998; AguadeÂ, 1999) will be heavily in¯uenced by the variety of biological functions across genes that comprise one's sample of the genome. Moreover, the history of adaptive vs. purifying episodes of evolution may change across loci (e.g. lysozyme; Messier and Stewart, 1997). If we believe that the neutralist selectionist debate should be viewed across the continuum from strong negative selection, through neutrality, to strong positive selection, the different views of this spectrum provided by pN.I. and cN.I. values can help tabulate `votes' in this debate. At the level of silent and replacement sites in Drosophila nuclear genes, about 16% (6/36) of genes depart from neutral expectations, and 5/6 of these depart in the direction of adaptive evolution (Fig. 1; Weinreich and Rand, 2000). While this may represent the `great majority' of neutral cases (84%), it certainly is not consistent with a `minute fraction' of changes being adaptive. For animal mitochondrial genes, almost half (15/31, 48%) of the cases depart from neutrality, and of those that do, 14/15 or 93% depart in the direction of negative selection. This is certainly not a `great majority' of neutral cases (52%), but is not far from the notion that a `minute fraction' are adaptive. The Arabidopsis story is similar to that for animal mtDNA (Weinreich and Rand, 2000). But when we consider conservative and radical amino acid changes, our data are in closer agreement with the proportions stated by Kimura (only 7/87, or 8% pMK tests were signi®cant, and 5/8 were in the direction of negative selection). When one corrects for multiple tests, for amino acid changes the predictions of Kimura (1983, pp. xi) seem to hold (non-neutral results emerge very slightly more frequently than they should by chance). But for DNA changes in codons, there seems to be an excess of non-neutrality. We feel that contrasts among different functional classes of all kinds of molecular changes will help elucidate the distributions of selection coef®cients within and between genes (e.g. Ballard, 2000a,b). This will help in providing upper and lower bounds for the strengths of selection coef®cients working on these different kinds of mutations. When we have a clear understanding of the nature of the functional differences between molecular polymorphisms we will be more able to appreciate the signi®cance of these different patterns of polymorphism and divergence.
4. Note added in proof The Microtus mtDNA data have been retracted (Baker et al., 1997. Nature 390, 100). Excluding these data alters our counts of pNI and cNI values, but does not affect our conclusions. Acknowledgements Supported by NSF grants DEB 9707676 to DMR and 9981497 to DMR and DMW. References AguadeÂ, M., 1999. Positive selection drives evolution of the Acp29AB accessory gland protein locus in Drosophila. Genetics 152, 543±551. Akashi, H., 1995. Inferring weak selection from patterns of polymorphism and divergence at `silent' sites in Drosophila DNA. Genetics 139, 1067±1076. Akashi, H., 1996. Molecular evolution between Drosophila melanogaster and D. simulans: reduced codon bias, faster rates of amino acid substitution, and larger proteins in D. melanogaster. Genetics 144, 1297± 1307. Akashi, H., 1999. Inferring the ®tness effects of DNA mutations from polymorphism and divergence data: statistical power to detect directional selection under stationarity and free recombination. Genetics 151, 221±238. Akashi, H., Schaeffer, S.W., 1997. Natural selection and the frequency distributions of `silent' DNA polymorphism in Drosophila. Genetics 146, 295±307. Ballard, J.W.O., Kreitman, M., 1995. Is mitochondrial DNA a strictly neutral marker? Trends Evol. Ecol. 10, 485±488. Ballard, J.W.O., 2000. Comparative genomics of mitochondrial DNA in members of the Drosophila melanogaster subgroup. J. Mol. Evol. 51, 48±63. Ballard, J.W.O., 2000. Comparative genomics of mitochondrial DNA in Drosophila simulans. J. Mol. Evol. 51, 64±75. Begun, D., Whitley, P., 2000. Adaptive evolution of relish, a Drosophila NF-kB/IkB protein. Genetics 154, 1231±1238. Dickerson, R.E., 1971. The structure of cytochrome c and the rates of molecular evolution. J. Mol. Evol. 1, 26±45. Francino, M.P., Chao, L., Riley, M.A., Ochman, H., 1996. Asymmetries generated by transcription-coupled repair in enterobacterial genes. Science 272, 107±109. Fu, Y.-X., Li, W.-H., 1993. Statistical tests of neutrality of mutations. Genetics 133, 693±709. Gillespie, J.H., 1989. Lineage effects and the index of dispersion of molecular evolution. Mol. Biol. Evol. 6, 636±647. Gillespie, J.H., 1995. On Otha's hypothesis: most amino acid substitutions are deleterious. J. Mol. Evol. 40, 64±69. Hughes, A.L., Ohta, T., Nei, M., 1990. Positive Darwinian selection promotes charge pro®le diversity in the antigen-binding cleft of class I major-histocompatibility-complex molecules. Mol. Biol. Evol. 7 (6), 515±524. Kimura, M., 1983. The Neutral Theory of Molecular Evolution. Cambridge University Press, Cambridge, UK. Kliman, R.M., Hey, J., 1993. Reduced natural selection associated with low recombination in Drosophila melanogaster. Mol. Biol. Evol. 10, 1239± 1258. Kreitman, M., 1996. The neutral theory is dead Long live the neutral theory. Bioessays 18, 678±683. Kreitman, M., Akashi, H., 1995. Molecular evidence for natural selection. Ann. Rev. Ecol. Systemat. 26, 403±422.
D.M. Rand et al. / Gene 291 (2000) 115±125 Kuittinen, H., AguadeÂ, M., 2000. Nucleotide variation at the CHALCONE ISOMERASE locus in Arabidopsis thaliana. Genetics 155, 863±872. Kumar, S., Tamura, K., Nei, M., 1993. MEGA: Molecular evolutionary genetics analysi. University of Pennsylvania, University Park, PA, p. 16802. Li, W.-H., 1997. Molecular Evolution. Sinauer, Sunderland, MA. Maynard Smith, J., 1994. Estimating selection by comparing synonymous and substitutional changes. J. Mol. Evol 39, 123±128. McDonald, J.H., Kreitman, M., 1991. Adaptive protein evolution at the Adh locus in Drosophila. Nature 351, 652±654. Messier, W., Stewart, C.-B., 1997. Episodic adaptive evolution of primate lysozymes. Nature 385, 151±154. Moran, N.A., 1996. Accelerated evolution and Muller's ratchet in endosymbiotic bacteria. Proc. Natl Acad. Sci. USA 93, 2873±2878. Moriyama, E.N., Powell, J.R., 1996. Intraspeci®c nuclear DNA variation in Drosophila. Mol. Biol. Evol. 13, 261±277. Nachman, M.W., 1998. Deleterious mutations in animal mitochondrial DNA. Genetica 102/103, 61±69. Naylor, G.J., Collins, T.M., Brown, W.M., 1995. Hydrophobicity and phylogeny. Nature 373, 565±566. Nielsen, R., Weinreich, D., 1999. The age of nonsynonymous and synonymous mutations in animal mtDNA and implications for the mildly deleterious theory. Genetics 153, 497±506. Ohta, T., 1992. The nearly neutral theory of molecular evolution. Annu. Rev. Ecol. Syst. 23, 263±286.
125
Ohta, T., 1995. Synonymous and nonsynonymous substitutions in mammalian genes and the nearly neutral theroy. J. Mol. Evol. 40, 56±63. Ohta, T., 1996. The current signi®cance and standing of neutral and nearly neutral theories. Bioessays 18, 673±677. Rand, D.M., Kann, L.M., 1996. Excess amino acid polymorphism in mitochondrial DNA: contrasts among genes from Drosophila, mice and humans. Mol. Biol. Evol. 13, 735±748. Rand, D.M., Kann, L.M., 1998. Mutation and selection at silent and replacement sites in the evolution of animal mitochondrial DNA. Genetica 102/103, 393±407. Sawyer, S.A., Hartl, D.L., 1992. Population genetics of polymorphism and divergence. Genetics 132, 1161±1176. Schmid, K.J., Nigro, L., Aquadro, C.F., Tautz, D., 1999. Large number of replacement polymorphisms in rapidly evolving genes of Drosophila: implications for genome wide surveys. Genetics 153, 1717±1729. Tajima, F., 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123, 585±595. Tsaur, S.-C., Ting, C.-T., Wu, C.-I., 1998. Positive selection driving the evolution of a gene of male Reproduction Acp26Aa, of Drosophila: II. Divergence versus polymorphism. Mol. Biol. Evol. 15, 1040±1046. Weinreich, D.M., Rand, D.M., 2000. Contrasting patterns of non-neutral evolution in proteins encoded in nuclear and mitochondrial genomes. Genetics 156, 385±399. Zhang, J., 2000. Rates of conservative and radical nonsynonymous nucleotide substitutions in mammalian nuclear genes. J. Mol. Evol. 50, 56±68.