Comparative genetic mutation frequencies based on amino acid composition differences

Comparative genetic mutation frequencies based on amino acid composition differences

Mutation Research 600 (2006) 89–92 Comparative genetic mutation frequencies based on amino acid composition differences Amandio Vieira ∗ Endocrine & ...

111KB Sizes 0 Downloads 75 Views

Mutation Research 600 (2006) 89–92

Comparative genetic mutation frequencies based on amino acid composition differences Amandio Vieira ∗ Endocrine & Metabolic Research Laboratory, K9600, Faculty of Applied Sciences, Simon Fraser University, 8888 University Drive, Burnaby, BC V5A 1S6, Canada Received 26 July 2005; received in revised form 30 November 2005; accepted 7 March 2006 Available online 30 June 2006

Abstract Genetic variation inferred from large-scale amino acid composition comparisons among genomes and chromosomes of several species, Saccharomyces cerevisiae, Drosophila melanogaster, Ceanorhabditis elegans, H. sapiens, is shown to be correlated (highest, r2 = 0.9855, p < 0.01) with reported mutation rates for various genes in these species. This study, based largely on pseudogene data, helps to establish reference mutation frequencies that are likely to be representative of overall genome mutation rates in each of the species examined, and provides further insight into heterogeneity of mutation rates among genomes. © 2006 Elsevier B.V. All rights reserved. Keywords: Comparative genomics; Mutation frequencies; Pseudogenes

Genetic mutation events are a critical part of the evolutionary process. Such events include single nucleotide substitutions, and insertions and deletions (indels). There is evidence that the frequency of mutation events differs within a given genome or among different genomes [1,2]. Mutation rates have been calculated for various prokaryotic and eukaryotic organisms [3]; they indicate, for example, a higher rate in Drosophila melanogaster compared to humans [3–5] when one considers the number of mutations per base pair per genome replication. Knowledge of the mechanisms and patterns of inter- and intra-genome heterogeneity is of importance in order to gain a better understanding of evolutionary processes. In this study, we have used both published data [6] and a gene–pseudogene database and analysis tool (http://bioinfo.mbb.yale.edu/genome/pseudogene/



Tel.: +1 604 291 4251; fax: +1 604 291 3040. E-mail address: [email protected].

0027-5107/$ – see front matter © 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.mrfmmm.2006.03.009

composition/) developed by Gerstein and coworkers [6,7] to obtain differences in amino acid frequencies for large-scale (approximately 34,000–8,000,000 residues [6], equivalent to 100 kb to 25 Mb) gene–pseudogene comparisons and pseudogene–intergenic DNA comparisons. In the case of pseudogene and intergenic DNA, the amino acids represent residues deduced from the DNA sequence [6]. Human pseudogene–intergenic DNA comparison data was available only for chromosomes 21 and 22 [6]. We used the data for these two chromosomes as representative of the human genome; a minor correction was performed because of the possible mutation hypervariability for chromosome 21 relative to other human autosomes [1,8] (see also Table 1; we also observed a similar hyper-variability for chromosome 21 relative to 22 in terms of amino acid composition differences, data not shown). Table 1 summarizes the basic data for the genomes of the four species examined in this study: human (chromosomes 21 and 22 only), Saccharomyces cerevisiae,

90

A. Vieira / Mutation Research 600 (2006) 89–92

Table 1 Data of amino acid differences and selected mutation rates used in this study Organism

Va

1/Vb

Va /EI

(1/Vb )/IG

Mutations (bp−1 rep−1 ) [Ref.]

S. cerevisiae C. elegans D. melanogaster H. sapiens

2.86 3.95 6.29 2.23b

0.20 0.30 1.04 0.16b

3.74 6.13 13.21 3.22

0.87 0.74 1.99 0.50

1.7 × 10−10 [9] 1.5 × 10−10 [4]a 3.4 × 10−10 [4] 0.8 × 10−10c

Va and Vb : variation (i.e., differences) between the amino acid compositions of genes and pseudogenes (Va ) or pseudogenes and intergenic (translated) DNA (Vb ). As noted in the text, Vb and Va represent opposite mutational directions: for Va , the greater the mutational changes that pseudogenes undergo the greater are the expected differences with the genes. For Vb , the greater the mutational changes that pseudogenes undergo the smaller are the expected differences with the intergenic DNA. These mutation events occur on the DNA; but, as shown in this report, these events are also manifested at the level of the amino acid composition when large-scale comparisons are performed. EI and IG: fraction of genome that represents exons and introns (EI) or intergenic DNA (IG) reported for the database [6,7]. a unc gene data. b Includes slight correction (uncorrected values are 2.86 (V ) and 0.20 (1/V )) for high variation in chromosome 21 relative to other human a b autosomes (see also text). The correction consisted of dividing by a factor of 1.3 obtained from a comparison of chromosomal human–chimpanzee sequence similarities [8]: (average DNA differences for chromosome 21 and 22)/(average DNA differences over all chromosomes). c Average of data reported for both mutations/generation [4,10,11] and replications/generation [4,12,13]: 2.1 × 10−8 mut bp−1 gen−1 / 253 rep gen−1 .

Drosophila melanogaster, and Ceanorhabditis elegans. As noted in the table, to compare gene–pseudogene differences with pseudogene–intergenic (translated) DNA differences, we used the inverse of the distances reported [6] for the latter. This was done because mutation frequency of the pseudogenes has an inverse relation to the differences between pseudogenes and intergenic DNA; i.e., the higher the value of Vb in Table 1, the higher the mutation or decay [6,7] rate of the pseudogene. The terms Va and Vb represent the differences in composition, over all amino acids, between pseudogenes and genes (Va ), or between pseudogenes and translated intergenic DNA (Vb ). The following are some specific examples of the calculations involved in determining Va and Vb . In C. elegans, for example, 215,995 pseudogene amino acid residues and 8,140,673 gene amino acid residues were in the database. The frequency of each amino acid in the 215,995 pool of pseudogene residues can be compared with the frequency of each respective amino acid in the 8,140,673 pool of gene residues (hence the large-scale of the comparisons). More specifically, for example, the frequency of phenylalanine among the 215,995 pseudogenes was 7.5% of all residues, and the frequency of phenylalanine among the 8,140,673 genes was 5% of all residues (numbers were obtained from the database plots); the difference between them is 2.5%. As another example, the frequency of glutamic acid among the 215,995 pseudogenes was 4.5% of all residues, and its frequency among the 8,140,673 genes was 6.5% of all residues; the difference is 2%. If the absolute values of such differences are summed (e.g., 2.5 + 2.0 + ···) for all residues (including stop codons), the average is the Va for C. elegans in Table 1. And like-

wise, the Va values reported in Table 1 for the other three organisms can be obtained. Calculation of Vb involves a similar process, but in this case, pseudogene residue pools are compared with (translated) intergenic residue pools. A correlation analysis of the data from Table 1, amino acid differences versus the reported literature spontaneous mutation rates, is shown in Fig. 1. All four difference values (Va , 1/Vb , Va /EI, (1/Vb )/IG) for each species exhibited a statistically significant correlation

Fig. 1. Correlation of amino acid differences with reported mutation rates (per base pair per replication). Values used were obtained from Table 1. Differences for gene–pseudogene (broken lines; squares) and pseudogene–intergenic (translated) DNA (solid lines; circles) are shown using uncorrected values (, ) and values corrected for genomic fractions (䊉, ) of intergenic (IG term in Table 1) or exon–intron (EI term in Table 1) DNA. Statistical values obtained are as follows: 䊉, r2 = 0.9855, p < 0.01; , r2 = 0.9167, p < 0.05; , r2 = 0.8937, p ∼ 0.05; , r2 = 0.8963, p ∼ 0.05. The scale for the gene–pseudogene differences (squares) has been reduced 10-fold to provide similar-scale comparisons with the other data.

A. Vieira / Mutation Research 600 (2006) 89–92

with reported mutation rates. The correlation with the highest statistical significance (p < 0.01) was that for (1/Vb )/IG, i.e., the mutation of pseudogenes that leads to their eventual decay into the intergenic sequence patterns that surround them. The correction obtained by dividing 1/Vb by the fraction of the analysed genome that corresponds to IG [7] (i.e., the (1/Vb )/IG column in Table 1) provides a value that is likely more representative of mutational heterogeneity of the human genome; this is perhaps why the uncorrected value (i.e., 1/Vb column in Table 1) is less representative of the overall mutation rates reported in the literature. A strong evolutionary selection pressure is expected to act on the coding sequences; and the number of mutations accumulating between genes and pseudogenes would, thus, be expected to be less than those accumulating between pseudogenes and intergenic DNA. In this context, Fig. 1 shows better correlations with reported spontaneous mutation rates for Vb (solid lines) relative to Va (broken lines) in all four organisms. As a further test of the correlations in Fig. 1, ratios of amino acid differences for the four species were compared with equivalent ratios from the reported spontaneous mutation rates. As shown in Fig. 2b, a correlation of high statistical significance (p < 0.005) was obtained when (1/Vb )/IG values for each species were used to form the ratios. In comparison with reported mutation rates, correlations are stronger for ratios formed using Vb relative to ratios formed using Va (Fig. 2a shows that the correlations using Va were of borderline statistical significance); the reason for this is likely the same as that mentioned above for Fig. 1, i.e., differences in selection pressure between coding and non-coding sequences. Calculations of spontaneous mutation rates can include assimilation of highly variable rates from different genes and different parts of the genome. Our results, through comparative amino acid composition analyses, provide support for reported rates that include such assimilation of varied mutational frequencies (e.g., [4]); this is of importance because such rates are widely used in genetic studies (cf. review [3]). Hence, our results help establish reference mutation frequencies that are likely to be representative of overall genome mutation rates in each of the species examined; such a reference may be useful, for example, in judging whether or not the substitution rate in a particular gene (or chromosomal region) is abnormally high or low relative to overall genomic spontaneous mutation rates in that species. The present study also provides support for the use of large-scale amino acid composition analyses in such an evolutionary context.

91

Fig. 2. Correlation of the ratios of amino acid differences in four eukaryotic species with equivalent ratios obtained using reported mutation rates for various genes. In both cases, the six ratios (Sce/Hsa, Cel/Hsa, Dme/Hsa, Dme/Cel, Dmel/Sce, Cel/Sce) for the four organisms were calculated based on data in Table 1. (a) Ratios based on gene–pseudogene differences are shown using uncorrected values () and values corrected for genomic fractions (䊉) of exon–intron (EI term in Table 1) DNA. Statistical values obtained are as follows: 䊉, r2 = 0.5572, p > 0.05; , r2 = 0.6645, p < 0.05. (b) Ratios based on pseudogene–intergenic (translated) DNA are shown using uncorrected values () and values corrected for genomic fractions (䊉) of intergenic (IG term in Table 1) DNA. Statistical values obtained are as follows: 䊉, r2 = 0.9258, p < 0.005; , r2 = 0.6475, p ∼ 0.05.

Acknowledgements I would like to thank P.M. Vieira for help with analyses and preparation of manuscript. Work in the author’s laboratory is supported by a grant from the Natural Sciences and Engineering Research Council of Canada. References [1] H. Ellegren, N.G.C. Smith, M.T. Webster, Mutation rate variation in the mammalian genome, Curr. Opin. Genet. Dev. 13 (2003) 562–568. [2] K.H. Wolfe, P.M. Sharp, W.H. Li, Mutation rates differ among regions of the mammalian genome, Nature 337 (1989) 283–285.

92

A. Vieira / Mutation Research 600 (2006) 89–92

[3] P.D. Sniegowski, P.J. Gerrish, T. Johnson, A. Shaver, The evolution of mutation rates: separating causes from consequences, Bioessays 22 (2000) 1057–1066. [4] J.W. Drake, B. Charlesworth, D. Charlesworth, J.F. Crow, Rates of spontaneous mutation, Genetics 148 (1998) 1667–1686. [5] P.D. Keightley, A. Eyre-Walker, Deleterious mutations and the evolution of sex, Science 5490 (2000) 331–333. [6] N. Echols, P. Harrison, S. Balasubramanian, N.M. Luscombe, P. Bertone, Z. Zhang, M. Gerstein, Comprehensive analysis of amino acid and nucleotide composition in eukaryotic genomes, comparing genes and pseudogenes, Nucleic Acids Res. 30 (2002) 2515–2523. [7] Z.L. Zhang, P. Harrison, M. Gerstein, Digging deep for ancient relics: a survey of protein motifs in the intergenic sequences of four eukaryotic genomes, J. Mol. Biol. 323 (2002) 811–822. [8] I. Ebersberger, D. Metzler, C. Schwarz, S. Paabo, Genomewide comparison of DNA sequences between humans and chimpanzees, Am. J. Hum. Genet. 70 (2002) 1490–1497.

[9] J.W. Drake, A constant rate of spontaneous mutation in DNAbased microbes, Proc. Natl. Acad. Sci. U.S.A. 88 (1991) 7160–7164. [10] A.S. Kondrashov, Direct estimates of human per nucleotide mutation rates at 20 loci causing Mendelian diseases, Hum. Mutat. 21 (2003) 12–27. [11] M.W. Nachman, S.L. Crowell, Estimate of the mutation rate per nucleotide in humans, Genetics 156 (2000) 297–304. [12] D.M. Wloch, K. Szafraniec, R.H. Borts, R. Korona, Direct estimate of the mutation rate and the distribution of fitness effects in the yeast Saccharomyces cerevisiae, Genetics 159 (2001) 441–452. [13] J.B. Drost, W.R. Lee, Biological basis of germline mutation: comparisons of spontaneous germline mutation rates among drosophila, mouse, and human, Environ. Mol. Mutagen. 25 (1995) 48–64.