Update
Trends in Genetics Vol.25 No.5
Genome Analysis
Potential of fish opsin gene duplications to evolve new adaptive functions Jun Gojobori and Hideki Innan Graduate University for Advanced Studies, Hayama, Kanagawa 240-0193, Japan
The duplication of four cone-opsin gene families is heavily involved in visual adaptation in bony fish. We found that two gene families for the middle-wave range of the vision spectrum have, on average, older duplications followed by accelerated amino acid substitution, in comparison with the other two families that define the boundaries. This could be due to the difference in the potential to evolve new functions; middle-wave genes should have greater contribution to adaptive vision evolution through gene duplication.
Fish cone-opsin genes as a model of adaptive evolution by gene duplication Gene duplication has been regarded as an important mechanism in adaptive genome evolution [1,2]. Although most duplicate genes are destined to become pseudogenes owing to the relaxation of purifying selection (pseudogenization), some can acquire a novel function and be preserved for a long time (neofunctionalization) [3]. Thus, the long-term preservation of a duplicated gene largely depends on how adaptive selection favors a new functional variation of the gene. In other words, the important factor in determining the fate of a duplicated gene is its potential to evolve new functions. It is suggested that, on a genomic scale, genes with great potential probably increase in copy number and constitute a large family with substantial functional variation, which contributes to adaptive genome evolution [4]. However, it remains unclear which kind of genes have great potential to evolve novel functions through duplications. To elucidate the level of such potential, we focus here on the evolution of opsin gene families. The products of opsin genes have a major role in the recognition of color vision. The genomes of vertebrates usually have multiple copies (normally two to four) of the opsin genes, each of which has a specific distribution of the absorption wavelength of light (usually characterized by lmax, which represents the peak of the distribution), and their joint effects determine the capacity of the host individual to recognize colors [5,6]. It is well demonstrated empirically that an amino acid change in an opsin gene can cause a shift of lmax; this association of a genotype (amino acid sequence) and a phenotype (lmax) is an advantage of using opsin as a model of molecular evolution [5]. We can predict that a duplicate with amino acid changes will immediately provide an opportunity to improve the vision system of the host. Consequently, gene duplication of opsin genes has had a key role in the Corresponding author: Innan, H. (
[email protected]).
198
adaptive evolution of the color vision system. Note that there is a neutral process to preserve a duplicated gene [7], but this might not be relevant here because the evolution of opsin genes is primarily driven by functional changes caused by amino acid substitutions [5,6]. To explore the contribution of gene duplication to adaptive phenotype evolution, we have analysed the pattern of gene duplication in cone-opsin gene families in bony fish. Vision is an important sensory system for fish to detect external information, and is strongly associated with adaptation to diverse light environments in their habitats [8,9]. Vertebrate cone-opsin genes have been classified into four gene families: short wave-sensitive 1, ultraviolet-blue cone-opsin (SWS1); shortwave-sensitive 2, blue cone-opsin (SWS2); rhodopsin-like, green cone-opsin (RH2); and long wave-sensitive, red-green cone-opsin (LWS) [5,10]. The origin of the four gene families is thought to predate the common ancestor of vertebrates [10,11]. The evolution of bony fish involved several subsequent duplications, making them opsin-gene-rich species. It might be interesting to point out that extensive duplications are also involved in the genes of pigment synthesis in bony fish [12], which might be associated with the evolution of opsin genes through duplication. Copy number and amino acid diversity in the four coneopsin gene families We first examined five bony fish species, zebrafish (Danio rerio), medaka (Oryzias latipes), tilapia (Oreochromis niloticus), yellow-tailed acei (Pseudotropheus acei) and bluefin killifish (Lucania goodei), for which lmax of the four families of cone-opsin gene families (SWS1, SWS2, RH2 and LWS) have been systematically investigated. We also included lamprey (Geotria australis), which should approximately represent the ancestral state of cone-opsin genes, as is obvious in Figure 1a [11,13]. Lamprey has only one copy for each of the four gene families, and lmax of the four genes are approximately evenly spaced in the spectrum, which could be an evolutionarily stable (and presumably optimum) pattern because similar distributions are reported in many species with stable number of opsin genes [14]. By contrast, the five bony fish have various patterns with more than four opsin genes (Figure 1a). There are two observable trends in all five bony fish species: (i) the minimum lmax values (with a range of 350–370 nm) encoded by the SWS1 gene and the maximum lmax values (with a range of 510–540 nm) coded by the LWS gene are approximately identical in the five species, and very similar to those of lamprey; and (ii) conversely, the lmax values of the opsin proteins encoded by RH2 and
Update
Trends in Genetics
Vol.25 No.5
Figure 1. Summary of the distribution of lmax, average gene copy numbers and average p-distances of the four cone-opsin gene families. (a) The distributions of lmax of cone-opsin genes (SWS1, SWS2, RH2 and LWS) for lamprey, bluefin killifish, yellow-tailed acei, tilapia, medaka, and zebrafish. Data are from Refs[13,19–23]. Note that medaka and bluefin killifish have two copies of the LWS gene, although it is difficult to recognize them visually in the figure. (b) Average copy number of paralogs of the four cone-opsin gene families. (c) Average p-distance among paralogs of the four cone-opsin gene families. The results change if the duplication of SWS1 (duplication #11 in Figure 2a) is excluded, and those changes are shown in parentheses and indicated by an asterisk (*). These results show that the middle-wave genes (RH2 and SWS2) have greater number of gene duplications and greater divergence among paralogs than the boundary-wave genes (LWS and SWS1).
SWS2 are more variable, and these genes have, on average, more copy numbers than the other two. The first observation makes sense in a biological context because the range of vision spectrum between ultraviolet (SWS1) and red (LWS) should be adequate for most fish species. The lmax values for RH2 and SWS2 have changed along the evolution of the color vision system and have involved extensive gene duplication. If the optimum vision system requires many opsin proteins, of which lmax values are evenly spaced in the vision spectrum as mentioned earlier, we predict that the relative contribution of the four gene families to adaptation of the vision system should be different. Here, we emphasize the differences between the two gene families responsible for the middle of the spectrum (RH2 and SWS2, which
we refer to as the middle-wave genes) and the other two genes (LWS and SWS1, referred to as the boundary-wave genes) that define the boundaries of the spectrum. Because the boundary-wave genes should have a conservative role to maintain the boundaries, it is presumed that there might not be an advantage for a duplicate to shift lmax to shorter range for SWS1 or to longer range for LWS (although shifting to the middle would be beneficial). By contrast, any shift of lmax of a duplicate of middle-wave genes can be beneficial. Thus, there could be a substantial difference in the potential to evolve new functions through gene duplication between the boundary- and middle-wave genes. To test this idea, we investigated the cone-opsin genes in a large number of bony fish species using public databases 199
Update (191 genes in total from 62 species; see supplementary material online for method and the list of genes used in this study). If our hypothesis is correct, then the middle-wave genes (RH2 and SWS2) will have more copy numbers than the boundary-wave genes (LWS and SWS1). Although the database should miss some unidentified genes, we expect that the effect of this on our analysis would be small because the effect of missing data should affect the four gene families in a similar manner. We used one randomly chosen species from each genus in our analysis, to minimize the potential statistical bias owing to multiple gene entries that could share the same duplication event (Figure 2a,b). We ignored multiple entries with identical nucleotide sequences because those might be caused by multiple deposits. As expected, we found that the numbers of gene entries per species for RH2 and SWS2 (1.54 and 1.55, respectively) are greater than those for SWS1 and LWS (1.03 and 1.12, respectively). A permutation test showed that the difference between the two classes is highly significant (p < 0.001).
Trends in Genetics Vol.25 No.5
A second prediction of our hypothesis is that the amino acid divergence of paralogous genes should be high for the middle-wave genes. For the 62 species we investigated, we calculated the p-distances of the amino acid sequences of every pair of genes within each gene family when multiple entries were available. We found that the p-distances for the middle-wave genes (the two gene families are pooled) are, on average, larger than those of the boundary-wave genes (0.0383 versus 0.199, p < 0.001, permutation test, Figure 1c), although SWS1 has a greater p-distance than RH2 (Figure 2a,b). Thus, our ad hoc analysis supported our hypothesis; the middle-wave genes have more copy numbers with more functional diversity than the boundarywave genes, but there are some statistical problems in our analysis. For example, our sample is obviously biased toward cichlids, although similar results were obtained excluding cichlids (Figure S1 in the supplementary material online). To exclude such bias and to further elucidate the evolutionary mechanism behind the observed differences be-
Figure 2. Evolutionary analyses of duplicated genes in the four opsin gene families. (a) Inferred duplication events mapped on a species tree. The species tree was constructed mainly based on the taxonomy database of NCBI [24]. The branch lengths are adjusted so that they are approximately consistent with those of gene trees of the four cone-opsin genes. (b) The p-distances of amino acid sequences of paralogs for the 13 duplications. The numbers of each duplication event correspond to those in (a). For every duplication, the p-distance was computed for a randomly chosen species if the duplication is shared by more than one species. The choice of species does not affect the result. The duplication #11 (*) might not function in retina (see main text for details). (c) Summary of the dN/dS ratios (w) analysis. The evolutionary rates were estimated by the Nei and Gojobori method [25]. Genes with dS > 1.0 were excluded from the analysis. The numbers in the parentheses with asterisks (*) are the results of the dN/dS ratios without duplication #11. These results show that the duplications of middle-wave genes are, on average, old and that the sequence divergence of middlewave genes was accelerated after duplication.
200
Update tween the boundary- and middle-wave genes, we mapped gene duplication events on a species tree (Figure 2a). The duplication events were inferred by phylogenetic trees of the opsin gene proteins. We found that, in total, 13 duplication events are needed to parsimoniously explain the observation. The numbers of duplication events in the boundary- and middle-wave genes are not very different (five and eight, respectively), but we found a significant difference in amino acid divergence after duplication. The duplicates of the middle-wave genes have, on average, high amino acid divergence (0.101), whereas those of the boundary-wave genes have very low divergence (0.013) except for the duplication of SWS1 (the overall average is 0.044), which is represented by the purple triangle (duplication #11) in Figure 2a. The difference between the two categories is significant (p < 0.05), as indicated by a non-parametric Mann-Whitney test (Figure 2b). Our phylogenetic analysis (Figure 2a) shows that the duplication of SWS1 (duplication #11) should be specific to ayu (Plecoglossus altivelis). We found in the literature that one copy of SWS1 (SWS1–2) in ayu expresses in the retina, whereas the expression of the other (SWS1–1) was not able to be detected [15], suggesting the possibility that SWS1–1 has acquired a new function needed in other tissues than retina, or that SWS1–1 has been pseudogenized. Taking this possibility into account, we repeated the analysis excluding duplication #11, which showed stronger support of our hypothesis (p < 0.01, Mann-Whitney test; see also Figure 1). The amino acid divergence reflects both the age of duplication event and the rate of amino acid changes, which can be enhanced by positive selection. The emerging picture from Figure 2a,b indicates that duplications of the middle-wave genes are, on average, old and that their amino acid evolution is accelerated by positive selection. This is also supported by the dN/dS analysis summarized in Figure 2c. We estimated the average dN/dS ratio (v) between paralogs for duplicated genes (vp) and average dN/dS ratio between orthologs for single copy genes (vs) for the two categories of gene families. (The result of dN/dS ratio analysis for each four cone-opsin gene families is shown in Table S2.) It is found that vp is greater overall than vs, indicating accelerated evolution after gene duplication. The ratio of vp/vs was particularly high in the middle-wave genes (Figure 2c), thus emphasizing the importance of functional divergence of middle-wave genes after gene duplication. Concluding remarks We have shown that duplications of middle-wave genes in bony fish have more paralogs with greater amino acid differences than those of the boundary-wave genes. These observed differences arose because the duplication events of the middle-wave genes are, on average, older and the following amino acid divergence is more enhanced. It might be concluded that duplications of the middle-wave genes followed by amino acid divergence have greater potential to contribute to visual adaptation than those of the boundarywave genes. Our conclusion is based on limited amounts of data, and more data (e.g. genome sequences of many fish species) will be helpful to further confirm our conclusion.
Trends in Genetics
Vol.25 No.5
Another implication could be that the contribution of a gene to adaptation through duplication depends on its potential role to improve the system; therefore, there can be substantial variation even within genes in the same functional category. Because previous studies have suggested that the extent of gene duplication is determined by the functional category, haplosufficiency or the level of protein complexity [16–18], our results suggest a more complex nature of adaptive evolution through gene duplication. Interestingly, we further found that the contrast between boundary- and middle-wave genes in the copy number and amino acid divergence is larger in species that have bright and complicated body color structures (see supplementary material online). This can be explained if visual adaptation is of special importance in those species (perhaps owing to sexual selection and camouflage). Substantial contribution of gene duplication to adaptive phenotypic evolution, as we show here, emphasizes the role of gene duplication in various important evolutionary events such as speciation. Acknowledgements We thank M. Kinoshita for extensive discussion in the early stages of this work. We also thank F. Kondrashov, S. Yokoyama, K. Wolfe and three anonymous reviewers for valuable comments on an earlier version of the manuscript. This work is supported in part by grants from the Japan Society for the Promotion of Science (JSPS), NSF (USA), and the Graduate University for Advanced Studies to H.I. J.G. is also supported by a start-up grant from JSPS.
Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.tibs.2009. 01.009. References 1 Ohno, S. (1970) Evolution by Gene Duplication. Springer-Verlag 2 Li, W.H. (1997) Molecular Evolution. Sinauer 3 Walsh, B. (2003) Population-genetic models of the fates of duplicate genes. Genetica 118, 279–294 4 Shiu, S.H. et al. (2006) Role of positive selection in the retention of duplicate genes in mammalian genomes. Proc. Natl. Acad. Sci. U. S. A. 103, 2232–2236 5 Yokoyama, S. (2000) Molecular evolution of vertebrate visual pigments. Prog. Retin. Eye Res. 19, 385–419 6 Bowmaker, J.K. (2008) Evolution of vertebrate visual pigments. Vision Res. 48, 2022–2041 7 Force, A. et al. (1999) Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151, 1531–1545 8 Levine, J.S. and MacNichol, E.F., Jr (1982) Color vision in fishes. Sci. Am. 246, 140–149 9 Seehausen, O. et al. (2008) Speciation through sensory drive in cichlid fish. Nature 455, 620–626 10 Lamb, T.D. et al. (2007) Evolution of the vertebrate eye: opsins, photoreceptors, retina and eye cup. Nat. Rev. Neurosci. 8, 960–976 11 Collin, S.P. et al. (2003) Ancient colour vision: multiple opsin genes in the ancestral vertebrates. Curr. Biol. 13, R864–R865 12 Braasch, I. et al. (2007) Evolution of pigment synthesis pathways by gene and genome duplication in fish. BMC Evol. Biol. 7, 74–92 13 Davies, W.L. et al. (2007) Functional characterization, tuning, and regulation of visual pigment gene expression in an anadromous lamprey. FASEB J. 21, 2713–2724 14 Arikawa, K. (1999) Color vision. In Atlas of Arthropod Sensory Receptors (Eguchi, E. and Tominaga, Y., eds), pp. 23–32, Spring-Verlag 15 Minamoto, T. and Shimizu, I. (2005) Molecular cloning of cone opsin genes and their expression in the retina of a smelt, Ayu (Plecoglossus altivelis, Teleostei). Comp. Biochem. Physiol. B Biochem. Mol. Biol. 140, 197–205 201
Update 16 Yang, J. et al. (2003) Organismal complexity, protein complexity, and gene duplicability. Proc. Natl. Acad. Sci. U. S. A. 100, 15661–15665 17 Kondrashov, F.A. and Koonin, E.V. (2004) A common framework for understanding the origin of genetic dominance and evolutionary fates of gene duplications. Trends Genet. 20, 287–290 18 Qian, W. and Zhang, J. (2008) Gene dosage and gene duplicability. Genetics 179, 2319–2324 19 Chinen, A. et al. (2003) Gene duplication and spectral diversification of cone visual pigments of zebrafish. Genetics 163, 663–675 20 Fuller, R.C. et al. (2005) Genetic and environmental variation in the visual properties of bluefin killifish, Lucania goodie. J. Evol. Biol. 18, 516–523 21 Parry, J.W. et al. (2005) Mix and match color vision: tuning spectral sensitivity by differential opsin gene expression in Lake Malawi cichlids. Curr. Biol. 15, 1734–1739
202
Trends in Genetics Vol.25 No.5 22 Matsumoto, Y. et al. (2006) Functional characterization of visual opsin repertoire in Medaka (Oryzias latipes). Gene 371, 268–278 23 Spady, T.C. et al. (2006) Evolution of the cichlid visual palette through ontogenetic subfunctionalization of the opsin gene arrays. Mol. Biol. Evol. 23, 1538–1547 24 Wheeler, D.L. et al. (2006) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 34, D173–D180 25 Nei, M. and Gojobori, T. (1986) Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3, 418–426 0168-9525/$ – see front matter ß 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.tig.2009.03.008 Available online 11 April 2009