Extensive duplications of phototransduction genes in early vertebrate evolution correlate with block (chromosome) duplications

Extensive duplications of phototransduction genes in early vertebrate evolution correlate with block (chromosome) duplications

Genomics 83 (2004) 852 – 872 www.elsevier.com/locate/ygeno Extensive duplications of phototransduction genes in early vertebrate evolution correlate ...

2MB Sizes 0 Downloads 52 Views

Genomics 83 (2004) 852 – 872 www.elsevier.com/locate/ygeno

Extensive duplications of phototransduction genes in early vertebrate evolution correlate with block (chromosome) duplications Karin Nordstro¨m, 1 Tomas A. Larsson, and Dan Larhammar * Department of Neuroscience, Unit of Pharmacology, Uppsala University, Box 593, SE-751 24 Uppsala, Sweden Received 8 May 2003; accepted 7 November 2003

Abstract Many gene families in mammals have members that are expressed more or less uniquely in the retina or differentially in specific retinal cell types. We describe here analyses of nine such gene families with regard to phylogenetic relationships and chromosomal location. The families are opsins, G proteins (a, h, and g subunits), phosphodiesterases type 6, cyclic nucleotide-gated channels, G-protein-coupled receptor kinases, arrestins, and recoverins. The results suggest that multiple new gene copies arose in all of these families very early in vertebrate evolution during a period with extensive gene duplications. Many of the new genes arose through duplications of large chromosome regions (blocks of genes) or even entire chromosomes, as shown by linkage with other gene families. Some of the phototransduction families belong to the same duplicated regions and were thus duplicated simultaneously. We conclude that gene duplications in early vertebrate evolution probably helped facilitate the specialization of the retina and the subspecialization of different retinal cell types. D 2004 Elsevier Inc. All rights reserved. Keywords: Gene duplication; Paralogon; Tetraploidization; Gene phylogeny; Vertebrate; Eye; Retina; Phototransduction; Opsin

Eye evolution has challenged evolutionary biologists since the time of Darwin. Progress in molecular biology over the past several years has shown that morphologically distinct eyes share expression of related genes such as the eye specification factors sine oculis, dachshund, eyes absent, Pax2, and Pax6 as well as the photoreceptive opsins [43,45]. This can be interpreted as eyes being derived from a single ancestral photoreceptive structure with a basic machinery of components. On the other hand, the transcription factors that trigger eye development have broad tissue distribution patterns and multiple functions, indicating that they may have been recruited on separate occasions to initiate the formation of eyes. Their original function was most likely in anterior patterning of the nervous system [27]. Multiple independent origins for eyes is supported not

* Corresponding author. Fax: +46-18-511540. E-mail address: [email protected] (D. Larhammar). 1 Present address: Department of Cell and Organism Biology, Unit of Zoology, Lund University, Helgonava¨gen 3, SE-223 62 Lund, Sweden. 0888-7543/$ - see front matter D 2004 Elsevier Inc. All rights reserved. doi:10.1016/j.ygeno.2003.11.008

only by morphological data, but also by theoretical calculations, which show that camera-type eyes may evolve as quickly as in a few hundred thousand generations [31]. Whichever scenario turns out to be correct, it is likely that gene duplication was an important mechanism to provide the raw material for the evolution of eye structures, as seems to be the case for the evolution of many other anatomical structures, including the vertebrate nervous system [28], jaws [11], and fins/limbs [1,41]. Numerous genes expressed in the vertebrate retina are duplicates of genes having similar functions in other tissues, and some duplicates are restricted to distinct retinal cell types. These include genes encoding proteins involved in phototransduction, such as opsins [4], transducin a subunits [12], and cyclic nucleotide-gated ion channels [20]. Therefore, we have investigated when these and other more or less specialized vertebrate phototransduction genes arose. The number of genes in the vertebrate genome expanded drastically in the early stages of vertebrate evolution [29,39], before the origin of jawed vertebrates (gnathostomes). Evidence has accumulated over several

K. Nordstro¨m et al. / Genomics 83 (2004) 852–872

853

Fig. 1. Rhodopsin evolution. (a) Phylogenetic tree calculated with the maximum likelihood method in Tree-Puzzle 5.0 [40], based on a ClustalW protein alignment. The numbers at the nodes show reliability values, i.e., the proportion of times the group is reconstructed during the puzzling steps [40]. The opsin genes found in the human genome are indicated to the right of the corresponding branch in the tree, with the abbreviation and chromosome location used by the Santa Cruz database. The scale bar shows number of substitutions per site. In (b) the chromosome map for the visual opsins is depicted. All genes are indicated as in the Santa Cruz database. Above each of the selected genes, the chromosome band is indicated. The genes are not necessarily depicted in the same order as they appear on the chromosome arms. The genes CLCN6 on 1p36.22 and CLCN5 on Xp11.22 have not been included as these probably arose long before the block duplications. In (c) the chromosomal locations of nonvisual opsin genes and a selection of flanking genes are depicted. In (b and c) the opsins are shaded.

854

K. Nordstro¨m et al. / Genomics 83 (2004) 852–872

Fig. 1 (continued ).

years suggesting that a considerable fraction of these genes arose by duplications of blocks of genes or complete chromosomes or, in fact, the entire genome [19,38]. This conclusion is based on the findings of numerous similar-looking chromosome regions within individual vertebrate genomes, initially identified in the human genome [24]. Often four copies with extensive similarity in gene content can be identified [24], and such a quartet of related chromosome regions is called a paralogon [7]. To identify paralogons it is essential to analyze large chromosome regions, usually entire chromosome arms, due to the high frequency of intrachromosomal rearrangements that distort the exact gene order [2,16,33,36,37]. We describe here analyses of phylogenetic relationships and chromosomal locations for several gene families with one or more members expressed in the retina and conclude that many of these arose through block, chromosome, or genome duplications during early vertebrate evolution.

Results The gene families are described in the order they appear in the phototransduction pathway. Only a few of the adjacent gene families are shown for identification of a

paralogon. For further information, several extensive listings of paralogons have been published [7,23,25,34,35,48]. In most cases, the displayed gene families consist of a small number of members that seem to have diverged during early vertebrate evolution. Opsins Opsins are the key photoreceptive components. Phylogenetic studies have shown that all eukaryotic opsins are more similar among themselves than to other G-proteincoupled receptors [3], suggesting a single origin for opsins (Fig. 1a). No less than eight opsins have been identified in the human genome. One of these is of recent primate origin (and allowed evolution of distinct red and green opsins), but data from other vertebrate species suggest that the ancestral gnathostome probably possessed a similar number of opsins. Even the cephalochordate amphioxus has as many as six opsins, four of which may reflect the repertoire of the ancestor of cephalochordates and craniates [21], an estimate supported by the finding of three opsins in the Ciona intestinalis genome [10]. The mammalian opsins are almost exclusively expressed in distinct retinal cell types. The human opsins RHO, OPN1SW (blue), OPN1LW (red), and OPN1MW (green) are more closely related to

K. Nordstro¨m et al. / Genomics 83 (2004) 852–872

each other than to the remaining opsins (Fig. 1a) and share exon –intron organization, with four introns in the coding region [15] (red and green have one additional intron). Four additional vertebrate opsin genes share three of the four introns, namely encephalopsin (OPN3), pinopsin, parapinopsin, and vertebrate ancient opsin (VA), but only the first mentioned of these has yet been found in mammals. The remaining three human opsins, peropsin (RRH), melanopsin (OPN4), and RGR-opsin, have an exon – intron organization different from those mentioned, suggesting early divergence [4,15]. RRH and RGR-opsin share two intron positions with each other, indicating a closer relationship between these. This is supported by the phylogenetic analysis (Fig. 1a) as well as their function as retinal isomerases [6,21]. It should be noted that the human nonvisual opsins seem to have a higher evolutionary rate than the visual opsins (Fig. 1a), which may complicate tree calculation. The chromosomal locations for RHO and OPN1SW are on human chromosome (Hsa) 3q and 7q, respectively, together with members of gene families such as CLCN, PLOD, and STAG, suggesting that they belong to a block of genes that was duplicated as a unit (Fig. 1b). This paralogon also involves a region on Xq with the OPN1LW –OPN1MW genes. A fourth similar chromosome region is present on 1p, but no opsin gene has yet been found on this chromosome arm. The opsin genes RRH and RGR on 4q and 10q, respectively, seem to belong to an extensive paralogon (Fig. 1c) involving a very large number of gene families [7,48]. This paralogon also includes regions on 5q and 8p [44], where the latter is complemented by a cluster on chromosome 2p (see Fig. 7c below). OPN4 on 10q has a distinct exon – intron organization, suggesting that the gene may have arisen before the block duplications, in which case its assumed duplicate on 4q has been lost. The remaining human opsin gene OPN3 is located on 1q43 and is not obviously part of any known paralogon. It differs extensively in primary sequence and exon –intron organization from the other opsin genes, suggesting early divergence [15].

855

Gia subunit gene to which they couple by inhibiting adenylyl cyclase. These genes are also equally distantly related to each other (Fig. 2a). Thus, it seems likely that an ancestral pair of one GNAT gene and one GNAI gene was duplicated twice to give the three pairs (Fig. 2b) on Hsa1p, 3p, and 7q. These chromosome arms, together with Hsa12p (not shown), form a paralogon with numerous gene families, mentioned above for opsins [35] and below for GNB1 –4 (Fig. 3b). The GNAT and GNAI genes have the same exon – intron organization [12]. Additional Ga genes seem to have been duplicated as parts of larger chromosome regions. The genes GNA11 and GNA15 are located close to each other on 19p and their most closely related genes GNAQ and GNA14 form a pair on 9q (Fig. 2c). The pair probably arose by a local duplication as all four genes have the same exon –intron organization with one intron less than GNAT and GNAI (not shown). The local duplication was presumably followed by duplication of a large chromosome region as many additional gene families are represented on these two chromosomes (Fig. 2c). These chromosome regions belong to an extensive paralogon with gene family members also on 1q and 15q (supplemented by Hsa6, probably due to breakage and translocation) [35]. No Ga gene has been found on the latter two chromosome arms. The phylogenetic analysis in Fig. 2a suggests serial duplications of the four genes, with GNA15 evolving faster than the others, explaining why its exact branch point is uncertain. The two closely related genes, GNAS and GNAL (olfactory), are located on 18p and 20q, respectively, and share neighbors from a number of gene families (Fig. 2d) forming a paralogon with 8q and 16q. Likewise, the close relatives GNA12 and GNA13 on 7p and 17q, respectively, have many neighboring gene families in common in the extended Hox cluster, a paralogon that also includes Hsa2q and 12q (Fig. 2e). The remaining two Ga genes, finally, GNAO on 16q and GNAZ on 22q, lack obvious similarity in their flanking chromosome regions and actually seem to have arisen prior to the protostome – deuterostome split (Fig. 2a). Transducin b subunits

Transducin a subunits Transducins are photoreceptor cell G proteins. Each G protein consists of three unrelated subunits named a, h, and g. In mammals 16 a subunits have been identified, which can be classified into subfamilies based on sequence similarity (Fig. 2a) and functional coupling [12]. The transducin a subfamily consists of two subtypes expressed in the retina where they stimulate the activity of phosphodiesterase, with GNAT1 being expressed in rods and GNAT2 in cones [12]. The third member, GNAT3 (gustduucin), is expressed in taste cells. Gustducin and the two transducins are almost equally closely related to each other (Fig. 2a). Each of these three genes is located close to a

The G-protein h-subunit gene family consists of seven human members [12]. Members GNB1 –4 are approximately equally closely related to each other (Fig. 3a), while the other three are considerably more distant (not shown) and will not be discussed further. GNB1 – 4 show broad tissue distribution with differential expression in retinal cell types. Rods and amacrine cells express GNB1, while cones and bipolar cells express GNB3 [32]. GNB1 –4 are located on chromosomes 1p, 3q, 7q, and 12p and are flanked by many other gene families in the paralogon described above for opsins (Fig. 1b) and Ga (Fig. 2b). The phylogenetic tree (Fig. 3a) suggests that GNB3 may have diverged from the ancestor of the remaining three genes prior to the

856

K. Nordstro¨m et al. / Genomics 83 (2004) 852–872

K. Nordstro¨m et al. / Genomics 83 (2004) 852–872

857

Fig. 2. G-protein a-subunit evolution. (a) Phylogenetic tree with the genes found in the human genome indicated to the right of the corresponding branch in the tree. GNAT1 is expressed in rods and GNAT2 in cones. GNAT3 from the Celera database is identical to GDCA, gustducin, in the OMIM database. In (b – e) the chromosome maps for the G-protein a subunits are depicted as explained in the text.

protostome –deuterostome split. However, GNB3 has a higher evolutionary rate than the other three proteins (as shown by branch lengths), thus the tree may not be a true

reflection of the duplication events. The phylogenies of the other gene families on Hsa12p support block duplication in early vertebrate evolution.

858

K. Nordstro¨m et al. / Genomics 83 (2004) 852–872

Fig. 3. G-protein h-subunit evolution. (a) Phylogenetic tree with the genes found in the human genome indicated to the right of the corresponding branch in the tree. The human genes GNB1L and GNB2L1 were used as outgroup to root the tree. GNB1 is expressed in rods and GNB3 in cones. In (b) a chromosome map for G-protein h subunits is depicted as explained in the text.

Transducin c subunits The G-protein g-subunit family contains 12 mammalian members (Fig. 4a) [12]. A few are ubiquitously expressed and many are present in the brain. Rods and cones express distinct g subunits: GNGT1 and GNGT2. However, GNGT1 is more closely related to GNG11. These two are adjacently located on Hsa7q, suggesting a recent local duplication. As

neither has been identified outside mammals, it is too early to say when this duplication took place. GNGT2 on Hsa17q is located in a region paralogous to 7q/7p (Fig. 4b). These chromosome regions, together with the g-gene-lacking Hsa2q and 12q, belong to the well-characterized Hoxcluster paralogon. The G-protein g family contains additional subfamilies located in paralogons. GNG2, 3, 4, and 8 show phyloge-

K. Nordstro¨m et al. / Genomics 83 (2004) 852–872

netic relationships consistent with origin in early vertebrate evolution (Fig. 4a) and are located on Hsa1q, 19q 14q, and 11q, in a paralogon shown in Fig. 4c. GNG7 and GNG12 are most closely related to each other and may also have an early vertebrate origin (Fig. 4a), which is supported by their locations on Hsa19p and 1p in a paralogon that also includes 6p and chromosome 9 (Fig. 4d). Also GNG5 and

859

GNG10 are most closely related to each other and are located on 1p and 9q, the same paralogon as GNG7 and GNG12 (Fig. 4d). It is possible that an ancestral GNG pair was duplicated, followed by the loss of one gene from Hsa9 and one from Hsa19p. GNG13, finally, is located on 16p, which bears no obvious similarity to any other chromosome region.

Fig. 4. G-protein g-subunit evolution. (a) Phylogenetic tree with the genes found in the human genome indicated to the right of the corresponding branch in the tree. GNGT1 is expressed in rods and GNGT2 in cones. In (b – d) the chromosome maps for the G-protein g-subunits are depicted as explained in the text.

K. Nordstro¨m et al. / Genomics 83 (2004) 852–872

Fig. 4 (continued ).

860

K. Nordstro¨m et al. / Genomics 83 (2004) 852–872

Phosphodiesterases Phosphodiesterases (PDE) hydrolyze cGMP and cAMP to GMP and AMP, respectively. They form a large gene family with over 20 human members, three of which are expressed in the retina; PDE6A and PDE6B are expressed in rods while PDE6C is expressed in cones. These three genes are more closely related to each other than to the remaining phosphodiesterases, and available frog and fish sequences

861

support early vertebrate origin (Fig. 5a). PDE6A, B, and C are located on 4p, 5q, and 10q (Fig. 5b), the same paralogon as described above for opsins (Fig. 1c), strongly suggesting that they arose by block duplications [7,48]. Cyclic nucleotide-gated channels (CNG) The cyclic nucleotide-gated channel (CNG) gene family has six human members, classified as four a subunits and

Fig. 5. Phosphodiesterase evolution. (a) Phylogenetic tree with the human PDE6 genes indicated to the right of the corresponding branch in the tree. The human PDE5A gene was used as outgroup to root the tree. PDE6A and PDE6B are expressed in rods and PDE6C in cones. In (b) a chromosome map for PDE6 is depicted as explained in the text.

862

K. Nordstro¨m et al. / Genomics 83 (2004) 852–872

two h subunits [20]. Rods express A1 and B1, while cones express A3 and B3. CNGA2 and A4 are expressed in olfactory sensory neurons and in vomeronasal (A4) and taste receptor cells. Other cell types also express CNG subunits. The phylogenetic tree suggests that the duplication giving rise to a and h subunits took place before the protostome –deuterostome split, as both are also found in Drosophila melanogaster and C. intestinalis (Fig. 6a). The duplications leading to the four CNGA genes probably took place in early vertebrate evolution as all four are found in fish and mammals, with no invertebrate sequences interspersed (Fig. 6a). This is supported by the exon –intron organization [20]. The two CNGB genes are also present in both fish and mammals and have a similar exon –intron structure (distinct from CNGA) and thus probably were also duplicated the same time.

The four CNGA genes are located in chromosomal contexts consistent with block duplications (Fig. 6b). However, this paralogon is not very extensive and it is unclear how large it might be. The CNGB genes on 8q and 16q, on the other hand, are clearly part of a duplicated block (Fig. 6c), with some genes represented on 18p and 20q [35]. Rhodopsin kinases The human genome contains seven known G-proteincoupled receptor kinases (GPRK). The phylogenetic analysis in Fig. 7a identifies two major clades, one consisting of ADRBK1 and ADRBK2 (adrenergic receptor h kinases) along with invertebrate GPRKs, while the remaining five GPRKs cluster with a different subset of invertebrate

Fig. 6. Cyclic nucleotide-gated channel evolution. (a) Phylogenetic tree with the genes found in the human genome indicated to the right of the corresponding branch in the tree. CNGA1 and CNGB1 are expressed in rods and CNGA3 and CNGB3 in cones. HCN1 does not have a name in the Santa Cruz database. In (b) a chromosome map for the CNGA subunit is depicted as explained in the text, and in (c) a chromosome map for the CNGB subunit is depicted.

863

Fig. 6 (continued ).

K. Nordstro¨m et al. / Genomics 83 (2004) 852–872

864

K. Nordstro¨m et al. / Genomics 83 (2004) 852–872

kinases. Thus, two genes probably existed before the protostome –deuterostome split. RHOK and GPRK7, both of which are expressed in the retina, are located on chromosomes 13q and 3q, respectively, and are flanked by members of several other gene families, some of which also have copies on Xq, suggesting that a block of genes was duplicated (Fig. 7b). Likewise, the genes GPRK2L, GPRK5, and GPRK6 on chromosomes 4p, 5q, and 10q are flanked by several gene families (Fig. 7c), suggesting block duplication also involving opsins (Fig. 1c), phosphodiesterases (Fig. 5b), and recoverin family members (Fig. 9c, below). The phylogenetic tree suggests that the block harboring the RHOK –GPRK7 ancestor was duplicated earlier than the block with the GPRK2L – GPRK5 – GPRK6 ancestor, as RHOK – GPRK7 diverge from each other before the D. melanogaster kinase branches off from the GPRK2L – GPRK5 –GPRK6 clade. However, the rapid

evolutionary rates of RHOK and GPRK7, as well as GPRK2L, in comparison with GPRK5 and GPRK6, complicate time point estimates based on sequence divergence. Arrestin The two arrestin genes SAG (S antigen) and ARR3 are expressed in rods and cones, respectively. ARRB1 and ARRB2 are expressed in nonvisual cell types [13]. The phylogenetic tree (Fig. 8a) reveals quite uneven evolutionary rates, with the two retina arrestins evolving most rapidly, and is not inconsistent with quadruplication early in vertebrate evolution [8]. The chromosomal organization suggests that a block of gene families including arrestin was duplicated (Fig. 8b), forming a paralogon consisting of 11q, 17p, and Xq [35]. The region on 2q may be a translocation from the fourth member of a presumed quartet (3q) or an

Fig. 7. G-protein-coupled receptor kinase evolution. (a) Phylogenetic tree with the genes found in the human genome indicated to the right of the corresponding branch in the tree, with GPRK7 and RHOK being involved in phototransduction. In (b) the chromosome map for the visual GPRKs is depicted. RHOK is not mapped in the Santa Cruz database—the chromosomal location was retrieved from Celera. RASA3 is named GAP1IP4BP in the Santa Cruz database. In (c) the chromosomal locations of nonvisual GPRKs and a selection of flanking genes are depicted.

865

Fig. 7 (continued ).

K. Nordstro¨m et al. / Genomics 83 (2004) 852–872

866

K. Nordstro¨m et al. / Genomics 83 (2004) 852–872

Fig. 8. Arrestin evolution. (a) Phylogenetic tree with the genes found in the human genome indicated to the right of the corresponding branch in the tree. SAG is expressed in rods and ARR3 in cones. ARRB1 and ARRB2 are used for nonvisual cell regulation. In (b) a chromosome map for the arrestin genes is depicted as explained in the text.

independent duplication of a small block from one of the other three chromosomes. Recoverin The recoverin family has at least 11 human members (Fig. 9a). This family is distantly related to other Ca2 +-

binding proteins such as calmodulin and GUCA1 (guanylyl cyclase-activating protein). The recoverin gene RCV1 is expressed exclusively in the retina [9], like GUCA1. Calmodulin is expressed in the retina as well as elsewhere. RCV1 is located on Hsa17p in a chromosomal region (17p13.1) that may belong to a paralogon consisting of 1p,

K. Nordstro¨m et al. / Genomics 83 (2004) 852–872

867

Fig. 9. Neuronal calcium sensor family (recoverin) evolution. (a) Phylogenetic tree with the genes found in the human genome indicated to the right of the corresponding branch in the tree. RCV1 is expressed in rods and cones. In (b and c) chromosome maps are depicted as explained in the text.

868

K. Nordstro¨m et al. / Genomics 83 (2004) 852–872

Fig. 9 (continued ).

3q, 12p, and 17p/7q (Hallbo¨o¨k, Lundin, and Larhammar, unpublished) but no duplicates of RCV1 are known in these regions. HPCA on 1p is most closely related to HPCAL1 on 2p. HPCAL4 on 1p likewise is most closely related to VSNL1 on 2p, suggesting that an ancient gene pair was duplicated along with several other genes (Fig. 9b). Hsa8q may also be involved as it harbors NCALD and shares several other gene families with Hsa1p and 2p. The fourth member of this quartet appears to be 20q although no recoverin/HPCA relative has been found on this chromosome. The taxonomic distribution and phylogenetic distances between the genes on the three chromosomes suggest that these block duplications took place early in vertebrate evolution. Likewise, the four recoverin family members KCNIP1, KCNIP2, KCNIP4, and CSEN are more closely related to each other than to the other members of the family (Fig. 9a) and are located in the previously described paralogon involving 2q, 4p – 4q, 5q, and 10q [35,48]. Fig. 9c shows some of the many gene families with members in these four chromosome regions. Again, the phylogenetic trees suggest that these block duplications took place early in vertebrate evolution. Also the three human calmodulin genes seem to have arisen by duplication as part of a chromosome block (not shown). Thus, recoverin itself seems to be one of the few genes in its family that was not involved in early vertebrate block duplications.

Discussion Proteins that belong to multimembered gene families carry out the phototransduction cascade in the mammalian retina. Therefore, it is of interest to see how and when these gene families expanded and if the paralogs may shed light on how cell-specific gene expression has arisen. We specifically wished to investigate whether new genes arose in the block or chromosome duplications proposed to have taken place in early vertebrate evolution. Thus, we calculated phylogenetic trees for nine gene families to identify gene duplication events that seemed to coincide with the origin of vertebrates and retrieved information on human chromosomal locations for these genes. We found that all nine gene families indeed seem to have expanded during early vertebrate evolution. Furthermore, all nine families were found to have members located in gene clusters that appear to have duplicated as blocks, resulting in three or four similar-locking clusters, so-called paralogons. Moreover, some of the families even consist of subfamilies of genes that seem to have expanded as parts of separate paralogons. For instance, G-protein a subunits form four subfamilies, each of which belongs to a separate paralogon (Figs. 2b – 2e), and the G-protein g subunits consist of three subfamilies belonging to distinct paralogons (Figs. 4b– 4d). Some of the gene families or subfamilies were found to belong to the same paralogons, as shown in Figs. 10a– 10e. The most extensive of these is the paralogon encompassing

K. Nordstro¨m et al. / Genomics 83 (2004) 852–872

2q, 4p/q, 5q, and 10q (Fig. 10a), which includes the opsin isomerases RRH and RGR, the three PDE6 genes, three receptor kinases (GPRK), and four recoverin relatives. The

869

1p, 3p/q, 7q, and Xq (12p) paralogon (Fig. 10b) likewise includes four phototransduction gene families, namely visual opsins, transducin a subunits and Gia subunits, and G-

Fig. 10. Display of phototransduction gene families that seem to belong to the same paralogon (all other gene families excluded). Sometimes the genes are located far apart on the same chromosome, for instance RRH on 4q and PDE6B on 4p, but synteny is supported by close proximity of their relatives on 5q and by numerous other adjacent genes on chromosomes 4p/q and 5q. In (a) the most extensive paralogon is depicted. It contains nonvisual opsins on 4q and 10q; PDEs on 4p, 5q, and 10q; nonvisual GPRKs; and genes from the recoverin family (although not involved in visual transduction). The paralogon in (b) contains visual opsins, transducin a subunits involved in photoreception, inhibitory a subunits ubiquitously expressed, and transducin h subunits with GNB1 and 3 used for visual transduction. The paralogons in (c and d) contain G-protein a and g subunits, of which GNGT1 and T2 are used in vision. Finally, the paralogon in (e) shows the last two G-protein a subunits and the CNGh subunits used in rods and cones.

870

K. Nordstro¨m et al. / Genomics 83 (2004) 852–872

Fig. 10 (continued ).

protein h subunits. Further, two of the G-protein a-subunit families are colocated with g subunits (Figs. 10c and 10d), suggesting that there once was an ancestral a – g gene pair that was duplicated before each underwent block duplications in the vertebrate lineage. One of the ancestral GNA genes, finally, seems to have been located in the same chromosome segment as a CNGB gene (Fig. 10e), although none of the four present chromosomes in this paralogon harbors both. Some chromosome blocks seem to have been rearranged so that genes have drifted apart along the chromosome. For instance the opsin gene RHO on 3q22.1 is far away from FLNB and PLXNB1 on 3p, while their orthologs are close together in Xq28 and 7q32.1 (Fig. 1b). In other cases, translocations that break apart a block have taken place, clearly illustrated by the GPRK-containing paralogon in which one of the quartet members is split between 2p and 8p (Fig. 7c). An additional complicating factor is that many paralogs have been lost, and gene loss is known to be extensive after tetraploidizations [17,47]. The high frequency of intrachromosomal rearrangements together with translocations and gene losses often makes tracing of ancestral chromosome configurations difficult [23,47] and indicates that additional and more extensive paralogons may be identified as additional information on gene locations is obtained from other vertebrates. Given the extent of the block duplications, and their probable occurrence during a limited period of time in early vertebrate evolution, it is most parsimonious to assume that

they took place synchronously as genome-wide tetraploidizations. However, it is impossible at this stage to rule out an extensive wave of independent block duplications rather than complete genome doublings, although independent duplications would be more likely to cause imbalance in biochemical pathways. Gene duplications appear to have contributed to retinaspecific expression of genes and to cell-type-specific expression within the retina. As described under Results, several paralogs appear to have more or less retina-specific expression. Some paralogs are exclusively expressed in individual retinal cell types, the prime examples being rhodopsin in rods, blue-red-green opsins in cones, and melanopsin in a subset of ganglion cells. Other cases are the PDE6A and PDE6B genes in rods and the PDE6C gene in cones, GNAT1 – GNGB1– GNGT1 in rods and GNAT2– GNGB3 – GNGT2 in cones, and arrestins SAG and ARR3 in rods and cones, respectively. In addition, GNGB1 is expressed in amacrine cells and GNGB3 in bipolar cells [32]. Restricted expression within the retina does not rule out expression outside the retina; for instance, the CNG subunits are known to be expressed in many cell types in the brain [20]. Interestingly, the single known arrestin in the chordate C. intestinalis is expressed exclusively in the eye of the larva [30]. This suggests that the eye is the primordial site of expression for the ancestral chordate arrestin gene and that SAG and ARR3 became specialized in rods and cones and probably contributed to their diversification, while ARRB1 and ARRB2 may have acquired more widespread distribution independently in the vertebrate lineage. Invertebrates also use arrestins inside (Drosophila Phosrestin I and II) and outside (Kurtz) the visual system. Information from additional species is required before the original arrestin function can be deduced. Interesting species differences have recently been reported for mammalian GPRK, as both RHOK and GPRK7 are expressed in cones in human and rhesus monkey, whereas pig and dog cones express only GPRK7, and mouse and rat cones express only RHOK [46], called GRK1 in the cited reference. In the teleost fish medaka, cones are known to express GRK7, with no information yet on RHOK [18]. Other species differences include the expression of short wavelength (blue) opsin and middle-wavelength (green) opsin in mammals, of which humans have distinct cone populations for the two opsins, whereas mice can coexpress the two in the same cells [26]. Thus, after duplication the daughter genes may evolve differently in different lineages. Recoverin, with its retina-specific expression, seems to be one of the few genes in its family that did not arise or duplicate by chromosome doubling, unless the visinin gene is a paralog (Fig. 9a), but visinin has so far not been found in mammals, only in chicken and pufferfish. As both calmodulin and GUCA1 are expressed in retina cells, and GUCA1 exclusively so [5], it is likely that the ancestral member of the recoverin branch was retina-specific, too. Regarding the other

K. Nordstro¨m et al. / Genomics 83 (2004) 852–872

two branches of the recoverin tree, they both seem to have been part of block duplications (Figs. 9b and 9c). However, there is still insufficient information regarding their distribution in the retina to speculate whether gene duplications have led to functional diversification. It is known that NCALD is expressed in the retina [22], whereas the members of the quartet KCNIP1, KCNIP2, KCNIP4, and CSEN are not. Thus, a number of examples exist for functional diversification, or subfunctionalization, after gene duplication. This means that a mother gene had a broader functional repertoire and/or anatomical distribution than its daughters, which have subdivided the functions between them [14]. In the eye, it is clear that gene duplicates may evolve distinct regulatory mechanisms that can provide functional advantages for different aspects of vision. The evidence presented here strongly suggests that block, chromosome, or even genome duplications have contributed to the functional specialization of the vertebrate phototransduction pathways.

Methods Phylogenetic analyses Protein sequences were retrieved from GenBank (http:// www.ncbi.nlm.nih.gov/Entrez/), JGI (http://genome.jgipsf.org/ciona4/), and Ensembl (http://scrappy.fugu-sg.org/ Fugu _ rubripes/) and aligned with ClustalW 1.8 [42]. BLAST searches were performed to identify additional gene family members. Phylogenetic trees were calculated with the quartet maximum-likelihood method in Tree-Puzzle 5.0 [40]. The statistical support for the nodes is given by reliability values, i.e., the proportion of times the group is reconstructed during the puzzling steps [40]. Accession numbers and alignments are available from the authors upon request. Chromosome locations Genes were localized in the human genome using the University of Santa Cruz Web site (http://genome.ucsc. edu/) November 2002 assembly. For consistency all genes are named according to the Santa Cruz database, except where indicated. The Celera database (http:// www.celeradiscoverysystem.com) was used to confirm chromosome locations. Note that the genes shown in the figures are not necessarily depicted in the same order as they appear on the chromosome arms.

Acknowledgments We thank Lars-Gustav Lundin for valuable discussions. K.N. was supported by Grant 243/2000 from the Swedish Research Council (VR) to Dan Nilsson and D.L., and D.L. received support from Grant B5107-20005950 from the Swedish Research Council (VR).

871

References [1] D.G. Ahn, M.J. Kourakis, L.A. Rohde, L.M. Silver, R.K. Ho, T-box gene tbx5 is essential for formation of the pectoral limb bud, Nature 417 (2002) 754 – 758. [2] S. Aparicio, Vertebrate evolution: recent perspectives from fish, Trends Genet. 16 (2000) 54 – 56. [3] D. Arendt, J. Wittbrodt, Reconstructing the eyes of Urbilateria, Philos. Trans. R. Soc. London B Biol. Sci. 356 (2001) 1545 – 1563. [4] J. Bellingham, R.G. Foster, Opsins and mammalian photoentrainment, Cell Tissue Res. 309 (2002) 57 – 71. [5] R.D. Burgoyne, J.L. Weiss, The neuronal calcium sensor family of Ca2 +-binding proteins, Biochem. J. 353 (2001) 1 – 12. [6] C.K. Chen, K. Zhang, J. Church-Kopish, W. Huang, H. Zhang, Y.J. Chen, J.M. Frederick, W. Baehr, Characterization of human GRK7 as a potential cone opsin kinase, Mol. Vision 7 (2001) 305 – 313. [7] F. Coulier, C. Popovici, R. Villet, D. Birnbaum, MetaHox gene clusters, J. Exp. Zool. 288 (2000) 345 – 351. [8] C.M. Craft, D.H. Whitmore, The arrestin superfamily: cone arrestins are a fourth family, FEBS Lett. 362 (1995) 247 – 255. [9] S. De Raad, M. Comte, P. Nef, S.E. Lenz, E.D. Gundelfinger, J.A. Cox, Distribution pattern of three neural calcium-binding proteins (NCS-1, VILIP and recoverin) in chicken, bovine and rat retina, Histochem. J. 27 (1995) 524 – 535. [10] P. Dehal, Y. Satou, R.K. Campbell, J. Chapman, B. Degnan, A. De Tomaso, B. Davidson, A. Di Gregorio, M. Gelpke, D.M. Goodstein, N. Harafuji, K.E. Hastings, I. Ho, K. Hotta, W. Huang, T. Kawashima, P. Lemaire, D. Martinez, I.A. Meinertzhagen, S. Necula, M. Nonaka, N. Putnam, S. Rash, H. Saiga, M. Satake, A. Terry, L. Yamada, H.G. Wang, S. Awazu, K. Azumi, J. Boore, M. Branno, S. Chin-Bow, R. DeSantis, S. Doyle, P. Francino, D.N. Keys, S. Haga, H. Hayashi, K. Hino, K.S. Imai, K. Inaba, S. Kano, K. Kobayashi, M. Kobayashi, B.I. Lee, K.W. Makabe, C. Manohar, G. Matassi, M. Medina, Y. Mochizuki, S. Mount, T. Morishita, S. Miura, A. Nakayama, S. Nishizaka, H. Nomoto, F. Ohta, K. Oishi, I. Rigoutsos, K. Oishi, M. Sano, A. Sasaki, Y. Sasakura, E. Shoguchi, T. Shin-i, A. Spagnuolo, D. Stainier, M.M. Suzuki, O. Tassy, N. Takatori, M. Tokuoka, K. Yagi, F. Yoshizaki, S. Wada, C. Zhang, P.D. Hyatt, F. Larimer, C. Detter, N. Doggett, T. Glavina, T. Hawkins, P. Richardson, S. Lucas, Y. Kohara, M. Levine, N. Satoh, D.S. Rokhsar, The draft genome of Ciona intestinalis: insights into chordate and vertebrate origins, Science 298 (2002) 2157 – 2167. [11] M.J. Depew, T. Lufkin, J.L. Rubenstein, Specification of jaw subdivisions by Dlx genes, Science 298 (2002) 381 – 385. [12] G.B. Downes, N. Gautam, The G protein subunit gene families, Genomics 62 (1999) 544 – 552. [13] S.S. Ferguson, J. Zhang, L.S. Barak, M.G. Caron, Molecular mechanisms of G protein-coupled receptor desensitization and resensitization, Life Sci. 62 (1998) 1561 – 1565. [14] A. Force, M. Lynch, F.B. Pickett, A. Amores, Y.-l. Yan, J. Postlethwait, Preservation of duplicate genes by complementary, degenerative mutations, Genetics 151 (1998) 1531 – 1545. [15] R.G. Foster, M.W. Hankins, Non-rod, non-cone photoreception in the vertebrates, Prog. Retin. Eye Res. 21 (2002) 507 – 527. [16] M.A. Groenen, H.H. Cheng, N. Bumstead, B.F. Benkel, W.E. Briles, T. Burke, D.W. Burt, L.B. Crittenden, J. Dodgson, J. Hillel, S. Lamont, A.P. de Leon, M. Soller, H. Takahashi, A. Vignal, A consensus linkage map of the chicken genome, Genome Res. 10 (2000) 137 – 147. [17] X. Gu, W. Huang, Testing the parsimony test of genome duplications: a counterexample, Genome Res. 12 (2002) 1 – 2. [18] O. Hisatomi, S. Matsuda, T. Satoh, S. Kotaka, Y. Imanishi, F. Tokunaga, A novel subtype of G-protein-coupled receptor kinase, GRK7, in teleost cone photoreceptors, FEBS Lett. 424 (1998) 159 – 164. [19] P.W. Holland, J. Garcia-Fernandez, N.A. Williams, A. Sidow, Gene

872

[20] [21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31] [32]

[33]

[34]

[35]

[36]

K. Nordstro¨m et al. / Genomics 83 (2004) 852–872 duplications and the origins of vertebrate development, Development Suppl. (1994) 125 – 133. U.B. Kaupp, R. Seifert, Cyclic nucleotide-gated ion channels, Physiol. Rev. 82 (2002) 769 – 824. M. Koyanagi, A. Terakita, K. Kubokawa, Y. Shichida, Amphioxus homologs of Go-coupled rhodopsin and peropsin having 11-cis- and all-trans-retinals as their chromophores, FEBS Lett. 531 (2002) 525 – 528. V.D. Kumar, S. Vijay-Kumar, A. Krishnan, T. Duda, R.K. Sharma, A second calcium regulator of rod outer segment membrane guanylate cyclase, ROS-GC1: neurocalcin, Biochemistry 38 (1999) 12614 – 12620. D. Larhammar, L.G. Lundin, F. Hallbook, The human Hox-bearing chromosome regions did arise by block or chromosome (or even genome) duplications, Genome Res. 12 (2002) 1910 – 1920. L.G. Lundin, Evolution of the vertebrate genome as reflected in paralogous chromosomal regions in man and the house mouse, Genomics 16 (1993) 1 – 19. L.G. Lundin, D. Larhammar, F. Hallbo¨o¨k, Numerous groups of chromosomal regional paralogies strongly indicate two genome doublings at the root of the vertebrates, J. Struct. Funct. Genom. 3 (2003) 53 – 63. A.L. Lyubarsky, B. Falsini, M.E. Pennesi, P. Valentini, E.N. Pugh Jr., UV- and midwave-sensitive cone-driven retinal responses of the mouse: a possible phenotype for coexpression of cone photopigments, J. Neurosci. 19 (1999) 442 – 455. A. Mansouri, M. Hallonet, P. Gruss, Pax genes and their roles in cell differentiation and development, Curr. Opin. Cell Biol. 8 (1996) 851 – 857. M. Manzanares, H. Wada, N. Itasaki, P.A. Trainor, R. Krumlauf, P.W.H. Holland, Conservation and elaboration of Hox gene regulation during evolution of the vertebrate head, Nature 408 (2000) 854 – 857. T. Miyata, H. Suga, Divergence pattern of animal gene families and relationship with the Cambrian explosion, BioEssays 23 (2001) 1018 – 1027. M. Nakagawa, H. Orii, N. Yoshida, E. Jojima, T. Horie, R. Yoshida, T. Haga, M. Tsuda, Ascidian arrestin (Ci-arr), the origin of the visual and nonvisual arrestins of vertebrate, Eur. J. Biochem. 269 (2002) 5112 – 5118. D.-E. Nilsson, S. Pelger, A pessimistic estimate of the time required for an eye to evolve, Proc. R. Soc. London B 256 (1994) 53 – 58. Y.W. Peng, J.D. Robishaw, M.A. Levine, K.W. Yau, Retinal rods and cones have distinct G protein beta and gamma subunits, Proc. Natl. Acad. Sci. USA 89 (1992) 10882 – 10886. P. Pevzner, G. Tesler, Genome rearrangements in mammalian evolution: lessons from human and mouse genomes, Genome Res. 13 (2003) 37 – 45. C. Popovici, M. Leveugle, D. Birnbaum, F. Coulier, Coparalogy: physical and functional clusterings in the human genome, Biochem. Biophys. Res. Commun. 288 (2001) 362 – 370. C. Popovici, M. Leveugle, D. Birnbaum, F. Coulier, Homeobox gene clusters and the human paralogy map, FEBS Lett. 491 (2001) 237 – 242. J.H. Postlethwait, I.G. Woods, P. Ngo-Hazelett, Y.L. Yan, P.D. Kelly, F. Chu, H. Huang, A. Hill-Force, W.S. Talbot, Zebrafish comparative

[37]

[38] [39]

[40]

[41]

[42]

[43] [44]

[45] [46]

[47] [48]

genomics and the origins of vertebrate chromosomes, Genome Res. 10 (2000) 1890 – 1902. S.W. Scherer, J. Cheung, J.R. MacDonald, L.R. Osborne, K. Nakabayashi, J.A. Herbrick, A.R. Carson, L. Parker-Katiraee, J. Skaug, R. Khaja, J. Zhang, A.K. Hudek, M. Li, M. Haddad, G.E. Duggan, B.A. Fernandez, E. Kanematsu, S. Gentles, C.C. Christopoulos, S. Choufani, D. Kwasnicka, X.H. Zheng, D. Nusskern, Q. Zhang, Z. Gu, F. Lu, S. Zeesman, I. Teshima, D. Chitayat, C. Shuman, R. Weksberg, E.H. Zackai, T.A. Grebe, S.R. Cox, S.J. Kirkpatrick, N. Rahman, J.M. Friedman, H.H. Heng, P.G. Pelicci, F. Lo-Coco, E. Belloni, L.G. Shaffer, B. Pober, C.C. Morton, J.F. Gusella, G.A. Bruns, B.R. Korf, B.J. Quade, A.H. Ligon, H. Ferguson, A.W. Higgins, N.T. Leach, S.R. Herrick, E. Lemyre, C.G. Farra, H.G. Kim, A.M. Summers, K.W. Gripp, W. Roberts, P. Szatmari, E.J. Winsor, K.H. Grzeschik, A. Teebi, B.A. Minassian, J. Kere, L. Armengol, M.A. Pujana, X. Estivill, M.D. Wilson, B.F. Koop, S. Tosi, G.E. Moore, A.P. Boright, E. Zlotorynski, B. Kerem, P.M. Kroisel, E. Petek, D.G. Oscier, S.J. Mould, H. Dohner, K. Dohner, J.M. Rommens, J.B. Vincent, J.C. Venter, P.W. Li, R.J. Mural, M.D. Adams, L.C. Tsui, Human chromosome 7: DNA sequence and biology, Science 10 (2003) 10. A. Sidow, Gen(om)e duplications in the evolution of early vertebrates, Curr. Opin. Genet. Dev. 6 (1996) 715 – 722. J. Spring, Major transitions in evolution by genome fusions: from prokaryotes to eukaryotes, metazoans, bilaterians and vertebrates, J. Struct. Funct. Genom. 3 (2003) 19 – 25. K. Strimmer, A. von Haeseler, Quartet puzzling: a quartet maximumlikelihood method for reconstructing tree topologies, Mol. Biol. Evol. 13 (1995) 964 – 969. M. Tanaka, A. Munsterberg, W.G. Anderson, A.R. Prescott, N. Hazon, C. Tickle, Fin development in a cartilaginous fish and the origin of vertebrate limbs, Nature 416 (2002) 527 – 531. J.D. Thompson, D.G. Higgins, T.J. Gibson, CLUSTAL W: improving the sensitivity of progressive multiple sequence alignments through sequence weighting, position-specific gap penalties and weight matrix choice, Nucleic Acids Res. 22 (1994) 4673 – 4680. J.E. Treisman, A conserved blueprint for the eye? BioEssays 21 (1999) 843 – 850. A. Vienne, J. Rasmussen, L. Abi-Rached, P. Pontarotti, A. Gilles, Systematic phylogenomic evidence of en bloc duplication of the ancestral 8p11.21 – 8p21.3-like region, Mol. Biol. Evol. 20 (2003) 1290 – 1298. S. Wawersik, R.L. Maas, Vertebrate eye development as modeled in Drosophila, Hum. Mol. Genet. 9 (2000) 917 – 925. E.R. Weiss, M.H. Ducceschi, T.J. Horner, A. Li, C.M. Craft, S. Osawa, Species-specific differences in expression of G-protein-coupled receptor kinase (GRK) 7 and GRK1 in mammalian cone photoreceptor cells: implications for cone cell phototransduction, J. Neurosci. 21 (2001) 9175 – 9184. K.H. Wolfe, Yesterday’s polyploids and the mystery of diploidization, Nat. Rev. Genet. 2 (2001) 333 – 341. A. Wraith, A. Tornsten, P. Chardon, I. Harbitz, B.P. Chowdhary, L. Andersson, L.G. Lundin, D. Larhammar, Evolution of the neuropeptide Y receptor family: gene and chromosome duplications deduced from the cloning and mapping of the five receptor subtype genes in pig, Genome Res. 10 (2000) 302 – 310.