Retroposons of salmonoid fishes (Actinopterygii: Salmonoidei) and their evolution

Retroposons of salmonoid fishes (Actinopterygii: Salmonoidei) and their evolution

Gene 434 (2009) 16-28 Contents lists available at ScienceDirect Gene j o u r n a l h o m e p a g e : w w w. e l s e v i e r. c o m / l o c a t e / g...

2MB Sizes 7 Downloads 62 Views

Gene 434 (2009) 16-28

Contents lists available at ScienceDirect

Gene j o u r n a l h o m e p a g e : w w w. e l s e v i e r. c o m / l o c a t e / g e n e

Retroposons of salmonoid fishes (Actinopterygii: Salmonoidei) and their evolution Vitaliy Matveev a,b, Norihiro Okada a,⁎ a b

Faculty of Bioscience and Biotechnology, Tokyo Institute of Technology, Yokohama, Japan Centre for Molecular Diagnostics, Severtsov Institute of Ecology and Evolution, Russian Academy of Sciences, Moscow, Russia

A R T I C L E

I N F O

Article history: Received 30 March 2008 Received in revised form 28 April 2008 Accepted 29 April 2008 Available online 22 May 2008 Received by J. Jurka Keywords: Salmon SINE LINE Transposon Lateral transfer Schistosoma japonicum RSg-1 SalL2 AvaIII SmaI FokI HpaI MT MS MP SlmI and SlmII

A B S T R A C T Short and long retroposons, or non-LTR retrotransposons (SINEs and LINEs, respectively) are two groups of interspersed repetitive elements amplifying in the genome via RNA and cDNA-mediated reverse transcription. In this process, SINEs entirely depend on the enzymatic machinery of autonomous LINEs. The impact of retroposons on the host genome is difficult to overestimate: their sequences account for significant portion of the eukaryotic genome, while propagation of their active copies gradually reshapes it. In this way, the retropositional activity plays a role of important evolutionary factor. More than 100 LINE and nearly 100 SINE families have been described to date from the genomes of various eukaryotes, and it is salmonoid fishes (Actinopterygii: Salmonoidei) that are particularly noticeable for the diversity of transposons they host in their genomes, including two LINE and seven SINE families. Moreover, this group of ray-finned fish represents an excellent opportunity to study such a rare evolutionary phenomenon as lateral gene transfer, due to a great variety of transposons and other sequences salmons share with a blood fluke, Schistosoma japonicum (Trematoda: Strigeiformes) — a parasitic helminth infecting various vertebrates. The aim of the present review is to structure all knowledge accumulated about salmonoid retroposons by now, as well as to complement it with the new data pertaining to the distribution of some SINE families. © 2008 Published by Elsevier B.V.

1. Introduction 1.1. Structure of LINEs and SINEs A significant portion of eukaryotic genomes – up to 45% in human and mouse (Lander et al., 2001; Waterston et al., 2002) – is occupied by various repetitive transposable elements, which are able to multiply and insert their own copies into new loci. The most abundant of them – long (LINEs) and short (SINEs) interspersed repeats, or nonLTR retrotransposons – belong to class I transposable elements (retroelements, or retroposons), and in human genomic DNA account Abbreviations: SINE, short interspersed element; LINE, long interspersed element; LTR, long terminal repeat; UTR, untranslated region; ORF, open reading frame; DeuSINEs, Deuterostomia SINEs; OS-SINE1, Oncorhynchus and Salmo SINE1; PCR, polymerase chain reaction. ⁎ Corresponding author. Laboratory of Molecular Evolution, Faculty of Bioscience and Biotechnology, Tokyo Institute of Technology, 4259 Nagatsuta-cho, Midori-ku, Yokohama 226-8501, Kanagawa, Japan. Tel.: +81 45 924 5742; fax: +81 45 924 5835. E-mail address: [email protected] (N. Okada). 0378-1119/$ – see front matter © 2008 Published by Elsevier B.V. doi:10.1016/j.gene.2008.04.022

for 20 and 13% of all sequences, respectively (Lander et al., 2001), repeating hundreds to millions times. Representing two most important groups of retroelements, SINEs and LINEs are sometimes solely referred to as short and long retroposons, correspondingly, with two other classes being LTR retrotransposons and processed pseudogenes. The common feature of retroelements is their propagation in the genome via RNA and cDNA intermediates by means of reverse transcription (Singer, 1982; Rogers, 1985; Weiner et al., 1986; Brosius, 1991; Kramerov and Vassetzky, 2005; Ohshima and Okada, 2005). Long interspersed elements contain 5¢- and 3¢-untranslated regions (UTR), two (typically) or one open reading frame (ORF), preceded by the promoter for RNA polymerase II (Pol II), and are usually terminated by poly(A) tail. All LINEs encode a reverse transcriptase, or RTase (Xiong and Eickbush, 1990), which provides for their reverse transcription. In the typical case, their ORFs also encode an apurinic/apyrimidinic endonuclease, a ribonuclease H, and/ or putative nucleic-acid-binding motifs (Feng et al., 1996; Malik et al., 1999) — all involved in the process of retrotransposition. Depending on their structure, different LINEs vary greatly in length between ca. 4

V. Matveev, N. Okada / Gene 434 (2009) 16-28

and 7 kb. However, because reverse transcription frequently terminates before reaching the 5'-end of the template sequence, most copies in the genome are truncated at their 5'-terminus and may be as short as 100 bp (Hutchison et al., 1989; Burch et al., 1993; Eickbush, 1994; Kajikawa et al., 1997). Short retroposons belonging to different families may also vary in size (ca. between 70 and 500 bp in length) and structure, yet share important common features. Unlike LINEs, transcribed by Pol II, all SINEs possess internal promoter for Pol III in their 5'-part, or head (Okada, 1991a,b; Okada and Ohshima, 1995), in most cases derived from transfer RNA genes (Sakamoto and Okada, 1985; for reviews see Kramerov and Vassetzky, 2005; Ohshima and Okada, 2005). A few exceptions include the rodent B1 family (Kramerov et al., 1979; Jelinek et. al., 1980; Krayev et al., 1980, 1982; Haynes et al., 1981; Haynes and Jelinek, 1981) and primate Alu (Houck et al., 1979; Rubin et al., 1980; Deininger et al., 1981; Ullu and Tschudi, 1984), containing a 7SL RNArelated promoter region. Apart from rodents and primates, the only other group of living organisms which has been shown to have a 7SL RNA-related promoter in its SINEs is tree shrews (order Scandentia), with their dimeric Tu type I and trimeric Tu type II families — the result of fusions between the tRNA- and 7SL RNA-related subunits (Nishihara et al., 2002). All other examples of non-tRNA-derived short retroposons belong to the superfamily of DeuSINEs hosted by a broad range of organisms. Several of their families are characterized by the presence of fused 5S rRNA and partial tRNA-related regions (Nishihara et al., 2006; also see Kapitonov and Jurka, 2003). Finally, a few other examples, such as SINE200 from mosquito (Anopheles gambiae), show no or weak resemblance of the tRNA structure in their promoter region, though still contain A- and B-boxes of the type 2 Pol III promoter (Dr. N. Vassetzky, personal communication). The 5'-terminus, or head, is followed in most SINEs by the body comprising the LINE-related segment, or ‘LINE-related tail’, at its 3'end in the typical case. And finally, the whole sequence ends with a variable tail. However, this pattern is reduced to the head immediately followed by the poly(A) tail in a group of so-called simple SINEs (Borodulina and Kramerov, (2005); also see Piskurek et al. (2003) and Piskurek and Okada (2006) for designation of t-SINEs). The variable 3'tail may contain poly(T) runs, which act as a termination signal for RNA Pol III (Haynes and Jelinek, 1981; Borodulina and Kramerov, 2001), or tandem repeats. The length and/or structure of this part of SINE can also be important for successful delivery of its RNA to the LINE reverse transcriptase (RTase) complex (Kajikawa and Okada, 2002; Dewannieux et al., 2003). Mutational analyses in the experiments on some LINEs of the L2 clade have revealed that more than one repetition of the repeat is necessary for successful retrotransposition. Kajikawa and Okada (2002) suggest that a slippage reaction occurs during the initiation of reverse transcription of UnaL2 LINE RNA and that its tandem repeats are required for such reaction.

17

mechanism, such as gene conversion and/or recombination (Matveev et al., 2007). Another support for this conclusion is the existence of multiple examples of different SINE families sharing one or two domains. 1.3. Diversity of LINEs and SINEs To date, nearly 100 SINE and more than 100 LINE families have been described from the genomes of various eukaryotes (Kramerov and Vassetzky, 2005; Ohshima and Okada, 2005; Piskurek et al., 2006; Nishihara et al., 2006; Matveev et al., 2007). Many of them are further subdivided into subfamilies, based on variation of their sequences among different lineages — the result of the amplification pattern explained in the multiple source gene model (Shedlock and Okada, 2000). Interestingly, different subfamilies often demonstrate different relative frequencies of retropositional activity. The latest phylogenetic analysis of virtually all LINE families known to date and based on the comparison of amino acid sequences of their ORF2 proteins (Ohshima and Okada, 2005) has revealed the presence of 14 major superfamilial clades (also see Malik et al., 1999; Permanyer et al., 2000). Assuming vertical transfer of LINEs in eukaryotic genomes, with only Bov-B and RSg-1 LINEs providing a few exceptions (Kordiš and Gubenšek, 1997, 1998; Melamed et al., 2004; Matveev et al., 2007; Gogolevsky et al., 2008), the above phylogenies unveil the ancient origin of these retroposons, appearing to be as old as the eukaryotes themselves (Malik et al., 1999). As for SINEs, most of them are restricted in their distribution to more or less uniform groups of organisms not higher in their rank than order. At the same time, three superfamilies of ancient SINEs are present in the genomes of a wide range of animals. One of them, CORE-SINEs, is known in chordates and molluscs from the class Cephalopoda. Members of this superfamily share a conserved coredomain (up to ca. 65 bp-long) following the tRNA-related region (Gilbert and Labuda, 1999, 2000; also see preceding publications by Jurka et al., 1995; Smit and Riggs, 1995). The second superfamily is widespread among several classes of fish (Cephalaspidomorphi, Chondrichthyes, Actinopterygii and Sarcopterygii) and amphibians, and is referred to as V-SINEs. The same as the previous superfamily, these SINEs contain a conserved central domain typical of all families in this group (Ogiwara et al., 2002; also see preceding papers byIzsvàk et al., 1996; Ogiwara et al., 1999). Finally, all members composing the third superfamily of recently described DeuSINEs (for Deuterostomia SINEs) are characterized by the presence of the Deu-domain within their body, following the tRNA- or chimeric 5S rRNA/tRNA-related region. So far, nine families of this superfamily have been found in the genomes of mammals, birds, several classes of fish, amphioxus and sea urchin (Nishihara et al., 2006). 1.4. LINEs and SINEs in the genomes of salmonoid fishes

1.2. Interactions between SINEs and LINEs Because SINEs do not encode any proteins themselves, in the process of retrotransposition they completely depend on the enzymatic machinery of autonomous LINEs. Recognition of SINE transcripts by LINE proteins is necessary for their reverse transcription and reintegration into new genomic loci, and is guaranteed either due to a common 3'-tail shared by the SINE and its partner LINE of the stringent type, or by the presence of the poly(A) tail. Such LINEs as in the latter case belong to the relaxed type of long interspersed elements (Okada et al., 1997; Jurka, 1997; Kajikawa and Okada, 2002; Dewannieux et al., 2003). It had been thought until recently that this common conserved 3'domain is acquired by SINEs directly from long retroposons. However, we have revealed the possibility of its SINE-to-SINE transmission, which apparently means that different retroposons are able to exchange with their parts by means of some general genomic

Salmonoidei is a suborder in the order Salmoniformes (class Actinopterygii) which unites salmons and their relatives. It is subdivided into three families: Salmonidae (salmons), Thymallidae (graylings) and Coregonidae (whitefish and ciscoes), though sometimes all three are regarded as subfamilies within the single family Salmonidae. Of all the fish taxa, salmons are no doubt among the best studied, mostly due to their importance for human and significant role they play in various ecosystems. Extensive research performed over the past two decades on their genomes has yielded a significant part of our general knowledge of retroposons. Therefore, it comes as no surprise that 7 out of 14 known families of ray-finned fish SINEs are those described from salmons. The aim of this short review is to structure all knowledge accumulated about salmonoid retroposons by now, as well as to complement it with new data concerning distribution of some SINE

18

V. Matveev, N. Okada / Gene 434 (2009) 16-28

Table 1 Distribution of salmonoid retroposons⁎ (supplement to Fig. 4) Name

Distribution

References

RSg-1 (LINE)

Winkfein et al. (1989), Okada et al. (1997), Melamed et al. (2004), Ohshima and Okada (2005)

SlmI (SINE) SmaI (SINE)

Proven: Salmonoidei (Actinopterygii: Salmoniformes); Schistosoma japonicum, S. mansoni (Trematoda: Strigeiformes) Expected: Salmonoidei, Esocoidei, Osmeroidei; Schistosoma japonicum, S. mansoni Proven: Oncorhynchus, Salvelinus, Salmo (Salmonoidei: Salmonidae) Expected: Salmonoidei Proven: Salmonoidei Expected: Salmonoidei, Esocoidei, Osmeroidei Proven: Salmonoidei; Schistosoma japonicum, S. mansoni Expected: Salmonoidei, Esocoidei, Osmeroidei; Schistosoma japonicum, S. mansoni Proven: Salmonoidei; S. japonicum Expected: Salmonoidei, Esocoidei; S. japonicum Salmonoidei; Schistosoma japonicum Salmonoidei; Schistosoma japonicum, S. mansoni

SlmII (SINE) FokI (SINE)

Parahucho, Salmo, Oncorhynchus, Salvelinus (Salmonoidei: Salmonidae) Salvelinus; Schistosoma japonicum

SalL2 (LINE) AvaIII (SINE) HpaI (SINE) OS-SINE1 (SINE)

Ohshima and Okada (2005) Kido et al. (1994, 1995) Kido et al. (1991, 1994, 1995), Melamed et al. (2004), Matveev et al. (2007) Nishihara et al. (2006) Matveev et al. (2007) Matsumoto et al. (1986), Kido et al. (1991), Hamada et al. (1997), Takasaki et al. (1997), Melamed et al. (2004), Ohshima and Okada (2005) Matveev et al. (2007) Kido et al. (1991), Hamada et al. (1998), Melamed et al. (2004)

⁎Based on the cited publications, existing and our own new entries to GenBank.

families. Table 1 and Fig. 1 summarise the basic information characterizing all known salmonoid retroposons. 2. Salmonoid LINEs 2.1. RSg-1 Although these LINEs of the L2 clade were among the first characterized retroposons (Winkfein et al., 1988), their sequence and its detailed structure have never been published before, apart from the 3'-tail and short fragment immediately preceding it. Here, we fill out this gap and present a consensus sequence of the last 3640 bp of RSg-1 family (Fig. 2). Apart from its well-known 3'-terminus which is also found in RSg-1 LINE-dependent SINEs (see below), the consensus presented herewith also includes putative ORF2 proteins with easily recognizable endonuclease and RTase domains and consisting of 387 and 268 amino acid residues, correspondingly. Distribution boundaries of this LINE family are not completely disclosed, but for certain it is present in the genomes of all salmonoid

fishes (Table 1) and at least two species of blood flukes (Melamed et al., 2004), as known both from direct findings and from the presence of active SINE-families with RSg-1-related 3'-tail. Also, our recent findings (see below) provide indirect evidence for its possible occurrence in other suborders of the order Salmoniformes: it may also exist in the genomes of Esocoidei, as well as in Osmeroidei (Table 1). 2.2. SalL2 This recently characterised family of long retroposons belongs to the L2 clade, too, and encodes typical LINE RTase, as well as endonuclease (Ohshima and Okada, 2005). Among other LINEs of the same clade, SalL2 family is most closely related with the zebrafish CR1-2DR (ZfL2) and particularly eel UnaL2 LINEs, based on the similarity of their ORF2 proteins. These members of L2 clade, the same as their dependent SINEs, are also characterised by highly similar 3'tails and apparently represent LINE homologues from various distant fish lineages. Such high degree of similarity may serve as evidence for a functional constraint existing in this group of long retroposons,

Fig. 1. Structure and distribution of salmonoid SINEs. For references see Table 1. Boxed letters A, B and C denote promoter boxes. Intermediate element of OS-SINE1 split promoter is shown as a vertical yellow rectangle between boxes A and C. Homologous domains are coded with the same colour; 5'-leaders and 3'-tails of unidentified origin are left without fill.

V. Matveev, N. Okada / Gene 434 (2009) 16-28

when strict identity is required for the recognition of LINE (SINE) transcripts by the RTase. The family of SalL2 LINEs was characterised from the genome of chum salmon, Oncorhynchus keta, and their sequences can also be found among GenBank entries for Salmo and Salvelinus species. However, its precise distribution is yet to be verified. At the same time, based on the occurrence of active SalL2-dependent SmaI SINEs in such distant lineages of salmonoid fishes as salmon and whitefish, we speculate that it might be present in the genomes of all members of the suborder (Table 1). 3. Salmonoid SINEs 3.1. AvaIII (CORE-SINE) These short retroposons belong to the superfamily of CORE-SINEs and are widespread among all salmonoid fishes, which suggests their ancient origin. Despite their wide distribution, the numbers of AvaIII

19

SINEs in the host genome are as low as ca. 1 × 102 copies (Kido et al., 1994). Two subfamilies are currently recognized: Ava type I and Ava type II (Kido et al., 1995), with most of the diagnostic nucleotides located at the very 5'-part of their sequences (Fig. 3). Based on all previous findings, AvaIII SINEs were thought to be confined to the taxa of the suborder Salmonoidei. However, our recent PCR-analysis with AvaIII-specific primers (Table 2) gave a positive result for the DNA from red-finned pickerel (Esox americanus) — a member of another salmoniform suborder Esocoidei (pikes), sometimes also regarded as a distinct order Esociformes. The PCR fragments had the expected length but we failed to determine their sequence (data not shown), and therefore the issue of AvaIII presence in pikes requires further investigation. Members of the Ava type II subfamily are found in all salmonoid fishes. As for Ava type I, these SINEs have hitherto been reported from two genera: Oncorhynchus and Parahucho (Kido et al., 1995), while we also found a copy in the GenBank entry for Brachymystax (AF005918). Therefore, Ava type I is likely to exist in all members of the family Salmonidae (Fig. 4).

Fig. 2. RSg-1 LINE consensus sequence and its putative translation products. The consensus is a result of compilation based on five individual LINE copies from the following GenBank entries: Salmo salar (EF467295, EU221180), Oncorhynchus mykiss (AY872257, DQ246664) and O. tshawytscha (AY493564). Deduced endonuclease domains are underlined and those of RTase are shaded; the conserved 3'-tail is boxed. Amino acid residues highly conserved among various LINE families (Malik et al., 1999) are shown in bold italics. The variable A/Trich tail is not included.

20

V. Matveev, N. Okada / Gene 434 (2009) 16-28

Fig. 2 (continued).

The full-length AvaIII sequence is ca. 260 bp long and begins with a short 5'-leader, followed by approximately 85 bp-long tRNA-related promoter region (Fig. 1). The latter is often partially degraded, particularly at its 5'-part, and altered by mutations. As a consequence, in many AvaIII copies the promoter box A is virtually unrecognizable or absent, proving their apparent transcriptional inactivity. The precise relationship of this region with a particular tRNA gene is not quite clear: it demonstrates a similar extent of homology with those of tRNAIle, tRNAMet and tRNAThr (around 57–63%, data not shown), closely followed by a few others. It is likely that the promoter region of these SINEs originated from one of them but later, because of low retropositional activity of their partner line (which is obvious from their low copy numbers), the entire family became inactive accumulating mutations in functionally important regions. This makes it difficult to speculate about the exact predecessor for AvaIII promoter region. AvaIII SINEs belong to the superfamily of CORE-SINEs: their central segment is represented by the ancient core-domain (Gilbert and Labuda, 1999). However, unlike in the related HpaI family (see below), in AvaIII it is notably shorter, being truncated from the 3'-terminus and stretching for approximately 45–50 bp (vs. the original state of ca. 65 bp).

Most of AvaIII sequence – approximately the last 130 bp at its 3'end – is occupied by the family-specific region of unknown origin, which may belong to yet unknown (and most probably inactive) LINE family. Another feature characteristic of AvaIII elements is the absence of the variable 3'-tail — poly(A) or A/T-rich, typical of other short retroposons and their partner LINEs. It is also surprising because direct repeats flanking the SINE sequence can still be easily found in most cases, proving relatively young evolutionary age of this family (see below). 3.2. HpaI (CORE-SINE) HpaI elements are among the best studied SINE families described to date. These abundant retroposons are found in all genera of the suborder Salmonoidei, though had been postulated to be absent in other fish lineages. We performed a PCR analysis with HpaI-MP specific primers (Table 2) which resulted in amplification of the DNA fragments of expected length for three non-salmonoid fish genera: Esox (Esocoidei), Plecoglossus and Hypomesus (Osmeroidei). It is worth mentioning though that the visualized signals for the latter two were rather faint, while that in the Esox sample was as strong as in

V. Matveev, N. Okada / Gene 434 (2009) 16-28

21

Fig. 3. Alignment of subfamily consensus sequences from Kido et al. (1995) with changes. General consensus of HpaI family is shown above the alignment. Consensus sequences of Ava type I and Ava type II subfamilies of Ava III family are shown at the bottom. The former is a compilation of Ava(OG)-414 and Ava(HP)-1103 sequences from Kido et al. (1995) and AvaIII sequence from the GenBank entry AF005918 for Brachymystax lenok. HpaI MS type V subfamily consensus from Kido et al. (1995) has been revised since we isolated new copies from B. lenok, Hucho taimen and Thymallus arcticus: EU293159, EU293158, EU293161, EU293162 (MS type V). RSg-1 LINE-related 3'-tail (complete in HpaI and its remnants in AvaIII) is shaded.

the case of salmons (data not shown). However, we only managed to confirm these results by sequencing for ayu fish (Plecoglossus altivelis) but not for Esox or Hypomesus, and hence further analysis will be needed. HpaI SINEs found in ayu had extremely high and equal resemblance with MS types IV and VI. (almost 98% in the recovered part of the sequence). Based on these findings, we propose that HpaI family appeared in the common ancestor of three salmoniform suborders (or earlier): Salmonoidei, Esocoidei and Osmeroidei (Fig. 4). The numbers of HpaI copies may vary in the genome depending on the subfamily and the fish lineage (Kido et al., 1991; 1994). Twelve HpaI subfamilies (Fig. 3) are currently recognized (Kido et al., 1995): MP (ca. 5 × 102 copies per genome), MT types I–IV (ca. 1 × 103) and MS types I, IIa, IIb, III, IV, V and VI (ca. 1 × 104). The first one is confined to one single genus — Prosopium. The other 11 are co-distributed in the families Thymallidae and Salmonidae, though the precise pattern is still unclear and requires further investigation. Until now, none of the MS subfamilies had been reported from graylings, but our recent findings confirm that MS type V subfamily is present in their genome, too, suggesting its earlier origin (Fig. 4). Approximately 1 × 102 copies of HpaI SINEs are also found in Coregonus. This is noticeably less than the amount of MP copies in another coregonid genus — Prosopium, suggesting two independent amplification peaks of HpaI SINEs in the genomes of salmonoid fishes (see below). Unlike in the related AvaIII family, where the tRNA-derived region demonstrates a more or less equal level of homology with several tRNA genes – tRNAIle, tRNAMet and tRNAThr – in HpaI SINEs the same domain is notably more similar to tRNAIle gene (ca. 69%), while the next closest candidate – tRNAThr – is only 61% identical (data not shown). Our preceding analyses (Kido et al., 1991, 1994, 1995) unveiled that the 5'-part of HpaI and AvaIII SINEs, including the leader sequence, the tRNA-related region and the core, have the same origin, and consequently both families should have the same tRNA gene as a progenitor for their promoter regions. Because the number of AvaIII copies is very low in the genome and this family became inactive long

time ago, while its individual copies are generally more degraded from mutations, it is not suitable for determination of the origin of AvaIII/ HpaI common promoter region. On the contrary, abundant and in the whole better preserved HpaI sequences quite clearly support tRNAIle gene as the best candidate. The full-length core-domain is well preserved and spans for approximately 65 bp. As we already mentioned above, the same region is much shorter in AvaIII SINEs, being truncated at its 3'-end. However, the remaining sequence is highly similar to that of HpaI, being almost 94% identical (Gilbert and Labuda, 1999). In addition, both sequences contain a characteristic insertion close to the 5'-end of their cores (CGTTCT), which is not shared by any other SINE family — another proof of their common origin. At the same time, the presence of the full core in HpaI SINEs, the same as in more ancient MIR elements, suggests the scenario according to which a truncation event took place in AvaIII family. Most likely, it was the same event as the one that had replaced its entire 3'-terminus, including the LINErelated domain. The core-domain in HpaI SINEs is followed by ca. 65 bp-long conserved tail derived from the partner RSg-1 LINE (Fig. 5).

Table 2 Primers used in the present study Primer

Primer sequence

Annealing t°C

MP-F1 (direct) MP-R1 (reverse) SlmI-F1 (direct) SlmI-R1 (reverse) OS-SINE1-F (direct) OS-SINE1-R (reverse) SmaI-F (direct) SmaI-R (reverse) AvaIII-F (direct) AvaIII-R (reverse)

5'-GGTAGCCTAGTGGTTAGAGCGTTG-3' 5'-AATTCTTATTTTCAATGACGGCCTA-3' 5'-GTGCTAGAGGCGTCACTACAGW-3' 5'-ACTAGGCAAGTCAGTTAAGAACAAAT-3' 5'-AGGGCAGTGATTGGGGACATTG-3' 5'-TGGGGAATAGTTACAGGGGAGAGGA-3' 5'-CGCTTGTAACGCCAGGGTAGTGG-3' 5'-GCCATTTAGCAGAYGCTTTTATCCA-3' 5'-CCTGGCGGGTAGGAGCGTTG-3' 5'-CTGAATCAGAGAGGTGCGGGGG-3'

55 51 66 67 66

22

V. Matveev, N. Okada / Gene 434 (2009) 16-28

Fig. 4. Timing of appearance of salmonoid SINE families and known distribution of their subfamilies (above and below the branches, respectively) in fish genomes, denoted with arrowheads. In addition to well-substantiated timing for most SINEs, shown in black, possible earlier appearance is hypothesized for AvaIII family (shown in light-gray). Conclusions on distribution of SINE subfamilies among taxa are based on the published data, existing and our own new entries to GenBank, including the following: Brachymystax lenok — HpaI MS type V (EU293158, EU293159), MS type VI (EU293157); Hucho taimen — HpaI MS type V (EU293161), MS type VI (EU293160); Thymallus arcticus — HpaI MS type V (EU293162); Esox americanus — OS-SINE1e (EU302495). Phylogeny within Salmonidae is mainly based on the reports by Oakley and Phillips (1999), Crespi and Fulton (2004) and Matveev et al. (2007).

The SINE sequence ends with a variable A/T-rich 3'-tail of irregular structure. It hardly contains any tandem repeats, which are typical of SINEs mobilized by the rest of L2 clade LINEs. Therefore, the case of RSg-1 LINE might represent some sort of exception from other stringent type LINEs of the L2 clade in the way of initiation of its reverse transcription. Alternatively, it may serve as an argument confirming its independence from the L2 clade. 3.3. OS-SINE1 (deuSINE) This family belongs to the superfamily of recently described DeuSINEs, characterised by the presence of a common Deu-domain

in their body and found in a variety of taxa, from sea urchin to mammals (Nishihara et al., 2006). Originally, OS-SINE1 was described from several salmon species of the genera Oncorhynchus and Salmo (Salmonidae), but we also found it in the genomes of fishes from all three salmonoid families (see GenBank EU302496 for the partial sequence from Prosopium). Also, we found these SINEs in the DNA of E. americanus, which suggests its appearance in the common ancestor of pikes and salmons (Fig. 4). Interestingly, visualized PCR-products have formed a very compact band with a higher electrophoretic mobility than in those received from the salmonoid samples (data not shown), suggesting their length being somewhat 20–30 bp shorter. A partial sequence from the single

Fig. 5. A — Schematic representation of three SINE families (SlmI, HpaI and OS-SINE1) and RSg-1 LINE sharing highly similar 3'-tail regions (from Matveev et al., 2007). B — Alignment of conserved 3'-sequences of RSg-1 LINE and its related SINEs. Nucleotides are numbered from 5'-terminus in SINEs and from 3'- in RSg-1. C — Putative secondary structures of LINE and SINE RNAs in their 3'-regions. Standard hydrogen bonds between paired residues and G–U wobble base pairs are shown as lines and dots, respectively.

V. Matveev, N. Okada / Gene 434 (2009) 16-28

recovered clone (GenBank accession EU302495) containing most of the Deu-domain and a part of the LINE-related tail, had a 16 bp-long deletion in the former and a few shorter gaps. These differences seem to be characteristic of all OS-SINE1 elements from the Esox genome, and for this reason we found it possible to classify them as a distinct subfamily, OS-SINE1e, where ‘e’ stands for ‘Esox’. In this case, the original SINE described from Salmo and Oncorhynchus merits the same status, and we propose to designate it as OS-SINE1s, where ‘s’ stands for ‘salmon’ (Fig. 4). Allocation of copies from other salmonoid taxa, as well as the exact structure of their 5'-regions require additional investigations. These SINEs are notable for their long sequences, normally exceeding 470–480 bp in length (464 in the consensus sequence, excluding the variable A/T-rich tail). The number of full OS-SINE1 copies in the genome has not been assessed experimentally. However, it seems to be relatively low: a rather long GenBank entry for Oncorhynchus mykiss (DQ156151, 96,442 bp) contains only one full copy of this SINE, while the other two long fragments from the salmon DNA (exceeding 200 and 300 kb) appear to lack any. Interestingly, the numbers of full-length copies of another DeuSINE family, AmnSINE1, was shown to be similarly low, both in the chicken and human genomes — ca. 1 × 103 (Nishihara et al., 2006). The same as in two other families of DeuSINEs, namely the zebrafish and catfish SINE3 and AmnSINE1 of amniotes, the promoter region of OS-SINE1 represents a chimeric structure of fused 5S rRNAand partial tRNA-related segments. The rest of their sequences demonstrate a significant level of homology, too (except for the 3'tail of OS-SINE1), and taking the wide range of hosts of these SINEs into account, it would be logic to suggest that their progenitor dates back to the common ancestor of vertebrates (Nishihara et al., 2006). That is, these SINEs may be older than 450 Myr (Kumar and Hedges, 1998). The 5S rRNA-related region of OS-SINE1 contains well conserved A and C boxes and the intermediate element of the split promoter (Fig. 1), which is a necessary condition for the successful transcription of the SINE sequence by Pol III. As for the adjacent tRNA-related segment, it appears to be truncated from its 3'-end and only has the A box left. Because of this, it is unlikely to be able to fulfil the promoter function on its own. At the same time, this part of the sequence remains conserved enough among all of the abovementioned SINEfamilies, including OS-SINE1. Apparently, this means that it still bears certain function: for instance, it could facilitate the recognition of the SINE sequence by Pol III. In most DeuSINEs of Gnathostomata, including SINE3, LmeSINE1 and AmnSINE1, as well as SacSINE1 from the genome of catfish shark, the 3'-tail is around 77% homologous to that of zebrafish CR1–4_DR LINE. This value is high enough to state that this (in zebrafish) and other, yet not described, LINE families of the L2 clade serve as partner LINEs for the above short retroposons (for more detailed explanation see Nishihara et al., 2006; Matveev et al., 2007). In their turn, OSSINE1 elements represent an exception: their 3'-tail is derived from RSg-1 LINE (75–77% homology, Fig. 5) which, however, still belongs to the L2 clade (Ohshima and Okada, 2005). The same as in HpaI family, the variable 3'-tail is A/T-rich and is usually characterised by irregular structure. 3.4. SlmI Members of this family are widespread among all salmonoid fishes and apparently started propagating in the genome of their common ancestor (Matveev et al., 2007). Earlier we estimated the number of their full-length copies at somewhat 3 × 104 per genome. Based on the differences in their sequence structure among various lineages of Salmonoidei, three SlmI subfamilies are currently recognized: PR (first found in the genus Prosopium), NP (non-Prosopium) and SI (Salmonoidei).

23

The 3–4 bp-long 5'-leader sequence in SlmI is followed by the conserved promoter region of nearly 90 bp. Unlike in other salmonoid SINEs, with their tRNAIle-related promoter domain (with the only exception of OS-SINE1), in SlmI family it is derived from the tRNALeu gene (Matveev et al., 2007). The family-specific central domain is 57, 63 or 68 bp long (in PR, NP and SI subfamilies, respectively). It is as conserved as the head and the LINE-related 3'-tail, which suggests its functional importance. This domain ends with a short (0–11 bp) but well pronounced subfamilyspecific segment. Approximately 70 bp-long conserved 3'-tail of SlmI SINEs is not unique to these retroposons, as they share it with two other families — HpaI and OS-SINE1. Therefore, all these SINEs have the same partner LINE – RSg-1 – and use its enzymatic machinery for their own retrotransposition. As we demonstrated previously (Matveev et al., 2007), the above three families of short retroposons demonstrate almost identical values of sequence homology of their conserved 3'-tails with that of RSg-1 – around 75%, which is close to the corresponding value in a well-studied UnaSINE1/UnaL2 pair (with their shorter termini) – approximately 78%. We speculated that the higher level of divergence from the original (LINE) state affects the ability of SINE RNA transcripts to bind the RTase, and hence eliminates such copies from further propagation in the genome, while recognition ability could probably be blocked even by a single mutation of one of the highly conserved residues within the loop of the tail secondary structure, as shown in the experiments with UnaL2 (Baba et al., 2004; Kajikawa and Okada, unpublished results). As a consequence, in all three families this value is ‘maintained’ above the same ‘critical’ level – 74–75%. This is distinctly higher than the sequence identity between SlmI and HpaI from one side, and OS-SINE1 from another — around 66% in both cases, while 3'-termini of HpaI and SlmI match each other with a 90% identity. These and other observations on the sequence homologies enabled us to conclude earlier (Matveev et al., 2007) that the RSg-1-related 3'terminus was at certain point acquired by SlmI progenitor from a member of HpaI family, and this is the only case of SINE-to-SINE transmission of the LINE-related tail known to date. The possibility of such indirect way of the tail transfer (i.e. between SINEs) could possibly mean that such general genomic mechanisms as gene conversion or recombination are likely to be involved in the process of exchange with their parts between retroposons. With all this, SINEs demonstrate a great evolutionary flexibility: the ability to incorporate 3'-tails of other SINEs, which are present in the genomes in much higher numbers of their copies than long retroposons, increases their chances for survival in case of loss of activity by their partner LINE. This also makes it more probable for individual copies inactivated by mutations to restore their activity. The LINE-related tail is followed in SlmI SINEs by the A/T-rich terminus of variable length and mostly irregular structure. In some copies, it can be partially composed of poly(A) runs or rarely — of short tandem repeats, such as (AAATAC)2, (TAAA)2 or (TAA)5. 3.5. SlmII These SINEs are distributed among the following genera of the family Salmonidae (Fig. 4): Oncorhynchus (Pacific salmon), Salvelinus (char), Salmo (Atlantic salmon) and Parahucho (with the only species P. perryi — Japanese huchen). Its absence in taimen genome (genus Hucho) confirmed the view on Japanese huchen as a distinct monotypic genus — Parahucho. In accord with their younger than in SlmI evolutionary age, the total number of full-length copies of SlmII SINEs in salmonid genomes is twice less, and was estimated at approximately 1.3 × 104 (Matveev et al., 2007). SlmII SINEs are notable for their sequence length identical to that of OS-SINE1: 464 bp excluding the A/T-rich tail. This family is obviously

24

V. Matveev, N. Okada / Gene 434 (2009) 16-28

related to the previous one — SlmI: a significant portion of its head and the 5'-part of the body matches those of SlmI-SI consensus with an 84% identity, including the 5'-leader, the tRNALeu-related promoter segment and the entire central domain. In addition, SlmII SINEs contain 5'remnants of the RSg-1 LINE-derived segment followed by ca. 290 bp of the conserved 3'-domain, preceding the variable tail (Matveev et al., 2007). Because short homology with RSg-1 is not sufficient to guarantee a successful amplification of SlmII SINEs, this family should have its own, yet unknown partner LINE in the genome. The variable 3'-tail of SlmII SINEs is A/T-rich and has a general structure similar to that of SlmI family. It may contain poly(A)/(T) runs and/or occasional tandem repeats, such as (AT)n. 3.6. SmaI Designated first as Pol III/SINE (Matsumoto et al., 1986), SmaI is the first family of short retroposons characterised from the salmon genome. Initially, these SINEs were thought to be confined to the two species of Pacific salmon, namely chum, Oncorhynchus keta, and pink salmon, O. gorbuscha (Kido et al., 1991), with approximately 2.6 × 104 copies per genome (Takasaki et al., 1997). However, later investigations showed the presence of almost identical sequences in coregonid fishes, which were named SmaI-cor and designated as a distinct family, while more diverged and less numerous (ca. 1 × 102 copies per genome) SmaI-like sequences were found in the rest of salmonoid taxa and were tentatively designated as SmaI-div family (Hamada et al., 1997). In addition, the former was subdivided into two subfamilies: SmaI-cor type I (found in all coregonid fishes) and SmaIcor type II (absent in whitefishes from the genus Prosopium). However, all these SINEs, in fact, are not diverged enough to merit a status of distinct families: their general structure is identical and hence they should be regarded as subfamilies within SmaI family. For convenience, we also propose to give a distinct name, SmaI-onc (for Oncorhynchus) to the original subfamily from chum and pink salmon. The interrelationships of SmaI-div, SmaI-cor type I, SmaI-cor type II and SmaI-onc subfamilies are discussed below. It is worth of mentioning though, that SmaI-div may include more than one subfamily, but this issue requires additional investigation. In the view of these small rearrangements, we have revised the family consensus for SmaI SINEs, to make it more precise (Fig. 6). In the original description of this family Kido et al. (1991) mention that its promoter region shows equal similarity to tRNALys and tRNAIle genes (around 74%). However, with more data having been assembled by now, we can conclude that this region demonstrates much higher identity with the latter: around 81%, while its homology with the tRNALys gene is even lower than the initially reported value (ca. 68%). The tRNAIle-related region is followed in SmaI SINEs by a relatively short, 25 bp-long central conserved domain. The conserved, 41 bp-long 3'-tail of SmaI SINEs exhibits a high homology with that of SalL2 LINE (Fig. 6), being ca. 88% identical. Such significant similarity confirms their ‘partnership’ (Ohshima and Okada, 2005).

The variable 3'-tail is A/T-rich. It is usually composed of tandem repeats, such as (ATT)n or (AT)n, and occasionally may contain short poly(A) and/or poly(T) runs. Therefore, it has a structure typical of other SINEs dependent on the L2 clade LINEs. 3.7. FokI The young family of FokI SINEs is restricted in its distributions to char species — fishes of the genus Salvelinus (Kido et al., 1991; Hamada et al., 1998). The number of its copies in the genome has not been assessed but is likely to be relatively low (Kido et al., 1991). The structure of FokI sequence is essentially similar to that of SmaI SINEs, proving their common origin. It has the same promoter region, which is approximately 78% identical to the tRNAIle gene, and ca. 72% similar to that of tRNALys. Therefore, with more data available at present, we can conclude that tRNAIle (but not tRNALys) gene is the best candidate for the role of the progenitor for promoter region of most tRNA-related salmonoid SINEs: AvaIII, HpaI, SmaI and FokI, while only two related families – SlmI and SlmII – seem to have acquired it from tRNALeu. The short central conserved domain of FokI family is 27 bp-long and is partially homologous to that of SmaI SINEs. In fact, the whole family of FokI SINEs could be characterized as SmaI with its central domain being greatly diverged from the original SmaI-like state (Fig. 6). The same as in SmaI, the conserved 3'-tail of FokI SINEs is derived from SalL2 LINE family, and their last 46 conserved nucleotides appear to be identical to each other, at the same time being slightly different from the SmaI tail (Fig. 6; Ohshima and Okada, 2005). The LINE-related tail is followed by the variable 3'-terminus, which is predominantly composed of tandem repeats, such as (TGTAAA)n — a feature typical of the L2 clade LINEs and their dependent SINEs. In some copies, the variable tail my contain relatively short poly(A) runs. 4. Evolution of salmonoid retroposons 4.1. AvaIII, HpaI, SlmI and SlmII Fig. 7 shows a possible pathway of evolutionary rearrangements in structure and interrelationships of several related SINE families and their subfamilies. The chain of changes within HpaI family, i.e. from MT type III through MS type I, was adopted with small amendments from Kido et al. (1995). Therefore, here, we will only concentrate on other parts of this scheme. Out of four SINE families appearing on Fig. 7, two, namely HpaI and SlmI, have RSg-1 LINE-derived 3'-tails. The other two – SlmII and AvaIII – contain long 3'-domains of unidentified origin, which apparently represent a result of replacement of the original, RSg-1related tail, because these SINEs still contain its ‘remains’ in the part of their sequence preceding the family-specific 3'-domain. The tail of the LINE itself begins with a TCTTGGCC motif, which is also present in all of the abovementioned four families (Fig. 7). However, none of them

Fig. 6. SmaI SINE family consensus aligned with SmaI-div, SmaI-onc, SmaI-cor type I and type II subfamily consensus sequences, FokI family consensus and SalL2 LINE 3'-tail. tRNArelated regions of the family consensus sequences are underlined.

V. Matveev, N. Okada / Gene 434 (2009) 16-28

25

Fig. 7. Possible pathway for the evolutionary changes in HpaI, AvaIII, SlmI and SlmII SINE sequences. Directions of changes are indicated with arrows. Changed nucleotides are accompanied by the number pertaining to their position in the appropriate family consensus. (+) and (−) indicate insertion and deletion, respectively. Homologous domains are colour-coded.

contain the first C residue: it is either replaced with G (in HpaI and AvaIII) or there is a deletion in this position (SlmI and SlmII). Also, GG is absent in HpaI and AvaIII SINEs, while SlmI and SlmII preserve the original, RSg-1-like state. In our previous discussions (Matveev et al., 2007), we concluded that SlmI SINEs (either PR or SI subfamily) originated through acquisition of the RSg-1 LINE-related 3'-tail from HpaI family. Continuing this line, it is most logic to suggest that HpaI SINE gave rise to AvaIII family, too. In this case, the primordial condition for the ancestral form of HpaI SINE itself would have to have both of the abovementioned states combined together, i.e. TGTTGGCC, which can be derived from the original RSg-1-like state after replacement of the first C with G. We have designated this hypothetical subfamily of HpaI family as ‘HpaI-M’. The transfer of the RSg-1-related tail from HpaI-M SINE to SlmI (PR or SI) was accompanied by the loss of one G residue, while appearance of supposedly most ancient HpaI subfamily – MT type IV – resulted in the loss of GG residues in the same motif. Further mutations in this subfamily gave rise to MT type III SINEs, while the replacement of most of its 3'-tail could create one of the two subfamilies of AvaIII SINEs (Fig. 7), possibly in the genome of Salmonoidei and Esocoidei (Fig. 4).

As we mentioned above, three subfamilies of SlmI SINEs are currently recognized: PR, SI and NP. The first two proved to be distributed among all salmonoid lineages, while NP subfamily was only found in Salmonidae, being absent in Coregonidae or Thymallidae (Matveev et al., 2007). Such distribution apparently means that the split between SI and PR SINEs happened shortly after appearance of SlmI family in the genome of salmonoid ancestor, either through insertion of a TTAGGGGAGGG motif between the central and RSg-1derived domains in a PR-like source gene, or through its deletion if the SI-like state was primordial. A deletion of GAGGG in the same region of one of the SI copies resulted in appearance of the NP subfamily and its subsequent propagation in the genome of an ancestor of Salmonidae, after its split with graylings (Fig. 4). The same subfamily – SlmI-SI – gave rise to SlmII family in the lineage (Parahucho, Salmo, Salvelinus, Oncorhynchus), apparently through replacement of most of the RSg-1 LINE-related 3'-tail with the new one, supposedly belonging to yet unknown LINE. In the alternative scenario, one of the SI copies first could have lost most of its RSg-1-related terminus at any evolutionary stage when Slm-SI SINEs already existed, with subsequent incorporation of the 3'-tail of a

26

V. Matveev, N. Okada / Gene 434 (2009) 16-28

different LINE at the time following the split of the above lineage with the ancestor of Hucho and Brachymystax.

regained its activity through a de novo recombination with SalL2 LINE and functional restoration of the 3'-tail.

4.2. OS-SINE1

5. Lateral transfer of retroposons

The tRNA-related segment has the same truncation pattern in all DeuSINE families with a 5S rRNA-derived promoter region: OS-SINE1 (Salmonoidei), SINE3 (catfish and zebrafish) and AmnSINE1 (chicken, mammals). Moreover, its remnants are essentially similar to the 5'part of the complete tRNA-derived domain of another DeuSINE family from coelacanth (Latimeria menadoensis) — LmeSINE1 (Nishihara et al., 2006), which is lacking any trace of 5S rRNA gene in its promoter region (Fig. 8). Taking the position of coelacanth on a vertebrate phylogenetic tree into account, the above structural differences of the promoter region among OS-SINE1, SINE3 and AmnSINE1 from one side and LmeSINE1 from another could be explained by postulating that the latter condition is a restored primordial state. If so, initially the preceding form of SINE with a tRNA-related promoter region, such as the shark SacSINE1 (Nishihara et al., 2006), could have lost the 3'part of this region containing box B and became inactive. Later, in the lineage of bony fishes (Osteichthyes), before the split between rayfinned and lobe-finned fishes (Actinopterygii and Sarcopterygii, correspondingly), inactive copy recombined with a 5S rRNA gene and regained its activity, giving rise to a novel SINE family. This state is now observed in OS-SINE1, SINE3 and AmnSINE1. However, at certain point of evolution of lobe-finned fishes, after their split with tetrapods (Tetrapoda), the tRNA-related promoter might have been restored through recombination with a tRNA gene or a different SINE, while the 5S rRNA-derived region was lost — the condition observed in the coelacanth LmeSINE1. The loss of the 5S rRNA-related part could well have happened before the restoration of the full-size tRNA-derived region, the same as after it. The probability of either of the two scenarios would depend on several factors, such as the number of initially active SINE copies in the genome (i.e. one/ several or many) and possible competition between the two promoter regions and its outcome (i.e. which one would be more successful, or important, for binding Pol III).

Horizontal, or lateral transfer is a rare evolutionary event, and retrotransposable elements provide a unique insight into this interesting phenomenon. One of very few known cases pertains to the salmonoid SmaI family. It is characterized by a mosaic distribution and simultaneously occurs in relatively high number of copies in two distant lineages: in all whitefishes and ciscoes from one side (SmaI-cor types I and II), and in two species of Pacific salmon, O. keta and O. gorbuscha (SmaI-onc subfamily), from the other (Hamada et al., 1997; this paper). As for the rest of taxa, their genomes host only low numbers of highly diverged SmaI-div SINEs. It would be natural to assume that SmaI-cor and SmaI-onc independently rose from those few SmaI copies (collectively called SmaI-div subfamily) which are present in the genomes of all salmonoid fishes. However, a significant evolutionary distance separating Pacific salmon and coregonid fishes would inevitably result in an appropriate level of divergence between SmaI-onc and SmaI-cor. Nevertheless, in practice their consensus sequences are 98.6% similar. This prompted Hamada et al. (1997) to conclude that SmaI-cor SINEs were transferred horizontally from some coregonid species either to the common ancestor of chum and pink salmon or to these two species independently. However, simultaneously, Takasaki et al. (1997) proposed several alternative models based on the analysis of inter- and intraspecific variation of SmaI SINEs and explaining that SmaI-onc (as we call them here) could not amplify in a common ancestor of these two species. Instead, introgression or horizontal transfer of SINEs from pink to chum salmon during evolution seems more likely. Arguably, the most intriguing case pertains to the group of organisms known to host many of salmonoid retroposons and other sequences in its genome. It is blood flukes from the genus Schistosoma (Trematoda: Strigeiformes), particularly S. japonicum and, to a less extent, S. mansoni (He et al., 2001; Melamed et al., 2004; Matveev et al., 2007). Among other elements, it hosts HpaI, OS-SINE1, SmaI, FokI, SlmI SINEs and even RSg-1 LINE in its DNA (Table 1). It is worth of saying that the only other LINE case reported to date is the one of BovB LINE (Kordiš and Gubenšek, 1997, 1998; also see Gogolevsky et al., 2008), simultaneously existing in Squamata (snakes and two lizard infraorders) and some mammalian genomes (ruminants and marsupials). At present, we can only speculate about potential processes that might have underlain such a remarkable exchange. However that may be, the occurrence of such a wide range of salmonid retroposons in the genome of this parasitic worm suggests a relatively recent and obviously repeated transfer, and therefore cannot be exclusively explained by possible parasite/host relationships that could exist

4.3. SmaI and FokI Unlike stated before (Kido et al., 1991), the promoter region of SmaI SINEs has a higher sequence homology with ancestral tRNA gene than that of FokI family (81% vs. 78%). At the same time, in FokI the SalL2related 3'-tail is notably more similar to the LINE sequence than in SmaI (Fig. 6). Hence, taking the distribution of both SINE families into consideration (Fig. 4), with FokI confined to the genus Salvelinus and SmaI found in low copy numbers in the majority of salmonoid taxa (being rather numerous in Coregonidae and two Oncorhynchus species), we tend to conclude that FokI family could have originated in the char lineage from one of the inactive SmaI copies, which

Fig. 8. Structure and distribution of DeuSINE families and known phylogenetic relationships among their host species. Boxed letters A, B and C denote promoter boxes. Intermediate element of the 5S rRNA-related promoter region is shown as a vertical yellow rectangle between the boxes A and C. 3'-tails of AmnSINE1, LmeSINE1a, LmeSINE1b and SINE3 are similar to that of zebrafish CR11-4_DR LINE and are coded with light-green; the OS-SINE1 tail originates from RSg-1 LINE and is shown in dark-green.

V. Matveev, N. Okada / Gene 434 (2009) 16-28

between blood flukes and salmons in the past, as proposed by Melamed et al. (2004). Moreover, occasional schistosomal infection of salmons (which is not impossible, recalling ecological versatility of S. japonicum — see Snyder and Loker, 2000; He et al., 2001) would not explain the fact of horizontal transfer on its own, as it only provides a basis but not the machinery. Recently, our group described the case providing potential grounds for viral mediation of the lateral transfer of retrotransposable elements (Piskurek and Okada, 2007), when a reptilian Sauria SINE was found in the taterapox virus, known to infect some rodents (Kemp's gerbil, Tatera kempi). We speculate that the same scenario could have taken place in the case of salmons and blood flukes. This would explain the existence of a wide range of salmonid retroposons, as well as IGF-1, LHβ and PRL genes in the schistosomal genomic DNA. Moreover, should it be the case, the repeated transfer of short (SINEs) and long (LINEs, genes) sequences from one organism to another could be easily explained, too. Since a part of the schistosomal life cycle occurs in fresh water anyway, and even involves other aquatic organisms, such as snails, the viral scenario would not require immediate helminth infection of salmons themselves (though does not exclude it either), as direct viral infection through water or food chain is not difficult to imagine. Alternatively, a vector other than virus could potentially be involved, too. Previously, our laboratory reported the case of HpaI SINE insertion into Tc-1-like transposon in Atlantic salmon (Takasaki et al., 1996), and SmaI SINE insertions into RSg-1 LINE sequences (Hamada et al., 1997). Acknowledgements This work was supported by Japanese Society for the Promotion of Science (grant reference number P05182). The authors are extremely grateful to Dr. S. Alekseev (Kol'tsov Institute of Developmental Biology, Russian Academy of Sciences, Moscow, Russia) for providing tissue samples of taimen, lenok and grayling, to Dr. D. Politov (Vavilov Institute of General Genetics RAS, Moscow) for inconnu samples, and we also thank Prof. Mutsumi Nishida (Ocean Research Institute, The University of Tokyo, Japan) for providing tissues of red-finned pickerel. References Baba, S., Kajikawa, M., Okada, N., Kawai, G., 2004. Solution structure of an RNA stemloop derived from the 3' conserved region of eel LINE UnaL2. RNA 10, 1380–1387. Borodulina, O.R., Kramerov, D.A., 2001. Short interspersed elements (SINEs) from insectivores: two classes of mammalian SINEs distinguished by A-rich tail structure. Mamm. Genome 12, 779–786. Borodulina, O.R., Kramerov, D.A., 2005. PCR-based approach to SINE isolation: Simple and complex SINEs. Gene 349, 197–205. Brosius, J., 1991. Retroposons — seeds of evolution. Science 251, 753. Burch, J.B.E., Davis, D.L., Haas, N.B., 1993. Chicken repeat 1 elements contain a pol-like open reading frame and belong to the non-long terminal repeat class of retrotransposons. Proc. Natl. Acad. Sci. U. S. A. 90, 8199–8203. Crespi, B.J., Fulton, M.J., 2004. Molecular systematics of Salmonidae: combined nuclear data yields a robust phylogeny. Mol. Phylogenet. Evol. 31, 658–679. Deininger, P.L., Jolly, D.J., Rubin, C.M., Friedmann, T., Schmid, C.W., 1981. Base sequence studies of 300 nucleotide renatured repeated human DNA clones. J. Mol. Biol. 151, 17–33. Dewannieux, M., Esnault, C., Heidmann, T., 2003. LINE-mediated retrotransposition of marked Alu sequences. Nat. Genet. 35, 41–48. Eickbush, T.H., 1994. Origin and evolutionary relationships of retroelements. In: Morse, S.S. (Ed.), The evolutionary biology of viruses. Raven Press, New York, pp. 121–157. Feng, Q., Moran, J.V., Kazazian Jr., H.H., Boeke, J.D., 1996. Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition. Cell 87, 905–916. Gilbert, N., Labuda, D., 1999. CORE-SINEs: Eukaryotic short interspersed retroposing elements with common sequence motifs. Proc. Natl. Acad. Sci. U. S. A. 96, 2869–2874. Gilbert, N., Labuda, D., 2000. Evolutionary inventions and continuity of CORE-SINEs in mammals. J. Mol. Biol. 298, 365–377. Gogolevsky, K.P., Vassetzky, N.S., Kramerov, D.A., 2008. Bov-B-mobilized SINEs in vertebrate genomes. Gene 407, 75–85.

27

Hamada, M., et al., 1997. A newly isolated family of short interspersed repetitive elements (SINEs) in coregonid fishes (whitefish) with sequences that are almost identical to those of the SmaI family of repeats: possible evidence for the horizontal transfer of SINEs. Genetics 146, 355–367. Hamada, M., Takasaki, N., Reist, J.D., De Cicco, A.L., Goto, A., Okada, N., 1998. Detection of the ongoing sorting of ancestrally polymorphic SINEs toward fixation of loss in populations of two species of charr during speciation. Genetics 150, 301–311. Haynes, S.R., Jelinek, W.R., 1981. Low molecular weight RNAs transcribed in vitro by RNA polymerase III from Alu-type dispersed repeats in Chinese hamster DNA are also found in vivo. Proc. Natl. Acad. Sci. U. S. A. 78, 6130–6134. Haynes, S.R., Toomey, T.P., Leinwand, L., Jelinek, W.R., 1981. The Chinese hamster Aluequivalent sequence: A conserved highly repetitious, interspersed deoxyribonucleic acid sequence in mammals has a structure suggestive of a transposable element. Mol. Cell Biol. 1, 573–583. He, Y.X., Salafsky, B., Ramaswamy, K., 2001. Host-parasite relationships of Schistosoma japonicum in mammalian hosts. Trends Parasitol. 17, 320–324. Houck, C.M., Reinhart, F.P., Schmid, C.W., 1979. A ubiquitous family of repeated DNA sequences in the human genome. J. Mol. Biol. 132, 289–306. Hutchison, C.A., Hardies, S.C., Loeb, D.D., Shehee, W.R., Edgell, M.H., 1989. LINEs and related retroposons: long interspersed repeated sequences in the eukaryotic genome. In: Berg, D.H., Howe, M.M. (Eds.), Mobile DNA. American Society of Microbiology, Washington, D.C. Izsvàk, Z., Ivics, Z., Garcia-Estefania, D., Fahrenkrug, S.C., Hackett, P.B., 1996. DANA elements: a family of composite, tRNA-derived short interspersed DNA elements associated with mutational activities in zebrafish (Danio rerio). Proc. Natl. Acad. Sci. U. S. A. 93, 1077–1081. Jelinek, W.R., et al., 1980. Ubiquitous, interspersed repeated sequences in mammalian genomes. Proc. Natl. Acad. Sci. U. S. A. 77, 1398–1402. Jurka, J., 1997. Sequence patterns indicate an enzymatic involvement in integration of mammalian retroposons. Proc. Natl. Acad. Sci. U. S. A. 94, 1872–1877. Jurka, J., Zietkiewicz, E., Labuda, D., 1995. Ubiquitous mammalian-wide interspersed repeats (MIRs) are molecular fossils from the Mesozoic era. Nucleic Acids Res. 23, 170–175. Kajikawa, M., Okada, N., 2002. LINEs mobilize SINEs in the eel through a shared 30 sequence. Cell 111, 433–444. Kajikawa, M., Ohshima, K., Okada, N., 1997. Determination of the entire sequence of turtle CR1: The first open reading frame of the turtle CR1 element encodes a protein with a novel zinc finger motif. Mol. Biol. Evol. 14, 1206–1217. Kapitonov, V.V., Jurka, J., 2003. A novel class of SINE elements derived from 5S rRNA. Mol. Biol. Evol. 20, 694–702. Kido, Y., et al., 1991. Shaping and reshaping of salmonid genomes by amplification of tRNA-derived retroposons during evolution. Proc. Natl. Acad. Sci. U. S. A. 88, 2326–2330. Kido, Y., Himberg, M., Takasaki, N., Okada, N., 1994. Amplification of distinct subfamilies of short interspersed elements during evolution of the Salmonidae. J. Mol. Biol. 241, 633–644. Kido, Y., Saitoh, M., Murata, S., Okada, N., 1995. Evolution of the active sequences of the HpaI short interspersed elements. J. Mol. Evol. 41, 986–995. Kordiš, D., Gubenšek, F., 1997. Bov-B long interspersed repeated DNA (LINE) sequences are present in Vipera ammodytes phospholipase A2 genes and in genomes of Viperidae snakes. Eur. J. Biochem. 246, 772–779. Kordiš, D., Gubenšek, F., 1998. Unusual horizontal transfer of a long interspersed nuclear element between distant vertebrate classes. Proc. Natl. Acad. Sci. U. S. A. 95, 10704–10709. Kramerov, D.A., Vassetzky, N.S., 2005. Short retroposons in eukaryotic genomes. Int. Rev. Cytol. 247, 65–221. Kramerov, D.A., Grigoryan, A.A., Ryskov, A.P., Georgiev, G.P., 1979. Long double-stranded sequences (dsRNA-B) of nuclear pre-mRNA consist of a few highly abundant classes of sequences: Evidence from DNA cloning experiments. Nucleic Acids Res. 6, 697–713. Krayev, A.S., Kramerov, D.A., Skryabin, K.G., Ryskov, A.P., Bayev, A.A., Georgiev, G.P., 1980. The nucleotide sequence of the ubiquitous repetitive DNA sequence B1 complementary to the most abundant class of mouse fold-back RNA. Nucleic Acids Res. 8, 1201–1215. Krayev, A.S., et al., 1982. Ubiquitous transposon-like repeats B1 and B2 of the mouse genome: B2 sequencing. Nucleic Acids Res. 10, 7461–7475. Kumar, S., Hedges, S.B., 1998. A molecular timescale for vertebrate evolution. Nature 392, 917–920. Lander, E.S., et al., 2001. Initial sequencing and analysis of the human genome. Nature 409, 860–921. Malik, H.S., Burke, W.D., Eickbush, T.H., 1999. The age and evolution of non-LTR retrotransposable elements. Mol. Biol. Evol. 16, 793–805. Matsumoto, K., Murakami, K., Okada, N., 1986. Gene for lysine tRNA1 may be a progenitor of the highly repetitive and transcribable sequences present in the salmon genome. Proc. Natl. Acad. Sci. U. S. A. 83, 3156–3160. Matveev, V., Nishihara, H., Okada, N., 2007. Novel SINE families from salmons validate Parahucho (Salmonidae) as a distinct genus and give evidence that SINEs can incorporate LINE-related 3'-tails of other SINEs. Mol. Biol. Evol. 24, 1656–1666. Melamed, P., Chong, K.L., Johansen, M.V., 2004. Evidence for lateral gene transfer from salmonids to two schistosome species. Nat. Genet. 36, 786–787. Nishihara, H., Terai, Y., Okada, N., 2002. Characterization of novel Alu- and tRNA-related SINEs from the tree shrew and evolutionary implications of their origins. Mol. Biol. Evol. 19, 1964–1972. Nishihara, H., Smit, A.F.A., Okada, N., 2006. Functional noncoding sequences derived from SINEs in the mammalian genome. Genome Res. 16, 864–874.

28

V. Matveev, N. Okada / Gene 434 (2009) 16-28

Oakley, T.H., Phillips, R.B., 1999. Phylogeny of Salmonine fishes based on growth hormone introns: Atlantic (Salmo) and Pacific (Oncorhynchus) salmon are not sister taxa. Mol. Phylogenet. Evol. 11, 381–393. Ogiwara, I., Miya, M., Ohshima, K., Okada, N., 1999. Retropositional parasitism of SINEs on LINEs: identification of SINEs and LINEs in elasmobranches. Mol. Biol. Evol. 16, 1238–1250. Ogiwara, I., Miya, M., Ohshima, K., Okada, N., 2002. V-SINEs: a new superfamily of vertebrate SINEs that are widespread in vertebrate genomes and retain a strongly conserved segment within each repetitive unit. Genome Res. 12, 316–324. Ohshima, K., Okada, N., 2005. SINEs and LINEs: symbionts of eukaryotic genomes with a common tail. Cytogenet. Genome Res. 110, 475–490. Okada, N., 1991a. SINEs. Curr. Opin. Genet. Dev. 1, 498–504. Okada, N., 1991b. SINEs: shot interspersed repeated elements of the eukaryotic genome. Trends Ecol. Evol. 6, 358–361. Okada, N., Ohshima, K., 1995. Evolution of tRNA-derived SINEs. In: Maraia, R.J. (Ed.), The impact of short interspersed elements (SINEs) on the host genome. RG Landes Company, Austin, TX, pp. 61–79. Okada, N., Hamada, M., Ogiwara, I., Ohshima, K., 1997. SINEs and LINEs share common 3' sequence: a review. Gene 205, 229–243. Permanyer, J., Gonzàlez-Duarte, R., Albalat, R., 2000. The non-LTR retrotransposons in Ciona intestinalis: new insights into the evolution of chordate genomes. Genome Biol. 4, R73. Piskurek, O., Okada, N., 2006. Simple and complex SINEs: a brief critical comment. Gene 375, 110. Piskurek, O., Okada, N., 2007. Poxviruses as possible vectors for horizontal transfer of retroposons from reptiles to mammals. Proc. Natl. Acad. Sci. U. S. A. 104, 12046–12051. Piskurek, O., Nikaido, M., Boeadi, Baba, M., Okada, N., 2003. Unique mammalian tRNAderived repetitive elements in dermopterans: the t-SINE family and its retrotransposition through multiple sources. Mol. Biol. Evol. 20, 1659–1668. Piskurek, O., Austin, C.C., Okada, N., 2006. Sauria SINEs: novel short interspersed retroposable elements that are widespread in reptile genomes. J. Mol. Evol. 62, 630–644. Rogers, J., 1985. Origins of repeated DNA. Nature 317, 765–766. Rubin, C.M., Houck, C.M., Deininger, P.L., Friedmann, T., Schmid, C.W., 1980. Partial

nucleotide sequence of the 300-nucleotide interspersed repeated human DNA sequences. Nature 284, 372–374. Sakamoto, K., Okada, N., 1985. Rodent type 2 Alu family, rat identifier sequence, rabbit C family, and bovine or goat 73-bp repeat may have evolved from tRNA genes. J. Mol. Evol. 22, 134–140. Shedlock, A.M., Okada, N., 2000. SINEs insertions: powerful tools for molecular systematics. Bioessays 22, 148–160. Singer, M.F., 1982. SINEs and LINEs: highly repeated short and long interspersed sequences in mammalian genomes. Cell 28, 433–434. Smit, A., Riggs, A., 1995. MIRs are classic tRNA-derived SINEs that amplified before the mammalian radiation. Nucleic Acids Res. 23, 98–102. Snyder, S.D., Loker, E.S., 2000. Evolutionary relationships among the Schistosomatidae (Platyhelminthes: Digenea) and an Asian origin for Schistosoma. J. Parasitol. 86, 283–288. Takasaki, N., Park, L., Kaeriyama, M., Gharrett, A.J., Okada, N., 1996. Characterization of species-specifically amplified SINEs in three salmonid species — chum salmon, pink salmon, and kokanee: the local environment of the genome may be important for the generation of a dominant source gene at a newly retroposed locus. J. Mol. Evol. 42, 103–116. Takasaki, N., Yamaki, T., Hamada, M., Park, L., Okada, N., 1997. The salmon SmaI family of short interspersed repetitive elements (SINEs): interspecific and intraspecific polymorphism of the insertion of SINEs in the genomes of chum and pink salmon. Genetics 146, 369–380. Ullu, E., Tschudi, C., 1984. Alu sequences are processed 7SL RNA genes. Nature 312, 171–172. Waterston, R.H., et al., 2002. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562. Weiner, A.M., Deininger, P.L., Efstratiadis, A., 1986. Nonviral retroposons: genes, pseudogenes, and transposable elements generated by the reverse flow of genetic information. Annu. Rev. Biochem. 55, 631–661. Winkfein, R.J., Moir, R.D., Krawetz, S.A., Blanco, J., States, J.C., Dixon, G.H., 1988. A new family of repetitive, retroposon-like sequences in the genome of the rainbow trout. Eur. J. Biochem. 176, 255–264. Xiong, Y., Eickbush, T.H., 1990. Origin and evolution of retroelements based upon their reverse transcriptase sequences. EMBO J. 9, 3353–3362.