A non-LTR retroelement extinction in Spermophilus tridecemlineatus

A non-LTR retroelement extinction in Spermophilus tridecemlineatus

Gene 500 (2012) 47–53 Contents lists available at SciVerse ScienceDirect Gene journal homepage: www.elsevier.com/locate/gene A non-LTR retroelement...

452KB Sizes 0 Downloads 61 Views

Gene 500 (2012) 47–53

Contents lists available at SciVerse ScienceDirect

Gene journal homepage: www.elsevier.com/locate/gene

A non-LTR retroelement extinction in Spermophilus tridecemlineatus Roy N. Platt II, David A. Ray ⁎ Department of Biochemistry, Molecular Biology, Entomology and Plant Pathology, Mississippi State University, Mississippi State, MS 39762, USA Institute for Genomics, Biocomputing and Biotechnology, Mississippi State University, Mississippi State, MS 39762, USA

a r t i c l e

i n f o

Article history: Accepted 9 March 2012 Available online 21 March 2012 Keywords: Transposable element Retrotransposon LINE Ground squirrel

a b s t r a c t The typical mammalian genome is dominated by two types of transposable elements (TEs), the autonomous and non-autonomous non-LTR retrotransposons, i.e. LINEs and SINEs, and with few exceptions there is a sole active LINE family (L1). During an ongoing investigation of TEs in rodents we determined that overall transposon activity has been steadily declining in Spermophilus tridecemlineatus. More specifically, the typically ubiquitous L1 activity of mammals has decreased drastically within the last 26 MY. Indeed, only three L1 insertions with intact ORF1 sequences were readily identifiable and no intact ORF2 sequences were identified. The last L1 and SINE insertions date to ~ 5.3 MYA and 4 MYA, respectively. Based on our inability to computationally identify recently inserted L1 elements we suggest that S. tridecemlineatus is experiencing a quiescence or extinction of non-LTR retrotransposon activity. Such a finding represents only the fourth instance of a loss of non-LTR retrotransposon activity identified in mammals and, as such, represents an important additional data point to guide our understanding of LINE dynamics in eutherians. © 2012 Elsevier B.V. All rights reserved.

1. Introduction Transposable elements (TEs) are repetitive DNA sequences that accumulate in genomes via multiple mechanisms and are particularly powerful mutagens. For example, in the course of their mobilization, they may influence gene expression via the introduction/disruption of regulatory elements, exons, and splice junctions (Babushok et al., 2007; Cordaux et al., 2006; Hasler et al., 2007; Jurka, 1995; Kazazian, 2004; Matlik et al., 2006; Nigumann et al., 2002; Peaston et al., 2004; Speek, 2001). However, TEs need not be actively mobilizing to have an effect on genome structure. TE-mediated genome rearrangements through non-homologous recombination and chromosomal rearrangements are well-documented (Batzer and Deininger, 2002; Eichler and Sankoff, 2003; Gray, 2000; Lim and Simmons, 1994; Lonnig and Saedler, 2002). Deletions, duplications, inversions, translocations and chromosome breaks have all been linked to the presence of TEs in a variety of genomes (Caceres et al., 1999; Gray, 2000; Lim and Simmons, 1994; Mathiopoulos et al., 1998; Weil and Wessler, 1993; Zhang and Peterson, 2004). TEs are classified into two major classes. DNA transposons, or Class II elements, mobilize via a DNA intermediate and are often described as using a “cut and paste” mechanism in which they excise and relocate themselves within the genome. Conversely, Class I elements, the

Abbreviations: TE, transposable element; LINE, long interspersed element; SINE, short interspersed element; MYA, million years ago; MY, million years; LTR, long terminal repeat; UTR, untranslated region; K2P, Kimura 2-parameter; Mb, megabase. ⁎ Corresponding author. Tel.: + 1 662 325 7740; fax: + 1 662 325 8664. E-mail address: [email protected] (D.A. Ray). 0378-1119/$ – see front matter © 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.gene.2012.03.051

retrotransposons, utilize a “copy and paste” mechanism known as retrotransposition. Retrotransposition involves transcription of a retrotransposon by an RNA polymerase and reintegration of the element into a novel genomic location via reverse transcriptase (Kajikawa and Okada, 2002). Two types of retrotransposons are common; Long Terminal Repeat (LTR) elements such as endogenous retroviruses and non-LTR elements. The non-LTR elements are subdivided into two groups, the Long INterspersed Elements (LINEs) and Short INterspersed Elements (SINEs). LINEs are considered autonomous elements, in that they can mobilize themselves using self-encoded enzymatic machinery, while SINEs are non-autonomous elements that require the protein machinery form a LINE counterpart for their own mobilization (Kajikawa and Okada, 2002). With few exceptions (Pagan et al., 2010; Pritham and Feschotte, 2007; Ray et al., 2008; Thomas et al., 2011) non-LTR retrotransposons (LINEs and SINEs) are the predominant TEs in mammalian genomes. In nearly all mammals examined, a single lineage of LINE, the LINE-1 (L1) superfamily, and/or a SINE counterpart(s) dominates the active retrotransposon repertoire. As a result, the L1 superfamily has played a major role shaping the mammalian genome including but not limited to X-chromosome inactivation (Bailey et al., 2000; Lyon, 1998), double stranded DNA break repair (Morrish et al., 2002), and coding exaptation (reviewed in Burns and Boeke, 2008). Structurally, LINEs contain a 5′ untranslated region (UTR), two open reading frames (ORF1 and ORF2), a 3′ UTR, and a poly(A) tail (Malik et al., 1999). Several of these components are required for successful mobilization of any active LINE. For example, an internal promoter in the 5′ UTR recruits RNA Pol II (Swergold, 1990) while ORF1

48

R.N. Platt II, D.A. Ray / Gene 500 (2012) 47–53

and ORF2 encode enzymes (RNA chaperones and endonuclease/ reverse transcriptase, respectively) that are required for nuclear import and reincorporation into the genome (Feng et al., 1996; Martin and Bushman, 2001; Mathias et al., 1991). Given the potential impact of TEs in general and LINE-1 elements in particular on mammalian genomes as well as the increasing availability of genomic sequence data and the development of powerful computational resources, we began a survey of available mammalian genomes for unique L1 activity. As part of our investigation into TEs in rodents we detected that L1 activity has been substantially reduced and possibly eliminated in the 13-lined ground squirrel, Spermophilus tridecemlineatus. At least three other independent examples of reduced or eliminated L1 activity have been identified previously. Convincing evidence exists for an L1 extinction event dating back 22 MY in pteropodid bats (Cantrell et al., 2008). In sigmodontine rodents, the L1 lineage is believed to have gone extinct within the past 9 MY (Casavant et al., 2000; Grahn et al., 2005; Rinehart et al., 2005). Finally, the spider monkey (Ateles paniscus) genome exhibits such an extreme reduction of L1 activity that an extinction event beginning 25 MYA is likely though it could not be confidently stated (Boissinot et al., 2004). As a result, an extinction/quiescence event in S. tridecemlineatus represents only the second independent L1 loss in rodents and only the fourth example in mammals as a whole. The S. tridecemlineatus genome therefore represents an important addition to our understanding of mammalian retrotransposon dynamics. Herein, we present evidence for this decrease in L1 activity within the context of other wellstudied rodents, the model organisms Mus musculus (mouse), Rattus norvegicus (rat) and Homo sapiens (human). We note that Helgen et al. (2009) recently renamed Spermophilus to Ictidomys. However, for ease of comparison with other work recently published on the genome, in particular the analyses of Lindblad-Toh et al. (2011) we have chosen to retain the name Spermophilus. 2. Materials and methods 2.1. Transposable element identification and quantification Many previously identified repetitive elements for S. tridecemlineatus are cataloged in the common transposable element repository, RepBase (Jurka et al., 2005). However, variations in program usage, annotation, and data sources presented the possibility that the TE landscape of S. tridecemlineatus might be incompletely represented. With this in mind, we conducted a de novo TE identification to complement the available Repbase library. Repetitive sequences in one quarter (~450 Mb) of the S. tridecemlineatus early stage (2 ×) whole genome shotgun (WGS) assembly (Genbank accession number: AAQQ01000000) were identified de novo using RepeatScout (Price et al., 2005). Because of the relatively small N50 size (2.75 kb) of the 2× S. tridecemlineatus draft, we sampled over 160,000 randomly distributed contigs for our analysis. RepeatMasker (Smit et al., 2004) searches were used to quantify copy number for each repeat in the entire genome, and those present in less than ten copies were removed from further analysis. Repeats consisting of low sequence complexity (satellite sequences) were also removed. Retrotransposons mobilize under a master gene model, in which only a few elements are capable of mobilizing at any one time (Deininger et al., 1992). The presumed master gene sequence can be inferred by comparing multiple progeny and creating a consensus sequence which ignores mutations that occurred in the progeny sequences after insertion. To infer the master gene (consensus) sequences for each repeat, the filtered RepeatScout (Price et al., 2005) output was used to query the entire S. tridecemlineatus WGS using BLAST v2.2.23 (Altschul et al., 1997). Hits of at least 75 base pairs (bp) were extracted along with a minimum of 500 bp of flanking sequence using custom PERL scripts. Extracted sequences were aligned using MUSCLE v3.8.31 (Edgar, 2004) and from these

alignments consensus sequences were reconstructed using a 50% majority rule. Full length elements were presumed only when single copy DNA was identifiable on the 5′ and 3′ ends of the alignments. In cases where the full length of the consensus sequence had not been captured the process was repeated until single copy DNA sequence was identifiable at both ends. The resulting library of elements was then submitted to CENSOR (Kohany et al., 2006) to ascertain their identity with regard to previously classified elements in RepBase (Jurka et al., 2005). Each element identified in our analysis fell within the classification parameters of Wicker et al. (2007), therefore subsequent analyses used the existing RepBase classification and naming system. Recently, a higher coverage assembly of the genome was released (AGTP00000000) and all subsequent analyses utilized this version. The TE complements of rat and mouse have been studied extensively (Cabot et al., 1997; Hardies et al., 2000; Rebuzzini et al., 2009; Saxton and Martin, 1998) and multiple TEs have been characterized for each. RepeatMasker tables were accessed via the UCSC genome browser (www.genome.ucsc.edu) for each species and used to compare their TE landscapes to that of S. tridecemlineatus. In addition, BLAST searches were used to identify and extract elements used in the distance based analyses below. 2.2. Neutral mutation rate and TE age estimation The presence of multiple TE insertions with little nucleotide divergence from each other and from the master (consensus) element is considered evidence of recent activity (Ivics et al., 1997). To estimate the active periods for each TE family however, a neutral mutation rate is needed. Many studies have examined phylogenetic relationships within Sciuridae (Herron et al., 2004; Steppan et al., 1999, 2004), but have not produced a robust neutral mutation rate specific to the Spermophilus lineage. To resolve this problem, we extracted and concatenated nine exons (9,069 bp; Supplemental Table 1) from the squirrel, rat, and mouse genomes using the ENSEMBL:55 database (Birney et al., 2004). Kimura 2-parameter (K2P; Kimura, 1980) values were calculated at third base synonymous sites using MEGA5 (Tamura et al., 2011). Fossil calibration dates of 16 and 75 MYA were used to date the Mus–Rattus (Horner et al., 2007; Huchon et al., 2007; Murphy et al., 2007), and sciurid–murid (Bininda-Emonds et al., 2007; Huchon et al., 2007; Murphy et al., 2007) divergences respectively. K2P divergence values between each TE insertion and its consensus were calculated based on sequence alignments generated using RepeatMasker (Pagan et al., 2010; Smit et al., 2004). By applying the neutral mutation rate to each K2P distance value we were able to estimate activity periods for selected families. In mammals increased methylation of cytosine at CpG sites has been demonstrated to act as a regulatory mechanism to suppress gene expression and TE activity (Xing et al., 2004; Yoder et al., 1997). To determine the rate of CpG mutations in the S. tridecemlineatus genome, the rate of cytosine to thymine conversion at CpG sites was calculated compared to all other non-CpG mutations. Five hundred random insertions from each squirrel-specific SINE and LINE subfamily were queried using PERL scripts developed by Xing et al. (2004). 2.3. Identifying recently active non-LTR retrotransposons L1 transposition relies on transcription of intact ORF1 and 2 regions. To determine the number of potentially active L1s elements in the genome, ORFs 1 and 2 of the S. tridecemlineatus L1 subfamilies, L1-1_Str and L1-2_Str, were used as BLAST queries against the genome. Each hit was extracted and translated. Potentially functional sequences were defined a priori as those harboring appropriate start codons and containing a single stop codon within a ±10% window of the expected range.

R.N. Platt II, D.A. Ray / Gene 500 (2012) 47–53

Most L1 copies incompletely insert resulting in 5′ truncations that are non-functional at the moment of insertion. We could therefore infer the history of L1 activity by assaying the more common truncated L1 insertions. We estimated the ages of truncated L1 elements by querying the S. tridecemlineatus WGS with 500 bp from the 3′ end of L1-1_Str and L1-2_Str (excluding the poly-A tail). Potential dates for the most recent L1 transposition in S. tridecemlineatus were determined based on the average pairwise K2P distance values among the twenty-five most similar truncated elements after having removed hypermutable CpG dinucleotides as per Pagan et al. (2010). Mobilization of SINEs relies on functional L1 elements (Dewannieux and Heidmann, 2005; Kajikawa and Okada, 2002). Thus, SINEs can serve as indirect markers for L1 activity because identification of recently active SINE families would refute the hypothesis that L1 expression has been subjected to extinction or reduction in activity and suggest other possibilities. For example, lowered L1 transposition rates might be due to increased competition for the ORF protein machinery from SINEs. To test this hypothesis, recent SINE activity was also dated as described above. Finally, TE dynamics can be diagnosed using phylogenetic tools. As insertions age and accumulate mutations, branch lengths on a phenogram of representative insertions will increase. Younger elements, on the other hand, are expected to form polytomies with very short branch lengths (Cantrell et al., 2008; Grahn et al., 2005; Scott et al., 2006). We therefore compared L1 phenograms from taxa with confirmed recent activity (human, mouse, and rat (Cabot et al., 1997; Hardies et al., 2000; Rebuzzini et al., 2009)) to S. tridecemlineatus. Using BLAST, the 250 insertions most similar to the ORF2 consensus sequence were identified and extracted from the S. tridecemlineatus WGS. Two hundred and fifty insertions with the highest similarity to the respective ORF2 consensus sequences from human, rat and mouse were also identified. Finally, a single neighbor joining tree encompassing each of the four data sets was inferred with MEGA5 (Tamura et al., 2011) based on K2P distance values. 3. Results 3.1. Repeat identification and the TE landscape Our de novo analysis recovered 98 of the 104 S. tridecemlineatus TEs present in Rebpase. Each element was subsumed within subfamilies listed in Repbase based on the parameters proposed by Wicker et al. (2007) and we have adopted the names recognized by Repbase throughout the rest of this work. The genome of S. tridecemlineatus is dominated by L1 LINEs, proto-B1 and ID SINEs as well as spuma-like (ERV3) retroviruses (Table 1). This pattern is similar to Mus and Rattus with a few general exceptions. First, TEs are much less abundant in the squirrel genome than in the murid rodents. In S. tridecemlineatus TEs comprise ~26.3% (608.9 Mb) of the genome compared to ~ 39.2 (1,072 Mb) and ~ 41.5% (1,030.5 Mb) in Mus and Rattus, respectively

49

per our Repeatmasker runs and repeat tables from the genome browser. The calculation for Rattus and Mus are comparable to those presented previously (Gibbs et al., 2004). Next, beta-like retroviruses (ERV2) present in large numbers in the murid genomes are much less abundant in the squirrel genome. Finally, our data confirm the post-divergence expansion of proto-B1 SINEs in S. tridecemlineatus compared to the B1 SINE dominance in the murid genomes (Churakov et al., 2010; Veniaminova et al., 2007a, 2007b). 3.2. Mutation rate and age estimation Mutation rate analysis revealed an average of 39.6% divergence at third base synonymous sites between murids and S. tridecemlineatus. This yielded a neutral mutation rate of ~ 2.64 × 10 − 9 (0.264% per MY), only slightly higher than the average mammalian mutation rate of 2.22 × 10 − 9 (0.222% per MY) (Kumar and Subramanian, 2002). Using this rate, we dated peak activity periods for twelve S. tridecemlineatus specific LINE and SINE families. CpG mutation rates were slightly elevated within SINE and LINE insertions 1.15–3.80 × (mean = 2.1 ×). TE expansion profiles (Fig. 1) suggest an overall period of declining L1 activity in S. tridecemlineatus based on insertion divergence from the consensus element. Combined with the neutral mutation rate calculated above, this decline began around 26 MYA has continued to the present. This decline appears to have affected all TE classes. 3.3. Are there active L1s in the squirrel genome? We estimated the number of potentially active L1 elements by identifying intact ORFs. Using tBLASTn searches, over 400,000 hits to ORF1 and ORF2 were identified and extracted. Of these only three insertions contained ORF sequences that exhibited a methionine start codon and whose stop codon fell within a 10% window of the expected position. Each of these insertions matched L1-1 ORF1, not ORF2. To confirm that intact ORF1 sequences were indeed contained within unviable elements, we identified multiple premature stop codons in the ORF2 portion of the corresponding L1 insertions. To estimate the likely most recent mobilization periods for L1, five hundred bp from the 3' region of L1, excluding the poly-A tail, were used as queries for BLAST searches. The twenty-five best hits for L1-1 and L1-2 exhibited 2.8 ± 0.3% and 2.7 ± 0.3% sequence divergence at their 3′ ends corresponding to ~5.3 ± 1.13 MY since their insertion. Compared to murid rodents, this suggests minimal activity at best. The average genetic distances within the 3′ portion of L1 among the twenty-five most recent insertions in the murid genomes drastically lower average divergences are observed (K2P: Rattus 0.3 ± 0.1%; Mus 0%) indicating recent mobilization in those genomes. Recent activity in Mus is also supported by the presence of polymorphic L1 insertions in laboratory strains (Akagi et al., 2008). To secondarily query L1 activity we searched for evidence of recently active SINEs. The squirrel SINE STRID3 is the most homogeneous family and therefore likely to be the

Table 1 Genome proportion and content for the major transposable element classes and LINE subclasses identified in the WGS of Spermophilus tridecemlineatus with comparisons to Rattus and Mus RepeatMasker tracts from the UCSC genome browser. TE classification

Taxon S. tridecemlineatus

DNA transposons LTR LINE CR1 L1 RTE L2 SINE

M. musculus

R. norvegicus

% genome

Coverage (Mb)

% genome

Coverage (Mb)

% genome

Coverage (Mb)

1.2% 6.4% 10.7% 0.4% 10.3% b0.1% 0% 8%

28.2 147.1 248.1 9.2 238.6 0.3 0 185.5

1.1% 10.7% 19.8% 0.1% 19.4% b 0.1% 0.4% 7.6%

28.4 289.9 538.3 1.5 525.8 0.5 10.5 206.4

1% 9.5% 23.5% 0.1% 23.1% b0.1% 0.4% 7.5%

24.9 236.2 583.8 1.8 573.1 0.5 8.6 185.6

50

R.N. Platt II, D.A. Ray / Gene 500 (2012) 47–53

Fig. 1. Transposable element insertions were identified in the S. tridecemlineatus genome using RepeatMasker. The proportion of queried sequence was quantified based on the percentage of nucleotides derived from repeats versus the total number of nucleotides in the genome. Elements were binned based on genetic distance (Kimura 2-paramter) from their consensus. Elements at each bin are further subdivided into classes (SINE, LINE, LTR, and DNA transposons). For M. musculus and R. norvegicus, UCSC RepeatMasker tracks were used to quantify, bin, and classify repeats.

most recently active SINE in the genome. Analysis suggested an average genetic distance of 2.1% when comparing the twenty-five most similar insertions. This suggests that the most recent SINE activity was centered ~4.0 ± 0.8 MYA, corresponding well with the period of most recent activity seen for L1-1_Str and L1-2_Str. Phenograms of the 250 most similar L1 ORF2 sequences from S. tridecemlineatus, Rattus, Mus and Homo, of which the latter three are known to harbor active L1s, provide striking graphical support for our contention that L1 activity in ground squirrels has been dramatically curtailed (Fig. 2; Cabot et al., 1997; Hardies et al., 2000; Rebuzzini et al., 2009). The L1 ORF2 sequences for Mus and Rattus form one large polytomy for each species with very short terminal branches, indicating a large number of recent insertions from a single L1 subfamily. Human ORF2 sequences exhibit slightly more variation (K2P 1.7%), than Rattus and Mus but exhibit a similar pattern overall. ORF2 insertions in S. tridecemlineatus recover long terminal branches (14.9% divergence within S. tridecemlineatus ORF2) and multiple clades, indicating either a very high mutation rate or long periods of

L1 quiescence. Given our estimate of the mutation rate calculated above, the former option is unlikely. 4. Discussion 4.1. An independent reduction in L1 activity in S. tridecemlineatus The TE landscape of the S. tridecemlineatus genome is distinct from Mus and Rattus (Fig. 1 and Table 1). The oldest elements in all three taxa exhibit similar profiles, as would be expected given their common ancestry. However, after the split with the murid lineage, S. tridecemlineatus exhibits a decline in activity for all TE classes. Our analyses suggest that there is very little, if any, TE activity in the squirrel genome. A lack of DNA transposon activity is not unusual in mammals. With a few notable exceptions, DNA transposon activity in mammals ceased 40–50 MYA (Giordano et al., 2007; Lander et al., 2001; Mikkelsen et al., 2007; Pace et al., 2008; Ray et al., 2008; Thomas et al., 2011; Waterston et al., 2002; Zhao et al., 2009). However, a lack of recent non-LTR retrotransposon activity is a rare observation. Two L1 extinction events had been identified previously in mammals (Fig. 3), the pteropodid extinction 22 MYA (Cantrell et al., 2008) and the sigmodontine extinction 9 MYA (Grahn et al., 2005). A reduction of L1 activity was identified in the spider monkey beginning within the last 25 MY (Boissinot et al., 2004). Our data suggest that between 19 and 26 MYA L1, activity begin declining in the proto-Spermophilus genome, a decline that has continued in S. tridecemlineatus and likely ceased altogether ~4–5 MYA. 4.2. Potential impacts of L1 loss in S. tridecemlineatus

Fig. 2. A neighbor-joining tree based on the Kimura 2-parameter distances of ORF2 sequences from the 250 most similar (and therefore most likely to be recent) L1 insertions in M. musculus, R. norvegicus, H. sapiens, and S. tridecemlineatus demonstrating the large inter-element divergences for even the most similar L1 insertions in S. tridecemlineatus. The average genetic distance within each group is also indicated.

The addition of S. tridecemlineatus to the list of mammalian genomes with little or no L1 activity represents an important fourth instance in furthering our understanding of the functional impact of these events. Three potential examples of the significance of this observation spring to mind. First, it is well known that LINEs and SINEs are substrates for recombination (Deininger et al., 2003; Kazazian, 2004). As time passes after the cessation of activity, individual insertions diverge thereby reducing the risk of ectopic recombination. Thus, we might now ask whether rates of LINE- and SINE-

R.N. Platt II, D.A. Ray / Gene 500 (2012) 47–53

mediated recombination events are correspondingly lowered in the affected taxa. Second, transposable elements play a substantial role in the introduction and spread of regulatory sequences (reviewed in Feschotte, 2008). In mammals, close to 7 Mb of regulatory sequence is derived from TEs (Lindblad-Toh et al., 2011). This raises the question, since S. tridecemlineatus has lost LINE activity, are new regulatory elements being gained at a slower rate when compared to other rodents with an active retrotransposon landscape? Unfortunately, the non-model status of S. tridecemlineatus suggests a long road to a basic understanding of their regulatory pathways before the many questions can be addressed. Finally, the reduction and/or extinction of non-LTR activity raises interesting questions regarding the evolution of genome size in these taxa. For example, without the constant introduction of new sequences from the mobilization of LINEs and SINEs, has there been a corresponding decrease in genome size when compared with other mammals? While this is an interesting question, it is also a complex one. For example, as noted by Petrov (2001) changes in rates of transposition may not correspond directly with changes in copy number. We present this data with the hope that it will provide a valuable point of reference for future studies Taxonomically, the scope of the L1 extinctions described herein contradicts the assumption of active non-LTR retrotransposon activity in mammalian genomes. While it is clear that most mammals examined to date exhibit L1 activity, it may be that L1 extinction/quiescence is more widespread than previously thought. Assuming the L1 reduction observed in S. tridecemlineatus is shared by other taxa in the lineage (Sciuridae: Xerini), these four independent extinction/ quiescence events potentially affect up to 675 species [Fig. 3], or around 12% of mammals (Wilson and Reeder, 2005). With increasing sequencing capacity the taxonomic sampling of mammalian genomes will improve and allow us to examine the non-LTR landscape of additional non-model taxa. 4.3. Possible factors for L1 loss in S. tridecemlineatus An important question to ask is what drives the loss of retrotransposon activity in certain mammalian genomes? Our total knowledge

51

of therian L1 extinction or quiescence events totals an N of four (Fig. 3). Therefore, an attempt to definitively answer this question based on current data would be premature. With such a small dataset, the evolutionary history of each species (or group) will be of utmost importance. However, certain scenarios can be considered. Several studies (Charlesworth et al., 1994; Sánchez-Gracia et al., 2005) support an equilibrium model of transposon accumulation versus removal via selection; although this scenario has also been criticized (Le Rouzic et al., 2007). Many factors alter the equilibrium between TE accumulation and removal. These include but are not limited to: 1) genetic drift; 2) competition between TEs; and 3) evolution of host defense mechanisms. Genetic drift does not appear to be playing a role in the ground squirrel. For example, one could hypothesize an ancestral population of proto-squirrels that encountered a bottleneck, allowing genetic drift to drastically impact the number of active L1 elements in the genome. However, our observation that TE activity in the squirrel experienced a slow reduction over a prolonged period of time contradicts this scenario. Much as their counterparts in macroscopic ecosystems, TE families contest for limited genomic resources, and this could lead to more efficient elements outcompeting their less efficient rivals (Brookfield, 2005; Casavant et al., 1998; Dewannieux and Heidmann, 2005; Veniaminova et al., 2007b; Venner et al., 2009). For example, in the rice rat, Oryzomys palustris, loss of L1 activity is correlated with an increase activity of the ERV MysTR (Cantrell et al., 2005) a pattern that is congruent in all the sigmodontines (Erickson et al., 2011). Such competition, however, fails to explain the recent suppression of L1 activity in S. tridecemlineatus. Instead, our data suggest that there has been a reduction in TE activity in general and we detect no evidence of increased activity from a competing family of elements. Finally, host defense mechanisms are powerful factors in limiting or decreasing TE activity in the genome. These mechanisms range from increased CpG methylation (Xing et al., 2004; Yoder et al., 1997), epigenetic suppression (Slotkin and Martienssen, 2007), and small RNA silencing (Aravin et al., 2007). It has been suggested that increased CpG methylation evolved as a defense against genomic

Fig. 3. A general phylogeny of mammals compiled from several sources (McCormack et al., in press; Perelman et al., 2011; Steppan et al., 2004; Teeling et al., 2005). Currently, two L1 extinction (marked with an “X”) and quiescence (marked with a “/”) events have been convincingly demonstrated in mammals. Our analysis indicates that a similar event (“!”) is ongoing in the S. tridecemlineatus genome.

52

R.N. Platt II, D.A. Ray / Gene 500 (2012) 47–53

invaders (Xing et al., 2004; Yoder et al., 1997). Indeed, some taxa with high levels of TE activity exhibit high CpG mutation rates (~ 6× in humans, Xing et al., 2004; ~ 8× in bats, Ray et al., 2008). CpG mutations were only slightly elevated in S. tridecemlineatus TEs (2.1×) and this could be interpreted in two ways. First, CpG mutation rates may have been much higher in the past, when TEs were more active. At that time, CpG mutations reduced TE activity, and the mutation rate is now returning to a lower equilibrium level. Alternatively, it is possible that CpG mutation rates never deviated greatly from their current levels. If increased CpG mutation rates played a role in the reduction of TE activity in S. tridecemlineatus, then it follows that CpG mutation rate in the past must have been much higher than current levels, a hypothesis that remains to be tested. Another interesting hypothesis involves the presence of Piwi RNA processing genes (PIWIL1, PIWIL2, PIWIL3, PIWIL4) and may be worthy of further study. piRNAs, the small RNA partners of Piwi homologs, are known to influence TE activity in mammals (Aravin and Bourc'his, 2008; O'Donnell and Boeke, 2007; Seto et al., 2007). Pteropodid bats and S. tridecemlineatus contain four piwi-RNA genes and have both experienced drastic reductions in TE activity. By contrast, Rattus, Mus, and several other mammals lack PIWIL3 while harboring active retrotransposons (F. Hoffmann, unpublished data). Even more striking, Myotis lucifugus lacks two Piwi homologs, PIWIL1 and PIWIL3 (F. Hoffmann, pers. comm.), and exhibits DNA transposon activity unprecedented in any mammalian genome investigated to date (Pritham and Feschotte, 2007; Ray et al., 2008). Could a more efficient piRNA system have driven the reduction of TE activity in S. tridecemlineatus? In conclusion, the reduction/extinction of L1 activity in S. tridecemlineatus cannot be confidently attributed to any single mechanism described above. Indeed, it would be overly simplistic to attribute this phenomenon to a single factor. The complex interactions between TEs and host genomes undoubtedly have led to the diverse array of TE activity and distributions observed. TE diversity and quantity varies not only among species (Rebuzzini et al., 2009; Volff et al., 2000) but also among individuals (Witherspoon et al., 2010), and temporally (Filatov et al., 2008; Khan et al., 2006) yet few mammals have experienced as drastic a reduction in L1 activity as S. tridecemlineatus. Our analysis revealed only three L1 insertions with an intact ORF1. No intact ORF2 sequences were recovered. By comparison, mouse has confirmed L1 activity and is estimated to contain 2,400 (Zemojtel et al., 2007) to 3,000 (DeBerardinis et al., 1998) active L1 copies. Regardless of the assumptions made, it is apparent that S. tridecemlineatus genome contains at most a fraction of the L1 activity found in other rodents. With this addition to the list of mammals having experienced such a significant reduction in L1 activity, we are now better equipped to identify factors contributing to TE suppression in mammals Supplementary materials related to this article can be found online at doi:10.1016/j.gene.2012.03.051.

Acknowledgments We thank the Broad Institute Genome Sequencing Platform and Genome Sequencing and Analysis Program, F. Di Palma and Kerstin Lindblad-Toh for making the early assembly of S. tridecemlineatus available. This work was supported by the National Science Foundation [MCB-0841821 and DEB-1020865]. Additional support was provided by the Institute for Genomics, Biocomputing and Biotechnology, the College of Agriculture and Life Sciences (both at Mississippi State University) and the Eberly College of Arts and Sciences at West Virginia University. Constructive criticism of the manuscript was provided by F.G. Hoffman, A. Chong, M.P. Ramakodi, M.W. Vandewege, J.D. Smith, A. Sharma, C. Thompson, C. Lavoie, and anonymous reviewers. S. DiFazio provided computational support.

References Akagi, K., Li, J., Stephens, R.M., Volfovsky, N., Symer, D.E., 2008. Extensive variation between inbred mouse strains due to endogenous L1 retrotransposition. Genome Res. 18, 869–880. Altschul, S., et al., 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402. Aravin, A.A., Bourc'his, D., 2008. Small RNA guides for de novo DNA methylation in mammalian germ cells. Genes Dev. 22, 970–975. Aravin, A.A., Hannon, G.J., Brennecke, J., 2007. The Piwi-piRNA pathway provides an adaptive defense in the transposon arms race. Science 318, 761–764. Babushok, D.V., Ostertag, E.M., Kazazian Jr., H.H., 2007. Current topics in genome evolution: molecular mechanisms of new gene formation. Cell. Mol. Life Sci. 64, 542–554. Bailey, J.A., Carrel, L., Chakravarti, A., Eichler, E.E., 2000. Molecular evidence for a relationship between LINE-1 elements and X chromosome inactivation: the Lyon repeat hypothesis. Proc. Natl. Acad. Sci. U. S. A. 97, 6634–6639. Batzer, M.A., Deininger, P.L., 2002. Alu repeats and human genomic diversity. Nat. Rev. Genet. 3, 370–379. Bininda-Emonds, O.R.P., et al., 2007. The delayed rise of present-day mammals. Nature 446, 507–512. Birney, E., et al., 2004. An overview of Ensembl. Genome Res. 14, 925–928. Boissinot, S., Roos, C., Furano, A.V., 2004. Different rates of LINE-1 (L1) retrotransposon amplification and evolution in New World monkeys. J. Mol. Evol. 58, 122–130. Brookfield, J.F.Y., 2005. The ecology of the genome — mobile DNA elements and their hosts. Nat. Rev. Genet. 6, 128–136. Burns, K., Boeke, J., 2008. Great exaptations. J. Biol. 7, 5. Cabot, E., Angeletti, B., Usdin, K., Furano, A., 1997. Rapid evolution of a young L1 (LINE-1) clade in recently speciated Rattus taxa. J. Mol. Evol. 45, 412–423. Caceres, M., Ranz, J.M., Barbadilla, A., Long, M., Ruiz, A., 1999. Generation of a widespread Drosophila inversion by a transposable element. Science 285, 415–418. Cantrell, M.A., Ederer, M.M., Erickson, I.K., Swier, V.J., Baker, R.J., Wichman, H.A., 2005. MysTR: an endogenous retrovirus family in mammals that is undergoing recent amplifications to unprecedented copy numbers. J. Virol. 79, 14698–14707. Cantrell, M.A., Scott, L., Brown, C.J., Martinez, A.R., Wichman, H.A., 2008. Loss of LINE-1 activity in the megabats. Genetics 178, 393–404. Casavant, N.C., Lee, R.N., Sherman, A.N., Wichman, H.A., 1998. Molecular evolution of two lineages of L1 (LINE-1) retrotransposons in the california mouse, Peromyscus californicus. Genetics 150, 345–357. Casavant, N.C., Scott, L., Cantrell, M.A., Wiggins, L.E., Baker, R.J., Wichman, H.A., 2000. The end of the LINE?: lack of recent L1 activity in a group of South American rodents. Genetics 154, 1809–1817. Charlesworth, B., Sniegowski, P., Stephan, W., 1994. The evolutionary dynamics of repetitive DNA in eukaryotes. Nature 371, 215–220. Churakov, G., Sadasivuni, M.K., Rosenbloom, K.R., Huchon, D., Brosius, J., Schmitz, J., 2010. Rodent evolution: back to the root. Mol. Biol. Evol. 27, 1315–1326. Cordaux, R., Udit, S., Batzer, M.A., Feschotte, C., 2006. Birth of a chimeric primate gene by capture of the transposase gene from a mobile element. Proc. Natl. Acad. Sci. U. S. A. 103, 8101–8106. DeBerardinis, R.J., Goodier, J.L., Ostertag, E.M., Kazazian, H.H., 1998. Rapid amplification of a retrotransposon subfamily is evolving the mouse genome. Nat. Genet. 20, 288–290. Deininger, P.L., Batzer, M.A., Hutchison Iii, C.A., Edgell, M.H., 1992. Master genes in mammalian repetitive DNA amplification. Trends Genet. 8, 307–311. Deininger, P.L., Moran, J.V., Batzer, M.A., Kazazian Jr., H.H., 2003. Mobile elements and mammalian genome evolution. Curr. Opin. Genet. Dev. 13, 651–658. Dewannieux, M., Heidmann, T., 2005. L1-mediated retrotransposition of murine B1 and B2 SINEs recapitulated in cultured cells. J. Mol. Biol. 349, 241–247. Edgar, R.C., 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797. Eichler, E.E., Sankoff, D., 2003. Structural dynamics of eukaryotic chromosome evolution. Science 301, 793–797. Erickson, I.K., Cantrell, M.A., Scott, L., Wichman, H.A., 2011. Retrofitting the genome: L1 extinction follows endogenous retroviral expansion in a group of muroid rodents. J. Virol. 85, 12315–12323. Feng, Q., Moran, J.V., Kazazian Jr., H.H., Boeke, J.D., 1996. Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition. Cell 87, 905–916. Feschotte, C., 2008. Transposable elements and the evolution of regulatory networks. Nat. Rev. Genet. 9, 397–405. Filatov, D.A., Howell, E.C., Groutides, C., Armstrong, S.J., 2008. Recent spread of a retrotransposon in the Silene latifolia genome, apart from the Y chromosome. Genetics 181, 811–817. Gibbs, R.A., et al., 2004. Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 428, 493–521. Giordano, J., Ge, Y., Gelfand, Y., Abrusán, G., Benson, G., Warburton, P.E., 2007. Evolutionary history of mammalian transposons determined by genome-wide defragmentation. PLoS Comput. Biol. 3, e137. Grahn, R.A., Rinehart, T.A., Cantrell, M.A., Wichman, H.A., 2005. Extinction of LINE-1 activity coincident with a major mammalian radiation in rodents. Cytogenet. Genome Res. 110, 407–415. Gray, Y.H., 2000. It takes two transposons to tango: transposable-element-mediated chromosomal rearrangements. Trends Genet. 16, 461–468. Hardies, S.C., Wang, L., Zhou, L., Zhao, Y., Casavant, N.C., Huang, S., 2000. LINE-1 (L1) lineages in the mouse. Mol. Biol. Evol. 17, 616–628. Hasler, J., Samuelsson, T., Strub, K., 2007. Useful ‘junk’: Alu RNAs in the human transcriptome. Cell. Mol. Life Sci. 64, 1793–1800.

R.N. Platt II, D.A. Ray / Gene 500 (2012) 47–53 Helgen, K.M., Cole, F.R., Hlegen, L.E., Wilson, D.E., 2009. Generic revision in the holarctic ground squirrle genus Spermophilus. J. Mammal. 90, 270–305. Herron, M.D., Castoe, T.A., Parkinson, C.L., 2004. Sciurid phylogeny and the paraphyly of Holarctic ground squirrels (Spermophilus). Mol. Phylogenet. Evol. 31, 1015–1030. Horner, D., Lefkimmiatis, K., Reyes, A., Gissi, C., Saccone, C., Pesole, G., 2007. Phylogenetic analyses of complete mitochondrial genome sequences suggest a basal divergence of the enigmatic rodent Anomalurus. BMC Evol. Biol. 7, 16. Huchon, D., et al., 2007. Multiple molecular evidences for a living mammalian fossil. Proc. Natl. Acad. Sci. U. S. A. 104, 7495–7499. Ivics, Z., Hackett, P.B., Plasterk, R.H., Izsvák, Z., 1997. Molecular reconstruction of sleeping beauty, a Tc1-like transposon from fish, and its transposition in human cells. Cell 91, 501–510. Jurka, J., 1995. Origin and evolution of Alu repetitive elements. In: Maraia, R.J. (Ed.), Impact of Short Interspersed Elements (SINEs) on the Host Genome. Landes Company, Austin, TX, pp. 25–41. Jurka, J., Kapitonov, V.V., Pavliceck, A., Kolonowski, P., Kohany, O., Walichiewicz, J., 2005. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110, 462–467. Kajikawa, M., Okada, N., 2002. LINEs mobilize SINEs in the eel through a shared 3′ sequence. Cell 111, 433–444. Kazazian Jr., H.H., 2004. Mobile elements: drivers of genome evolution. Science 303, 1626–1632. Khan, H., Smit, A., Boissinot, S., 2006. Molecular evolution and tempo of amplification of human LINE-1 retrotransposons since the origin of primates. Genome Res. 16, 78–87. Kimura, M., 1980. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16, 111–120. Kohany, O., Gentles, A., Hankus, L., Jurka, J., 2006. Annotation, submission and screening of repetitive elements in Repbase: RebaseSubmitter and Censor. BMC Bioinformatics 25. Kumar, S., Subramanian, S., 2002. Mutation rates in mammalian genomes. Proc. Natl. Acad. Sci. U. S. A. 99, 803–808. Lander, E.S., et al., 2001. Initial sequencing and analysis of the human genome. Nature 409, 860–921. Le Rouzic, A., Boutin, T.S., Capy, P., 2007. Long-term evolution of transposable elements. Proc. Natl. Acad. Sci. U. S. A. 104, 19375–19380. Lim, J.K., Simmons, M.J., 1994. Gross chromosome rearrangements mediated by transposable elements in Drosophila melanogaster. Bioessays 16, 269–275. Lindblad-Toh, K., et al., 2011. A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478, 476–482. Lonnig, W.E., Saedler, H., 2002. Chromosome rearrangements and transposable elements. Annu. Rev. Genet. 36, 389–410. Lyon, M.F., 1998. X-chromosome inactivation: a repeat hypothesis. Cytogenet. Genome Res. 80, 133–137. Malik, H.S., Burke, W.D., Eickbush, T.H., 1999. The age and evolution of non-LTR retrotransposable elements. Mol. Biol. Evol. 16, 793–805. Martin, S.L., Bushman, F.D., 2001. Nucleic acid chaperone activity of the ORF1 protein from the mouse LINE-1 retrotransposon. Mol. Cell. Biol. 21, 467–475. Mathias, S., Scott, A., Kazazian, H., Boeke, J., Gabriel, A., 1991. Reverse transcriptase encoded by a human transposable element. Science 254, 1808–1810. Mathiopoulos, K.D., della Torre, A., Predazzi, V., Petrarca, V., Coluzzi, M., 1998. Cloning of inversion breakpoints in the Anopheles gambiae complex traces a transposable element at the inversion junction. Proc. Natl. Acad. Sci. U. S. A. 95, 12444–12449. Matlik, K., Redik, K., Speek, M., 2006. L1 antisense promoter drives tissue-specific transcription of human genes. J. Biomed. Biotechnol. 2006, 71753. McCormack, J.E., Faircloth, B.C., Crawford, N.G., Gowaty, P.A., Brumfield, R.T., Glenn, T.C., in press. Ultraconserved elements are novel phylogenomic markers that resolve placental mammal phylogeny when combined with species tree analysis. Genome Res. doi:10.1101/gr.125864.111 Mikkelsen, T.S., et al., 2007. Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences. Nature 447, 167–177. Morrish, T.A., et al., 2002. DNA repair mediated by endonuclease-independent LINE-1 retrotransposition. Nat. Genet. 31, 159–165. Murphy, W.J., Pringle, T.H., Crider, T.A., Springer, M.S., Miller, W., 2007. Using genomic data to unravel the root of the placental mammal phylogeny. Genome Res. 17, 413–421. Nigumann, P., Redik, K., Matlik, K., Speek, M., 2002. Many human genes are transcribed from the antisense promoter of L1 retrotransposon. Genomics 79, 628–634. O'Donnell, K.A., Boeke, J.D., 2007. Mighty Piwis defend the germline against genome intruders. Cell 129, 37–44. Pace, J.K., Gilbert, C., Clark, M.S., Feschotte, C., 2008. Repeated horizontal transfer of a DNA transposon in mammals and other tetrapods. Proc. Natl. Acad. Sci. U. S. A. 105, 17023–17028. Pagan, H.J., Smith, J.D., Hubley, R.M., Ray, D.A., 2010. PiggyBac-ing on a primate genome: novel elements, recent activity and horizontal transfer. Genome Biol. Evol. 2, 293–303. Peaston, A.E., et al., 2004. Retrotransposons regulate host genes in mouse oocytes and preimplantation embryos. Dev. Cell 7, 597–606. Perelman, P., et al., 2011. A molecular phylogeny of living primates. PLoS Genet. 7, e1001342.

53

Petrov, D.A., 2001. Evolution of genome size: new approaches to an old problem. Trends Genet. 17, 23–28. Price, A.L., Jones, N.C., Pevzner, P.A., 2005. De novo identification of repeat families in large genomes. Bioinformatics 21, i351–i358. Pritham, E.J., Feschotte, C., 2007. Massive amplification of rolling-circle transposons in the lineage of the bat Myotis lucifugus. Proc. Natl. Acad. Sci. 104, 1895–1900. Ray, D.A., et al., 2008. Multiple waves of recent DNA transposon activity in the bat, Myotis lucifugus. Genome Res. 18, 717–728. Rebuzzini, P., et al., 2009. Quantitative variation of LINE-1 sequences in five species and three subspecies of the subgenus Mus; and in five Robertsonian races of Mus musculus domesticus. Chromosome Res. 17, 65–76. Rinehart, T.A., Grahn, R.A., Wichman, H.A., 2005. SINE extinction preceded LINE extinction in sigmodontine rodents: implications for retrotranspositional dynamics and mechanisms. Cytogenet. Genome Res. 110, 416–425. Sánchez-Gracia, A., Maside, X., Charlesworth, B., 2005. High rate of horizontal transfer of transposable elements in Drosophila. Trends Genet. 21, 200–203. Saxton, J.A., Martin, S.L., 1998. Recombination between subtypes creates a mosaic lineage of LINE-1 that is expressed and actively retrotransposing in the mouse genome. J. Mol. Biol. 280, 611–622. Scott, L.A., Kuroiwa, A., Matsuda, Y., Wichman, H.A., 2006. X accumulation of LINE-1 retrotransposons in Tokudaia osimensis, a spiny rat with the karyotype XO. Cytogenet. Genome Res. 112, 261–269. Seto, A.G., Kingston, R.E., Lau, N.C., 2007. The coming of age for Piwi proteins. Mol. Cell 26, 603–609. Slotkin, R.K., Martienssen, R., 2007. Transposable elements and the epigenetic regulation of the genome. Nat. Rev. Genet. 8, 272–285. Smit, A.F.A., Hubley, R., Green, P., 2004. RepeatMasker Open-3.0. Speek, M., 2001. Antisense promoter of human L1 retrotransposon drives transcription of adjacent cellular genes. Mol. Cell. Biol. 21, 1973–1985. Steppan, S.J., et al., 1999. Molecular phylogeny of the marmots (Rodentia: Sciuridae): tests of evolutionary and biogeographic hypotheses. Syst. Biol. 48, 715–734. Steppan, S.J., Storz, B.L., Hoffmann, R.S., 2004. Nuclear DNA phylogeny of the squirrels (Mammalia: Rodentia) and the evolution of arboreality from c-myc and RAG1. Mol. Phylogenet. Evol. 30, 703–719. Swergold, G.D., 1990. Identification, characterization, and cell specificity of a human LINE-1 promoter. Mol. Cell. Biol. 10, 6718–6729. Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M., Kumar, S., 2011. MEGA5: Molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol. Biol. Evol. 28, 2731–2739. Teeling, E.C., Springer, M.S., Madsen, O., Bates, P., O'Brien, S.J., Murphy, W.J., 2005. A molecular phylogeny for bats illuminates biogeography and the fossil record. Science 307, 580–584. Thomas, J., Sorourian, M., Ray, D., Baker, R.J., Pritham, E.J., 2011. The limited distribution of Helitrons to vesper bats supports horizontal transfer. Gene 474, 52–58. Veniaminova, N., Gogolevsky, K., Vassetzky, N., Kramerov, D., 2007a. Comparative analysis of the copy number of ID and B1 short retroposons in rodent genomes. Mol. Biol. 41, 986–989. Veniaminova, N.A., Vassetzky, N.S., Kramerov, D.A., 2007b. B1 SINEs in different rodent families. Genomics 89, 678–686. Venner, S., Feschotte, C., Biémont, C., 2009. Dynamics of transposable elements: towards a community ecology of the genome. Trends Genet. 25, 317–323. Volff, J.-N., Korting, C., Schartl, M., 2000. Multiple lineages of the non-LTR retrotransposon Rex1 with varying success in invading fish genomes. Mol. Biol. Evol. 17, 1673–1684. Waterston, R.H., et al., 2002. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562. Weil, C.F., Wessler, S.R., 1993. Molecular evidence that chromosome breakage by Ds elements is caused by aberrant transposition. Plant Cell 5, 515–522. Wicker, T., et al., 2007. A unified classification system for eukaryotic transposable elements. Nat. Rev. Genet. 8, 973–982. Wilson, D.E., Reeder, D.M., 2005. Mammal species of the world. A Taxonomic and Geographic Reference. Johns Hopkins Univeristy Press. 3. Witherspoon, D., Xing, J., Zhang, Y., Watkins, W.S., Batzer, M., Jorde, L., 2010. Mobile element scanning (ME-Scan) by targeted high-throughput sequencing. BMC Genomics 11, 410. Xing, J., Hedges, D.J., Han, K., Wang, H., Cordaux, R., Batzer, M.A., 2004. Alu element mutation spectra: molecular clocks and the effect of DNA methylation. J. Mol. Biol. 344, 675–682. Yoder, J.A., Walsh, C.P., Bestor, T.H., 1997. Cytosine methylation and the ecology of intragenomic parasites. Trends Genet. 13, 335–340. Zemojtel, T., Penzkofer, T., Schultz, J., Dandekar, T., Badge, R., Vingron, M., 2007. Exonization of active mouse L1s: a driver of transcriptome evolution? BMC Genomics 8, 392. Zhang, J., Peterson, T., 2004. Transposition of reversed Ac element ends generates chromosome rearrangements in maize. Genetics 167, 1929–1937. Zhao, F., Qi, J., Schuster, S.C., 2009. Tracking the past: interspersed repeats in an extinct Afrotherian mammal, Mammuthus primigenius. Genome Res. 19, 1384–1392.