Transcriptional regulation and transpositional selection of active SINE sequences Carl Schmid and Richard Maraia University of California, Davis, California and National Institute of Child Health and Human Development, NIH, Bethesda, Maryland, USA Alu repeats are short interspersed elements whose transpostition has lead to genetic variability and heritable disorders in humans. A select subset of the nearly one million Alu sequences in human DNA actually produce new transpositions. The evolution of newly inserted Alu repeats is currently a key subject for study. Mechanisms of RNA polymerase III activity and the sequence environment into which an Alu inserts might select for transcriptional and posttranscriptional determinants of Alu transposition. Current Opinion in Genetics and Development 1992, 2:874-882
Introduction Alu repeats are small transposed elements endogenous to the human genome. Nearly 10 6 dispersed copies constitute about 5% of the genetic material. Althot,gh function is unknown, their propensitT for mobility has contributed to genetic variability and caused heritable disorders in humans. Human Alu and mouse B1 are homologous short interspersed elements (SINEs) that have each been classifted into subl'~llilies of different evolutional3, age. Old Alus are fixed in the genome and have accumulated substantial sequence diversity, while young Alus are currently mobile and of little sequence divergence. Subfamilies result from distinct source genes that appear to be encoded by a limited number of dispersed fanaily members. Here, we first review structural features of SINEs and the hypothetical model of their retrotransposition, which involves transcription by RNA polymerase III (pol III ). Next we focus on the evolution of Alu and B1 elements, which conform to a succession of new subfamilies as opposed to a continual expansion by old seqL,ences. We also examine properties of the young subfamilies that contain transpositionally active members, and address two alternative hypotheses regarding the possible number and types of active genes that encode new transpositions. We then discuss pol Ill-mediated transcription of Alu and B1. Finally, we speculate on how certain new SINEs might insert into regions of the genome that could influence their ability to produce subsequent retrotransposons. Specifically, mechanisms that enhance transcription through local chromatin structure, hypomethylation, and the acquisition of 5' promoters and 3' terminators that m W also affect RNA folding, self-priming, posttranscriptional processing and poly(A) metabolism will be discussed.
Retrotranspositional potential of RNA pol III directed SINE transcripts The nearly one million interspersed Alt, repeats share a consensus structure that strongly' suggests they 'retro'trm'lsposed through an RNA intermediag, [1,2J. They contain Y A-rich tails and are flanked by direct repeats. Also, their precisely defined 5' end corresponds to the start site for RNA pol III. Thus, it appears to be of reasonable certainty that Alu repeats are retrotransposed pol 111 transcripts and here we consider only that possibilit3,. Pol III transcribes Alu repeats t',z vilro by virtue of their internal promoters that upon retro-insertk)n might generate additional pr(geny leading to an exponential pol> ulation growdl [3,-~]. (Fortt'nately, as discussed below, this potential is not full), realized.) Pol III terminates transcription in a run of Ts resulting in a transcript that ends with oligo-kl [3,'+]. Conceival)ly, tile terminal oligoU tract primes reverse transcriptase by base pairing with the A-rich tail. These sequence features prompted an ingenk)us nlodel to account for the abundance of SINEs
[3,4] (Fig. 1).
Successive evolution of Alu subfamilies While this review emphasizes tile transcription and transposition of Alu and B1 repeats, they are not especi~dly active by either criteria. Most Alu repeats in primates were fixed in the ancestral genome and have increased in number by perhaps 0.2% since the divergence of ape and human lineages [5,6"]. Also, stable Alu and BI
Abbreviations IAP--intracisternal A-particle; LINE--long interspersed element; mc--5-methyl cytosine; pol III--polymerase III; RT--reverse transcriptase; SINE--short interspersed element.
874
C) Current Biology Ltd ISSN 0959-437X
SINE sequences Schmid, Maraia
,iJ--~::~~
B
DNA I I
Poly(A)
lllhlliTTTT I
DR
I
Transcription by RNA pol III
~- RNA
5'
AAAAAAAAAGCf ~
[ ' ~
DR
Differential folding[
3' UUUCG-.J' "~
{,~
Differential reverse A
I
•
A
B
Poly(A)
PA
~
IIIIIIFI
L-I~/
;
~
New
insertion
I~
Site duplication
Subsequent rounds of retrotransposition
A
B
transcription/ Poly(A)~/
\
Differential \ 3' processing
Poly(A) addition? " " Fully processed RT priming? cytoplasmic RNA
Fig. 1. Hypothetical retrotransposition cycle of a consensus SINE. SINEs are flanked by short direct repeats (DR), which vary in base sequence among individual SINEs and represent a duplication of the SINE insertion site (reviewed in [1,2]). An A-rich tract, Poly(A), which is pure A in young SINEs (see text), precedes the 3' DR. The A and B boxes (diagonal hatching) of the internal promoter direct RNA pol III to initiate transcription at the first base of the SINE (bent arrow) and proceed through the 3' DR until it terminates at a stretch of four or more dT residues. This produces a primary transcript containing an oligo(U) terminus that may base pair with the A-rich tract to form an intramolecular primer for reverse transcriptase [3,4]. Adjacent G-C base pairs, which would be contributed by the variable insertion sites of different SINEs, could differentially stabilize the A-U self primer. Phylogenetic sequence/structure data indicate that only those RNAs that can fold into a conserved tRNA-like cruciform structure are successfully retrotransposed [54,56..]. Alternatively, the primary transcript may undergo 3' processing that removes the A-rich and oligo(U) tracts from the SINE core sequence [51]. According to the above model, such processed transcripts would no longer 'self-prime' reverse transcriptase (RT) and would enter the retrotransposition pathway (if at all) by less direct mechanisms. Once reverse copied, the SINE DNA may insert into an 'active' genomic locus and might be capable of generating subsequent pol Ill mediated retrotranspostions. Alu and B1 repeats are specific examples of mammalian SINEs that in other species include non-homologous sequences which nonetheless share the consensus SiNE retrotransposon structure.
transcripts are present at low copy number in tissue-culture cells indicating that the majority are transcriptionally inactive [7,8,9",10]. These properties contrast with those of young B1 and Alu subfamilies.
Young and younger subfamilies Five laboratories classified Alu repeats into distinct subfamilies, and identified a putatively young subfamily (called Precise here), which comprises approximately 10% of Alu repeats in human DNA (Fig. 2) [11-15]. Bls have also been classified into subfamilies of different evolutionary age [ 16].
These advances lead multiple research groups to the conclusion that Alu and B1 evolution occurred through a succession of subfamilies rather than by continual exponential expansion. Alu classification was further extended by realizing that two Alus that are dimorphic (present at a particular
locus in some individuals but absent at the same locus in others) in the human population share five mutations relative to the Precise consensus (Fig. 1). This younger subfamily is called PV here and HS in the Deininger laboratory publications [8,17-19]. Oligonucleo~ide hybridization experiments targeting two of the free 'diagnostic' PV bases demonstrated first, that the five diagnostic mutations were tightly linked in individual PV Alus and second, that they have expanded to about one thousand members during the radiation of the human population following the divergence of the gorilla and chimpanzee lineages (Figs. 2 and 3) [8,17,19,20"]. Unlike most Alus, a substantial fraction ( --, 10%) of individual PV Alus are dimorphic at a particular locus (Fig. 3), and all demonstrate very low sequence divergence from their consensus (Fig. 2). These characteristics further indicate their youth. Recently, a gennline Alu insertion into an individual's neurofibromatosis gene, NF1, demonstrated that some Alu repeats are presently mobile [21.]. The sequence
875
876
Genomes and evolution
i0 ~
G
20
C
30
GGTGGCTCAC
40
GCCTGTAATC
50
60
~
CCAGCACTTT
70
~
PRECISE Cons Mlvi PV 92 APO
TCACGAGGTC
T ..................................................................... .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . . . .
°
. . . . . .
° . °
.
.
.
.
.
.
.
.
.
.
.
o o .
........................................................ .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . .
. . .
T .............
.
.
.
.
.
.
.
.
.
.
.
.
.
.
°.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . . .
. . . °
T P A
PV71 Truncated
.....................................................
.......................................
~
G
C
GGTGGCTCAC
•
GCCTGTAATC
. . . . . . . . . . . . . . . . . . . . ..i ......... :]]...]: ...... . . . . . . . . . . . . . . . . . . . . . .
80
•
CCAGCACTTT
:::[
:[]:::]:::
G
A
~
:[::::::]:
i00
:]::::T:[:
ii0
GCTAACACGG
GACCATCCTG
~
PV 83 NFI
°° .............
TCACC~kGGTC
PV Cons
:::T:::::[
C1 I N H CHE
AFP
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
90
AGGAGATCGA
...............
120
TGAAACCCCG
130
TCTCTACTAA
140
AAATA-CAAAA
..................
C.
A ........... - ................. . . . . . A . . . C ....... T . . . . . . . . . . . . . . . . . ..... A .............................
..................
C.
..... A ...........................
C..- ........
- ......
..................
Co
..... A ..........................
C..- ........
-- . . . . . .
..................
C.
..... A .............................
.A ................
C.
..... A ............................. GCTAAAACGG TGAAACCCCG TCTCTACTAA
..................
C.
..................
C.
AGGAGATCGA
.....
GACCATCCCG
..................
T°
..................
T.
..................
r.
150
..... ..... .....
C ...................... C,A ...................... C.T ...........................
160
GCGTGGTGGC
170
GGGCGCCTGT
-...A ........... .... A ........... .... A ...........
200
CTGAGGCAGG
PV Cons C1 I N H CHE AFP
210
AGAATGGCGT
PRECISE Cons Mlvi PV92 APO
GAACCCGGC4k
T .................................... T .................................... T ....................................
TG ......
T ........................
T .......
T .................................... T ...............................
.... A ............................
T ....................................
AGTCCCAGCT
ACTTGGGAGG
.... G..A .........................
C
~
G
G
TPA
A ...........
....... - .........................
PV71 T ....
AGAA~GT
PV83 NFI
GAACCCGGGA
PV
C ................ G ................... G .................................... C ................................ T...
•..CG
............................ GA...T .......................
....
NF]
- ............... - ............... - .... [A2] ........
.... A..-- .................
GGC42GCCTGT
PV71 PV 83
- ..... A .......... AAATA-CAAAA AATTAGCCGX2
.... A ....................
GCGTAGTGGC
TPA
- ...............
190
ACTCC-GGAF~
. ............... : ................ -- . . . . . . . . . . . . . . . .
A ............... - ............... - ...............
[A5] ..... A .....
180
AGTCCCAGCT
PRECISE Cons Mlvi P V 92 APO
AATTAGCCGG
220 230 240 250 260 270 280 ~ G C T T GCAGTGAGCC G A G A T ~ CACTGCACTC CAGC~TGGGC GACAGAGCGA GACTCCGTCT C ..... --. .................. C ............................................ .......................... CT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.......................... .......................... .......................... .......................... ~ G C T T
GCAGTGAGCC
...... CG . . . . . . . . . . . . . . . . . . ......................... .........................
.
.
.
.
C
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
C ....................... A .................... C ...................... T ................ C ............................................ C ............................................ GAGATCCCGC
CACTGCACTC
CAGG~TGGGC
GACAGAGCC~K
.
.
PRECISE Cons Mlvi PV92 A P O
A ....
GACTCCGTCT
G ........................................... TGT . . . . . . . . . G...9 ............................. TG . . . . . . . . . . . . . . C ........ C ....................
Cons
C] I N H CHE AFP
C
-
I'PA PV 71 PV 83 NF1 P V Cons C1 I N H CHE AFP
Fig. 2. The h u m a n - s p e c i f i c Precise a n d PV s u b f a m i l y Alus. M l v i , PV 92, APO, TPA, NF1, C1 INH and CHE Alus a r e y o u n g b y t h e criteria t h a t t h e y are n o t fixed in t h e h u m a n p o p u l a t i o n and are a b s e n t f r o m t h e ape g e n o m e [21°,23] ( r e v i e w e d in [17]). O f t h e r e m a i n i n g three, w h i c h are fixed in t h e h u m a n p o p u l a t i o n , PV 83 a n d AFP are e a c h a b s e n t f r o m t h e a p e g e n o m e [17,61], w h i l e t h e p h y l o g e n y o f PV 71 is u n r e s o l v e d [17]. T h e h u m a n - s p e c i f i c Alus b e l o n g t o e i t h e r o f t w o groups, t h e PV o r Precise subfamilies, as d i s t i n g u i s h e d b y five t i g h t l y linked m u t a t i o n s . T h r e e m u t a t i o n s shared b y t h e TPA a n d PV 71 Alus i d e n t i f y a m i n o r v a r i a n t o f t h e PV s u b f a m i l y . D o t s r e p r e s e n t a base i d e n t i c a l t o a consensus, a l e t t e r i n d i c a t e s a d i f f e r e n t base, b r a c k e t s i n d i c a t e a l e n g t h m u t a t i o n in an A run, and t h e n u m b e r 9 indicates an i n s e r t i o n o f n i n e n u c l e o t i d e s . T o e n h a n c e c o n t r a s t , t h e seven PV Alus are c o m p a r e d t o t h e Precise s u b f a m i l y consensus, w h e r e a s t h e t h r e e Precise Alus are c o m p a r e d t o t h e PV consensus.
of this youngest Alu differs from the PV consensus by only two nucleotides (Fig. 2). This remarkable sequence
agreement confirms that the PV subfamily was correctly identified.
SINE sequences Schmid, Maraia
PV and Precise founders
AFP (Precise) Beta Hb (Precise)
Chimp PV (PV maj)
S
....----
t Fixed
PV maj and min (most (PV 83) m
CHE (Precise) ~C1
INH (Precise)
~ ( P V
ma D
m
Unfixed (dimorphic)
p~(PV min) PV 92 (PV maj)
NF1 ( P V <
Gorilla
Chimpanzee
Human
Human
Human
Fig. 3. Phylogenetic tree depicting the mobilization of young Alu repeats. The appearance of a sequence before a node indicates its inheritance in all ensuing descendants. The three branches for human schematically depicts the three possibilities for any Alu repeat: fixed by virtue of preceding the populations radiation (e.g. AFP), dimorphic by virtue of insertion during the population's radiation (e.g. CHE), and loci that are fixed for the absence of an Alu (e.g. NF1) that is present in a single or few individuals. The Precise and PV subfamily Alu sequences are differentiated {also see Fig. 2 and Leeflang et al. [6°*]). The majority of PV Alus are fixed in the human population as are the majority of the older Precise subfamily members. However, the contemporary transpositional activity of the Precise subfamily approximates that of the PV subfamily; dimorphisms in the Precise subfamily are more difficult to detect against the background of many fixed members in a far larger subfamily. Similarly, the transpositional activity of the far larger and far older Major subfamily (not depicted here) might be considered to be an open question. Abbreviations include: PV, predicted variant; maj, major; min, minor.
Single or multiple founder genes? Although older Alus differ by mutations that mostly accumulated after retroinsertion, the following suggests that in part the sequence diversity of young Alus reflects a diversity,of source genes. A minority of PV Alus share three linked mutations forming a sub-subfamily (Fig. 2) [17,19]. This PV-minor subfamily presumably results from a source sequence(s) distinct from the source of major PVs. Additionally, two Precise subfamily members are dimorphic in the genome [6°°,22,23] arguing that they also resulted from a contemporaneously active source sequence (Fig. 3). Analysis of other Mu subfamilies also indicates the existence of a multiplicity of founder sequences [24]. Thus, at least three distinct sequences served as sources during human evolution. As discussed below, these different source sequences could either represent multiple alleles of a single locus or different loci. New SINEs might be encoded by a single 'master gene' locus; a conventional gene(s) [20"°]. Mternatively, multiple dispersed SINEs might serve as founders subsequent to their insertion as discussed below and in Fig. 1.
The example of human 7SL RNA genes and transposed pseudogenes provides strong precedent for the single locus model. Human 7SL pseudogenes out-number 7SL genes by 100 to 1 [25]. Some of these adopted the consensus retrotransposon structure as evidenced by their 3' A-rich tails, flanking direct repeats, internal pol 111 promoters, and 5' ends corresponding to pol HI initiation. Apparently, like most Alu's, these are transcriptionally quiescent #7 vivo. Unlike the 7SL pseudogenes however, the authentic 7SL RNA gene is efficiently transcribed i,7 vivo from an essential 5' promoter and contains a po1111 terminator at its 3' end but not the A-rich tail and flanking direct repeats associated with dispersed 7SL pseudogenes [25,26]. The evolution of primate Alu subfamilies is easily reconciled with successive mutations of a single designated locus [20°°]. Shen et al. attribute the evident diversity of source sequences to multiple alleles of a single 'master gene' locus. If these alleles themselves evolve over time, each could theoretically be the source of a succession of distinctive transpositions [27]. However, this interpreta-
877
878
Genomes and evolution tion requires an unprecedented level of allelic variation [6-]. Leeflang et al. observe a chimpanzee lineage-specific PV Alu in addition to a gorilla lineage-specific PV Alu (Fig. 3). This demonstrates that PV and Precise source sequences both predate the divergence of gorilla, chim. panzee and human lineages and were separately active in each divergent lineage. Because it is unlikely that multiple allelic variants of a single locus would survive the speciation bottleneck in three separate lineages [6"], we conclude that there are multiple source genes. A transposed L1 member has been traced back to its apparent source, which is an interspersed element having an A-rich tail and flanking short direct repeats [28-]. Despite significant differences between Alu and L1 repeats, this example demonstrates the possibility of dispersed retrotransposons serving as founder genes and implies a multiplicity of possible sources (Fig. 1). The diversity of pol III initiated B1 transcripts provides direct evidence for multiple transcriptional loci [ 9 - ] . The complexity of these transcripts represents a fraction of the entire family's complexity, as young subfanlily transcripts are over-represented relative to other subfamilies [9"]. Similarly, pol III initiated cellular Alu transcripts have been mapped to multiple dispersed loci (R Maraia and G Darlington, unpublished data). Thus, regulator}, mechanisms appear to preferentially select certain dispersed elements for transcriptional activity. Additional mechanisms may further select for the transpositionally active loci that found distinct subfamilies.
Regulated transpositional expression Regulation in biological systems is usually accomplished at several levels, each of which effects the final result. The multistep retrotranspositional pathway provides numerous mechanisms to regulate the activity of potential source genes. We speculate here on steps that might select successful source genes from the background of the entire family.
Chromatin context Because retrotransposition must occur in the germ-cell lineage to be inherited, cell type specific SINE transcription could limit source gene selection. It is now accepted that transcription is associated with active chromatin. Slagel and Deininger [29] showed that transcription of a transfectecl SINE is co-induced with a c/s-linked reporter gene, supporting the notion that the transcriptional context of neighboring genes may also determine the activity of a SINE. Thus, one criteria for source gene selection might be residence near transcription units expressed in the germ-cell lineage.
Methylation CpG dinucleotides are often methylated at the 5 position of cytosine. Such methylation is known to repress
transcription from pol III dependent tRNA and VA1 RNA genes in vitro [30,31]. Human PV Alus are very rich in CpG dinucleotides, as are young Bls [13,15,16,32"]. Alu repeats in human spleen are extensively methylated [32°.], while Alus in human sperm are highly undermethylated and their methylation is developmentally regulated (C Schmid, unpublished data). Also, small B1 RNA levels are high in testes [9"]. Thus, it is plausible that the methylation of SINEs provides a direct mechanism to differentially regulate in vivo transcription. This mechanism also appears to be used by long interspersed elements (LINEs). Activity of intracistemal A-particle (lAP) repeats is sensitive to methylation, as are certain L1 promoters •33,34]. A secondary consequence of 5-methyl cytosine (mc) modification is the rapid rate of mutation of CpG dinu. cleotides in Alu and B1 to TpG [34] by mC-mediated C to T transitions [13,15,16]. CpG dinucleotides reside within the internal promoters of B1 and Alu repeats [9",35]. Conceivably, transcription becomes depressed as aging Alus lose these important cis-acting sequences to mutation [32"]. In agreement with this notion, Daniels and Deininger [36] have shown that of the three different SINE families present in the primate galago, the one with a CpG in its B box promoter (type II fanlily) is the most (transpositionally) proliferative and the most transcriptionally active in vitro, as compared with the other two fanlilies (monomer and type 1) that contain CpA and TpG, respectively, in this position. Furthermore, two old Mus fail to support in vitro transcription by pol II1 (C Schmid, unpublished data) as does a substitution involving the CpG within the B box promoter of an otherwise active B1 [37°°]. In summary, CpG methylation may limit transcriptional and therefore transpositional competence and ultimately lead to the permanent silencing of old SINEs.
C/s-acting promoter elements and trans-acting fadors SV40 transformation specifically increases pol III transcription of endogenous B1 and B2 sequences above basal levels ([10,38,39] and references cited therein). This suggests that trans-acting factors act through cisacting signals that either lie within or flank these SINEs. In the former cases, all accessible retrotransposed SINEs would be activated, whereas in the latter cases, only a minority of elements that reside within flanking promoter range would be expressed. While the identity of the putative flanking elements necessary for in vivo expression is unknown, we can entertain the possibility of an Alu inserting near such an element. The promoters of the U6, 7SK and selenoc3,steyl-tRNA(Ser)Sec genes all include a simple TATA element, located approximately 25bp upstream from transcription initiation, which is both necessary for pol III transcription and enhanced by other c/s-acting elements [40-44]. In the case of the selenocysteyl-tRNA(Ser)Sec gene, the upstream promoters control transcription, while the internal A and B boxes are apparently idle [44]. Also, pol III mediated transcription of BmX, a silkworm SINE, is strongly dependent on flanking TATA box containing sequences and
SINE sequences Schmid, Maraia to a lesser extent the internal B box promoter [45]. This provides direct evidence that certain SINEs utilize flanking transcriptional control elements. Interestingly, a TATA element is positioned approximately 25bp upstream from a young B1 that is transcribed in transfected cells [46]. Also, a chimpanzee PV Alu is preceded by a TATA-Iike element [6",]. Thus, by inserting near a short sequence motif that resembles a known promoter, certain new SINEs might acquire enhanced transcriptional advantage over others that would contain a relatively inactive internal pol III promoter.
RNA processing and Poly(A) metabolism Just as the posttranscriptional fate of an mRNA determines the level of gene expression, similar events could modulate the transpositional activity of SINEs. Individual SINEs terminate in a dA-rich sequence. Three considerations suggest that poly(A) tails are added to new transposons posttranscriptionally. First, the dA-rich tails of young B1 and Alus consist of pure stretches of about 40 dA residues, whereas older elements contain other bases [17,19,47-50]. A simple explanation of this observation is that the 3' end is formed by de novo polyadenylation prior to retroposition, and then diverges in situ into an clA-rich structure in older Alus. Second, the poly(dA) tail of a young L1 retrotransposon is longer than that of its corresponding L1 source gene [28..]. Although L1 repeats retrotranspose through pol II transcripts, this is nonetheless circumstantial evidence for poly(A) addition to an RNA intermediary in the retrotransposition pathway. Third, based on the preceding discussion concerning 7SL RNA pseudogenes, we conclude that their dA-rich tails initially resulted from posttranscriptional polyadenylation. Like 7SL, neither Mu nor B1 consensus sequences contain recognizable polyadenylation signals so their polyadenylation must involve novel biochemistry. Alternatively, the poly(dA) tail might be encoded by the source gene. However, this would appear to require selection of source genes that contain only pure dA tails (i.e. young SINE~) and/or a correctional mechanism to maintain pure dA tails in source genes but not typical Alus. Pol III transcription of young Alu SINEs produce a primary transcript that contains a poly(A) tail (R Maraia and K Hsu, unpublished data). Until a bona fide source gene is identified and its transposition examined, poly(A) metabolism remains to be determined. Certain young B1 transcripts generated in microinjected Xenopus nuclei and in mammalian nuclear extracts are precisely processed by an activity that removes their poly(A) and 3' oligo(U) tracts [51,52]. Most B1 primary transcripts are not processed however, suggesting that sequence differences may radically alter processing [9"]. In fact, subtle base changes around the transcriptional terminator dramatically affects posttranscriptional processing [37"] and perhaps transcript half-life and nucleocytoplasmic transport as well. As the pol III termination signal is not contained within B1 and Alu ele-
ments (Fig. 1), the 3'-sequence environment into which they insert could determine the efficiency of processing. Presumably, a subset of Alu transcripts undergo processing similar to B1. Although some young Alu transcripts undergo 3' processing to generate a dimeric RNA that lacks poly(A) (R Maraia and K Hsu, unpublished data) the major pol III initiated cellular Alu RNA corresponds to 3'-processed transcripts of left Alu monomers [8,53"]. However, these 'fully processed' monomeric Alu transcripts appear not to enter the human retrotransposition pathway, which is apparently dominated by dimeric Alu transcripts. Remembering the early model for Alu retrotransposition (Fig. 1), 3' processing might drastically reduce the potential for retrotransposition [37"]. Alternatively, as discussed above, posttranscriptional polyadenylation might be a required step in retrotransposition favoring the most readily processed transcripts. Both models are consistent with the observation that small cytoplasmic B1 RNAs correspond to processed transcripts of exclusively young sequences and are abundant in testes [9"]. Regardless of which alternative is correct, these findings illustrate that retrotranspositional potential might be determined by the base composition near the transcriptional terminator.
The influence of 3' flanking sequence on self-priming Likewise, the 3'-sequence composition around the direct repeat and transcriptional terminator could influence the potential for any given SINE primary transcript for reverse transcription in the absence of processing. Sequences that could form G-C base pairs adjacent to the oligo(U)-poly(A) base pairs could significantly stabilize the latter, thereby increasing their effectiveness as a self-primer for reverse transcriptase (see Fig. 1). In summary, the insertion site for a SINE could determine the efficiencies of both transcriptional initiation and termination as well as self-priming, 3' processing, and poly(A) metabolism, and n~ght impose advantages or limitations on its subsequent retrotranspositional potential.
Structure recognition and other posttranscriptional determinants of retrotranspositional potential Specific structural motifs present in Mu and B1 RNA have been conserved throughout their evolution [9"',54,55", 56"]. The Alu RNA domain first identified in 7SL RNA is a complex secondary structure that was conserved in B1 and Alu during significant divergence in primary sequence [9",54,55",56"]. This argues that this specific structural motif in the RNAs facilitates retroposition [24,54,55",56"]. Although it is too early to know if the transposition machinery interacts with this motif, its importance nonetheless imposes another level of source gene selection. A novel polypeptide was recently identiffed that binds specifically to small cytoplasmic B1 and Alu transcripts in vitro [53"]. Although enzymes such as DNA nucleases, polymerases, topoisomerases and ligases are all likely participants
879
880
Genomesand evolution in retrotransposition, these are unlikely to be selective determinants of retrotransposition. In contrast, reverse transcriptase (RT) is a potentially limiting factor; presumably only those Alus that are expressed coordinately with RT could retrotranspose. RT might be supplied endogeneously as a product of L1 or lAP repeats [33,57o], which appear to be the most likely mediators of SINE retrotransposition, or exogeneously by retroviral infection. However, the recent identification of RT actMty in the enzyme that synthesizes telomere repeats demonstrates that 'non-conventional' RTs with inherent template capacity also exist in ceils [58]. Thus, it is a remote possibility that SINEs utilize such a non-conventional RT that could perhaps template-extend and/or edit their poly(A) tail prior to retroinsertion.
Conclusion Currently, no experimental system reconstructs Alu or B1 retrotransposition. However, several steps identified here can be tested for their proposed regulatory effect on retrotransposition. Furthemaore, the identification of source genes is becoming increasingly probable. According to the model entertained here, source genes may already be isolated but not recognized to be anymore than dispersed family members. While significant progress in our understanding of Alu and B1 retrotransposition has been achieved, our appreciation of the complexity of this process is also increasing. Our discussion here has excluded topics such as the significance of mammalian SINE retrotransposition and their possible function. Only recently have polymorphic Alus been detected among the nearly one million fixed members and yet there are already several examples of the genetic consequences associated with their retrotransposition [21.,22,23]. Similarly, the potential function(s) of Alu and B1 repeats has recently been reviewed [24,59,60] yet a clear resolution remains elusive. Our new found ability to detect discrete cellular transcripts and their binding proteins, and to discriminate between new and old family members will hopefully enable us to associate functions with these elements. The previous discussions concerning the differential methylation state and expression of new Alus illustrates such possibilities; others will soon emerge.
Acknowledgements Research on Alu repeats is supported by USPHS GM 21346 and the Department of Genetics, Agricultural Experiment Station, University of California, Davis. Our models and discussion were also motivated by the unpublished observaOons of FF Hintz, U Hellman-Blumberg, EP Leeflang, W-M Liu, K Hsu, and DY Chang and we thank them for their important contributions toward this study. We also thank G Humphrey, E Englander and B Howard for helpful discussions.
References and recommended reading Papers of particular interest, published within the annual period of review, have been highlighted as: • of special interest •• of outstanding interest
1.
SCHMIDCW, SHEN CKJ: T h e Evolution of Interspersed Repetitive DNA Sequences in Mammals and O t h e r Vertebrates. In Molecular Evohttiona O, Geneticx Edited by Maclntyre RJ. New York: Plenum Press; 1985:323-358.
2.
WEINERAM, DEININGER PL, ESTRATIADISAE: Nonviral Retroposons: Genes, Pseudogenes, and Transposable Elements Generated by the Reverse Flow of Genetic Information. Annu Rev Biochen~ 1986, 55:631-662.
3.
JAGADEESWARANP, FORGET BG, WEI~SMAN SM: Short Interspersed Repetitive DNA Elements in Eucaryotes: Transposable DNA Elements Generated by Reverse Transcription of RNA Pol III Transcripts? Cell 1981, 26:141-142.
4.
V/uN AP~qDEILS\X/, DENISON RA, BERNSTEIN LB, WEINER AM: Direct Repeats Flank Three Small Nuclear RNA Pseudogenes in the Human Genome. Cell 1981, 26:11-17.
5.
SCHMIDCW, DEKA N, MATERAAG. Repetitive H u m a n DNA: the Shape of Things to Come. In Chromosomes Eukao,otic, Prokao,otic a n d Viral. Edited by Adolph KW. Bacon Raton: CRC Press; 1990:3-29.
6. **
LEEFLANGEP, UU V/Vl, HASHhMOTOC, CHOUDt~d{Y PV, SCH,~ttD CW: Phylogenetic Evidence for Multiple Alu Source Genes. J Mol Evol 1992, 35:7-16. The PV Alu subfamily is shown to be present in great apes and is not human specific. The finding that distinct Alu sources predate h u m a n - a p e divergence argues that new Alus must be encoded by several loci rather than alleles of a single locus. 7.
PAULSONRE, SCHMID Ck~: Transcriptional Inactivity of Alu Repeats in HeLa Cells. Nucleic AcMs Res 1986, 14:6145--6158.
8.
MATERAAG, HEI.h'~bhNN U, SCHMID CW: A Transpositionally and Transcriptionally C o m p e t e n t Alu Subfamily. Mol Cell Biol 1990, 10:5424-5432.
9. ,,,
MARAtA R: The Subset of Mouse BI (Alu-Equivalent) Sequences Expressed as Small Processed Cytoplasmic Transcripts. Nucleic Acids Res 1991, 19:5695-5702. Small B1 sequences are expressed in mouse testes and cell lines. These are characterized and found to be limited to a young B1 subfamily. It is demonstrated that a select subset of B l s undergo synthesis by pol 111, posttranscriptional 3' RNA processing, and cytoplasmic accumulation. The secondary structure of this RNA was determined and found to be highly conserved. This specific structure was previously identified in 7SL signal recognition particle RNA. 10.
CAREYMF, SINGH K, BOTCHAN M, CO7.Z.AP,EIJJ NR: Induction of Specific Transcription by RNA Polymerase IIl in Transformed Cells. Mol Cell Biol 1986, 6:3068-3076.
11.
WRLARDC, NGUYEN HT, SCHMID C\V: Existence of at Least T h r e e Distinct Alu Subfamilies. J Mol Evol 1987, 26:180-186.
12.
SLAGELV, FLEMINGTON E, TRAINA-DORGE V, BRADSt-~WJR H, DEININGER Fl2 Clustering and Sub.family Relationships of the Alu Family in the H u m a n Genome. Mol Biol Evol 1987, 4:19-29.
13.
JURKAJ, SMITI-t T: A Fundamental Division in the Alu Family of Repeated Sequences. Proc Natl Acad Sci USA 1988, 85:4775-4778.
14.
BmTrEN RJ, BARON WF, STOUT DB, DAVIDSON EH: Sources and Evolution of H u m a n Alu Repeated Sequences. Proc Natl Acad Sci USA 1988, 85:4770-4774.
15.
QUENTINY: T h e Alu Family Developed Through Successive Waves of Fixation Closely Connected with Primate Lineage History. J Mol Evol 1988, 27:194-202.
16.
QUENTINY: Successive Waves of Fixation of B1 Variants in Rodent Lineage History. J Mol Evol 1989, 28:299-305.
SINE sequences Schmid, Maraia 881 17.
MATERAAG, HELI~IANN U, HINqZ MF, SCHMID CW: Recently Transposed Alu Repeats Result from Multiple Source Genes. Nucleic Acid Res 1990, 18:6019-6023.
18.
BA'IZER MA, GUDI VA, MENA JC, FOLTZ DW, HERRERA RJ, DEININGER PL: Amplification Dynamics of Human-specific (HS) alu Family 'Members. Nucleic Acid Res 1991, 19:3619-3623.
19.
BATZERMA, ~ O V GE, PdCHARDPE, SHAIKH TH, DESSELLETD, HOPPENS CL, DEININGER PL: Structure and Variability of Recently Inserted Alu Family Members. Nucleic Acids Res 1990, 18:67934798.
SHENMR, BATZERMA, DEININGER PL: Evolution of the Master Alu Gene(s). J Mol Evol 1991, 33:311-320. Mu subfamilies appear at different evolutionary times by successive accumulation of sequence changes on existing subfamilies. Diagnostic substitutions to identify distinct subfamilies are shown.
Alus are extremely methylated in somatic tissues. Consequently Alu's as floating CpG islands have a profound impact on the extent of methylation and position of these methyls in h u m a n DNA. These findings suggest function for Alus at the DNA level. 33.
KUFE EL, LUEDERS KK: T h e Intracisternal A-particle Gene Family: Structure and Functional Aspects. Adv Cancer Res 1988, 51:183-276.
34.
BIRD AP: DNA Methylation and t h e Frequency of CpG in Animal DNA. Nucleic Acid Res 1980, 8:1499-1504.
35.
PEREZ-STABLEC, SHEN C-KJ: Competitive and Cooperative Functioning of the Anterior and Posterior Promoter Ele m e n t s of an Alu Family Repeat. Mol Cell Biol 1986, 6:2041-2052.
36.
DANIELSGR, DEININGER PI; Characterization of a Third Major SINE Family of Repetitive Sequences in the Galago Genome. Nucleic Acid Res 1991, 19:1649-1656.
20.
21.
WAD.ACEMR, ANDERSEN LB, SAUkINOAM, GREGORY PE, GLOVER TW, COLLINS FS: A de novo Alu Insertion Results in Neurofibromatosis Type 1. Nature 1991, 353:864--866. A de novo Alu transposition landed in a patient's neurofibromatosis 1 gene causing that dominant disorder. The sequence of this Alu identified it as PV. •
22.
STOPPA-LYONNETD, CARTER PE, MEO T, TOSE M: Clusters of Intragenic Alu Repeats Predispose the H u m a n C1 Inhibitor Locus to Deleterious Rearrangements. Proc Natl Acad Sci USA 1990, 87:1551-1555.
23.
MURATAN1K, HADA T, YA/~MOTO Y, KANEKO T, SHIGETO Y, OHUE T, FURUYAMA J, HIGASHINO K: Inactivation of the Cholinesterase Gene by Alu Insertion: Possible Mechanism for Human Gene Transposition. Proc Natl Acad Sci USA 1991, 88:11315-11319.
24.
MARA~RJ, CHANG D-Y, \Y¢OLFFEAP, VORCE RL, Hsu K: T h e RNA Polymerase III Terminator Used by a B1-AIu Element can Modulate 3' Processing of t h e Intermediate RNA Product. Mol Cell Biol 1992, 12:1500-1506. Regulation of 3' RNA processing of a B1 primary transcript is demonstrated to be coupled to transcription termination by pol Ill. Subde differences surrounding the termination signal are unique to individual SINEs and were found to profoundly -affect 3' processing. Certain sequences stabilize the oligo(U)-terminated primary transcript, while others promote removal of the poly(A) and oligo(U) tracts.
37. ••
38.
WHITE RJ, STOTr D, PdGBY PxxrJ: Regulation of RNA Polymerase III Transcription in Response to Simian Virus 40 Transformation. EMBO J 1990, 9:3713-3721.
39.
CAREYMF, SINGH K: Enhanced B2 Transcription in Simian Virus 40-Transformed Cells is Mediated t h r o u g h the Formation of RNA Polymerase III Transcription C o m p l e x e s on Previously Inactive Genes. Proc Nail Acad Sci USA 1988, 85:7059-7063.
40.
DAS G, HENNING D, WRIGHTD, REDDY R: Upstream Regulatory Elements are Necessary and Sufficient for Transcription of a U6 RNA Gene by RNA Polymerase III. EMBO J 1988, 7:503-512.
JURKAJ, MILOSAVLJEVICA: Reconstruction and Analysis of Human Alu Genes. J Mol Evol 1991, 32:105-121.
25.
ULLU E, WEINER AM: H u m a n G e n e s and P s e u d o g e n e s for the 7SL RNA C o m p o n e n t of Signal Recognition Particle. EA°BO J. 1984, 3:3303-3310.
26.
ULLU E, WEINER AM: Upstream Sequences Modulate the Internal Promoter of the H u m a n 7SL RNA Gene. Nature 1985, 318:371-374.
41.
DEININGERPL, BATZER MA, HUTCHISON Ill CA, EDGELL MH: Master Genes in Mammalian Repetitive DNA Amplification. Trends Genet 1992, 8: 307-311.
LOBOSM, HERNANDEZN: A 76 bp Mutation Converts a Hum a n RNA Polymerase II snRNA Promoter into an RNA Polymerase IIl Promoter. Cell 1989, 58:55-57.
42.
PARRYHD, MATTAJ IW: Positive and Negative Functional Interactions Between Promoter Elements from Different Classes of RNA Polymerase Ill-transcribed Genes. ~9IBO J 1990, 9:1097-1104.
43.
MURPHYS, DI LIEGRO C, MELLI M: T h e in Vitro Transcription of the 7SL RNA Gene by RNA Polymerase Ill is D e p e n d e n t only on the Presence of an Upstream Promoter. Cell 1987, 51:81-87.
44.
LEE nJ, KANG SG, HATFIELD D: Transcription of X e n o p u s Selenocysteine tRNAsee (Formerly Designated Opal Suppressor Phosphoserine tRNA) Gene is Directed by Multiple 5'-Extragenic Regulatory Elements. J Biol Chem 1988, 264:9696-9702.
45.
WILSONET, CONDLIFFE DP, SPRAGUEKU: Transcriptional Properties o f BmX, a Moderately Repetitive Silkworm Gene that is an RNA Polymerase III Template. Mol Cell Biol 1988, 8:624--631.
46.
ScoTT RW, T1LGHMANSM: Transient Expression of a Mouse ctFetoprotein Minigene: Deletion Analyses of Promoter Function. Mol Cell Biol 1983, 3:1295-1309.
47.
ECONOMOUEP, BERGEN AW, WARRENAC, ANTONARAKISSE: T h e Polydeoxyadenylate Tract of Alu Repetitive Elements is Polymorphic in the H u m a n G e n o m e . Proc Natl Acad Sci USA 1990, 87:2951-2954.
48.
YOUNGPR, SCOTI" RW, HAMER DH, TILGHMANSM: Construction and Expression in Vivo of an Internally Deleted Mouse
27.
28. ••
DOMBROSKI BA, MATHIAS SL, NANTHAKUMAR E, SCOTT AF, KAZAZIANJR HH: Isolation of an Active H u m a n Transposable Element. Science 1991, 254:1805-1807. By using a highly specific probe derived from the sequence of a LINE-1 element that had recently transposed into a patient's factor VII gene, the authors identified a ftdl-length L1 on c h r o m o s o m e 22 that appears to be the progenitor of the recent transposition. An allele isolated from this locus (chromosome 22) in the patient's maternal DNA was identical in sequence to the factor VIi transposition, contained a poly(A) tail of 54 residues, and was flanked by direct repeats indicating that it was itself the product of a transposition event. 29.
30.
31.
32. •.
SLAGELVK, DEININGER PL: In Vivo Transcription of a Cloned Prosimian Primate SINE Sequence. Nucleic Acid Res 1989, 17:8669-8682. BESSERD, GO'IX F, SCHULZE-FOP,STER K, WAGNER H, KROGER H, SXMON D: DNA Methylation Inhibits Transcription by RNA Polymerase III of a tRNA Gene, b u t not of a 5S rRNA Gene. FEBS Let• 1990, 269:358-362. JUTTEP~IANNR, HOSOKAWA K, KOCHANEK S, DOERFLERW: Adenovirus T y p e - 2 VAI RNA Transcription by Polymerase-lll is Blocked by Sequence-specific Methylation. J Virol 1991, 65:1735-1742. SCHMID CW: H u m a n Alu Subfamilies and their Methylation Revealed by Blot Hybridization. Nucleic Acid Res 1991, 19:5613-5617.
882
Genomes and evolution a-Fetoprotein Gene: Presence of a Transcribed Alu-like Repeat Within the First Intervening Sequence. Nucleic Acid Res 1982, 10:3099-3116.
structure. Enzymatic structure probing of a young B1 (mAFP) confimls the predicted structure. This relates evolution of B1 elements to seleo tion at the RNA level.
49.
KRAYEVAS, KRA~IEROVDR, SKRYABINKG, RYSKOVaP, BAYEVAR" GeORBtE\' GP: The Nucleotide Sequence of the Ubiquitous Repetitive DNA Sequence BI Complementary to the Most Abundant Class of Mouse Fold-back RNA. Nucleic Acid Res 1980, 8:1201-1215.
56. •.
50.
JEUNEKWR, SCHMXDCW: Repetitive Sequences in Eukaryotic DNA and their Expression. In Annu Ret, Biochem. Edited by Snell EE, Boyer PD, Meister R, Richardson, CC. Palo Alto: Annual Reviews; 1982:813-844.
51.
ADENPtq-JON-ESS, ZASLOFF M: Transcription, Processing and Nuclear Transport of a B! Alu RNA Species Complementary to an Intron of the Murine a-Fetoprotein Gene. Nature 1985, 317:81-84.
52.
MARAIAR, ZASLOFF M, PLO'I-Z P, ADENIhq-JONES S: Pathway of B1-Alu Expression in Microinjected Oocytes: Xenopus laevis Proteins Associated with Nuclear Precursor and Processed Cytoplasmic RNAs. Mol Cell Biol 1988, 8:,i433--4~40.
CHANGD-Y, MARAIARJ: A Novel Protein Binds the Highly Specific BI and Alu Small Cytoplasmic RNAs in Vitro..I Biol Chem 1992, in press. The existence of a protein that binds small t_3.'toplasmic BI and Alu RNAs was demonstrated by mobility shift and LIV-crosslinking. Examinaton of various deleted and substituted B1 RNAS indicate that the interaction requires a highb/conseta,ed stem loop domain found in 7SL, B1 and Alu transcripts. Data suggests that this is a novel pro. rein.
SINNE'IT D, RICHER C, DERAGON J-M, L~,BUDA D: Alu RNA Secondary Structure Consists of Two Independent 7 SL RNA-like Folding Units. J Biol Cbem 1991, 266:8675-8678. The structure of Alu RNA transcribed in vitro from cloned SINEs was enzymatical .ly probed. Each monomer of the dimeric Alu transcript was found to form a similar structure, previously identified in the 7SL RNA component of signal recognition particle, the presumed progenitor of the Alu family of SINEs. This RNA structure may facilitate retrotransposition of either 7SL pseudogenes or Alus. 57. •
MATHIASSL, SCOT1"AF, IG'..7~XZIANHH JR, BOEKEJD, GABRIELA: Reverse Transcriptase Encoded by a Human Transposable Element. Science 1991, 254:1808-1810. An allele of a chromosome 22 encoded HNE-I (see [28 "• ] ) was tested for iLs ability to produce RT activity when cloned in frame witha yeast TY retrotransposable element. The TS'-L1 chimeric element was found in vires-like particles and produced RT activity ms demonstrated by its abilitT to polymerize dNTP. 58.
MORINGB: The Human Telomere Terminal Transferase Enzyme is a Ribonucleoprotein that Synthesizes TTAGGG Repeats. Cell 1989, 59:521-529.
59.
ZUCKERKANDI.E, bxx'rl,ZR G, JimKA J: Maintenance of Function without Selection: Alu Sequences as 'Cheap Genes'..l Mol Evol 1989, 29:5()4-512.
60.
HOWARDBH, SAK.,WIOTOK: AIu Interspersed Repeats: Selfish DNA or a Functional Gene Family. N Biol 1990, 2:759-770.
61.
R','AN SC, Dt!GAICZYKA: Newly Arisen DNA Repeats in Primate Phylogeny. Proc Nail Acad Sci USA 1989, 86:9360-936-t.
53. •
54. 55. ••
LABUDAD, STRIKERG: Sequence Conservation in Alu Evolution. Nucleic Acid Res 1989, 17:2477-2491.
L,',,BUDAD, SINNE'N" D, RICHFR C, DERAGONJ-M, STRIKER G: Evolution of Mouse BI Repeats: 7SL RNA Folding Pattern Conserved. J Mol Et,ol 1991, 32:405-,i14. Reconstructed consensus sequences that represent the various B1 subfamilies were examined at the RNA structure level. The subfamily con. sensuses conse~,e the potential to fold into a specific RNA secondary
C Schmid, Departments of Genetics and Chemisto,, Llniversity of California, Dmis, California 95616, USA. R Mamia, klboratory of Molecular Growth Regulation, Nation:d Institt, te of Child Health and Human Development, NIH, Bethesda, Mat3,1and 20892, USA.