Bovine Alu-like sequences mediate transposition of a new site-specific retroelement

Bovine Alu-like sequences mediate transposition of a new site-specific retroelement

Gen~ 152 (1995) 261--264 ©1995 Elsevier Science B.V. All rights reserved. 0378-1119/95/$9.50 261 GENE 08498 Bovine Alu-like sequences mediate trans...

395KB Sizes 0 Downloads 9 Views

Gen~ 152 (1995) 261--264 ©1995 Elsevier Science B.V. All rights reserved. 0378-1119/95/$9.50

261

GENE 08498

Bovine Alu-like sequences mediate transposition of a new site-specific retroelement (Repetitive sequences; mobile retroposon; reverse transcriptase)

J a n u s z S z e m r a j a, G r a ~ y n a P l u c i e n n i c z a k b, J 6 z e f J a w o r s k i a a n d A n d r z e j P t u c i e n n i c z a k b'c aDepartment of Biochemistry, Institute of Physiology and Biochemistry, Medical School, Lindleya 6, 90-131 L6d~, Poland. Tel. (48-42) 782-465; bLaboratory of Genetic Engineering, PP TERPOL, Sieradz, Poland; and CMicrobiology and Virology Center, Polish Academy of Sciences, D(tbrowskiego 251, 92-321 L6d~, Poland. Tel. (48-42) 431-533 Received by A.J. Podhajska: 18 March 1994; Revised/Accepted: 28 July/29 July 1994; Received at publishers: 10 October 1994

SUMMARY

We describe a new family of 3.1-kb repetitive sequences which is present in the bovine genome. The 5' and 3' ends of the unit are flanked with sequences homologous to the 5' and 3' halves of the bovine Alu-like monomer (BM), respectively. Distribution of the 5' ends of the family members in the genome is not random. They are close to the truncated bovine Alu-like dimer (BD) which, in some cases, is followed by 40-bp repeated sequences containing block A of the RNA polymerase III promoter. The ORFs found within the unit code for peptides homologous to amino-acid sequences characteristic for reverse transcriptases (RT). The family members may be considered as mutant mobile elements whose propagation in the genomes was accomplished by means of a process including site-specific recognition with BD. Because of this, we call this family the bovine dimer-driven family (BDDF).

INTRODUCTION

Bovine Alu-like sequences are widely spread in the genomes of Ruminantia (Watanabe et al., 1982). The BD members consist of two 116- and ll7-bp homologous monomers divided by 27-bp palindromic, purine+ pyrimidine-rich sequences. The right BM is contiguous to (CAG), tails which are recombination hot spots in the bovine genome (Skowrofiski et al., 1984). Sequences homologous to the 3' half of the BM flank one end of art2 or Pst interspersed repetive sequences (Duncan, 1987; Majewska et al., 1988). The structure of these repetCorrespondence to: Dr. A. Ptucienniczak, Laboratory of Genetic Engineering, PP TERPOL, POW 57, 98-200 Sieradz, Poland. Tel. (48-43) 4548; Fax (48-43) 71200. Abbreviations: aa, amino acid(s); BD, bovine Alu-like dimer(s) (family); BDDF, bovine dimer-driven family; BM, bovine Alu-like monomer; bp, base pair(s); kb, kilobase(s) or 1000 bp; nt, nucleotide(s); ORF, open reading frame; PCR, polymerase chain reaction; RT, reverse transcriptase(s); ss, single strand(ed). SSDI 0378-1119(94)00709-8

itive elements was a hint for us to look for DNA fragments in the bovine genome which are bordered with sequences homologous to both the 5' and 3' halves of the BM. The analysis of the sequence data libraries shows that members of the art2 or Pst families of sequences are rather randomly distributed within the genomes of Ruminantia. One exclusion from this rule comes from the data of Majewska et al. (1988) who have shown that PstI elements are parts of longer repetitive sequences. We decided to determine features of a new interspersed repetitive sequence with a hope that its structural organization will supply new data about the function of the bovine Alu-like elements.

EXPERIMENTAL AND DISCUSSION

(a) Sequence data analysis ( I ) Structure of the BDDF unit

The structure of BDDF members which emerge from our data is shown in Fig. 1. The repetitive unit is flanked

262 10 L

~_~

520 ]r"~CAG)n tm m

20 _

_

3.0kb

i

__

~__

Transcription

~ nx40 112ml

)

art," J_

OAACTG)n - -

ll2mr

c d

e f

g

Fig. I. Structure of the BDDF unit. The vertical arrows show the borders of the unit. tin, m, 1/2 ml and 1/2 mr are sequences homologous to the BM: truncated monomer, monomer, the sequence homologous to the left part and to the right part of the monomer, respectively. (CAG), and (AACTG), show the sites where the sequences in the brackets occur in the structure, nx40 (and vertical lines) show the location of the 40-bp repeats preceding the left flank of the unit. The direction of possible transcription of the BDDF unit is shown by long horizontal arrow. E20 is the 20-bp sequence preceding the 5' half of the BM (Fig. 2). The overlined 3' part of the unit is equivalent to the bovine art2 or Pst families (Duncan, 1987; Majewska et al., 1988): the underlined internal part of the unit, designated rt. shows the ORF coding for RT-like peptides. The structure shown is based on the nt sequences of the inserts found in the bovine genomic library prepared in the ~ L41.1 (Sambrook et al., 1989). The accession Nos. of the sequences in EMBL data library are Z25525, Z25527 and Z25530. The bovine genomic library was screened with [~-32p]dCTP-labeled plasmids carrying inserts belonging to the bovine Pstl family (Majewska et al., 1988). The structure of the left flank was additionaly confirmed by sequencing of two inserts placed in pBS(+) plasmid (Stratagene, La Jolla, CA, USAt obtained after cloning of the bovine DNA amplification products (EMBL accession Nos. Z25528 and Z25529). The PCR variant of Copley et al. (1991) was used. The internal primer was 5'-GAACTCCCCAGGTCGATAGC, shown with thin horizontal arrow. The external, partially complementary, primers which were tigated to the bovine DNA digested with BamHI and external primer used for amplification were 5'-GCCGGATCCGACTCAACTGACTAGGATGGA, 5'-GATCTCCATCCTAACCGATCTAACCGG AT and 5'-GCCGGATCCGACTCAACTGGAC, respectively. The lines a-f under the BDDF model structure represent the inserts cloned and sequenced in this work: a, b, e, inserts of recombinant ?, phages (EMBL data library accession Nos.: Z25525, Z25527 and Z25530, respectively) and d, e, f, plasmid inserts (EMBL data library accession Nos. Z52526, Z25528 and Z25529, respectively). The bold line g represents the longest DNA sequence showing homology to the BDDF unit found in the EMBL data library, Release 33, December 1992, accession No. M94327 (Groenen et al.. 1993); the sequence is a part of the bovine ~s2-caseinencoding gene localized in the complementary strand (nt in the coding strand: 17 286--19062). The standard molecular biology techniques were used (Sambrook et al., 1989).

by sequences h o m o l o g o u s to the 5' a n d 3' halves of the BM. The flanking sequences c o m p l e m e n t to the almost full B M (indicated by 1/2 ml a n d 1/2 m r in Fig. 1 ). The sequences forming 3' flank of the unit occur very frequently ( D u n c a n , 1987; M a j e w s k a et al., 1988; and E M B L d a t a library). The 3' end of the flank consists of

( A A C T G ) , or similar repeats. In the consensus sequence there are two of them (Fig. 2A). The 5' flank, h o m o l o g o u s to the 5' half of the B M (Fig. 2A), was discovered initially within two inserts of r e c o m b i n a n t ~ phages found in the bovine D N A library. To confirm these d a t a we decided to check whether sequences from internal p a r t of the repetitive unit are c o n n e c t e d with D N A fragments h o m o l o g o u s to the 5' half of the BM. It seems to us that the best m e t h o d to do this is P C R amplification variant in which only one primer is h o m o l o g o u s to the internal p a r t of the r e p e a t e d unit (Copley et al., 1991 ). The nt sequences of two D N A fragments o b t a i n e d in this way confirm that the 5' end of the unit consits of D N A fragments h o m o l o g o u s to the 5' end of the BM. Except for the elements h o m o l o g o u s to the 5' halves of the B M s the u p s t r e a m parts of the 5' flanks consist of 20-bp sequence c o n t a i n i n g small 4-bp inverted repeats at the j o i n i n i n g p o i n t with the Alu-like p a r t (Fig. 2A). The internal parts of the characterized B D D F m e m b e r s show a b o u t 80% sequence similarity (three sequences were c o m p a r e d , E M B L accession Nos. Z25525, Z25527 a n d Z25530; Fig. 1, lines a, b a n d c). The examin a t i o n of peptides c o d e d by O R F s within the r e p e a t e d unit reveals h o m o l o g y to RT. The h o m o l o g y is o b s e r v e d for clusters of characteristic a a sequences a n d distances between them (Michel a n d Lang, 1985; H a t t o r i et al., 1986; Fig. 3). The longest B D D F - l i k e element found in the E M B L sequence d a t a l i b r a r y is 1777 b p a n d consits of 3' flank a n d the c o n t i g u o u s D N A fragment c o n t a i n i n g the R T e n c o d i n g O R F ( G r o e n e n et al., 1993). In the 5' n e i g h b o u r h o o d of the B D D F the t r u n c a t e d B D F elements are present (Fig. 1). The nt sequences localized between ( C A G ) , tails of the t r u n c a t e d B D a n d the 5' B D D F flanks are of variable length. F o r the two B D D F units sequenced, these spacers are 152 a n d 313 bp a n d consist of 40-bp repeats a n d their t r u n c a t e d fragments (Fig. 2C). The repeats c o n t a i n block A of the R N A p o l y m e r a s e III p r o m o t e r ( G e i d u s c h e c k and TocchiniValentini, 1988). The p o l a r i t y of the a c c o m p a n y i n g B D F elements is the same as that of Alu-like flanks at the b o r d e r s of the B D D F units. F o r the two sequenced B D D F elements, the left m o n o m e r s of the a c c o m p a n y i n g BD dimers, distal to B D D F elements, are d a m a g e d . In this way, the 5' ends of the B D D F units are close to the intact, right B M s which are e q u i p e d with ( C A G ) , tails.

(2) Possible mechanism of transposition o/'the BDDF members Such a structure of the 5' flanking sequences a n d h o m o l o g y of the B D D F m e m b e r s flanks to the 5' a n d 3' halves of the B M suggest that the B D F elements are the sites which m a y d e t e r m i n e the t r a n s p o s i t i o n of the B D D F

263

A s,

"

rc

.c

cc6a-crccaarirrcrracc

rcceic.c.c

c

54 54 74 74 75 74 71

I I GGAGAA~TGGCAACCCA-CTCCAGTACTCT~TGGAAAATCCCAGGC~C ctgacagaat~tggtccactGGAGAAN~~CCA-CTTCANTATTCTTGCCTTGAG~ACCYCWTGAAC ct a a c a g a a t g t g g t c c a c t G G A A A T T G G A A T C C C_AAACCA- CT T C A C T A A A C T T G C C T T G A G A A C C TC T A G A A C c t g a c a g a a t c tgg t c c a c t G G A G A A T G G A A T ~ C C A A C T T C A G T A T T T C T G C C TT G A G A A C C T C T T G A A C ctgacagaatg tggtccac tGCAGA- GGAAAT~CCAC T G C A T CAT TC T T G C C T T G A G A A C C C C A T A G C A c - g - - a g a a t g t a g t c c a c t G G A G A A G G G A A T G G C A A A C TA- CT T C A G T A T T C T T G C C T T G A G A A C C C C A T G A A C

e. c/ ac-cr

rac r6rarac.c.a-rbacam a rbac.acacc c c.aaac cr

1 iv

~GC~GC-CTGGrAGC~T~GTCYAT~-TCC~TMAC~GTCGC-iACACC~CTC~-GC~CTT

a b c d e f g a

116 b

GARGC-CT~TGCTGyRRTYcATGGGG-TYRCAAAG~GTYGGACACGACTC.AActgaactg

61 h 75 1 73 j

GGCGC-CT GAAAGCTGTGATTCATGGGG-TCACAAAGAGTCAGACGCGACTGgggtgactgaactgaattgaactg GGAGAGCTGGCACGCTGTGAC-CATGGGGCTC--AAAGAGTCAGACGC~GAAgc~catgca~

B

G G U

U A

]0 236-252

C*G

253-29] A G C T A C C C C C G T T G A G G C C A C G G G G T G - C T G G A T T G A A G G

292-319 371-390

20- cACU"CCAGUA- C*[~3

G

A-lOO

***

U*

C

C

C*G U=~.~

~*~ A G

A'L(

5' 1

30

C*G G*C 60- A*U

CC AAc, GG%

20

*G ~*C ~"~ G*C G~(6"-') A*U G "~ - 80 GAGGA"' G A*U G-40 .~*GG A * A A C C*G

40

GGGGTGGTGGACA-GAGG AGCAACCCCATGTTCAAGGTTAGGAG AGATACCCCATGTCCAAGAT

687-694

CCGAGAGG A G C T A C C C C A C A T C C A A G G T C A G G G G C A G T C X ; C C GAGAC, G 735-774 A G C T A C C C C A C A T C T A A G G T C A G T A G C A G T A G C C A A G G C G 775-813 A G C T A C C C - A A A T C T G A G G T C A G G G G T G G T G G C C . A A ~ 814-853 AGCTAACCCAC GTCCGAGT TCAGGGGCGGCGAACAAGAAG 854-893 A G C T A C C C C A C G T C T G A G G C C A G G G G T G G T G G A C A A G A G G 894-924 AC_dZAACCCCAT G T T C A A G G T T A G G A G G ~ G TCCAAGGTAAGGAGCAGTGGC 930-950 977-998 A G A T A C C C C A T G T C C A A G A T A A

b

695-734

( ~

U*A C*G G*C C*G U*A G*C-U-U 3'

10

116

~JGC T A C C C C A Y G TC C A A

20

30

40

c

TCAGGRGMGGTGGCCAAGAGG

Fig. 2. The BDDF flanking sequences. (A) Comparison of the BDDF flanking sequences with the BMs. a, b, c and h, the consensus sequences (shown in italic) of the right BD monomer, the left BD monomer (Skowrofiski et al., 1984), the left BDDF flank and the right BDDF flank, respectively. d-g, the left flanks found in this work (EMBL accession Nos. Z25525, Z25527, Z25528 and Z25529, respectively); the sequences were used to construct the consensus sequence of the left BDDF flank (c). The consensus sequence of the right flank of the BDDF unit (h) was obtained from 18 sequences homologous to the Pst family (Majewska et al., 1988) taken from the EMBL databank, i and j, the BDDF right flanks found in this work (EMBL accession Nos. Z52525 and Z25527, respectively). The sequences which flank BMs parts of the BDDF unit are shown with lower-case letters. The inverted repeats in the left BDDF flank are underlined; their centers of symmetry are shown with ([), (c). The direct repeats at the end of the right BDDF flanks are double-underlined, (h-j). The most variable region in the right BDDF flank is shown with subscript (h). In the region of possible homology, the presented sequences are compared with the BMs. Matches are shown in bold. (B) Secondary structure of the left BM ssDNA, (see line a in panel A). The nt 54 and 58 where the left BDDF flanks end and the right flanks begin, respectively, are closed in squeres. The nt of the most variable region of the right BDDF flank are circled (nt 74-79). The PC/GENE program RNAFOLD (IntelliGenetics, Montain View, CA, USA) was used for the structure prediction. (C) The nt sequence aligment of the 40-bp repeats lying between truncated BD and BDDF units, a and b, fragments of EMBL accessionm Nos. sequences Z25525 and Z25527, respectively. (e) The consensus sequence. Matches with the consensus sequence are shown in bold. Numbers on left margin correspond to the coordinates of shown repeats. Block A of of RNA polymerase III promoter is underlined (Geiduschek and Tocchini-Valentini, 1988).

m e m b e r s in the b o v i n e g e n o m e . T h e m u t u a l l o c a l i z a t i o n of t r u n c a t e d B D F e l e m e n t s a n d the b o v i n e Alu-like flanks of the B D D F u n i t s e x c l u d e a s i m p l e site-specific r e c o m b i n a t i o n as a p r o c e s s l e a d i n g to the t a r g e t i n g of the n e w B D D F e l e m e n t s in the g e n o m e . W e s u p p o s e t h a t s e q u e n c e specific r e c o g n i t i o n m a y be i n v o l v e d in the first step of t r a n s p o s i t i o n followed b y several e v e n t s (Fig. 4) w h i c h l e a d to the l o c a l i z a t i o n of the m o b i l e e l e m e n t at

the 3' site of the ( C A G ) , m u t a g e n i c tail. W e believe t h a t the t a r g e t i n g process m a y be i n f l u e n c e d by: (i) the s e c o n d a r y s t r u c t u r e of s s D N A of B D m o n o m e r s or their t r a n scripts w h i c h m a y f o r m c r u c i f o r m s t r u c t u r e s (Fig. 2B) a n d (ii) ( C A G ) . tails w h i c h m a y p r o v i d e sites where s t r a n d s b r e a k s o c c u r very f r e q u e n t l y ( S k o w r o f i s k i et al., 1984). E s p e c i a l l y a t t r a c t i v e is the a s s u m p t i o n t h a t ss s e c o n d a r y s t r u c t u r e is a n i m p o r t a n t factor in t a r g e t i n g

264 +

**+++**+++*+

32

+

QGC ILS PCLFNLYAEY

IMRRAGLEE

+ ++

+

+

+

ated by reverse t r a n s c r i p t i o n , p o s s i b l y of R N A p o l y m e r -

TQAG IKIAGRN INNL

658 ~ , C P L ~ L ~ N I V M E V L A I A I R E E K A I K G I H I G G E E IKLS 230 ~ I I ~ L ~ L T L D G _ _ ~ E FH IYKKI -QKSSSKGN- - T -YC ++***

+

*+

**

+

*

+

+

a

b C

*

RYADD TTL -MAE SKE E LKE T.I/MKVKEE SEKVGFKLN IQK L F A ~ M I V Y L E N T R D S T TKLLEVI KE YSNVS~YK IN TH~ R Y A ~ M V I L T TTE E TAL IALPAV KE F L A V ~ L E V K I ~

a b C

Fig. 3. Homology of the peptide coded by ORF found within the BDDF unit (nt 3250 3690, accession No. Z25525) with aa sequences of RT. The aa common for RT (Michel and Lang, 1985; Hattori et al., 1986) are double-underlined, asterisks and crosses correspond to sites of complete and partial sequence conservation, respectively, a, peptide encoded within the BDDF unit. b and c RT-like domains encoded in Simian L1 family sequences (Hattori et al., 1986) and in the Scenedesmus obliquus petD intron (Kt~ck, 1989). Numbers on left margin correspond to each first aa shown.

ase l I I transcripts, a n d site-specific r e c o g n i t i o n of the a b u n d a n t B D sequences. (2) T h e c o n c e p t of the g r o u p - I I i n t r o n s as site-specific r e t r o e l e m e n t s (Belfort, 1993) e n h a n c e d b y existence of BDDF

defines a n e w i n s i g h t o n the e v o l u t i o n of the

genome. (3) T h e c o p y n u m b e r of Pst family m e m b e r s , which s h o u l d be c o n s i d e r e d as the t r u n c a t e d B D D F - l i k e elem e n t s , is e s t i m a t e d to be 5 × 104 ( M a j e w s k a et al., 1988j. T h i s figure reflects the i m p o r t a n c e of the t r a n s p o s i t i o n process d e s c r i b e d in this w o r k a n d its influence o n the e v o l u t i o n of g e n o m e s of Ruminantia.

REFERENCES

m

5' Seque.

m

/

recognition/5 ' 112m~

\

(CAG) XXX n 000

112~mr /nse~ion

lllllllllllllllll BDDF

5'

tm

m

(CAG)n I12mi xxx ooo

ll2mr

llllllllIllllllll

BDDF

Fig. 4. Sequence of possible events which may lead to the insertion of a BDDF unit into a new target site in the bovine genome (for symbols, see legend to Fig. 1).

a n d final r e s o l u t i o n of the m o b i l e e l e m e n t - g e n o m i c D N A i n t e r a c t i o n . T h e s t r u c t u r e (Fig. 2B) is s i m i l a r to t h a t d e s c r i b e d for the yeast m i t o c h o n d r i a g r o u p - I I i n t r o n s a12 a n d E3 at their 3' e n d s ( K e n n e l l et al., 1993) a n d m a y d e t e r m i n e r i b o z y m e e n d o n u c l e a s e activity which m a y be able to cleave D N A specifically at the t a r g e t i n g site a n d to ligate a D N A s u b s t r a t e to a n R N A (M•rl et al., 1992) or, at least, to p r o v i d e a specific site for the target D N A cleavage. (b) Conclusions (1) O u r h y p o t h e s i s is that the B D D F m e m b e r s are m u t a t e d or t r u n c a t e d m o b i l e e l e m e n t s e n c o d i n g their o w n RT. T h e i r s p r e a d i n g in the b o v i n e g e n o m e was medi-

Belfort, M.: An expanding universe of introns. Science 262 (1993) 1009 -1010, Copley, C.G., Boot, C., Bundell, K. and McPheat, W.L.: Unknown sequence amplification: application to in vitro walking in Chlamydia trachomatis L2. Bio/Technology 9 (1991) 74 79. Duncan, CH.: Novel Alu-type repeat in artiodactyls. Nucleic Acids Res. 15 (1987) 1340. Geiduschek, E.P. and Tocchini-Valentini, G.P.: Transcription by RNA polymerase llI. Annu. Rev. Biochem. 57 (1988) 873-914. Groenen, M.A.M, Dijkhof, R.J.M., Verstege, A.J.M. and Van der Poel, J.J.: The complete sequence of the gene encoding bovine as2-casein. Gene 123 (1993) 187-193. Hattori, M., Kuhara, S., Takenaka, O. and Sakaki, Y.: LI family of repetitive DNA sequences in primates may be derived from a sequence encoding a reverse transcriptase-related protein. Nature 321 (1986) 625-628. Kennell, J.C., Moran, J.V., Perlman, P.S., Butow, R.A, and Lambowitz, A.M.: Reverse transcriptase activity associated with maturaseencoding group II introns in yeast mitochondria. Cell 73 (1993) 133 146. Kfick, U.: The intron of a plastid gene from green alga contains an open reading frame for a reverse transrcriptase-like enzyme. Mol. Gen. Genet. 218 (1989) 257-266. Majewska, K., Szemraj, J., Plucienniczak, G., Jaworski, J. and Ptucienniczak, A.: A new family of dispersed highly repetitive sequences in bovine genome. Biochim. Biophys. Acta 949 (1988) 119-124. Michel, F. and Lang, B.F.: Mitochondrial class II introns encode proteins related to the reverse transcriptases of retroviruses. Nature 316 (1985) 641 643. Mbrl, M., Niemer, I. and Schmelzer, C.: New reactions catalyzed by a group II intron ribozyme with RNA and DNA substrates. Cell 70 (1992) 803-810. Sambrook, J., Fritsch, E.F. and Maniatis, T.: Molecular Cloning. A Laboratory Manual, 2nd ed. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 1989. Skowrofiski, J., Plucienniczak, A., Bednarek, A. and Jaworski, J.: Bovine 1.709 satellite. Recombination hot spots and dispersed repeated sequences. J. Mol. Biol. 77 (1984) 399-416. Watanabe, Y.T., Tsukada, T., Notake, M., Nakanashi, S. and Numa, S.: Structural analysis of the repetitive DNA sequences in the bovine corticotropin [3-lipotropin precursor gene. Nucleic Acids Res. I0 (1982) 1459 1469.