Br ves communications
BIOCH1MIE, 1985, 67, I053-1057
Sequence analysis of the glyW region in Escherichia coli. Stanley D. TUCKER and Emanuel J. MURGOLA.
Section of Molecular Genetics, Department of Genetics, The University of Texas M,D. Anderson Hospital and Tumor Institute, Houston, TX 77030 (USA). (Recu le 2-7-1985, accept~ le 4-7-1985).
R6sum~ - - Nous avons ddtermind la s~quence DNA d'un segment de mille bases du chromosome d'Escherichia coli. Ce segment comprend glyW, un gone en deux copies codant pour le tRNA~ ty, et ses rdgions adjacentes. Une sdquence d'insertion, dont l'origine spontande ~tait ddj~ connue, est identifi~e comme IS1. Des sites d'initiation possibles pour l'initiation et la terminaison de transcription sont identifids par comparaison avec les sdquences des rdgions modOles du type promoteur ou terminateur. Les r~sultats suggOrent que l'expression de glyW ddpend de l'expression du g~ne pr6cddent, pgsA, soit par chevauchement transcriptionnel ou traductionnel, soit par co-transcription, soit encore par une combinaison de ces deux mdcanismes. Mots-cl~s : tRNA~ ~' / IRNA g~nes duplicata / pgsA / IS1.
Summary -- We have determined the DNA sequence of a 1-kilobase segment of the Escherichia coli chromosome. The segment contains glyW, a duplicate gene for tRNA~ t', and its flanking regions. An insertion sequence, previously known to have occurred spontaneously within the sequenced fragment, was identified as IS1. Possible sites for initiation and termination of transcription were identified by comparing them with the sequences of model promoter regions and termination structures. The results suggest that the expression ofglyW may depend upon the expression of the preceding gene, pgsA, by transcriptional or translational overlap, by cotranscription of these two genes, or both. Key-words : tRNA3c~' / duplicate IRNA genes / pgsA I IS1.
Introduction
Escherichia colitRNA genes have been mapped at various locations on the genome. These genes often occur in complex arrangements associated with other tRNA genes, with tRNA operons, or with proteins (for a current review, see [1]). Several tRNAs of E. coli, particularly the more abundant tRNA species, are encoded by multiple Abbreviations : bp : base pairs kb : kilobases min : minutes
gene copies. Duplicate tRNA genes are often closely linked, either as stable tandem repeats or within the same operon, but in some instances they are widely separated on the chromosome [2]. Multiple tRNA genes, in specific chromosomal arrangements, may be important for regulating the level of specific tRNA species under different physiological conditions. The chromosomal organization of the glycine tRNA genes represents an interesting example of gene duplication. Three species of glycine tRNA, designated 1, 2 and 3, have been characterized and their genes have been mapped at four widely separated loci [2,3,4]. The major species,
1054
S.D. Tucker and E.J. Murgola
tRNA~ 1y, is encoded at two loci, ~,/yV and gO,IV. Three or four copies of glyVare present in tandem at about 95 rain and a single copy of glyWoccurs at 42 rain [5, 6, 7]. Previously, we mapped in detail the position of glyWon the recombinant plasmids pPGI and pPGL2019 [7]. A cluster of restriction sites predicted from the tRNA~ ~ysequence indicated that the glyWgene was located within a few hundred base pairs of the pgsA gene, which codes for phosphatidylglycerophosphate synthase. The orientation of these sites suggested that the glyW gene was transcribed in the same direction as pgsA. In this paper, we report the nucleotide sequence of glyW and the adjacent chromosomal regions contained in pPGL2019.
EcoRI
PstI"
Materials and methods
PstI
Bacterial strains, plasmids, and growth conditions E. coli strain JA200 was previously described [8]. pPGL2019 contains a region of E. coli DNA previously shown to carry pgsA and glyW [7]. This region also contains an approximately 0.5 kilobase (kb) insert of DNA that spontaneously appeared in the original parent plasmid, pPGI [71. Bacteria were routinely grown in Luria broth supplemented with glucose and tetracycline (20 ixg/ml).
DNA manipulations Plasmid DNA was prepared essentially as described by Maniatis et al. [9]. DNA restriction fragments were recovered from agarose gels either by electroelution into hydroxyapatite as described by Davis et al. [10], or by electroelution onto DEAE membrane as described by Schleicher and Schuell publication no. 364 (provided with the membrane). DNA fragments were labeled at their 5' ends with [y-'P]ATP and T4 polynucleotide kinase. The DNA sequence was analyzed by the method of Maxam and Gilbert [11].
FIG. 1. - - The flasmid pPGL2019 contains the EcoRI-Bglll fragment of pPG1-L subcloned in the rector pKC7 [7, 81. p P G I - L contains a DNA insert of approximately 0.5 kb (shaded region), which spontaneously appeared in an isolate of a pPGl-containing strain [8]. glylVwas previously mapped between the EcoRI site and the Pstl site that lies within the insert. The orientation of the plasmid in this figure is the reverse of that shown by Tucker et aL [7]. The segments of DNA originating from pKC7 ( .), the E. coil chromosome ( r - - - - I ) , and the insertion element ( ~ ) are indicated.
'
Pstl, I
Mapping of the glyW gene on the plasmid pPGL2019 was previously described [7]. In the present study, the 2.2-kb Pst I restriction fragment of pPGL2019 extending clockwise from about 6.0 kb to 0.7 kb on the map o f pPGL2019 (Fig. 1) was isolated and the nucleotide sequence determined, according to the strategy shown in Figure 2. The 450-base pair (bp) sequence at 'the PstI end of the sequenced region corresponded to 60 % of the insertion element ISI as reported by Ohtsubo and Ohtsubo [12] and by Johnsrud
J
Hint]
bp
t
l
I
0
i
EcoRI
~ .
.......
t I
i
~ ~
4
glyW
DdeI
TaaI MSDI
Results and discussion
•
~
~ I t
i
i
1
i
i m i 500
.
....
°
v
,
•
.-
,
,
IIX~0
FIG. 2. - - The DNA segment shown extends from the Pstl site of the insert in pPGL2019 to the EcoRl site. Restriction sites are indicated on the lines below the DNA fragment. The 5' and of the restriction fragments sequenced are marked by the symbols designated for that site as follows : zx, Ddel; II, HinfI, O, Taql; 0 , MspI. The extent to which the sequences were determined are designated by solid arrows. The dashed arrow indicates a portion of a fragment that was not sequenced.
[13]. The insertion element that spontaneously appeared between pgsA and glyWin pPGI (see legend to Fig. I) can be identified as ISI and has
Sequence analysis of the glyw region in E. coli also been verified by Gopalakrishnan et aL [14]. The IS1 sequence determined by Johnsrud varied from the sequence reported by Ohtsubo and Ohtsubo at seven nucleotides. Three of these nucleotide differences were within the ISI region we sequenced. The sequence reported by Ohtsubo and Ohtsubo contained A at position 393, C at position 396, and G at position 444, whereas the Johnsrud sequence and ours contained C, T and T at these respective positions in IS1. As both strands have been sequenced, we believe our assignments to be correct. Minor variations have been noted before in E. coli sequences from various sources [15, Tucker and Murgola, unpublished data]. The 5' end of the sequence corresponding to the mature tRNA~ ~y sequence [16] is located at l l 0 b p as designated in Figure3. The D N A sequence is consistent with the previously determined sequence of the tRNA molecule. Analysis o f the region upstream from the mature tRNA~ ]y sequence for possible sites of transcription ini20
1055
tiation indicated that the sequence from nucleotide 76 to 82 (Fig. 3) is similar to the - 10 region of model promoter sequences [17, 18 19]. A sequence similar to the - 3 5 consensus sequence appears at nucleotides 62 to 67, placing these two regions only 8 nucleotides apart. However, analyses o f the promoters for ~-galactosidase [20], [3-1actamase [21], and tyrosine tRNA [22], as well as of hybrid promoters [23], indicate that it is important to have a spacing o f 16-19 nucleotides between these two consensus sequences. Consequently, we consider it unlikely that RNA polymerase can form an initiation complex with this region of DNA. A heptamer corresponding to the - 10 consensus sequence except at the weakly conserved third position was located further upstream from the mature tRNA7 ~y (nucleotides 31 to 37 in Fig. 3). Upstream from this - 10 sequence and separated from it by 16 bp, there is a sequence similar to the - 3 5 consensus sequence (nucleotides 9 to 14 in Fig. 3). The first two nucleotides of this sequence,
40
60
80
AGTCA~ACCG~C~ATCGCGC~AGGTAAGTAGAAT~AACGCATCGAA~GGC~TGATI-,~A~A~GATA~TAAAAI~AAGTGATT l
*
m
160
,
AACTGATTGCTTGATGAATGCGGGAATAGC TCAGTIGGTAGAGCACGACCTTGCCAAGGTCGGGGTC ~ GAGTTCGAGTCTCGTTTCCCG 270
CICCAGITIA~ACATCGGCGICAAGCGGATG~T~TG~AG~TG--`~Gp&TITGGCGcGIT~C~J~GCGGIIAIGTAGCGGAT 36O TGCAA~CCG~C~AGTCCGGTT~GACTCCGG~CGCGCC~CCAC~TTCTTCCCG~G~CGGA~GGTGGAA~CGGTAGA~AC/L~GGGA~ 46O
•A•LATCCCTCGGCGTTCGCG•TGTGCGGTTC•AGTT•CCG•T•CGGGTACC•TGGGAA•G•TA•G/L•TA•AA•T•AA•GC••••AG••GTG i
i
i
m
54O
*
TCG~G~A~CACCT~GGGTGG~T~TTGTGC~TGCAACI~GTCGTIACACCCTCCTTAATTT~TAATCACCAGCAAAGCC~cTCA~AA` 62O
AGTATCTCTGATACGGAC(GGCATGTAAGATAGGTGCTGGCGAGTTGAGATCCACAAGGAAAAGCGT ,
,
,
,
,
ATG AAA ACG GGACCG TTA Met tys Thr Gly Pro Leu ,
e ~~O
AAC GAAAGT GAG.TTGGAA TGG CTG GAC GAT ATT CTG ACC AAA TAC AAC ACT GAC CAC GCCATC CTT GAT Ash GIu Set GIu teu Glu Trp Leu A s p A s p l l e teu Thr Lys Tyr Ash Thr A s p H i s Ala t i e Leu A s p •
*
J
,
D
,
760 g
GTG GCG GAG CTG GAC GGT TTA TTG ACG GCG GIG TTG AGT TCT CCG CAA GAG ATT GAA CCG GAA CAG TGG Val kla Glu Leu Asp Gly Leu Leu Thr kla Val Leu Set Set Pro Gin Glu lie Glu Pro Glu Gin Trp , , , , , 830 CTG GTT GCC GTG TGG GGT GGG GCT GAC TAT GTG CCG CGC TGG GCG TCA GAG AJU~ GAG ATG ACG CGC GCT Leu Va! kla Vat Trp Gly GIy k|a Asp Tyr Vat Pro krg Trp kla Ser Glu Lys Glu Met Thr kr9 kla , , , , , 8gO TTA TGA ATCTGGCTTTTCAJ~CATATGGCCGATACCGCAGAGCGTCTG,~LACGAATTCTT # EcoRI
Leu
FIG. 3. -- DNA sequence for the antisense strand o f the g~'IV gene and the flanking regions isolated on pPGL201~ The right end of the ISI insertion sequence is located at nucleotide'10 as indicated by the symbol V. The nucleotides sequenced to the left of this region correspond to ISI and are not included. The mature tRNA~ ~ysequence is underlined. The - 1 0 consensus sequences are shown in boxes. The - 3 5 consensus sequences are underlined with dashes. Regions of dyad symmetry are shown
with horizontal arrows.
S.D. Tucker and E.J. Murgola
1056
-35
-10
(a) AAAaraattGCTTGACAtttttt-tactattGTggTArAATgc---cCATcaatagat (b)
AAAAATATCGTTGACTCATCGC-GCCAGGTAAGTAGAATGCAACGCATCGAACGGC
FIG. 4. - - The nucleotide sequence of the potential glylV protnoter and model protnoter sequences are shown for comparison. (a) The model promoter sequence of Siebenlist et al. [19] is shown with the upper case nucleotides conserved at a frequency of ~ 46 % and the lower case nucleotides corresponding to the most frequent occurring nucleotide at their respective positions in a statistical analysis of prokaryotic promoters. (b) The potential glyWpromoter includes the 12 nucleotides prior to the ISI insertion point sequenced on pPGI as noted in the text and extends to nucleotide 53 as designated in Figure 3. Homologies with the highly conserved nucleotides of other promoters are extensive.
CCGACT, belong to the right end of ISI rather than to the E. coli chromosomal DNA at this location. DNA sequencing studies undertaken with pPG1, which does not contain the ISI sequence, indicated that the chromosomal sequence for the 12 nucleotides preceding the insertion point of ISI was AAAAATATCGTI" [14]. Consequently, the chromosomal sequence (TFGACT) is homologous with the - 3 5 consensus sequence except at the weakly conserved sixth position. Homologies of this potential promoter region with the highly conserved nucleotides of other E. coli promoters are extensive as noted in Figure 4. A conserved G / C rich region between the - 1 0 consensus sequence and the proposed initiation site of stringent response promoters was proposed to be a discriminator region for stringent control [24, 25, 26]. The poor correlation of this discriminator region with our proposed glyW promoter may be related of the association of glyW expression with that of pgsA as suggested below. A potential rho-independent transcription termination loop structure with a G/C-rich region of dyad symmetry followed by a T-rich region was found at nucleotides 215-240 (Fig. 3). Another area of possible significance in this segment of DNA is a short, open reading frame that begins at nucleotide 608 and extends 228 nucleotides to a TGA termination codon at nucleotides 836 to 838. The initiator region is preceded by a possible ribosome binding site [27]. A potential promoter is located further upstream with the - 3 5 and - 1 0 consensus sequences located at nucleotides 491 to 496 and 514 to 520, respectively (Fig. 3). Two of the genes that have been mapped downstream from glyWareflbB and flal [28]. The products of both genes are believed to be positive regulators of the other flagellar genes [29]. However, the flbB and flal genes have recently been cloned and sequenced and the data indicate flbB codes for a 13 kilodalton polypep-
tide and is located at least 1.5 kb beyond the EcoRI terminus of the fragment cloned in pPGL2019 (P. Matsumura and D. Bartlett, personal communication). The function of the 10 kilodalton polypeptide coded by the open reading frame downstream from glyWremains unknown. The proximity of glyWto the pgsA gene and the common direction of their transcription suggest the possibility that the two genes are cotranscribed on the same precursor RNA [7, 14]. Dualfunction transcripts linking tRNA genes with protein genes have been found previously [1]. glyWrepresents the first case in which the tRNA gene is preceded by the protein gene implying that the tRNA may be under the control of the protein gene promoter rather than vice versa. On the other hand, sequence analysis of the pgsA gene indicated that the TGA codon at positions 59-61 (Fig. 3) is the carboxy-terminus of this gene [14]. Consequently, if glyWtranscription in vivo is initiated at the potential promoter described above and if pgsA expression terminates in the region between these two genes, then the regulation of these genes is overlapping. Various types of overlapping genes were reviewed recently [30]. By either mechanism, cotranscription or transcriptional (or translational) overlap, the expression of glyW seems to be coupled to the expression of pgsA. Finally, both mechanisms may be operative under different physiological conditions, relating perhaps to the reported involvement of tRNA~ ~y in the transfer of glycine to lipopolysaccharide [31]. Transcriptional analysis of the precursor RNA for the tRNA7 ~yfrom this segment of the genome will be necessary to verify the DNA regulatory signals for glyWexpression. In a previous report [7], we suggested that overproduction of tRNA3G~y may be involve~i in some way in preventing the survival of cells carrying multiple copies of the pgsA-glyWregion of the genome. It was observed that the problem
Sequence analysis o f the glyw region in E. coli
could be overcome either by deletion of glylVor by insertion of an extraneous D N A segment into the region. We suggested that the insertion element prevented expression o f glyW. The D N A sequence determined in the present study supports that suggestion since the ISI sequence lies within a potential promoter sequence for glytl".
Acknowledgments We are grateful to William Dowhan for plasmid strains, helpful discussions, and communication of results prior to publication, to Philip Matsumura and Douglas Bartlett for communication of results prior to publication, to Maurille J, Fournierfor providing his manuscript prior to publication, to Walter J. Pagel for editorial consultation, and to Janie Finch and Janet Naquin for assistance in preparing the manuscript, During part of this stud),, S.D. Tucker was a postdoctoral trainee of the U.S. Public Health Service (grant CA-09299 from the National Cancer Institute). Portions of this work were supported by grants to E.J. Murgola from the American Cancer Society (NP-167), the National Institute of General Medical Sciences (GM21499), and the Robert A. Welch Foundation (G-966).
REFERENCES I. Fournier, M.J. & Ozeki, H. (1985) Microbiol. Rev., (in press). 2. Bachmann, B.J. (1983) Microbiol. Rev., 47, 180-230. 3. Hill, C.W., Squires, C. & Carbon, J. (1970) J. MoL BioL, 52, 557-569. 4. Murgola, E.J., Prather, N.E. & Hadley, K.H. (1978) J. Bacteriol., 134, 801-807. 5. Fleck, E.W. & Carbon, J. (1975) J. Bacteriol., 122, 492-501. 6. Ilgen, C., Kirk, L.L. & Carbon, J. (1976) J. BioL Chem., 254, 922-929. 7. Tucker, S.D.~ Gopalakrishnan, A.S., Bollinger, R., Dowhan, W. & Murgola, E.J..(1982) J. BacterioL, 152, 773-779. 8. Ohta, A., Waggoner, K., Radominska-Pyrek, A. & Dowhan, W. (1981) £ BacterioL, 147, 499-562.
1057
9. Maniatis, T., Fritsch, E.F. & Sambrook, J. (1982) in : Molecular cloning (a laboratory manual).
p. 90-94, Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y. 10. Davis, R.W., Botstein, D. & Roth, J.R. (1980) in : Advanced bacterial genetics, p. 182-183, Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y. 11. Maxam, A.M. & Gilbert, W. (1980) Methods in Enzymol., 65, 499-560. 12. Ohtsubo, H. & Ohtsubo, E. (1978) Proe. Natl. Acad. Sci. USA, 75, 615-619. 13. Johnsrud, L. (1979) Molec. Gen. Genet., 169, 213-218. 14. Gopalakrishnan, A.S., Chen, Y.C., Temkin, M. & Dowhan, W. (1985) £ Biol. Chem., (in press). 15. Milkman, R. & Crawford, I.P. (1983) Science, 221, 378-380. 16. Squires, C. & Carbon, J. (1971) Nature New Biol., 233, 274-277. 17. Hawley, D.K. & McClure, R.W. (1983) Nucleic Acids Res., 11, 2237-2255. 18. Rosenberg, M. & Court, D. (1979) Ann. Rev. Genet., 13, 319-353. 19. Siebenlist, U., Simpson, R. & Gilbert, W. (1980) Cell, 20, 269-281. 20. Mandecki, W. & Reznikoff, W.S. (1982) Nucleic Acids Res., 10, 903-912. 21. Jaurin, B., GrundstrOm, T., Edlund, T. & Normark, S. (1981) Nature (London), 290, 221-225. 22. Berman, M.L. & Landy, A. (1979) Proc. Natl. Acad. ScL USA, 76, 4303-4307. 23. Russell, D.R. & Bennett, G.N. (1982) Gene, 20, 231-243. 24. Travers, A.A. (1984) Nucleic Acids Res., 12, 2605-2618. 25. Lamond, A.I. & Travers, A.A. (1985) Cell, 40, 319-326, 26. Mizushima-Sugano, J. & Kaziro, Y. (1985) EMBO J., 4, 1053-1058. 27. Stormo, G.D., Schneider, T.D. & Gold, L.M. (1982) Nucleic Acids Res., I0, 2971-2996. 28. Komeda, Y., Kazuhiro, K. & Iino, T. (1980) Genetics, 94, 277-290. 29. Komeda, Y. (1982) J. Bacteriol., 150, 16-26. 30. Normark, S., BergstrOm, S., Edlund, T., GrundstrOm, T., Jaurin, B., Lindberg, F.P. & Olsson, O. (1983) Ann. Rev. Genet., 17, 499-525, 31. Gentner, N. & Berg, P. (1971) Fed. Proc., 30, 1218.