Gene, 171 (1996) 131-132 © 1996 Elsevier Science B.V. All rights reserved. 0378-1119/96/$15.00
131
GENE 09670
Sequences of the
Salmonella typhimurium mgIA and mglC genes *
(Galactose transport; permease; mgl operon; homologies)
Lola V. Stamm, Natalie R. Young and Jonathan G. Frye** Program in Infectious Diseases, Department of Epidemiology, School of Public Health, University of North Carolina, Chapel Hill, NC 27599, USA Received by J. Wild: 22 November 1995; Accepted: 29 December 1995; Received at publishers: 26 January 1996
SUMMARY
The nucleotide sequences of the mglA and mglC genes of Salmonella typhimurium (St) LT2 have been determined. The deduced amino acid (aa) sequences of MglA and MglC are 506 and 302 aa long with predicted molecular masses of 56 484 and 31 551 Da, respectively. The aa sequences of St MglA and MglC are homologous to the corresponding Mgl proteins of Eseherichia coli, Haemophilus influenzae, Treponema pallidum and Mycoplasma genitalium. The order of the St mgl operon is mglBAC.
In Escherichia coli (Ec), high-affinity galactose transport is accomplished by the interaction of three proteins, MglB (galactose-binding protein), MglA (ATP-binding protein) and MglC (transmembrane pore) (Hogg et al., 1991). The genes encoding these proteins comprise the Ec mgl operon whose order of transcription is mglBAC. MOiler et al. (1985) proposed that the Salmonella typhimurium (St) mgl operon contains a fourth gene (mglE) that is located between mgIA and mglC. Benner-Luger and Boos (1988) determined the nt sequence of St LT2 mglB and the first 132 nt of mglA. Here we report the complete nt sequence of St mglA and mgIC. These genes comprise the remainder of the St mgl operon. Plasmid priG30 contains a 6.3-kb EcoRI St genomic Correspondence to: Dr. L.V. Stamm, Program in Infectious Diseases, Department of Epidemiology, School of Public Health, University of North Carolina, 242 Rosenau Hall, CB#7400, Chapel Hill, NC 27599-7400, USA. Tel. (1-919) 966-3882; Fax('l-919) 966-2089; e-mail:
[email protected] * On request, the authors will supply detailed experimental evidence for the conclusions reached in this Brief Note. ** Present address: Department of Microbiology, University of Georgia, Athens, GA 30605, USA. Tel. (1-706)542-4112. Abbreviations: aa, amino acid(s); bp, base pair(s); Ec, Escherichia coli; Hi, Haemophilus influenzae; kb, kilobase(s) or 1000 bp; Mg, Mycoplasma genitalium; nt, nucleotide(s); ORF, Open reading frame; RBS, ribosomebinding site(s); St, Salmonella typhimurium; Tp, Treponema pallidum.
PII S0378-1119(96)00072-8
DNA fragment that encodes the entire mgl operon (M011er et al., 1985). Analysis of the nt sequence reveals ORF1, which is located 137 nt after the mglB stop codon (TAA) (Fig. 1). ORF1 encodes a 506-aa protein with a calculated M r of 56 484 Da. A putative RBS (GGT) was identified 6 nt upstream from the ATG start codon. The aa sequence of ORF1 contains two ATP-binding motifs (aa46-aa 53, Walker A, and aaHg-aa 133, Walker B) (Walker et al., 1982; Hyde et al., 1990) and a peptide linker region (aal°V-aalXS, ATP-binding protein family signature) (Ames, 1992). The hydrophobicity profile of the protein has no distinct stretches of highly hydrophobic aa, suggesting that it is not an integral membrane protein. The deduced aa sequence of ORF1 is 95, 73.9, 51.4, and 33.3% identical to the MglA proteins of Ec (P23199), Hi (U32764), Tp (L.V.S., unpublished) and Mg (U39691), respectively. Additionally, St MglA has 41 and 40% identity with Ec RbsA (P04983) and AraG (P08531), respectively. The latter proteins are equivalent proteins of the ribose and arabinose transport systems. A second ORF (ORF2) was identified beginning 14 nt after the TAA stop codon of mglA (Fig. 1). ORF2 encodes a 302-aa protein (31551 Da). A putative RBS (AGGGGCT) was identified 4 nt from the ATG start codon. A stem-loop structure typical of Rho-independent terminators (Brendel and Trifonov, 1984) is located
132 ORF1 (mglA} AAGGC CGA C A C ' ~ A T ~ A T T A T ~ G C A C C~q'I~C TC CC~2mTCC~TGAATA~PI'I~ ~ ' I U 4 3 ~ C ~ A~ C~ ~ A ~ C C m C ~ ~ T ~ ~ m ~G~ C ~ A ~ ~A ~AT 1 S0 M G ~ T I S P P S G E Y L L E M R G I N K S F P G V K A L D N V N L N V R P H S I H k L TA~M~ A A A A C C ~ C G C A C , CgCAAA TCAACAI, rA~.I,AAAAq.~,I.I~GGTA ~f.FAC C A A A A A C ~ r C CGC.C ~ A ~ A ~ ~ A ~ C A ~ ~ ~ A ~ ~ A ~ TA C 44 3OO £/ E N G A G K S T L L K C L F G I Y Q K D S G S I V F Q G K E V D F H S A K E A L E N G I S M V H 94 A CCAGGAG~AAACCTGGTA~I'A CAA C G ~ T C A q~aGATAACA ~ I 3 T ~ T C ~ C G ~ A T C C CAC C A A A G G T A T G T F I b ' I C ~ C ~ T ~ C C A ~ A ~ C ~ T A ~ ~ A ~ A C C~C 450 Q E L N L V L Q R S V M D N M W L G R Y P T K G M F V D Q D K M Y Q D T K A I F D E L D I D I D P R 144 GCGCGCGCGTAGGAACG~ATCCGI~T}X~GCAAATGCAGATGATCGAAATI~AAGGCGq'I~C TATAACGC T ~ G A ~ A ~ A ~ ~AC ~ A ~ C ~ A ~ A ~ ~A~ ~ C ~ A ~ A 600 A R V G T L S V S Q M Q M I E I A K A F S Y N A K I V I M D E P T S S L T E K E V N H 5 F T I I R K 194 AG~ K ~ E A A C G C G G C ~ C G G T A T I ~ T I ~ A T A ~ I ~ ' G C A T A A A A 7~aGAAGAAA'r F I'I'ICAA7 " r G ~ f G C G A ~ A C T A ~ C ~ A G ~ ~ ~C~A~ ~ ~ A ~ A ~ A 750 E R G C G I V Y I S H K M E E I F Q L C D E I T I L R D G Q W I A T Q P L E G L D M D K I I A M 244 q~A TGGTCGGGCGq*FC CCTCdkAC C A G C G ~ C CGGATAAAGAAAATAAGCCGG~GACG~A%~I"C~GGTC CGTCACC~ACC TCGC~ ~ G C C ~ A ~ c l t I ~ A ~ A ~ ~ A 900 M V G R S L N Q g F P D K E N K P G D V I L E V R H L T S b R Q P S I R D V S F D L H K G E I L G I 294 T q ~ C G G TC ~ G T " I ~ A A A G C G T A C C G A T A TCGTCGAAACGCT~T~Aqq~G~AGAAGTCG"J~ CGGGAC ~ AC ~ A C A ~ G ~ ~ A ~ C ~ ~ ~ A ~ . I T I .L~ ~ ~ A 1050 A G L V G A K R T D I V E T L F G I R E K S S G T I T L H G K K I N N H T A N E A I N H G F A L V T 344 CGGAAGAGCGTCG ~2C CAC CGGAA'ffI~ACGCC TATC TGGATA~GGA'~ITAAC T C A ' I ~ 7 T I ~ G A A T A T C ~ T ~ ~ T ~ T ~ A ~ A ~ ~ ~ A ~ C ~ ~ A ~ A ~ 1200 E E R R S T G I Y A Y L D I G F N S L I S N I R N Y K N K V G L L D N S R M K S D T Q W V I D S M R 394 C ~ V TAAAAACGC CGGGGCA TCGCA CGCAAA 7 " I ~ C qTPCC GGCGGCAATC A G C A A A ~ A T C A T ~ T C G C q ~ A ~ T A A C G C A ~ C ~ T A ~ ~ C C ~ C ~ ~ ~ ~ 1350 K T P G H R T Q I G E L S G G N O O K V [ ~ G R W L L T O P E I L M L D R p T R G I D v G A K F E 444 A~I 7TSA T C A G C ~ qqF'CGGAAC ~I?~-sCGAAAAAAGGCAAGGGGATCATCA TTA TC T C ~ ' I ~ T G C C G G A G C T G ~ A ~ A ~ A C C ~ T A ~ G ~ ~ ~ ~ A ~ C ~ C ~ 1500 Q L A E L A K K G K G I I I I S S E M P E L L G I T D R I L V M S N G L V S G I V D T K T T T 494 ORF2 {m91C) CGCAAAACG-~'I'rr I u C G " I X ? ~ TIR7~ " r G c & c c ' I ' I T A ~ . G A ' i X 2 A G G G G C T C C ' I ~ A ~ G ~ ~ . r r I-I-IuACC ~ G ~ ~ L 1~1~1.1~ ~ A ~ C 1650 Q N E I L R L A S L H L * 506 M g A L N K K S F L T W L K E G G I Y V V 5 L V L L A I I I F Q 32 AGGA CCCC,AC 7q'FFI~AAGq'FI"GC ~ A A T T T A A G T A A T A TTC T ~ 3 A C G C A A T C T ~ T A C G T A T T A T " ] ' A T C G C C ~ ~ A ~ A ~ A C ~ A ~ C ~ A ~ ~ A ~ T ~ ~ G 1800 D P T F L $ L L N L $ N r L T Q s s v R I I I A L G V A G L I V T Q G T D 5 S A G R Q V G L A A V V 82 T ~ A GGCAACA~'FAC TGC A A T C A A T C ' G A A A A C G C C A A T A A A G ~ T T ~ C C C ~ A A A ~ A C C A ~ C GA~GCGCTC~ATC C T G A T ~ 3 C G C G A ~ ~ ~ ~ A C C ~ C G 1950 A T L L Q s M E N A N K V F P E M A T M P I A L V I L I V C A I G A V I G L V N G I I I A Y L N V 132 TC'ACGCCG~T~CCA CGCR~I~ACGA~A'~ATCG~q~I'A~G~GATCAAC ~ C C~I'~ACTATGA C'Fl~i~'l~C ~ ~ C CA~ C~ ~ " I T FLCCAC~ACA~-r I ~ ' I ~ A ~ I TIL'C 2100 T P F I T T L G T M I I V Y G I N S L Y Y D F V G A S P I S G F D S G F S T F A Q G F V A M G S F R 182 C43CTCTCATA CATCA C CTTqTACGCTC TGA~'£C'C~TAGCG~'I'I~TCTGGG~I~CTGT~GAATAAGACCCGq~TAAAAACA'r r I'I"I C ~ T A ~ C C C ~ ~ ~ T ~ ~ C C ~ 2250 L g Y I T F Y A L I A V A F V W V L W N K T R F G K N I F A I G G N P E A A K V S G V N V A L N L ;, 232 T G A T C 4 % T I T A T G C G C T C T C C G G C G T G T T I T A T G C C 7 ~ D C G G ~ G G ~ T r A C T c ~ G A A G C A G G G C G T A ~ C A C C ~ C ~ C ~ A ~ C ~ A ~ ~ ~ T A ~ A ~ G 2400 /4 1 Y A L S G V F Y A F G G L L E A G R I G S A T N N L G F M Y E L D A I A A C V V G G V S F E G G 282 GA~vTGGGTA CGGTC T ~ C ~ G T~GR~GA C G G C G ~ A TTATC ~'ITACC GTCATCAAC TACGGC C ~ C C T A T A ~ C ~ C C~TA C~ C A ~ A ~ A ~ A ~ A ~ ~ ~ A ~ T A C 2550 G T V F G V V T A S L S L P S S T T A * 302 C ~ C G T A A G A A G T A G T T A T C C AGA C C T A q ~ C ' i ~ A T C A G A T A A C C CGC,CCC A(3CGAAAC TC,CG ~ ~ A ~C ~A~G~ACAG~C~t-I-ITI~ ~r i T I - I ~ A C C ~ ~ T A 2700
Fig. I. The nt sequences of the St mglA and mglC genes and the deduced aa sequeuces. The nt sequence of the pHG30 DNA insert was determined using the Taq DyeDeoxy Terminator cycle sequencing technique. The putative start and stop codons are bolded, the RBS are underlined, and the downstream hairpin loop (Rho-independent terminator) is indicated by facing arrows. The aa sequences corresponding to the ATP-binding sites (aa 46, Walker A and aa Hg, Walker B), and the peptide linker (aa my) of MglA, and the EAA motif (aa ~*~) of MglC are italicized and underlined. GenBank accession No. U40492.
d o w n s t r e a m from O R F 2 . A K y t e - D o o l i t t l e h y d r o p a t h y plot of the p r o t e i n e n c o d e d by O R E 2 i n d i c a t e d that it is highly h y d r o p h o b i c . A l o m a n a l y s i s p r e d i c t e d the presence of at least seven t r a n s m e m b r a n e d o m a i n s . T h e C t e r m i n u s of the p r o t e i n c o n t a i n s a h y d r o p h i l i c s e g m e n t (aa 21s aa 237, E A A m o t i f ) t h a t is c o n s e r v e d in h y d r o p h o b i c m e m b r a n e p r o t e i n s of bacterial b i n d i n g - p r o t e i n d e p e n d e n t t r a n s p o r t systems ( S a u r i n et al., 1994). T h e d e d u c e d aa s e q u e n c e of O R F 2 is 93.7, 74.5, a n d 37.9% identical to the M g l C p r o t e i n s of Ec (P23200), Hi ( U 3 2 7 6 4 ) , a n d Tp (L.V.S., u n p u b l i s h e d ) , respectively. St MglC has 36.5 a n d 34.5% aa i d e n t i t y with Ec R b s C ( P 0 4 9 8 4 ) a n d A r a H (P08532), respectively. We did n o t find a f o u r t h gene (inglE) b e t w e e n St mglA a n d mgIC. T h e nt s e q u e n c e d o w n s t r e a m from St mglC c o n t a i n s a n O R E t h a t e n c o d e s a h o m o l o g of d i h y d r o p y r i m i d i n e d e h y d r o g e n a s e . T h i s O R E is t r a n s c r i b e d in the o p p o s i t e d i r e c t i o n from that of the St mgl o p e r o n . O u r nt s e q u e n c e d a t a indicates t h a t the St mgl o p e r o n consists of three genes in the o r d e r mglBAC.
ACKNOWLEDGEM ENTS We t h a n k W. B o o s for p r i G 3 0 . T h i s research was supp o r t e d by N a t i o n a l I n s t i t u t e s of H e a l t h G r a n t s 5 U O 1 A I 3 1 4 9 6 a n d 2 U O 1 AI31496.
REFERENCES Ames, G.E.-L.: Bacterial periplasmic permeases as model systems for the superfamily of traffic ATPases, including the multidrug resistance protein and the cystic fibrosis transmembrane conductance regulator. Int. Rev. Cytol. 137A (1992) 1-35. Benner-Luger, D. and Boos, W.: The mgIBsequence of Salmonellatyphimurium LT2; promoter analysis by gene fusions and evidence for a divergently oriented gene coding for the mgl repressor. Mol. Gen. Genet. 214 (1988) 579 587. Brendel V. and Trifonov, E.N.: A computer algorithm for testing potential prokaryotic terminators. Nucleic Acids Res. 12 11984) 4411 4427. Hogg, RW.. Voelker, C. and von Carlowitz, I.: Nucleotide sequence and analysis of the mgl operon of Escherichia coli K12. Mol. Gen. Genet. 229 (19911 453 459. Hyde, S.C., Emsley, P., Hartshorn, M.J., Mimmack, M.M., Gileadi, U., Pearce, S.R., Gallagher, M.P., Gill, D.R., Hubbard, R.E. and Higgins, C.F.: Structural model of ATP-binding proteins associated with cystic fibrosis, multidrug resistance and bacterial transport. Nature (London) 346 (1990) 362 365. Mailer, N., Heine, H.-G. and Boos, W.: Characterization of the Salmonella typhimurium mgl operon and its gene products. J. Bacteriol. 163 (1985) 37 45. Saurin, W., Koster, W. and Dassa, E.: Bacterial binding proteindependent permeases: characterization of distinctive signatures for functionally related integral cytoplasmic membrane proteins. Mol. Microbiol. 12 (1994) 993 1004. Walker, J.E., Saraste, M., Runswick, M.J. and Gay, N.J.: Distantly related sequences in the :~- and 13-subunitsof ATP synthase, myosin, kinases and other ATP-requiring enzymes and a common nucleotide binding fold. EMBO J. 1 (1982) 945-951.