Biochimica et BiophysicaActa, 950 (1988) 303-312 Elsevier
303
BBA 91835
M o l e c u l a r c l o n i n g and n u c l e o t i d e s e q u e n c e o f Thermus thermophilus H B 8 t r p E
and t r p G Showbu
S a t o a, Y u k a r i N a k a d a
a, S h i g e n o r i K a n a y a
b and Teruo Tanaka
a
a Mitsubishi-KaseiInstituteof Life Sciences, Tokyo, and b BioscienceLaboratory, Research Center, Mitsubishi ChemicalIndustries, Yokohama (Japan) (Received 5 March 1988)
Key words: trpE; trpG; Nucleotide sequence; Anthranilate synthase; Molecular cloning; (T. thermophilus HB8)
The trpE gene of Thermus thermophilus HB8 was cloned by complementation of an Escherichia coli tryptophan auxotroph. The E. coli harboring the cloned gene produced the anthranilate synthase I, which was heat-stable and enzymatically active at higher temperature. The nucleotide sequence of the trpE gene and its flanking regions was determined. The trpE gene was preceded by an attenuator-like structure and followed by the trpG gene, with a short gap between them. No other gene essential for tryptophan biosynthesis was observed after the trpG gene. The amino-acid sequences of the T. themophilus anthranilate synthase I and II deduced from the nucleotide sequence were compared with those of other organisms.
Introduction Bacteria belonging to genus Thermus proliferate optimally above 7 0 ° C [1]. Some of their genes, for instance, the leuB gene [2-4] and tuf gene [5] of T. thermophilus and the gene encoding malate dehydrogenase of Thermus flaous [6,7] and lactate dehydrogenase of Thermus caldolyticus [8], have been cloned in E. coli and their D N A sequences have been elucidated. However, there are few studies on the regulatory regions of Thermus genes except those for the leucine genes [9]. It would be of interest to study a regulatory gene of Thermus whose D N A is characteristic of high
The sequence data in this paper have been submitted to the EMBL/Genbank Data Libraries under accession number X07744. Abbreviations: AS, anthranilate synthase (EC 4.1.3.27); ORF, open reading frame; RBS, ribosome-binding sequence. Correspondence: S. Sato, Mitsubishi-Kasei Institute of Life Sciences, 11 Minamiooya, Machida-shi, Tokyo, 194, Japan.
guanine plus cytosine content, because regulatory sequences such as a promoter and a transcription terminator are comparatively rich in adenine and thymine in E. coli. The tryptophan operon of E. coli has become one of the best examples of gene structure and function operation [10,11]. Since the regulatory regions of the tryptophan operon precede trpE gene in most bacteria, we have tried to clone the trpE gene of Thermus. In this paper, we present the cloning of the trpE gene of T. thermophilus HB8 and its nucleotide sequence. We also describe the nucleotide sequence of the trpG gene following the trpE gene.
Materials and Methods
Strains, plasmids and media Thermus thermophilus HB8 (ATCC27634) was kindly given by Dr. T. Oshima. E. coli JA221 ( hsdR, trpE5, leuB6, lacy, recA1 ) was kindly provided by Dr. K. N a k a m u r a . E. coli 3110 (trpA9952), E. coli 3110 (trpB9759), E. coli 3110 (trpC9941) and E. coli 3110 (trpD9778) were described by Dr. F. Imamoto. E. coli 3110 (thy)
0167-4781/88/$03.50 © 1988 Elsevier Science Publishers B.V. (Biomedical Division)
304
was used as a control strain of a tryptophan prototroph, pBR322 and pUC13 were used as a cloning vector. LB was used for rich medium, and M9CA was used when an E. coli tryptophan auxotroph harboring the cloned gene was grown. Bactotrypton, yeast extract and casamino acids (vitamin-free) were products of Difco. Drug concentration was 37.5 # g / m l for ampicillin and 12.5 /~g/ml for tetracycline. Rich growth medium and synthetic medium for T. thermophilus HB8 were used as described by Oshima et al. [1] and Tanaka et al. [3], respectively.
DNA isolation and transformation Chromosomal DNA of T. thermophilus HB8 was prepared by the method of Saito and Miura [12] from the cells grown in rich medium at 75 ° C. Plasmid DNAs were isolated by the alkaline lysis method for mini-preparation, and by the SDS/lysozyme method followed by centrifugation in ethidium bromide-CsC1 gradient for large-scale preparation. E. coli was transformed by the CaC12 method [13]. DNA manipulations All restriction enzymes were obtained from Toyobo, Tokyo. Phage T4 DNA ligase, calf intestine phosphatase, phage T4 polynucleotide kinase and E. coli DNA polymerase were obtained from Boehringer-Mannheim Biochemicals. E. coli DNA polymerase large fragment (the Klenow enzyme) was a product of Takara Shuzo, Kyoto. Enzyme reactions and other manipulations were carried out as described by Maniatis et al. [141. Cloning of the T. thermophilus HB8 trpE gene T. thermophilus HB8 chromosomal DNA was partially digested with Sau3AI and fractionated on 10-40% sucrose density gradient. The DNA fragments of 5-10 kb (1 #g) were ligated to BamHI-digested and dephosphorylated pBR322 (1/~g) in 100 #1 ligase buffer with T4 DNA ligase (10 units) at 16°C overnight, and aliquots were directly used to transform E. coli JA221. Trp + transformants were selected on an agar plate of M9CA supplemented with ampicillin.
Enzyme assays Bacteria grown to stationary phase were harvested, washed with 50 mM Tris-HC1 (pH 8.0), suspended in 1/50 vol. 20 mM Tris-HC1 (pH 8.0), 10 mM 2-mercaptoethanol, 5 mM EDTA and lysed with egg-white lysozyme (2 mg/ml). After a brief sonication to reduce the viscosity, the cell debris were removed by centrifugation. The enzymatic activity of anthranilate synthase I and II (ASI and ASII) was assayed according to a slight modification of the procedure described by Egan and Gibson [15]. The ASI activity was measured using chorismate and NHaC1 as a substrate and the ASII activity was determined as ability to form anthranilate in the presence of glutamine and the ASI fraction. The initial rate of anthranilate formation was measured using a Shimazu fluorospectrophotometer RF502. The excitation and emission wavelengths used were 310 nm and 390 nm, respectively. The enzymatic activity capable of forming 1 nmol per min was defined as 1 unit. Anthranilate phophoribosyltransferase was also assayed according to the method by Egan and Gibson [15]. The amount of proteins was determined by Bio-Rad protein assay reagent with bovine globulin as standard. DNA sequencing The plasmid pBW1 was digested with PvulI and self-ligated to construct plasmid pBWll (see Fig. 1). p B W l l was cut with PvulI, ligated with phosphorylated EcoRI linker (Pharmacia) and digested with EcoRI. The fragment containing the T. thermophilus chromosomal DNA fragment of about 3.7 kb was ligated to the EcoRI-digested pUC13 to make p U W l l . 23 deletion plasmids of increasing length were prepared from the p U W l l by the method described by Frischauf et al. [16], using the XbaI site. All the deletion plasmids were digested with XbaI and labeled at the 5' end by using [y-32p]ATP (5000 Ci/mmol, Amersham) and T4 polynucleotide kinase, or at the 3' end by using [a-32 P]dCTP (3000 Ci/mmol, Amersham) and the Klenow enzyme. Each labeled DNA was digested with PstI, gel-filtered on a small column of Sephadex G-50 to remove the small labeled fragment and subjected to the method of Maxam and Gilbert [17]. The electrophoresis was carried out on 8% acrylamide in 8.3 M urea.
305
philus HB8 and E. coil JA221 [pBW1] showed similar temperature dependency, which was a m a x i m u m around 60 ° C and remained still high around 8 0 ° C . On the other hand, E. coil W3110 (thy) had little activity above 6 0 ° C (Fig. 2). The crude extract of the cells was heated at 70 ° C for 10 rain and the supernatant was assayed for ASI activity after centrifugation. The ASI activity of the E. coli JA221 [pBW1] remained 90% of its original activity after the heat treatment and the specific activity increased 15-fold due to the precipitation of most of E. coil cell proteins. On the other hand, the enzyme activity of the crude extract of E. coli W3110 (thy) was completely lost by the heat treatment. The above results confirm that the trpE gene of T. thermophilus was cloned in pBW1 and that its gene product was expressed in the E. coil. There was no activity of anthranilate phosphoribosyl transferase in the heat-treated extract of E. coil JA221 [pBW1]. The thermostable ASI was purified in a homogeneous state from the cells of E. coli JA221 [pBW1] by the procedure including heat treatment, a m m o n i u m sulfate fractionation, DEAE-cellulose, phenyl-Sepharose and gel filtration on Sephacryl S-300 (the detailed p u r i f i c a t i o n p r o c e d u r e will be p u b l i s h e d elsewhere). The molecular mass of the ASI was
Results and Discussion
Isolation of T. thermophilus HB8 trpE gene Two trp + A m p r transformants were isolated out of about 104 AmprTet s transformants. Plasmid D N A from the two transformants gave identical banding patterns on agarose electrophoresis after digestion with some restriction enzymes. The plasmid was designated pBW1, which had an insert of an about 5.2 kb fragment. When pBW1 was allowed to re-transform E. coli JA221, all the A m p r transformants could grow on an M 9 C A / A m p plate. The restriction map of p B W l is shown in Fig. 1. A smaller plasmid ( p B W l l ) was constructed by self-ligation of the large fragment of the PvulI-digested pBW1. It was capable of complementing E. coli JA221 in M9CA. The pBW1 was unable to complement other tryptophan auxotrophs, E. coil 3110 (trpA9952), 3110 (trpB9759), 3110 (trpC9941) and 3110 (trpD9778). Characterization of the gene product Anthranilate synthase I (ASI) activity was assayed for the crude extract of the cells of E. coli JA221 harboring pBW1 (E. coli JA221 [pBW1]), compared with that of E. coli W3110 (thy) and T. thermophilus HB8. The ASI activity of T. thermo-
pBWl Er,o R I . . ~ BamHI~. I, I i~ , I I~ind m ~ 5acl
I
Pvull
I,
PsII
I
I I
I
Bglll
I
1
I I
I
Sail
I
I
Pvull I
I
I
PstI
I,,_ I
Xhol
pBW11 EcoRI ~
BamHI ~.
li
Pvull
I
I PstI
~tind II1/'
Sac I Im
I
,
I I, , , ,~
Xhol Fig. 1. Restriction maps of pBW1 and pBWll, pBW1 has about 5.2 kb inserted DNA at the BamHI site of pBR322. One of the BamHI sites between pBR322 and the insert was abolished, pBW11 was constructed by self-ligation of the PouII large fragment of pBW1. Thin and bold lines indicate DNA from pBR322 and the chromosomal DNA fragment of T. thermophilus HB8, respectively.
306
The nucleotide sequence of the T. thermophilus HB8 DNA cloned in pB W11
100
£
://
i
°
........
[]
I
I
I
40
60
80
Temp ('C) Fig. 2. Temperature dependency of the enzymatic activity of anthranilate synthase I. The cells of T. thermophilus HB8 were grown in the synthetic medium devoid of tryptophan and those of E. coli JA221 [pBW1] in M 9 C A medium supplemented with ampicilin. E. coli W3110 (thy) grown in M9CA supplemented with thymidine was used as a source of the ASI of E. coli. The enzymatic activity of the crude extract was measured at different temperatures; ( o o), E. coli JA221 [pBW1]; (zx . . . . . zx), 7". thermophilus HB8; ([3 . . . . . . D), E. coli W3110 (thy). The enzymatic activity of the crude extract of E. cell JA221 [pBWl], T. thermophilus HB8 and E. coli W3110 (thy) was 0.13, 0.05 and 0.64 u n i t / m g protein, respectively, measured at 37 ° C.
about 110 k D a as determined by gel filtration on Sephacryl S-300 and around 50 k D a as determined by SDS-polyacrylamide gel electrophoresis (SDS-PAGE). The heat-stable anthranilate synthase II activity accompanied by ASI was separated from ASI at the phenyl-Sepharose step. It had molecular weight of about 20 k D a on SDSPAGE. The partial amino-acid sequence of the amino-terminal region of the purified ASI was determined to be Met-X-X-Ile-Arg-Pro-Tyr-ArgLys by Edman degradation on a gas-phase automated sequencer. The carboxy-terminal amino acid was leucine, determined by amino-acid analysis of the carboxypeptidase A digests of the intact protein.
F r o m the analysis on both strands of the 23 deletion plasmids of p U W l l , the nucleotide sequence was determined over 3708 bp of all the T. thermophilus HB8 D N A cloned on p U W l l . The distribution of initiation and termination codons for both strands are shown in Fig. 3. The longest open reading frame (ORF) is 1386 bp in frame 1, which extends from 169 to 1554. This O R F has the potential for encoding a protein of 462 amino-acid residues, which corresponds to the molecular mass of the thermostable ASI (50 kDa). With a gap of 45 bp, this O R F (trpE gene) is followed by 612 bp ORF, which was concluded to be trpG, since its translated amino-acid sequence had homology with the known amino-acid sequence of the trpG product of Serratia marscence (see below) and since ASII partially purified from the lysate of E. coli JA221[BW1] had a molecular mass of about 20 kDa. There was no O R F longer than 300 nucleotides in the downstream region of the trpG. N o other tryptophan gene seems to be linked to the trpE and trpG. The organization of the tryptophan genes of Thermus is different from that of the enteric bacteria and seems to be a G r o u p I Pseudomonad or Rhizobium type [18]. The nucleotide sequence of the trpE and trpG genes and their flanking regions and the deduced amino-acid sequence of ASI and ASII are shown in Fig. 4. The partial amino-acid sequence of the N-terminal region and the C-terminal amino-acid of the ASI were as expected from the nucleotide sequence. The sequence G G G G G located 5 bases upstream from the initiation codon is probably a ribosome-binding sequence (RBS) for the trpE. The sequence encoding a putative leader peptide was found between 97 to 65 bases upstream from the initiation codon of the trpE. It consists of 11 amino acids, including two tandem tryptophan residues. On a survey of the sequence upstream of the putative leader peptide, the sequences 35 and 10 bases upstream of its initiation codon seem to be most similar to the canonical E. coli promoter [19]. The - 35 and - 10 sequences of this putative promoter are T G G A C A and TATCCT, respectively. They could presumably function as a promoter for E. coli R N A polymerase. However, if they function as a promoter, the transcript is
307
A) I
Frame 1 l
Frame 3
II1|
600
900
1200
I
I
I
I
1 II
I
Frame 2
300 I
II [
II
II I
i
II
2700
I
T
I
III
II
|
I
I
III
II
I11
T
300
600
I
I [ II
Frame 1
I I
Frame 3
[11
2100
I
i i i l l f li i
,,
,
I
I
2700 i11
r
I
II
I
I
IA|
II
I
II1
I
I ]I
I
I
] !
RI
I
I
I
,
II
III
2100
III II II
I, II,,
] ] I
,
I
3600
I
I1
I
1000 I I,
v
II
,
I
I
3300
I
i II I
1
I
II
II
1800
l
3000
I
ii i i i
|
I II I
I
III
,
I
1
1200
I
l
III
I ir
I
t i
II
l i
I
I
2400
L
I
II I
900
L
I
II I
]
I
II ,
Frame 2
I
!
3600 I
I
I
II
! I
II
I
II
Is
I I I
3300
I
I
a ilia
Illu
II I
I
3000
I
AI LIIA II J Ill
B)
3
I
aH I
Frame 3
Frame
I
Ill
2100
I
I
Frame 2
2
[
I
2400
Frame 1
Frame
I
I
2100
Frame 1
I
1800
lS OO
I , I, ,
i
Fig. 3. Location of initiation and termination codons in the T. thermophilus HB8 D N A fragment (3708 bp) on p B W l l . The vertical lines above and under the bold lines indicate initiation codons ( A T G or G T G ) and termination ones (TAA, T A G or TGA), respectively. A strand in the same direction as in Fig. 1 (A) and its complementary strand (B).
expected to start at the A of the initiation codon of the putative leader peptide and if it is translated, translation must be started without an RBS. It is of interest to determine the starting point of the transcription and to evidence the translation of the leader peptide. Recently, Croft et al. [9] proposed the nucleotide sequence of the putative control region for leuB, leuC and leuD genes of the same organisms. The transcript for the leucine genes is expected to start at the A of the initiation codon of the putative leader peptide (15 amino acids), similarly to 5'-flanking region of the trpE gene. Between the sequence encoding the putative leader peptide and the initiation codon of trpE gene, there are three repeats of the sequence, C C C C G G G G , and the last C C C C G G G G was followed by T-rich region, T T T G T I W I q ' . These structures may allow the transcript leading the trpE to form alternative hairpin structures and to function as an attenuator, as already demonstrated in the leader transcript of the tryptophan operon of E. coli. The trpE is followed by trpG which starts with A T G at either 1603 or 1636 and
extends to 2214. It is uncertain at present whether the trpG starts from the first or the second ATG, since protein sequence data for trpG are not available. A typical RBS of E. cola precedes the first A T G with the spacing of 10 nucleotides, while a purine cluster of 14 nucleotides exists two bases upstream from the second ATG. In any case, there is a gap between the trpE and trpG genes, which is in contrast with the juncture between trpE and the following gene observed in other bacteria. We tentatively assumed that the trpG gene starts at the first ATG.
Codon usage of the trpE and trpG The guanine plus cytosine (G + C) contents of the coding regions of trpE and trpG are 69.1% and 68.0%, respectively, which are much the same as that of the T. thermophilus HB8 whole genome (69%). The codon frequencies of the trpE and trpG of T. therrnophilus HB8 are shown in Table I. The codon usage is much biased. Out of 61 sense codons, 16 and 18 codons are not used in the trpE and trpG, respectively. The G + C contents of the
308
GGATCCGGGCCCT~G~GGG~GGCCCC~TAGCC~CTGGACA~G~C~GTGTCCCG~A~CC~GAGGCCATG~CCC~TCCC~CC~C~ -35 -10 M A L P S A L ~rpL 180
TCTGGTGGCCCGGCTAGGCCCCG$GGCGGGAGGCCT~TCCCC~GGCACACCCCGGGGCTTTGTTTTTGGGGGACGGCATGGAGCGGATC WWPG-
~
~
MERI
t ~rpE270 ~GA~TTA~CAAAA~CTT~T~G~GGA~TGGAGACC~CGGTGAC~G~TACCTGAAGCTTGC~GAGAAGGCTCCGGTGAGCTTCCTT R P Y R K T F L A O L E T P V T A Y L K L A E K A p V S F L 360 TTGGAGTcGGTGGAG•GGGGGCGc•AAAG••G•TT•T••AT•GT•G•GGTGGGGG•G•GG•G•A•cTT••G•cTGAAGGACGGGGT•TT• L E S V E R G R 0 S R F S I V G V G A R R T F R L K D G V F 450 ACGGTGAACGGGGAGCGGGTGGAAACCCGTGATCCCTTGCGCGCCCTCTACGAGAGGGTCTACGCCCCCTTGGAGCGCCACCCCGACCTC T V N G E R V E T R D P L R A L Y E R V Y A P L E R H P D L 540 cCCCCCTTCTTCGGCGGGGTGGTGGGCTACGCCGCCTACGACCTCGTCCGCTACTACGAAAGGCTTCCGAGCCTCAAGCCCGACGACCTC P P F F G G V V G Y A A Y D L V R Y Y E R L P S L K P D D L 630 GGCCTCCCCGACCTCCTCTTCGTGGAGCCCGAGGTGGTGGCCGTCTTTGACCACCTGAAGAAC•TCCTCCACCTCGTGGCCCCAGGGAGG G L P D L L F V E P E V V A V F D H L K N L L H L V A P G R 720 GACCCCGAGGAGGCGGAGGCCCGcCTCTTTTGGGcGGAGAGGCGGCTCAAGGGCCcCTTGCCCGGGGTGCCGGGGGAGAGGGCGGGGGGG D P E E A E A R L F W A E R R L K G P L P G V P G E R A G G 810 AGGGCCCGCTTCCAGGCGGA•TTTTC•CGGGAGGCCTACCTGGAGGCGGTGAGGAGGGCCCTGGA•TACATCCGGGCGGGGGACATCTTC R A R F 0 A D F S R E A Y L E A V R R A L D y [ R A G O [ F 900 CAGGTGGTCCTCT~CTTGAGGCTCTCCTCCCCCCTCA~CGTCCACCCCTTCGCCCTCTACCGG~GCTGAGGAGCGTGAACCCGAGCCCC 0 V V L S L R L S S P L T V H P F A L Y R A L R S V N P S P 990 TACATGGGCTACCTGGACCTGGGGGAGGTGGTCTTGGTCTCGGCGAGCCcGGAAAGCCTCCTCCGCTCGGACGGCCGAAGGGTGGTCAC• Y M G Y L D L G E V V L V S A S P E S L L R S D G R R V V T 1080
CGG~CCAT~G~GGGCA~GAGG~CGAGGG~GAAG~A~GAG~AGGAGGACAAAAGG~TTG~GAGGAGCT~CTTAGGGACGAGAAGGAGGT~ R P
I
A G T R P R G K D E E E D K R L
A E E L L
R D E K E V 1170 GCGGAGCACGTGATGCTTCTGGACCTCTCCCGCAACGACATCGGCCGGGTCGcCGCCTTCGGCACGGTGCGGGTCCTCGAGCCCCTCCAC A E H V M L L D L S R N D [ G R V A A F G T V R V L E P L H 1260 •TG•AGCACTACTC•CA•GTGATGCACCTG•TC•••ACGG•G•AGGG•AT•T•GG••GAGGGGAAGACCCCC•TGGA•G•••TGG••AG• V E H Y S H V M H L V S T V E G I L A E G K T P L D A L A S 1350 GTGCTGC~CATGGGGACGGT~T~GGGGCCC~G~GATCC~;~CAT~GAGA~ATT~AAGAACTGG~GCCCCACCGCCG~G~GCCCTAC V L P M G T V S G A P K I R A M E l I E E L E P H R R G P Y 1440
~GG~G~G~TT~GGCTACCTCGCCTA~GA~SG~GCCATGGACATGGCC~TcAC~T~CGCACCT~CGT~GTGGCGAAGGG~TG3ATGCAC G G S F G Y L A Y D G A M D M A L T L R T F V V A K G W M H 1530
~TCCA~C~GCG~AT~GTG~G~T~GGTGC~G=~"~TACGAGGA~TGCTG~AACAA~GCGCG~`~CTCC~CAAGGCG V 0 A G A G I V A D S V P E R E Y E E C W N K A R A L L K A 1620
~T6GAgATG~GGAGG~GGCTGTGATCCCACC~ATGCCGG~AGg~GC~CGGTAAGGAGG~CTGGTAGGCATGGCTGCTAACGGAGCG V E M A E A G L M A A N G A ~:rpG 1710 ~G~G~A~AGGTTATGAGGGTCTT~GTGGTGGAC~AC~A~A~CTTCAC~TAC~AC~TGGTGCAGTA~CTGG~GAGCT~gSG~G K G R K V M R V L V V D N Y D S F T Y N L V 0 Y L G E L G A 1800
GAGCCCAT•G•GTGGCGGAA•GAC•GCTTCC•GCTGGAGGAGGTGGAG•CCCTGGACCCGGACCGGATCCTCATCAGCCCGGGGCC•TGC E P [ V W R N D R F R L E E V E A L D P D R ] L I S P G P C 1890
A•CC••TTTGAGGCGGGGCTTTCCGTCCCCTTGGTCCAGCGCTACGCCC•CCGCTACCC•ATCCTGGGGGTCTGC•TCGGACACCAGGCC T P F E A G L S V P L V 0 R Y A P R Y P i L G V C L G H 0 A Fig. 4. See facing page.
309
1980
ATCGGGG~CTTC~GGGG~GGTGGT~CCG~CC~TCCTCATGC~GGC~GGTGA~CCCATCCACCACGAC~CACCGGGGTC I
G A A F G G K V V P A P V L M H G K V S P
I
H H D G T G V 2070
TTCCGGGG~T~ATA~CCCTTCCCCGC~CCCG~TACcACTC~TGGCGGTGGTGGAGGTGCCGGAGG~TCGTGGTGAACGCCTGG F R G L D S P F P A T R Y H S L A V V E V P E A L V V N A W 216o
~GGAGGAG~GGGGGGGCGGACGGTGATGGG~TTCCGCCACC~GACT~CCCACCCACGGGGTGCAGTT~A~CCGGAAAGCTACCTT A E E A G G R T V M G F R H R O Y P T H G V 0 F H P E S Y L 2250
ACGGAGGCGGGT~ACTCATCCTCAAGAACTTCCTGGAGGA~CCATGGACGCGGTG~GAAGGCCATTCTGGGCGAGGTTTTGGAGGAAG T E A G K L I
L K N F L E D P W T R 2340
AGGAGGCCTACGAGGTCATGCGGGCCCTGATGGCGGGGGAGGTcT•CCCGGTGCGGGCGGCGGGGCTT•TGGTGGCCT•GAGCCTGAGGG Fig. 4. Nucleotide sequence of the trpE and trpG genes and their deduced amino-acid sequences. Numbering of the nucleotides starts from the BamHI site at the junction between pBR322 and T. thermophilus DNA. The lines under the nucleotide sequence indicate putative - 3 5 and - 1 0 regions of the promoter. Inverted repeats in the region upstream from trpE are indicated by converged arrows,
first, t h e s e c o n d a n d t h e t h i r d l e t t e r in t h e trpE g e n e a r e 68.9% 45.4% a n d 93.0%, r e s p e c t i v e l y a n d t h o s e in t h e trpG are 68.5%, 43.6% a n d 94.3%, r e s p e c t i v e l y . K a g a w a et al. [4] i n d i c a t e d h i g h G + C c o n t e n t in the t h i r d l e t t e r o f c o d o n s in leuB gene of the same organism. The G + C content of t h e t h i r d l e t t e r in t h e trpE and trpG g e n e s is h i g h e r b y 4 - 5 % t h a n t h a t in leuB gene, w h i l e t h a t o f t h e s e c o n d l e t t e r is l o w e r b y 4 - 5 % t h a n t h a t in t h e leuB gene. T h e d i f f e r e n c e o f t h e c o d o n u s a g e w a s f o u n d also b e t w e e n trpE and trpG. A G G ( A r g )
a n d G G C ( G l y ) o c c u p y 35% a n d 30% o f s y n o n y m o u s c o d o n s , r e s p e c t i v e l y , in t h e trpE, b u t 7% a n d 10%, r e s p e c t i v e l y , in t h e trpG gene.
The amino-acid sequence of the anthranilate synthase I and H (ASI and ASII) of T. thermophilus HB8 T h e a m i n o - a c i d s e q u e n c e o f A S I o f T. thermophilus d e d u c e d f r o m t h e n u c l e o t i d e s e q u e n c e was a l i g n e d t o g e t h e r w i t h t h a t o f E. coli [20] a n d B. subtilis [21] f o r m a x i m a l h o m o l o g y (Fig. 5A). T h e
TABLE I CODON USAGE OF ANTHRANILATE SYNTHASES I AND II (ASI AND ASII) OF THERMUS THERMOPHILUS HB8 t, termination codons.
U U U Phe UUC Phe UUA Leu UUG Leu CUU Leu CUC Leu CUA Leu CUG Leu AUU lie AUC Ile AUA lle AUG Met GUU Val GUC Val GUA VA1 GUG Val
ASI
ASII
3 14 0 7 6 28 0 17 1 10 0 10 0 15 0 30
1 8 0 2 2 7 1 7 0 7 0 4 1 7 0 14
UCU UCC UCA UCG CCU CCC CCA CCG ACU ACC ACA ACG GCU GCC GCA GCG
Ser Ser Ser Ser Pro Pro Pro Pro Thr Thr Thr Thr Ala Ala Ala Ala
ASI
ASII
0 9 0 4 1 20 1 9 0 10 0 5 1 24 0 21
0 2 0 0 1 11 1 4 0 5 0 3 2 8 0 8
UAU UAC UAA UAG CAU CAC CAA CAG AAU AAC AAA AAG GAU GAC GAA GAG
Tyr Tyr t t His His Gin Gin Asn Asn Lys Lys Asp Asp Glu Glu
ASI
ASII
0 18 0 0 0 11 1 3 0 5 2 13 1 23 5 38
0 8 0 0 0 8 0 4 0 6 1 5 1 8 1 12
UGU UGC UGA UGG CGU CGC CGA CGG AGU AGC AGA AGG GGU (3GC GGA GGG
Cys Cys t Trp Arg Arg
Arg Arg Ser Ser Arg Arg Gly Gly Gly Gly
ASI
ASII
0 1 1 3 1 14 2 14 0 9 0 16 0 11 1 24
0 2 1 3 0 5 0 7 0 5 1 1 1 3 2 14
310
A, 20
E.coli , B.subtilis
40
MERIRPYRKTFLADLETPVTAYLKLAEKAPVSFLLESVE .......
T.thermophilus
MQTQKPTLELLTCEGAYRDNPTALFHQLCGDRPATLLLESADIDSKDDL MNFQSNISAFLEDSLSHHTIPIVETFTVDTLTPIQMIEKLDR 60
80
100
"RGRQSRFSIVGVGARR-TFRLKDGVFTVNGE--RVETR-DPL-RALYERVYAPL KSLLLVDS--ALRITA-LGDTVTIQALSGNGEALLALLDNALPAGVESEQSPNCRVLRFPPVSPL EITYLLESKDDTSTWSRYSFIGLNPFL-TIKEEQGRFSAADQDSKSLYTGNELKEVLNWMNTTYK 120
140
160
ERHPDLP-PFFGGVVGYAA
YDLVRYYERLPSLKPDDLGLPDLLF
LDEDARLCSLSVFDAFRLLQNLLNVPKEEREAMFFSGLFSYDLVAGFEDLPQLSAENNC-PDFCF IKTPELGIPFVGGAVGYLSYDMIPLIEPSVPSHTKETDMEKCMLFVCRTLIAYDHETKN-VHFIQ •1 8 0
200
220
240
VEPEVVAVFDHLKNLLHLVAPGRDPEEAEA-RLFWAERRLK
-GPT,PGVPGERAGGRARF
YLAETLMVIDHQKKSTRIQASLFAPNEEEKQRLTARLNELRQQLTEAAPPLPVV YARLTGEETKNEKMDVFHQNHL ...... ELQNLIEKMMDQKNIKELFLSADSYK 260
QADFSREAYLEAVRRALDY
280
IRAGDIFQVVLSLRLS
.... S V P H M R C .... T P S F E T V 300
S PLTVHPFALYRALRSVNPSPYMGYLDLGE
ECNQSDEEFGGVVRLLQKAIRAGEI FQVVPSRRFSLPCPSPLAA- YYVLKKSNPSPYMFFMQDND SSNYEKSAFMADVEKI KSY IKAGDI FQGVLSQKFEVP IKADAFELYRVLR IVNPSPYMYYMKLLD 320
340
¢VLVSASPEST, LRSDGRRVVT--RP FTLFGASPESSLKYDATSRQIE REIVGSSPERLIHVQDGHLE-380
IAGTRPRGK
360
DEEEDKRLAE~-LLRDEKEVA~.HVML
I YP I A G T R P R G R R A D G S L D R D L D S R I E L E M R T D H K E L S E H L M L IHPIAGTRKRGA ...... DKAEDERLKVELMKDEKEKAEHYML 400
LDLSRNDIGRVAAFGTVRVLEPLHVW.HYSHVMHLVSTVEG
420
ILAEGKTPLDAL- - -ASVLPMGTVS
VDLARNDLAR ICTPGSRYVADLTKVDRYSYVMHLVSRVVGELRHD- -LDALHAYRACMNMGTLS VDLARNDIGRVAEYGSVSVPEFTKIVSFSHVMHI ISVVTGRLKKG- - -VHPVDALMSAFPAGTLT 440
460
480
GAPKIRAME I IEELEPHRRGPYGGSFGYLAYDGAMDMALTLRTFVVAKGWMHVQAGAG
500
IVADSVP
GAPKVRAMQLIAEAEGRRRGSYGGAVGYFTAHGDLDTCIVIRSALVENGIATVQAGAGVVLDSVP GAPKIRAMQLLQELEPTPRETYGGCIAYIGFDGNIDSCITIRTMSVKNGVASIQAGAGIVADSVP Fig. 5A. See facing page.
311
B
20
T.thermophilus
MADILLLDNVDSFTYNLVDQLRASGHQVVIYRNQIGA MADILLLDNIDSFTYNLADQLRSNGHNVVIYRNDIPA MADILLLDNIDSFTWNLADQLRSNGHNVVIYRNDIPA
S.marcescens
E.coli S.typhimurium S.cerevisiae
40
MAANGAKGRKVMRVLVVDNCDSFTYNLVQYLGELG~PI_VWRNDRFR
MSVHAATNPINKHWLIDNYDSFTWNVYEYLCQEGAKVSVYRNDAIT
60 80 , 100 L EEV E AL .... D P DR I L I S PG PCTP F EAG LSVP L VQR Y A- P R YP I LGVCLGHQA IGAAFGGKVVP
EVIIERLQHMEQPV-LMLSPGPGTPSEAGCMPELLQRLR-GQLPIIGICLGHQAIVEAYGGQVGQ QTLIERLATMSNPV-LMLSPGPGVPSEAGCMPELLTRLT-GKLPIIGICLGAQAIVEAYGGYVGQ QTLIDRLATMKNPV-LMLSPGPGVPSEAGCMPELLTRLR-GKLPIIGICLGHQAIVEAYGGYVGQ VPEIAAL .... NPDTLLISPGPGHPKTDSGISRDCIRYFTGKIPVFGICMGQQCMFDVFGGEVAY 120 140 160 APVLMHGKVSPIHHDGTGVFRGLDSPFPATRYHSLAVl-VEVPEALVVNAWAEAAGGRTVMGFRH AGEILHGKASAIAHDGEGMFAGMANPLPVARYHSLVG--SNIPADLTVNAR-FGEM---VMAVRD AGEILHGKASSIEHDGOAMFAGLTNPLPVARYHSLVG--SNIPAGLTINAH-FNGM---VMAVRH AGEILHGKASSIEHDGOAMFAGLANPLPVARYHSLVG--SNVPAGLTINAH-FNGM---VMAVRH AGEIVHGKTSPISHDNCGIFRNVPOGIAVTRYHSLAYTESSLPSCLKVTASTENGI---IMGVRH .180
,
200
RDYPTHgVQFHPESYLTEAGKLiLKNFLEDPW~'R DRRRVCGFQFHPESILTTHGARLLEQTLAWALAK DADRVCGFQFHPESILTTHQARLLEQTLAWALQH/ DADRVCGFQFHPESILTTHQARLLEQTLAWALAK/ KKYTVEGVQFHPESILTEEGHLMIRNIPNVSGGT/ Fig. 5. Comparison of the amino-acid sequences of anthranilate synthase I and II (ASI and ASII) of T. thermophilus with those of other organisms. (A) Comparison of T. thermophilus ASI with E. coli [20] and B. subtilis [21] enzymes. Amino-acid residues matching with both and either of them are doubly and singly underlined, respectively. (B) Comparison of the trpG product of T. thermophilus with that of S. marcescence [23], and the N-terminal one-third of trpD product of E. coli [23] and S. typhimuriurn [23] and that of trp3 product of S. cerevisiae [24]. Amino-acid residues matching with all the four and with either of them are doubly and singly underlined, respectively.The asterisks indicate the residues essential for the enzymatic activity [25,26].
ASI of T. thermophilus is 5 0 - 6 0 residues less t h a n that of E. coli a n d B. subtilis. T h e deletion m a i n l y occurs in the first half of the sequence: O u t of the 462 residues of the T. thermophilus ASI, a b o u t 160 are identical with the E. coli e n z y m e or the B.
subtilis enzyme. T h e conserved a m i n o - a c i d residues over the three e n z y m e s a m o u n t to 103 a n d m o s t of t h e m are d i s t r i b u t e d in the second half of the molecules. Tso a n d Z a l k i n [22] suggested from the chemical m o d i f i c a t i o n studies o n the S.
312
marscecens e n z y m e that h i s t i d i n e a n d a r g i n i n e residues were involved in its e n z y m a t i c activity. T w o histidine a n d nine arginine residues are f o u n d c o m m o n l y in the three enzymes (His-381 a n d -417; Arg-349, -351, -366, -389, -395, -465, -468 a n d -491). T h e y also suggested t h a t o n e of cysteine residues was necessary for the e n z y m a t i c activity. However, there is n o cysteine residue c o n s e r v e d over the three sequences. T h e T. thermophilus A S I has 1 cysteine, 4 g l u t a m i n e s a n d 5 asparagines, while the E. coli e n z y m e has 12 cysteines, 21 g l u t a m i n e s a n d 17 a s p a r a g i n e s a n d the B. subtilis e n z y m e has 5 cysteines, 15 g l u t a m i n e s a n d 17 asparagines. T h e low c o n t e n t of such residues is o n e of the features of e n z y m e s f r o m e x t r e m e l y t h e r m o p h i l i c bacteria. T h e a m i n o - a c i d sequence of A S I I o f T. thermophilus is also c o m p a r e d with A S I I o f S. marcescens [23], a n d the N - t e r m i n a l o n e - t h i r d o f the trpD p r o d u c t of E. coli [20] a n d S. typhirinium [23] a n d t h a t o f the trp3 gene p r o d u c t of S. cerevisiae [24]. A s shown in Fig. 5B, the sequence of T. thermophilus A S I I has s o m e h o m o l o g y to that of the o t h e r enzymes. T h e h o m o l o g o u s residues are d i s t r i b u t e d over the w h o l e molecule. Cys-84 a n d His-170, which were d e m o n s t r a t e d b y the site-directed m u t a g e n e s i s to b e essential for the e n z y m a t i c activity of the S. marcescens enz y m e [25,26], also exist in the T. thermophilus enzyme.
Acknowledgements T h e a u t h o r s wish to express their t h a n k s to Dr. T. U c h i d a for h e l p f u l discussion d u r i n g this work. T h e w o r k was s u p p o r t e d b y a special f u n d o f the Science a n d T e c h n o l o g y A g e n c y , J a p a n .
References 10shima, T. and Imahofi, K. (1974) Int. J. Syst. Bacteriol. 24, 102-112. 2 Nagahari, K., Koshikawa. and Sakaguchi, K. (1980) Gene 10, 137-145.
3 Tanaka, T., Kawano, K. and Oshima, T. (1981) J. Biochem. (Tokyo) 89, 677-582. 4 Kagawa, Y., Nojima, H., Nukikawa, N., Ishizuka, M., Nakajima, T., Yasuhara, T., Tanaka, T. and Oshima, T. (1984) J. Biol. Chem. 259, 2956-2960. 5 Kushiro, A., Shimizu, M. and Tomita, K. (1987) Eur. J. Biochem. 170, 93-98. 6 Iijima, S., Uozumi, T. and Beppu, T. (1986) Agric. Biol. Chem. 50, 589-592. 7 Nishiyama, M., Matsubara, N., Yamamoto, K., lijima, S., Uozumi, T. and Beppu, T. (1986) J. Biol. Chem. 261, 14178-14183. 8 Kunai, K., Machida, M., Matsuzawa, H. and Ohta, T. (1986) Eur. J. Biochem. 160, 433-440. 9 Croft, J.E.~ Love, D.R. and Bergquist, P.L. (1987) Mol. Gen. Genet. 210, 490-497. 10 Kolter, R. and Yanofsky, C. (1982) Annu. Rev. Genet. 16, 113-134. 11 Platt, T. (1983) Cell 24, 10-23. 12 Saito, H. and Miura, K. (1963) Biochim. Biophys. Acta 72, 619-629. 13 Davis, L.G., Dibner, M.D. and Battey, J.F. (1986) Methods in Molecular Biology, Elsevier, Amsterdam. 14 Maniatis, T., Frisch, E.F. and Sambrook, J. (1982) Molecular Cloning: A Laboratory Mannual, Cold Spring Harbor Laboratory, NY. 15 Egan, A.F. and Gibson, F. (1970) Methods Enzymol. 17A, 380-386. 16 Frischauf, A.M., Garoff, H. and Lehrach, H. (1980) Nucleic Acids Res. 8, 5541-5549. 17 Maxam, A.M. and Gilbert, W. (1980) Methods Enzymol. 65, 499-560. 18 Crawford, I.P. (1987) Methods Enzymol. 142, 293-306. 19 Hawley, D.K. and McClure, W.R. (1983) Nucleic Acids Res. 11, 2237-2255. 20 Nichols, B.P., Van Cleemut, M. and Yanofsky, C. (1981) J. Mol. Biol. 146, 46-54. 21 Henner, D.J., Band, L. and Shimotsu, H. (1984) Gene 34, 169-177. 22 Tso, J.K. and Zalkin, H. (1981) J. Biol. Chem. 256, 9901-9908. 23 Nichols, B.P., Mizzari, G.F., Van Cleemut, M., Bennett, G.N. and Yanofsky, C. (1980) J. Mol. Biol. 142, 503-517. 24 Zalkin, H., Paluh, J.L., Van Cleemut, M., Moye, W.S. and Yanofsky, C. (1984) J. Biol. Chem. 259, 3985-3992. 25 Paluh, J.L. Zalkin, H., Betsch, D. and Weith, H.L. (1985) J. Biol. Chem. 260, 1889-1894. 26 Amuro, N., Paluh, J.L. and Zalkin, H. (1985) J. Biol. Chem. 260, 14844-14849.