J. MoZ. Biol.
(1982) 156, 245-256
Nucleotide Sequence of the trpD Gene, encoding Anthranilate Synthetase Component II of Escherichiu HEIDI
HOROWITZ,
lkpartment
coli
GAIL E. CHRIsTIEt AND TERRY WATT
of Molecular Biophysics and Biochemistry Yule lmiversity, 233 Cedar Street Xew Havm, f'onn. 06510. TJ.8.A. (Received
24 August
1981)
We have completed the nucleotide sequence determination of tr@, the second structural gene of the tryptophan (trp) operon in Escherichia coli, which encodes anthranilate synthetase component II. This bifunctional protein carries the glutsmine amidotransferase and phosphoribosyl-anthranilate transferase activities within the proximal one-third and the distal two-thirds of the polypeptide, respectively. The entire trpD gene consists of 1593 nucleotides encoding 531 amino acids; the sole tryptophan residue is located at position 189 of the amino acid sequence. Several sites predicted on the basis of genetic evidence have been localized within the sequence. These include a chi site for recombination at basepairs 654 to 661 and the trp-p2 promoter, located within a region from about 1400 to 1440 of the nucleotide sequence. The trpD sequence we present here completes the sequence determination of the entire trp operon of E. coli.
1. Introduction The tryptophan (trp) operon of Escherichia coli consists of five contiguous genes preceded by a region of control elements (Fig. 1). As a well-characterized genetic and biochemical system, this operon has already provided a great deal of in prokaryotes. including the information about regulatory phenomena interactions of the promoter-operator region with RNA polymerase and trp repressor, the mechanisms of attenuation and termination, and the role of physiological effecters such as tryptophan (see Platt, 1978: Crawford & Stauffer, 1980). More detailed studies at the molecular level depend, in part, on a knowledge of the primary structure of the genes and proteins: the information we present here represents the completion of the nucleotide sequence determina,tion for the entire trp operon of E. coli. This will facilitate the study of this operon as a single entity, and will permit more detailed studies that enhance our general understanding of operon structure and function. The second gerle of the operon, trpD, codes for a bifunctional protein that is part of the first enzyme complex unique to tryptophan biosynthesis. This anthranilate t Present address:
Department of Molecular Biology, University
of California.
Berkeley, Calif. 94720.
I’.S.A. 245 0022-2836/82/100245-12
$03.00/O
80 1982 Academic Press Inc. (London)
Ltd.
246
H. po
L
0 162
HOROWITZ, E 1560
G. 2
D
E.
(‘HRISTIE
~26
I593
C 1356
AND 14
T. .B
1191
PLATT 2
A 004
f
t’
36
@‘I(:. 1. The E. roli trp operon. The 5 structural genes (E through -4) are preceded by a transcribed leader region (L) and followed by a region specifying termination of transcription (t and t’). The regulatory elements are the promoter (p), operator (0). attenuator (0). $2 promoter (~2) and two et (II., 1974). frpE (Nichols terminators (t and t’). The size in nucleotides is shown for /rpL (Bertrand et al.. 1981), trpn (Nichols et rrl.. 1980: this paper), trp(’ ((‘hristie & Platt. IQSO), trpR ((‘rawford et nl., 1980). trpA (Nichols & Yanofsky. 1979) and the untranslated end of the transcript (Wu Xr Platt. lQi8). The lengths of intercistronic regions are shown above the line : trpE1) (Nichols et rrl.. 1QXO). fr/~/)(’ and trpCR (Christie & Platt. 1980). and tq71RA (Platt, & Yanofsk,v. lQ75).
synthetase complex is a tetramer composed of two subunits each of component8 I and component II. the trpE and trplj gene products. respectively (Ito Rr Yanofsky. 1966). The complex catalyzes two subsequent reactions. (1) The anthranilate synthetasr reaction. In the presence of glutamine, chorismatc is converted to anthranilate. The first third of‘ the trpl) protein contributes t’he glutamine amidotransferase activity to the trpE-trpll complex (Jackson & the reaction can occur with ammonia as the Yanofsky. 1974). Alternatively. amino donor, in which case only the trpEl polypeptide is required. (2) The phosphoribosyl anthranilate trccnsfwmr function. The enzyme complex transfers 5-phosphoribosyl-1-pyrophosphate to the anthranilate molecule, converting it to N-5’-phosphoribosyl-anthranilate. The distal two-thirds of the whethela it is complexed with the trpE trpl) protein catalyzes this reaction, polypeptide or not (Miozzari & Yanofsky. 1979). Subsequent reactions. catalyzed by the trpC polypeptide and a trpli-trpl enzyme complex. ultimately convert the phosphoribosyl-anthranilate to L-tryptophan. In contrast to the hifunctional trpl) polypeptide found in tC:. coli (and similar proteins in Shiyelln dywnteriar and Salmwnrlla fyph.imurirnvn). Serratin. marcescens possesses two separate polypeptides to carry out the reactions described above (Robb et ul., 1971 ; Zalkin & Hwang. 1971 : Hutchinson & Belser. 1969). Amino acid sequence homology between the amino termini of the E. coli trpD and the A\. marcescens trpG’ proteins (Li et al.. 1974) supports the hypothesis that the bifunctional trplj polypeptide of E. coli arose by fusion of the two separate genes (Miozzari & Yanofsky. 1979). A comparison of the nucleotide sequence of the trpGtrpI1 punctuation region in 8. maweswns with the analogous region in E. coli suggests a seyuence of base changes and deletions that could have generated the fusion (Miozzari & Yanofsky. 1979). A fusion event would explain why the two functions of the trpl) protein reside in separate domains of the protein. There is evidence that these two domains are actually physically separated in the three-dimensional conformation of the protein. Mild proteolytic digestion of the anthranilate synthetase complex produces an ASaseIIt fragment that retains the (:ATase. but not the PRTase activity (Li et al., 1974). Deletion mutants that lack the first t,hird of the trplj gene retain the PRTase activity (Jackson dz Yanofsky. 1974). anthranilate t Abbreviations used: A&se, PRTase. phosphoribosyl anthranilate transferase:
synthetase: 1:ATase. glutamine amidotransferase: PR-anthranilate. phosphoribosyl-anthranilate.
NIICLEOTIDE
SEQYENCE
OF THE
trpl)
GICXE
24i
The existence of a secondary promoter within the tryptophan operon of S. typhimurium was first postulated by Bauerle & Margolin (1967), and was later observed in E. edi and other enteric bacteria (Morse & Yanofsky, 1968; Larger1 & Belser. 1973). Genetic data indicated that the p2 promoter of E. coli resides within the distal portion of the trpD gene (Morse & Yanofsky, 1968; Jackson & Yanofsky, 1972). The presence of this internal promoter in the operon may provide an addit,ional level of control through which the regulation of tryptophan biosynthesis can he achieved. We hoped that the sequence determination of the gene might facilitate the specific localization of ~2, thereby enabling further studies of its func~tiorl.
2. Materials and Methods (a) Plasmids and enzymes
The plasmid pGC21 (Christie & Platt, 1980) contains a 2.4 x 10” base-pair Hind111 fragment carrying the distal 4/.!i of the trpll gene and all of trpC cloned into the single Hind111 site of pACYC177 (Chang & Cohen, 1978). Restriction endonucleases were purchased from commercial suppliers (Bethesda Research Laboratories, New England Biolabs). (b) Restriction
mapping
The Pr~II-Bg11 I fragment contained within the trpD sequence was end-labeled at both 5’ trrmini with 32P using [y-32PJATP (synthesized by the method of Glynn & Chappell, 1964) and phage T4 polynucleotide kinase (Boehringer-Mannheim). This doubly labeled fragment was cut in t’wo at the unique HinfI site, generating products of 406 and 438 base-pairs. Both halves were isolated from a polyacrylamide was carried out by partial endonuclease Biros&l. 1976).
gel after electrophoresis and restriction digestion of each end-labeled fragment
(c) DSA
mapping (Smith XI
8eq7lrncr malysis
The DKA sequence was determined by the method of Maxam & Gilbert (1980). Computer analysis
of the 1)NA
sequence
was carried
out using the program
of Queen &r Korn
(1980).
3. Results (a) LVwleotide
sequence
A detailed restriction map of the trpl) gene is shown in Figure 2. The sequence of the 5’ end of the trpD gene. up to base-pair 663, was det,ermined previously (Nichols AUG I
UAA 400
000
1200
1600
Bstn I
-
1
Unique sates
HindIII FIG:. 2. Restriction endonuclease cleavage tint nucleotidc of the trpl) initiation codon.
11
PVUII map of E. coli trpl).
Ball Nucleotides
BglU
are numbered
from
thp
248
H. HOROWITZ,
G. E. CHRISTIE
.___-
AND
___)
T. PLATT
-
-
600
700 I
i t HinfI :
900 000 I 1 t t HpalI RsaI
1000 I t Bstn I
I100 I
1200 I t Hpall
tt : HinfI
1400 1500 1600 , I I t t tt t Bstn I HpaII 8g/lI ; Hiif
1300 1
RsaI
HpaJJ
PVUII
PIG. 3. Sequencing strategy for the PRTase region. Anwvs indicate the direction and extent of sequence obtained from the labeled end of each fragment (indicated by the base of the arrow). Broken lines indicate regions of an end-labeled fragment from which sequence wais not, obtained. trpfl
et ab.. 1980: Miozzari & Yanofsky, 1979). The PLwII and RgZII sites end-labeled in the restriction mapping are located at base-pairs 661 and 1510, respectively. Earlier observations located the RsaI and H&f1 sites distal to the BgZII site end-labeled in the restriction mapping. The sequencing strategy and extent of sequence determination from each 5’.labeled site is shown in Figure 3. Sequence determination was done on both strands in over 70(+$ of the region extending from the PwuII site to the end of the gene. In those cases where sequence determination involved only one strand, different fragments and/or multiple runs of sequencing with the same fragment were performed. The sequences spanning all restriction sites used for sequencing were determined from overlapping fragments so that no information was lost as a result of a small restriction fragment being unnoticed. The complete nucleotide sequence of trpD is shown in Figure 4. The gene contains 1596 nucleotides, including the termination codon.
TAHIX 1 Codon frequency
in E. coli trpD X 5 1 0
C\IS (‘YY End TV
TG’I TG(’ TGA TCX:
4 1 0 I
(‘AT (‘A(‘ (‘L4L4 (‘AG
12 11 1% 11
L4rg Arg Arg .4rg
(‘GT (YX’ (‘GA (‘M:
16 1 0
.4Sll .4sn l>.V” Lys
.&AT AA(‘ AAA AA(:
X 15 I1 3
SW Ser A rg Arg
AGT AU’ XGA AG(’
7 10 I 0
Asp ASJ’ Gill Glu
GAT GAC (:AA GAG
12 9 22 7
Qly Gly Gly Gly
UGT GM’ GG4. GGG
1.5 18 5 8
Phe Phe Leu Leu
TTT TTC TTA TTG
8 7 9 6
Ser Ser SW Ser
TCT TCC TCA TC<’1
4 5 2 3
T?;I Tyr End End
TAT TA(‘ TAA TAG
Leu Leu Leu Leu
CTT CTC CTA CTG
1 8 2 37
Pro Pro Pro Pro
C(‘T , CCC (‘CA (‘C”(’I
1 3 2 23
His His Gill (:I11
Ile Ile Ile Met
ATT ATC ATA ATG
20 6 1 13
Thr Thr Thr Thr
A(‘T AcY’ A(‘A i\C(‘ 7
1 15 .5 4
Val Val Val Val
GTT GTC GTA GTG
7 8 3 15
Ala Ala Ala Ala
GCT GCC GCA GCG >
ti 19 10 31
7
SIlr(‘LEOTIDE
SEQUEXCE
OF
THE
~rpl)
GENE
249
RI:. 1. Sucleotidr sequence and predicted amino arid sequence of E. coli trpl). Nucleotides arr numbered from the beginning of the trpl) initiation codon. The sequence of the first third of the gene up to base-pair ti63 was determined previously (Sichols ef al.. 1980; Miozzari & Yanofsky, 1979).
(b) Podon
uauge
The frequency of codon usage in trpD (Table 1) displays a number of biases among E. coEi proteins. For example, in those cases where an amino acid is encoded by a set of triplets degenerate in the last base, there is a significant absence of adenine bases in this position (Berger, 1978; Elton et al., 1976). In trpD, of the 51 leucine residues coded for by (“TX, in only two cases is the third base prevalent
250
H. HOROWITZ,
G. E. CHRISTIE
adettitte. Other general trends Rerger. 1978) are also apparent (1) The predominance
that have in trpl):
ANI)
T. PLATT
been noted
before
(Elton
et nl., 1976:
of the XAA codon for Iysine residues : 11 out of 11 of the
trpl) lysine residues are coded for by XAA. (2) The predominance of the ACX’ codon for threonine : 15 out of 23 of the t,hreonine residues in trpl) are coded for by A(‘(‘. (3) The predominance of the CUG codon for leucine: 37 out of the 66 ]euci1tr residues are coded for by PUG\ though therr are six possible codons for this residue. (1) The rarity of the AGA and AM: codons for arginine: out of 25 arginine residues, one is AGA. none is AGG. and the rest are (XX (with 16 bring CYX~). (5) The rarity of t.he AUA codott for isolrucine : out of 27 isoleucirte residues in trpll, only one is ACA; while 20 are coded for by ANT. (6) The extensive use of GGl’y for gly+tte: out, of 16 glycitle residues. 33 are coded for by GGPy (7) The preferred use of (:AA for glutamic acid : 22 out of the 29 glutamic acid residues in trpl) are coded for by (:=\A. Some biases in trpD
differ
from the general
situation
:
(1) Of the four possible proline codons. 23 out of the 29 proline residues in trpf) are coded for by (Y”G. (2) There are four possible valine codons. but 15 of the 33 valine residues in trpf) are coded for by GUG. This is contrary t.o the trend noted by Elton et al. (1975). who found that valitre was coded for predominantly by (GUT. The predominance of the GtlrC: codott for saline is also found in the trpE: and trpt’ genes (lu’ichols et (xl.. 1980: (‘hristie Rr Platt, 1980).
The correct translatjional phase for the protein has already been established (Nichols et nl.. 1980) and is consistent with t’hr 61 residues of the ?I‘ terminus of the try~lj protein determined by amino acid analysis (Li rt 01., 1974). The caomplete deduced atnino acid sequence of trpli is shown in Figure -C.The nucleotide seyuence contains one continuous reading frame fot 1593 base-pairs. The atnino acid composition as predicted from the I)K;A sequence is shown in Table 2 and agrees well with experimentally determined amino acid frequettcies derived from averaged (or extrapolated) values from three separate acid hgdrolysates of the frpl) protein (T. Platt, unpublished results). There is only one trypt,ophan codon in t,hr entire gene. and it is located at position 189 in the amino acid sequence. (d) A’equencr ph~rnometw Genetic
studies
have revealed that there is a recombinational hotspot within the et a,Z.. 1982). X computer search for a chi site, an octameric sequence known to be associated with regions of DSA that have increased recombinational activity (Sprague et nZ.: 1978), located one such region at positions 654 to 661 ; the sequence in the coding strand is C-C-A-N-A-G-C’.
trpl;, gene (Triman
NUCLEOTIDE
SEQI’ENCE
OF THE
trpl)
GESE
251
TABLE 2 Amino
acid composition
Amino acid Ala Arg Asx c*ys Glx Gly His Ile Leu Lys Met Phe Pro SW Thr Trp Tyr Val
of trpD
protein
So. residues observed predicted 65% 25.7 45.2
66 25 44 5 5.5 46 23 27 66 14 13 15 29 31 2<5 1 13 33
58.8 455 21.5 27.0 68.1 15.8 12.7 15.7 303 29.7 25.8 13.5 33.3
Observed values are compared with those predicted by the nucleotide sequence. and are averages obtained after acid hydrolysis for 24.48. and 72 h of 2 separate samples of protein. Assuming 531 amino acids residues. M,56,886. Values for threonine and serine were extrapolated to zero time: valine. isoleucine and leucine values were derived from the 72 h hydrolysis sample. The trpD protein had been purified to homogeneity (as judged by electrophoresis on polyacrylamide gels) from a strain lacking the similarly sized trpE protein (T. Platt, unpublished results).
A search of the nucleotide sequence for a segment of DNA containing homology to the -35 and Pribnow box regions found in the IZ. coli consensus promoter sequence (Rosenberg &I Court. 1979) was carried out in hopes of locating the p2 promoter. A region containing extraordinary homology to the consensus sequence was found in the DNA sequence spanning base-pairs 1400 to 1440 (see Fig. 4 and the accompanying paper). The location is consistent with previous genetic evidence & and is supported directly by functional studies in ~ivo and in vitro (Horowitz P1at.t. 1982, accompanying paper).
4. Discussion (a) Sequence
anulysis
and codon usage
The nucleotide sequence of the trpD gene of E. coli presented here establishes the amino acid sequence of the protein encoded by the gene. The only significant technical difficulty encountered in the sequence determination was in the region extending from base-pair 1213 to 1243. The “stacking” of DNA fragments that was observed on the sequencing gels is most probably due to the extensive dyad symmetry in the region: T.T.G-(r-G.C-G-T-G-C-T-G.C:-(:-(~
.T-A-T-C-A-A-(I-G-(‘-G-C-G-G-(‘-G.G.
252
H. HOROWITZ,
G. E. CHRISTIE
AND
T. PLATT
Both strands of DNA were sequenced in this region and different fragments were sequenced for one of the strands. III so doing. we were able to read through the region. Because of potential difficulties of this type, it is important to employ several criteria to judge the validity of such a lengthy nucleotide sequence. First. our sequence yields an uninterrupted reading frame from t,he initiator methionine (Nichols et ul., 1980) all the way to the previously identified stop codon for the ,?rpl) protein (Christie & Platt, 1980). Second. if portions of a DNA sequence are t,ranslated in an incorrect frame, one often fiirlds regions with many unComm()n amino acids, even if a nonsense codort is not. encountered. Our t,ranslated sequence lacks any detectable regions of this sort, E’inally. comparison of the predicted amino acid composition to what is observed experimentally (Table 1) demonstrates little discrepancy between the observed and predicted amino acid frequencies. The protein is unusual in having only one tryptophan residue. This should be advantageous, since translation of message coding for an enzyme required for tryptophan biosynthesis would be impeded in ca,ses of severe try ptophan codo~~~. lt is starvation if it contained a significant number of tryptophan interesting to note that the location of this t,ryptophan codon corresponds t,o the only tryptophan residue in the trpf: gene of 8. marceCscen.s, only five amino acids from the C terminus of the trpG protein (Miozzari B Yanofsky. 1979). It is probable that the GATase domain of the E. coli trpll protein is functional even if it lacks these last five C-terminal residues. Thus. an abort,ive t,ranslational product of the trpn gene of E. coli would be able to complex with the Mase component I (trpE gene product) to form the enzyme complex responsible for carrying out the committed step leading to the synthesis of tryptophan. the conversion of chorismate to anthranilate. In agreement with this hypothesis is the finding that the trpE gene is devoid of any trp codon (Nichols et nl.. l!Fil). In the case of severe tryptophan starvation. it might be expected that the single trp codon located at amino acid residue 189 would impede the translation of the distal two-thirds of the gene coding for the PRTase activity. However, the possibility of a translational restart within the gene has been proposed by Jackson & Yanofsky (1974), based on the observation that some polar mutants having chain termination sites early in trpl) are leaky. and have low. but, measurable levels of PRTase activity (Jackson & Yanofsky, 1974). Since translation of the PRTase portion of the gene requires no tryptophan. a sufiicient level of enzyme may be synthesized to get the cell through the sever? starvation period. (‘onsistent with the notion that it is advantageous to have few trypt,opharl residues in the enzymes involved in the biosynthesis of t,his amino acid. are the findings that there are only three tryptophan codons in trp(’ ((‘hristic & Platt,. 1980). one in trpH (Crawford et al., 1980). and none in hp.4 (Sichols & Yanofsky. 1979). The possible involvement of the trp-p2 promoter in this aspect of regulation is discussed elsewhere (Horowitz & Platt. accompanying paper). We have found that codon usage in trpf) is. for the most part, similar to that generally found in E. coli genes (Berger, 197%: Elton et (11.. 1976). Compared to the usage found in the other genes of the trp ol)eron. it is most similar to that found in trpE (h’ichols it 01.. 1981,, which reflects the fact that the proteins encoded by the two genes function together in a complex. Bias in codon usa,ge may also reflect a
St’TCLEOTIDE
SEQUENCE
OF THE
tr?,Il
GER’E
253
translational advantage, with certain codons selected for because their respective transfer RXAs are more abundant or base-pair more strongly with the messenger RNA (Berger, 1978; Elton et al., 1976). For example, the GGPy binding activity of E. coli tRNAs is significantly greater than GGPu binding, and the predominance of G(+Py over GGPu usage has been noted above. Other explanations invoke tRNA availabilit)y. The tRNAs recognizing AGA and AGG codons for arginine and the AIJA codon for isoleucine are present in small quantities (Scherberg & Weiss, 1972). Accordingly, in trpD, only one arginine residue out of 25 is coded for by AGA and none is coded for by AGG. The isoleucine codon AUA is used only once in the entire gene. Thus, codons recognized by rare tRNAs may be selected against. The reverse may also be true. For example, the major leucine tRNA species in E. coli (50 to SO?,{;,of the total) recognizes CUG exclusively (Blank & Soll. 1971). The staggering predominance of this codon for leucine in the trpD gene is noted above. In spite of some of these correlations, it must be noted that there remains no conclusive evidence for the modulation of translation by codon usage.
(b) The p2 promoter Perhaps the most revealing information derived from sequence analysis is the localization of the p2 promoter. The existence of a second promoter within the trpI1 gene of E. coli was predicted (Morse $ Yanofsky, 1968; Jackson & Yanofsky. 1972). and a computer search for sequences similar to the E. coli consensus promoter sequence (Rosenberg & Court, 1979) located a region with considerable homology. extending from positions 1400 to 1440 in the nucleotide sequence (Fig. 4, and see also accompa,nying paper by Horowitz & Platt). Transcription experiments in vitro have confirmed that this sequence is indeed the p2 promoter, and cloning experiments have shown that it functions in V~VO as well (Horowitz & Plat’t. accompanying paper). The p)2 promoter represents an unusual phenomenon in E. coli, as a regulator) sequence that resides entirely within the translated region of a gene. Consequently. one might, expect to discern int.eractive constraints placed on the sequence in quest,ion as a result of the dual functions. The nucleotide sequence and the unusual codon usage in this region may reflect these limitations. In p2, our “best” sequence for t.he Pribnow box is T-A-C-A-A-G-G, which differs at only two bases from the consensus sequence T-A-T-A-A-T-G (Rosenberg 8~ Court., 1979). This probabl? represents the closest resemblance possible, because a T (rather than a C) at thr third position would introduce a nonsense codon into the gene. Similarly. if thr sixth base were a T (rather than a G). the glycine residue at, that position would be replaced by a cysteine residue. This amino acid is rare but crucial in E. coli proteins. and a misplaced sulfhydryl might easily interfere with protein function. However. the degeneracy of the genetic code provides for structural flexibility even M+thin the amino acid constraints, and it seems likely that the infrequently used codons emplo?-ed in the p2 region reflect a cellular attempt to maintain a reasonable promoter activity. For example. 16 out of 2.5 arginine residues in trpt) arc coded for by CGC. Only seven out of the 25 are coded for by CGT, and two of t.hrse are within the p2 promoter region. Of 66 leucine residues. 37 are coded for by
254
H. HOROWITZ,
G. E. CHRISTIE
AND
T. PLATT
CTG, and only nine residues by TTA, yet t)his codon also occurs twice in the p2 promoter region (see Fig. 4 and also accompanying paper). These arguments. though not formal proof, are consistent with overall evidence that the p2 promoter serves some important physiological function (see the accompanying paper).
(c) Chi sites Sprague et al. (1978) first studied a class of mutations that stimulated recombination in lambda phage. Each mutation was found to stimulate recombination within about lo4 base-pairs of its locus with the maximum (about tenfold) occurring very near the locus and diminishing with distance. Sequence analysis of six such chi sites created by mutation in lambda and pBR322 showed that the octamer G-C-T-G-G-T-G-G ( or its complement) is necessary for the Chi+ phenotype (Smith et al.. 1981). It was suggested that E. co2i chi elements may have structures similar or identical to those studied in lambda (Smith et aZ.. 1981). Triman et aZ. (1981) studied an E. coli chi element located in the la& gene and found the predicted octamer sequence near the region of increased recombinational activity. Malone et al. (1978) reported a chi site in trpD, which we have located at base-pairs 654 to 661, close to the fusion point of the ASaseII and PRTase domains of the trpD protein (within the PRTase domain). Chi sites have also been detected genetically in other genes of the B. coli trp operon (Malone et aZ., 1978; Faulds et al.. 1979). Besides the site in trpD, there are two chi sites in trpE and a single site in trpc’. These four sites are arranged so as to have alternating orientations. Since recombination occurs to a greater extent on one side of the chi octamer than the other (Faulds et al.. 1979: Stahl & Stahl. 1975). orientations may effectively the existence of chi sites in trp with alternating increase the potential for recombination in one or more specific regions of the operator-proximal port’ion of the operon.
(d) Operon xtrrscturr The complete primary structure of the trp operon will provide a basic framework from which to extend our understanding of structure and function at the molecular level. For example, some insights have already been gained as a result of sequence information at each intergenic region. It has been noted that both the E-D and R-A junctions contain overlapping stop and start codons. (Nichols et al., 1980: Platt & Yanofsky, 1975), whereas the trpD-trpC and trpC-trpR junctions do not (Christie B Platt, 1980). This would be advantageous, since both the E+D and B+A protein pairs function as complexes. In the course of translating the polycistronic message. ribosomes would not have an opportunity to fall off before completing translation of the second protein in either complex. thus assuring equimolar amounts of each within the cell. Indeed, Oppenheim & Yanofsky (1980) have shown that successful translation of trpE is required for optimal trpD levels. The entire tryptophan operon is about 6500 nucleotides long, from the beginning that the first of trpE to the stop codon following trpA. Thus. it is significant tryptophan codon in the structural genes occurs in trpD, approximately one-third
NI!CLEOTIl)E
SEQTESCE
OF THE
lrpn
GENE
255
of the way into the operon, in a position that. would not impede t,he formation of a functional ASase complex under conditions of severe tryptophan starvation. This complex could then channel chorismate into the pathway committed to tryptophan biosynthesis. The translational restart within trpD could bypass the first trp codon of t’hr operon. However, under conditions of severe tryptophan starvation, the presence of trpptophan codons in trpC and trpB might act to stall translating ribosomrs to such an extent that they would eventually fall off the message, creating abortive polypeptides. This would tend to create a polarity effect, resulting in decreased translation of the distal genes. Thus. it is interesting to note that the genes distal to trpll have decreasing numbers of tryptophan codons, possibly to compensate for the fewer number of ribosomes reaching that far into the message: fr&’ has three (Christie Dt Platt, 1980), trpR has one (Crawford ef nf.. 1980). and h-p-4 has none (Nichols & Yanofsky, 1979). In addition. the existence of the p2 promoter may be justified by the need to maintain a basal level of the trpC, trpB and trpd gene products under conditions of repression. If the bacterium were suddenly shifted to a state of severe tryptophan starvation. there would be a reserve of these enzymes that might be sufficient t’o sustain the cell through the starvation period, and allow it to recover (Nichols et al.. 1981). Hopefully, with the entire trp operon sequence available for study, additional insights will be gained into the mechanisms by which E. coli coordinates both the transcription and translation of trp messages under different renditions.
\Ve thank our colleagues for advice and criticism: and Pat Pouncey for expert typing. This research Health Service grant GM-22830 (to T.P.).
Trudy Michaud for technical assistance. was supported by United States Public
REFERENCES Bauerle. R. H. & Margolin, P. (1967). J. Mol. Riol. 26, 423-436. Berger. B. M. (1978). J. lVoZ. Evol. 10. 319-323. Bertrand, K., Korn, L., Lee, F., Platt. T., Squires, c’. I,.. Squires. (‘. & Yanofsky, C. (1974). itkietccu. 189, 22-26. Blank. H. I‘. &. Soll, D. (1971). J. Riol. (‘hem. 246, 4947-4950. Chanp, A. (‘. Y. & Cohen, S. N. (1978). J. Ructeriot. 134, 1141-1166. Christie. G. E. & Platt, T. (1980). J. Mol. Biol. 142, 519-530. (‘rawford, I. P. & Stauffer, G. V. (1980). Annu. Rev. Biochem. 49, 163-195. Crawford. I. P.: Nichols, B. F’. & Yanofsky, C. (1980). J. Mol. Biol. 142, 489-602. Elton, R. A., Russell, G. ?J. & Subak-Sharpe, J. H. (1976). J. Mol. Evol. 8, 117-135. Faulds, D., Dower, N., Stahl, M. $ Stahl. F. (1979). J. Mol. Biol. 131, 681-695. Flynn, I. M. & Chappell. J. B. (1964). Riochem. J. 90, 147-149. Horowitz. H. & Platt, T. (1982). J. Mol. Biol. 156, 257-267. Hutchinson. M. A. h Belser, W. L. (1969). J. Bacterial. 98, 109-11.5. Ito. .J. 8r Yanofsky, C. (1966). J. Biol. Chem. 241, 4112-4114. ,Jackson, E. N. 8r Yanofsky. C. (1972). J. Mol. Biol. 69, 307-313. Jackson. E. N. & Yanofsky, C. (1974). J. BactrrioZ. 117, 502-508. Largen. M. & Relser, W. (1973). Genetics. 75, 19-22. Li, S. I,., Hanlon. J. & Yanofsky, C. (1974). .Vatuw (London). 248. 48-60.
H. HOROWITZ,
256 Malone,
R. E., Chattoraj,
G. E. CHRISTIE
D. K.. Faulds,
D. H.. Stahl,
AND T. PLATT M. M. & Stahl,
F. W’. (1978). J. Viol.
Biol. 121, 473-491. Maxam, A. & Gilbert, W. (1980). Methods Enzymol. 65, 499-560, Miozzari, G. F. & Yanofsky. C. (1979). X&we (London), 277, 486-489. Morse, D. E. & Yanofsky. C. (1968). J. Mol. Biol. 38. 447-451, Nichols, B. P. & Yanofsky, C. (1979). Proc. Xat. Acad. Sci., I1Q.A. 76, 5244-5248. Nichols, B. P., Miozzari, G. F., VanCleemput, M., Bennett, G. N. & Yanofsky, C. (1980). J.
Mol. Biol. 142, 503-518. Nichols, B. P., VanCleemput, M. & Yanofsky. C. (1981). J. Mol. Biol. 146, 45-54. Oppenheim, D. 0. & Yanofsky, C. (1980). Cenetica. 95, 785795. Platt, T. (1978). In The Operon (Miller, J. A. & Reznikoff, W. S.. eds), pp. 2633302. Cold Spring Harbor Laboratory Press, Cold Spring Harbor. Platt, T. & Yanofsky, C. (1975). Proc. Nat. Acad. Sci.. I7.S.A 72, 2399-2403. Queerl, C. L. & Korn, L. J. (1980). Methods Enzymol. 65, 595-609. Robb, F., Hutchinson. M. A. & Belser, W. L. (1971). J. Biol. Chem. 246. 6908%6912. Rosenberg, M. & Court, D. (1979). Annu. Rrn. Gnrt. 13, 319-353. Scherberg, N. H. & Weiss, S. B. (1972). Proc. Nut. ilcad. Sci., U.S.A. 69, 1114-1118. Smith, G. It., Kunes, S. M., Schultz. D. W’.. Taylor. A. B Triman. K. L. (1981). CeII, 24, 429-
436. Smith, H. 0. & Birnstiel, M. L. (1976). Nucl. Acids Res. 3, 2387-2398. Sprague, K. U., Faulds, D. H. & Smith, G. R. (1978). Proc. Xat. Acud. Sci.. l/.S..4, 75, 618% 6186. Stahl. F. W. & Stahl, M. M. (1975). Mol. &n. &net. 149, 29.-37. Triman, K. L., Chattoraj, D. K. & Smith, G. R. (1982). J. Mol. Biol. 154, 393-398. Wu, A. M. & Platt, T. (1978). Proc. Nut. ilcad. Sci., 1IS.A. 75, 5442-5446. Zalkin. H. & Hwang, L. H. (1971). J. Biol. Chrm. 246. 6899%6907.
Edited
by S. Hwtlnrr