Cell, Vol. 66, 415-416,
August
9, 1991, Copyright
0 1991 by Cell Press
Transcriptional Control Lessons from an E. coli Promoter Data Base Jay D. Gralla Department of Chemistry and Biochemistry and Molecular Biology Institute University of California, Los Angeles Los Angeles, California 90024-l 569
What determines how regulatory elements are arranged in the genome? Few questions are more fundamental to molecular biology, and yet little is known. With this in mind, Collado-Vides et al. (1991) analyzed the largest available homogeneous data base, the more than 100 E. coli promoter sequences whose regulatory sites have been studied experimentally. The general principles that emerge, summarized below, include both the expected and the unexpected and have implications that extend well beyond the world of those who study prokatyotic biology. Two Families of Promoters, One Like Eukaryotes The promoters fall relatively neatly into two categories, according to the type of u factor required for transcription (see the scale drawing in the figure, adapted from ColladoVides et al., 1991). The majority use the u70 protein (or a close homolog) and have the type of structure using the well-known -10 and -35 basal elements, typically illustrated in textbook descriptions of bacterial transcription. A minority use the factor 0% and have different basal elements located at -12 and -24, and also have a different arrangement of regulatory elements. The available data suggest that these promoters are regulated solely by activation rather than by repression. The us promoters are also regulated by enhancer-like elements that are located at some distance from the polymerase binding site. The figure shows the large difference between the two types of promoters in the distribution of the activator sites that are located closest to the basal elements. Despite recognizing different basal elements in the two promoter types, RNA polymerase ends up covering the same base pair coordinates from nearly -50 to +20. Virtually all o’O promoters subject to activation contain activator sites located where they can communicate directly with the polymerase; nearly two-thirds touch position -40. For 0% promoters the activators are centered near position -110 and cannot touch the polymerase without looping out the intervening DNA. The a5) activator elements can be moved kilobases in cis and retain residual function (Reitzer and Magasanik, 1986) for three different activators (Buck et al., 1986; Birkmann and Bock, 1989), one of several properties that make the regulatory systems reminiscent of eukaryotic polymerase II promoters. The mechanism by which @ promoters are regulated from distant activator sites is known in some detail. The key to the mechanism appears to be the ability of polymerase to bind basal elements in the absence of activator but not to melt the DNA (Sasse-Dwight and Gralla, 1988; Popham et al., 1989). Thus, prior to activation a stable proximal complex exists containing core polymerase plus C? bound over the basal elements. This poised polymer-
Minireview
ase cannot transcribe, because the transcription start site has not yet been opened. When the upstream activator is triggered to bind in an active form, it touches this basal element complex by looping out the intervening DNA, as shown by electron microscopy (Su et al., 1990). The energy of ATP hydrolysis (Popham et al., 1989) is then used in conjunction with a helicase activity to reveal the basepairing determinants on the strand that must be copied and thus allow transcription to begin. This process has features that resemble polymerase II initiation more strongly than 070 initiation (see table) in that the CI~ system involves looping from distant sites, has a requirement for acidic domains (Sasse-Dwight and Gralla, 1990) uses the energy of ATP hydrolysis for initiation, and uses a phosphorylated enhancer protein (Weiss and Magasanik, 1988). a54 promoters also frequently use integration host factor (IHF) as a coactivator (see figure). IHF has no intrinsic activation activity but can stabilize formation of a looped complex containing u”, core polymerase, and enhancer, and can assist in activation (Hoover et al., 1990). By contrast, the u70 promoters require that an activator site be able to communicate directly with a polymerase binding over the basal elements. Displacement of a u70 activator site to a distant location causes a loss of activation. One key to this loss is the absence of the poised but inactive polymerase. Therefore, if a hypothetical distant activator were to loop to the basal u70 elements, there would be no target protein complex to touch; an interaction could occur only in the rare circumstance that looping to this region occurred at the same instant as transient association of the u70 polymerase with the basal elements. More generally, this rationalization predicts that looping mechanisms may be particularly effective when the basal elements are prebound by at least one factor whose conformation might be changed easily. It is interesting to note in this regard that the polymerase II TATA box can be bound
Cdl 416
Properties
of Pol II and cry1 Pal, Not Shared
by 19 Pol
Activated from a distance Requires ATP hydrolysis for initiation Has acidic, Gin-rich, and leucine zipper domains in activated transcription complexes Low basal level expression-is regulated predominantly by activation rather than repression Recognition of basal elements commonly occurs prior to activator intervention Coactivators are used commonly Multiple activator binding sites are generally used
by the TATA-binding protein of TFIID in the absence of activators. In looping mechanisms, genes must be separated by long distances to avoid inappropriate cross-activation of a gene by an enhancer associated with a different nearby gene. This may explain why aS4 promoters are very rare in the relatively compact E. coli genome (Magasanik, 1989). However, such mechanisms provide greater flexibility in evolutionary terms, since the regulatory site locations can be shuffled without loss of function. Such shuffling is extremely difficult for 070 genes, which require an adjacent and precisely positioned activator. Thus, the 6% mechanism is more appropriate to the large genomes of eukaryotes, where similar mechanisms do indeed seem to dominate the regulatory apparatus. We Function Changes with Posltlon and Context The location and context of factor binding sites was found to be a critical determinant of 070 promoter function (Collado-Vides et al., 1991). Perhaps the most striking examples were those in which a change of location was accompanied by interconversion of the same protein between an activator and a repressor. Thistypically occurred when a binding site for an activator protein was found downstream of the highly restricted zone of activation shown in the figure. No change in the protein structure appears to be required for this interconversion. For example, the common regulators CRP and FNR activate from a normal position near -40 but act as repressors when their sites are downstream of -20. The figure shows that the zone of repression is basically coincident with the polymerase binding site, implying that repression mostly involves direct interference with the polymerase. Even when the position of a regulatory site is fixed, its ability to function can vary according to what sequences are nearby. CRP and many other urn activators are required for stable promoter recognition by polymerase when the basal elements are weak. If the basal elements are strengthened, activation lessens. For OmpR activation of a series of test promoters, when transcription directed by the basal elements reaches an exceptionally high level, OmpR actually represses rather than activates (Tsung et al., 1990). Apparently the same OmpR-polymerase interaction that helps stabilize binding to weak basal elements inhibits when strong basal elements have already allowed stable polymerase binding. Even when located within the zone of common activation, the precise positioning of the a70 activators deter-
mines how well they function. For example, CRP activation sites are mostly located near -40, -60, and -70. These are roughly on the same DNA helix face, suggesting that stereospecific contacts with polymerase must be preserved. When a CRP site is artificially moved to different locations within a single promoter, it is these same three positions that support optimal activation (Gaston et al., 1990). When activator sites are moved upstream of this zone, they no longer function, probably because the a 70 polymerase machinery is not designed to easily accommodate looping mechanisms, as discussed above. Activators do not appear downstream of approximately position -20 (except when adapted to act as repressors), possibly because this would require them to overlap the nontranscribed DNA region that will be melted by polymerase after activation. If bound there as double-stranded DNA-binding proteins, they would need to be displaced from the DNA as the strands were separated prior to transcription initiation. Thus the mechanism of initiation by the a” polymerase places very severe restrictions on where activator binding sites may be positioned. For repression, one site must always be near polymerase (Collado-Vides et al., 1991) but its effectiveness depends on precisely where it is (Lanzer and Bujard, 1999). Thus a location between the -10 and -35 basal elements, where the repressor may have the opportunity to block recognition of both, is the most effective. However, the precise positioning is not nearly as critical as it is for activation, implying that repression does not require astereospecific contact with the polymerase. The quantitative dependence on position for repression may have implications for the operon and regulon organization of bacterial genes. Genes in regulons have wholly separate regulatory regions, despite being controlled by the same regulator and despite the predominance of operon organization, wherein multiple genes are controlled by a common region. Genes within regulons typically exhibit an exceptionally wide variation in operator position (Collado-Vides et al., 1991). An extreme example is control by the TyrR repressor, in which seven unlinked operons are controlled from seven different proximal operator locations. In general, since the strength of repression varies with position, this type of arrangement allows differential control of physically unlinked genes by the same repressor. These considerations imply that biology has taken advantage of the variations in the ability of regulators to function depending on site location as well as on physical and biological context. The mere appearance of a site of interesting sequence does not necessarily imply that it is important, a lesson also becoming apparent in studies of eukaryotic regulatory regions. In the bacterial cases, the mechanism and relevance of these variations are now being revealed. Multlpk and Overlepplng Regulatory Sites In many cases, regulatory regions contain overlapping or multiple regulatory elements (Collado-Vides et al., 1991), as is more common in eukaryotes, and these use interesting mechanisms. For example, in maltose regulation (Raibaud et al., 1989) interspersed binding sites for the MAL
Minireview 417
and CRP regulators are phased upstream of the basal elements. When both activators are present, a large nucleoprotein complex results in which the proteins form a scaffold for wrapping the regulatory DNA. This stable complex approaches the basal elements and activates the 070 polymerase that must recognize these elements. In this case, distant activator sites are used by CJ’Opolymerase but are brought near the polymerase binding site by a complex of intervening bound proteins rather than by the looping-out mechanism used to deliver activator to ati polymerase. Indeed, the theme of DNA wrapping may be much more general. For example, the strong bending from a single proximal CRP site may promote DNA wrapping around a proximal polymerase-CRP complex (Zinkel and Crothers, 1991). Eukaryotic polymerase II promoters often contain both multiple proximal elements and more distant enhancer sites, which also often bind multiple factors. In those cases, information could be delivered to the basal elements by combining aspects of both bacterial mechanisms: bringing distantly bound protein arrays nearby by looping to large, wrapped proximal arrays. The data base analysis showed many examples where regulatory sites overlap, mostly involving a positive and a negative regulator of 070 transcription. In these cases, the repressor usually specifically antagonizes the activator. The interplay between the CRP activator and the CytR repressor is particularly revealing with regard to differential control of the CAMP response pathway. The control regions of the several operons within the CytR regulon contain tandem binding sites for CRP protein, which activates the sites in proportion to CAMP levels. Each also contains elements directing repression by CytR protein in response to the availability of certain nucleotides. The CytR repression is dominant, and its elements overlap the tandem CRP sites. However, CytR does not actually recognize and bind its operator elements. Instead it binds and bridges the tandemly bound CRP proteins, inactivating CRP and causing repression (SBgaard-Andersen et al., 1991). The repressor is designed to be a specific antagonist of CAMP-dependent activation only when precisely positioned tandem CRP sites are present. This example is interesting in that it shows that an appropriately positioned duplication of a CAMP response element can cause regulatory overlap with a new inhibitory pathway. There are many examples in which a single regulatory site is duplicated. Duplication of activator sites is much more frequent in the 6% promoters. This may be because they use a looping mechanism whereby multiple activator sites increase the probability of touching the target polymerase properly during looping. Polymerase II also appears generally to use multiple sites in enhancer regions. 070 promoters rely on nearby specific placement of activators, and thus the need for duplication is not obvious. Duplicated repressor sites occur in a significant minority of 070 promoters and appear both in regions near the basal elements and, less often, in remote positions. The proximal duplications are associated mostly with particular proteins, as if their structure requires duplicated sites for optimal repression. The remote duplications occur nearly
exclusively when the proximal region is very crowded with activator and repressor elements, leaving little room to build in a duplication. The remote duplicated operators are known to bind repressor multimers that interact with proximal and remote sites simultaneously, looping out the intervening DNA. This provides quantitative assistance in repression, especially when severe repression is required. Thus the functions of duplicated sites in general appear to be to interact with regulators that function poorly using a single site or to assist the function of regulators by allowing cooperative interactions between proteins bound to separate sites. Have the Distinct Bacterial Mechanisms Been Adapted by Eukaryotes? Collado-Vides et al. (1991) suggest that there are two distinct types of E. coli promoter control systems and that certain general principles guide each. For 070 promoters, direct communication with the polymerase from at least one nearby regulatory element is the rule. The precise location of the site is very important for activation and somewhat important for repression. The organization of regulatory elements in the genome appears to use this dependence of function on location to allow differential regulation of genes. Regulatory elements are duplicated or used in combination to provide additional flexibility and quantitative assistance in regulation. By contrast, the or” promoters are rare and predominantly regulated by activation alone; this activation occurs from remote sites whose precise locations are less critical. The table lists several properties shared by the o” and RNA polymerase II transcription machinery. In terms of mechanism, a key property of the o@ mechanism is the ability to recognize the basal DNA elements but to prevent melting of the DNA start site until a looped activator intervenes. This interesting property is apparently shared by the other known bacterial enhancer, the phage T4 replication enhancer (Herendeen et al., 1989). Both the T4 and 15’~ systems also require ATP hydrolysis for opening the transcription start site, independent of requirements for protein phosphorylation. By contrast, the 070 machinery, which is not activated by enhancers, does not share these properties. Eukaryotic RNA polymerase Ill also does not share these properties, and in this regard it resembles the bacterial am polymerase more closely than it resembles RNA polymerase II. If the 6% and polymerase II mechanisms are as linked as this comparison implies, perhaps polymerase II activation can also occur by activators looping to partial, inactive basal assemblies and triggering changes leading to start site melting and transcription initiation.
References Birkmann,
A., and B&k,
A. (1969).
Buck, M., Miller, S., Drummond, 374-378. Collado-Vides, J., Magasanik, Rev. 55, in press.
Mol. Microbial.
M., and Dixon, B., and Gralla,
Gaston, K., Bell, A., Kolb, A., But, 733-743.
3, 187-195.
Ft. (1QSS). Nature J. D. (IQQI).
H., and Busby,
320,
Microbial.
S. (IQQO). Cell 62,
Cell 418
Herendeen, Geiduschek, Hoover, 1 l-22.
D. R., Kassavetis, G. A., Barry, J., Albert% E. P. (1989). Science 245, 952-958.
T. R., Santero,
E., Porter,
Lanzer, ht., and Bujard, 8973-8977. Magasanik,
B. (1989).
Popham, D. L., Szeto, 243,829-835.
H. (1988).
Proc.
S. (1990).
Natl. Acad.
Cell 63,
Sci. USA 85,
New Biol. 7, 247-251 D., Keener,
Raibaud, O., Vidal-lngigliardi, 205, 471-485. Reitzer,
S., and Kustu,
8. M., and
J., and Kustu,
D., and Richet,
L. J., and Magasanik,
8. (1986).
Sasse-Dwight, 85, 8934-8938.
S., and Gralla,
Sasse-Dwight,
S., and Gralla,
Tsung, K., Brissette, R. E., and Inouye, Sci. USA 87, 5940-5944.
Cell 62, 945-954.
B. (1988).
Zinkel,
D. M. (1991).
S. S., and Crothers,
B., and
H. (1990). M. (1990).
Weiss, V., and Magasanik, 8919-8923.
J. Mol. Biol.
Proc. Natl. Acad. Sci. USA
J. D. (1990).
S., and Echols,
E. (1989).
Science
Cell 45, 785-792.
J. D. (1988).
SBgaard-Andersen, L., Petersen, H., Hoist, sen, P. (1991). Mol. Microbial. 5, 969-976. Su, W., Porter, S., Kustu, Sci. USA 87, 5504-5508.
S. (1989).
Proc.
Valentin-Han-
Proc. Natl. Acad. Proc.
Natl. Acad.
Natl. Acad. Sci. USA 85,
J. Mol. Biol. 219, 201-215.