New Concepts in Protein-DNA Recognition: Sequence-directed DNA Bending and Flexibility1

New Concepts in Protein-DNA Recognition: Sequence-directed DNA Bending and Flexibility1

New Concepts in ProteinDNA Recognition: Sequence-directed DNA Bending and Flexibility’ RODNEYE. HARRINGTON* AND ILGAWINICOV*J *Departments of Biochemi...

4MB Sizes 0 Downloads 23 Views

New Concepts in ProteinDNA Recognition: Sequence-directed DNA Bending and Flexibility’ RODNEYE. HARRINGTON* AND ILGAWINICOV*J *Departments of Biochemistry and +Microbiology University of Nevada Reno Reno, Nevada 89557

I. DNA Sequence Dependence in Protein-Nucleic Acid Binding Specificity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Sequence-dependent Bending in DNA . . . . . . . . . . . . . . . . . . . . . . . B. Sequence-dependent Flexibility or Kinking in DNA C. Evidence for Sequence-dependent Flexibility in DNA . . . . . . . . . . 11. A Short Taxonomy of DNA-bending Proteins and Their Recognition Sequences ................................................... A. Prokaryotic Helix-Turn-Helix Proteins ................. B. Eukaryotic Helix-Turn-Helix C. Zinc-finger Proteins: The ”C,H,” Classes D. The “C,” Class of Zinc-binding Proteins . . . . . . . . . . . . . . . . . . . . . . .................. E. Leucine-zipper Proteins . . . . F. Minor-groove-binding Proteins: The TFIID Transcription Factor Complex . . . . . . , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G. The NF-KB Protein and Its Binding to DNA H. Other DNA-binding Proteins with Putative F1 Their Recognition Sites . Sequences . . . . . .

111. Models of Sequence-directed Structure-Function Relationships in

Selected Regulatory Systems

References

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..

196 199 200 202

213 214 221 227

233 236 240 243

245 248 253 259 261 263

One of the most interesting and fruitful recent developments in molecular biology is the continuing coalescence of genetic control with the older field of DNA molecular biOphysics and structure. It has been known for some time that many biological processes at the molecular level, including 1

A glossary of abbreviations and polynucleotide notation appears on p. 261.

Progress in Nucleic Acid Research and Molecular Biology, Vol. 41

195

Copyright 6 1994 by Academic Press, Inc. All rights of reproduction in any form reserved.

1%

RODNEY E. HARRINGTON AND ILGA WINICOV

the regulation of genes, occur concurrently with the binding of regulatory proteins to specific sites on the genomic DNA. These regulatory nucleoprotein complexes are usually characterized by unusual binding &nity as well as site specificity. Many cases are known in which the same protein can regulate different trans-actional events by binding to different DNA recognition sites, or in which differential regulation can occur from the competitive binding of two or more proteins to the same site. Recent crystallographic, spectroscopic, and biochemical studies provide some structural rationale for the extraordinary binding specificity of many of these regulatory complexes, and a number of consensus structural “motifs” for the DNA-binding domains of proteins have been described (reviewed in 1-3). However, how the DNA contributes to the binding specificity i s not so apparent. For some time, it has been recognized that the well-known DNA structural “families” may exhibit some sequence dependence, but no instances of specific DNA families binding with high specificity to regulatory proteins have so far been demonstrated. Rather, the DNA in regulatory nucleoprotein complexes seems to be invariably in the B-form, the softest and most labile of the various structural families known. Thus, sequenceregulated, highly localized DNA structures must be implicated in proteinDNA recognition. Such sequence dependence in DNA structure has been recognized for some time, and has been related in certain ways to protein binding (reviewed in 4-6). In this writing. we attempt to knit together available structural information on the DNA-binding domains of a representative set of regulatory proteins with what is presently known about sequence-directed DNA structures. Because the discovery and identification of new transcription and regulatory factors are currently proceeding at such an explosive pace, we make no pretense that this discussion can be a comprehensive and current review of these systems. Rather, we focus on unusual D N A structures with known sequence dependencies, such as bends, and the relatively new concept of sequence-directed structural softness or flexibility, and correlate these with protein structural motifs wherever possible. We show that analyses of consensiis binding sequences in DNA can provide important clues both for identifying possible roles of localized DNA structures (or microstructures) in protein-DNA interactions and for interpreting these roles in structure-function terms.

I. DNA Sequence Dependence in Protein-Nucleic Acid Binding Specificity The perception of DNA structure-fhnction relationships by molecular biologists has undergone considerable modification in recent years. The

STRUCTURAL FLEXIBILITY IN

DNA-PROTEININTERACTIONS

197

earlier depiction of DNA as a homogeneous, stiff, nearly rodlike macromolecule has had to acquiesce to the notion that many of its important biological functions are based not on its overall global conformation but rather on highly localized structural features directed by relatively short sequences of bases. A number of factors have played roles in this remarkable shift of paradigms, but two of the most important have been the discovery of axial bending in DNA and the observation that regulatory proteins generally bind to specific sequences of bases with extraordinary affinity. An important consequence is that most biological processes are modulated at the molecular level by the interactions of regulatory proteins with themselves or other proteins and with their characteristic operator DNA. A corollary is that the trajectory of the DNA is precisely defined, particularly in large, multisubunit nucleoprotein complexes. Because both of these factors are highly site specific, they can confer a corresponding level of site specificity to the processes they control, which include transcriptional regulation, the action of hormone receptors, and certain types of site-specific recombination, including the precise insertion of viral DNA into host genomes. Thus, our advancing knowledge of biological control processes at the molecular level has mandated an equivalent understanding of localized, sequence-directed DNA structures. The mechanisms by which certain proteins recognize specific regions of DNA are not fully understood and are generally complex. In spite of this, the broad classification of energetic factors leading to both binding specificity and affinity into direct and indirect readout components, as originally suggested by Drew and Travers (7), provides a useful basis for conceptualizing the role of localized, sequence-directed DNA structures in specific proteinDNA binding. Direct readout was first proposed in the early 1980s (8, s), based on modeling studies of the Cro protein crystal structure fitted to uniform, Watson-Crick DNA. A small number of amino-acid residues, usually about three, aligned along one side of a “recognition” a-helix [or occasionally a p-ribbon (lo)]in the DNA-binding domain of the protein, form specific hydrogen bonds with a “recognition matrix” of complementary nitrogen or phosphate sites in the DNA. In most cases, access is through the major groove because in B-family DNA the major groove width is adequate to accommodate the protein-recognition element, and the potential hydrogen-bonding sites are relatively exposed. However, new classes of minor-groove-binding proteins have recently been characterized. Additional contacts of the protein-recognition element with the DNA backbone, usually with phosphates, ensure correct placement of the element within the DNArecognition matrix. Although direct readout provides a mechanism for high binding specificity, this can be further amplified by cooperative proteinprotein interactions and by involving additional protein-recognition ele-

198

RODNEY E . HARRINGTON AND ILGA WINICOV

ments in the binding process. A special case of the latter is the interaction of dimeric proteins with palindromic DNA-recognition matrices. The structural fit between the protein and DNA binding partners that facilitates these highly specific interactions is termed indirect readout. Until fairly recently, it was thought that this could be understood as a complementary assembly of localized static protein and nucleic acid structures, and the “goodness of fit” between the two would be determined by the specific structures and hence by their sequences. It has been known for over a decade that B-form DNA exhibits sequence-dependent structural variability (11, 12) that leads also to corresponding variability in conformation or molecular shape (13, 14). When spaced in a sequence at helical periodicity, localized structural dislocations can lead to longer range structural features such as fixed bends. It is likely that sequence-dependent DNA structure including such coherent additivity effects leading to planar curvature is an important ingredient of indirect readout. In addition to enhancing protein-DNA interactions by improving the geometrical fit, DNA bending appears to facilitate or modulate looping between regulatory elements acting in cis at a distance (15) and in the architecture of multisubunit regulatory complexes (16).The first of these may include the interactions of enhancers with promoter regions, a number of protein-mediated intrapromoter associations, and effects of chromatin structure on transcriptional regulation ( I 7). It is becoming increasingly clear, however, that a picture of indirect readout based only on static structures is a serious oversimplification. Evidence is accumulating that there is also a dynamic aspect in which proteinDNA binding is affected by conformational deformations of both the protein and the DNA. These result from structural adjustments involving localized changes in helical twist angles and in the direction of the helical axis at the dinucleotide level. The structural accommodations between a specific binding protein and its DNA-binding site are determined by a complicated interplay of intermolecular and intramolecular electrostatic, hydrophobic, and van der Waals forces. These interactions are nonspecific in themselves, but are strongly dependent on the conformational features of both the DNA and the correctly folded DNA-binding domain of the protein. Both the protein and DNA partners seek regions of the other that maximize these interactions. The nucleoprotein complex will therefore reflect not only the greatest possible accommodation between static structures of the binding partners, but its formation may entail some level of structural distortion in one or both partners as well. Virtually all specific nucleoprotein complexes must utilize conformational lability in both the protein and DNA components to improve direct recogni-

STRUCTURAL FLEXIBILITY IN

DNA-PROTEININTERACTIONS

199

tion (18).The structural and conformational factors involved in indirect readout are highly sequence dependent because maximum accommodations between the binding partners will occur only at a juxtaposition of critical sequences in both partners. However, there may also be some synergism between direct and indirect readout effects: the nucleoprotein conformation that corresponds to maximum structural accommodation by the binding partners will usually also be the conformation that optimizes the formation of specific hydrogen-bonding contacts. It is in the context of all these effects that the extraordinary sequence specificity of interactions between binding proteins and their relatively short cognate DNA-recognition regions can be understood. It now seems likely that each specific nucleoprotein complex utilizes a characteristic and possibly unique combination of direct and indirect readout mechanisms. Furthermore, an extraordinary level of complexity in nucleoprotein binding as well as a complex interplay of multiple chemical and physical mechanisms are probably necessary to achieve the required level of site specificity and to reduce errors to a genetically acceptable level in truns-actional events having high site-specificity requirements.

A. Sequence-dependent Bending in DNA Curvature in B-form DNA regions has been studied for almost a decade and is now believed to be an intrinsic property of certain DNA sequences (reviewed in 16, 19-23). The earliest curvature elements to be identified were the phased A, tracts found in DNA fragments obtained from kinetoplast minicircles of the African trypanosome. Their anomalously slow migration in polyacrylamide gels was assumed to derive from planar, or nearly planar, axial curvature (22). Subsequent studies on these systems using a variety of experimental methods, including gel mobility retardation (23)and measurement of cyclization .probabilities (24-27), variants of methods pioneered by Shore and colleagues (28,29), clearly established that DNA with helical-phased regions containing tracts of A, with n 2 5 were indeed curved. Two models have been proposed to account for this curvature. In the wedge model, each A, tract is curved because the axial deflections of successive A-A dinucleotides combine coherently to produce a planar curve (3033). In the junction model, axial deflections arise from the structural discontinuities occurring at the junction between the A, tracts, presumed to be in a modified B-form structure (34), with adjacent B-form DNA (35).In both models, the A, tracts are in nearly perfect helical register, which assures that the curvatures of individual bending elements add coherently to produce a larger overall bend. Experiments to date do not confirm one model over the

200

RODNEY E. HAHRINGTON AND ILGA WINICOV

other, and it now seems possible that both are simply variants of the same model (21). Hegions of putative A, curving DNA in important biological systems have recently been identified (reviewed in 5 , 6, 36). More recently, evidence has appeared suggesting that sequence-directed curving in DNA may be a more general phenomenon, involving a number of sequence elements in addition to A,, regions (13, 14). A set of first-order predictive rules that provide a semiquantitative description of general sequence-directed fixed bending has recently been proposed (13).Although a number of moderately curved sequences that contain no A, tracts have been identified ( I d ) , curvature from phased A,, elements is still much larger than from other reported curvature motifs (13, 14). it now seems probable that sequence-dependent DNA structures, including coherent effects leading to curvature, serve to orient or steer the DNA in large, multisubunit nucleoprotein complexes (16).Fixed DNA bending may also be important in looping between regulatory elements acting in cis at a distance (15).

6. Sequence-dependent Flexibility or Kinking in DNA

Although much more difficult to demonstrate experimentally, sequencedirected DNA flexibility very likely plays a functional role similar to that of static axial bending. The distinction between flexible and fixed or static bending in DNA is based on relative deformability of the helical axis trajectory in a direction perpendicular to the axis. All DNA is flexible to some extent, as manifested by its finite persistence-length in solution (37, 38; reviewed in 36). Just as the helical axis changes direction in certain sequence elements ( I S ) , the local bending or torsional modulus may also vary significantly among sequence motifs. Steric considerations indicate that flexibility in the double helix will generally occur preferentially by roll toward either the major or the minor groove (39). This involves less configurational readjustment of the backbone than a flexure due entirely to tilt. Such anisotropic flexibility, if it occurred in a completely directional fashion (as might be the case in the interaction of a DNA sequence element with a protein), would have configurational consequences similar to those of fixed or static bending. Thus, larger flexibility effects might derive from the coherent contributions of multiple flexibility elements located in a helically phased array just as with fixed bending motifs. Both static bending and flexibility at specific sequence elements can promote the necessary DNA trajectories to effect a tight fit between DNA and proteins in nucleoprotein complexes. Conformational lability from flexible sequence elements may fine-tune this process and may, i n addition, offer structural explanations for instances in which more than one protein can bind specifically to the same

STRUCTURAL FLEXIBILITY IN

DNA-PROTEININTERACTIONS

201

DNA sequence. From information presently available, localized DNA flexibility may be a more ubiquitous feature of DNA-protein interactions than previously believed (40). The first explicit distinction between flexibility and fixed bending was made in studies of the electrooptical properties of DNA containing putative static bending elements (41, 42). Bending was characterized in terms of the decay of electrical birefringence, a quantity highly sensitive to the effective length or end-to-end distance of a rodlike macromolecule; bending in a DNA fragment, whether static or due to anisotropic flexibility, results in an effective shortening of the molecule. The distinction between static bending and flexibility was based on the measured persistence-length, and particularly on its electrostatic component as deduced from its experimental dependence on ionic strength. Although persistence-length is a measure of chain stiffness, it is a statistical quantity averaged over the entire fragment, so these experiments could therefore not distinguish isotropic from anisotropic chain flexibility effects. In general, measurements of properties that are averaged over the entire chain are not very useful for this type of distinction; they usually cannot distinguish unambiguously between anisotropic flexibility and static bending. This includes most traditional polymer characterization methods, and most enzymatic and biochemical methods as well. Rather, methods are required that can extend the dynamical lifetime of a bent, flexible chain or sequence element. Of these, binding the sequence to a DNA-bending protein is perhaps the least ambiguous, and studies on the free sequence provide suitable controls to differentiate static from flexible bending effects. At present, the most plausible microstructure associated with increased DNA flexibility is the stereochemical kink. This concept was first proposed (43)in an attempt to explain the energetics of DNA wrapping on the nucleosome, and was amplified somewhat in later studies (44).When a structural dislocation leading to an abrupt and discontinuous change in direction of the helical axis is reduced to a single dinucleotide, or to a small set of contiguous nucleotides, the DNA is usually said to kink. Kinks have been proposed to occur from relatively massive structural alterations in the DNA due to pyrimidine dimers or psoralen crosslinks (45),from drug-binding interactions (46),from single-strand bubbles (47), and from protein binding to DNA (3, 31, 40, 48-51; reviewed in 3). Simple stereochemical kinks are evidently possible that primarily involve roll into the major or minor groove (43, 44), avoiding a change in local tilt (31,40,50,51).Such kinks represent a dislocation in the helical axis where the stacking interactions between two neighboring base pairs is essentially lost (50, 51). It has been calculated that a roll of 15 to 20"between the base planes corresponds roughly to a loss of 50% of

202

RODNEY E. HARRINGTON AND ILGA WINICOV

stacking energy (31, SO); this value therefore sets a lower limit to the kink angle. At larger angles, the kink is expected to behave essentially as a free hinge (43, 44). At ordinary temperatures, an appreciable steady-state concentration of kinkable sequence elements may exist in a kinked state due to thermal fluctuations ("DNA breathing"). Schellman (39) has analyzed the effect of such kinking on DNA chain flexibility and has suggested that as many as 2% of all bases could exist in a kinked conformation (44) at any given time in order to account for observed persistence-length values. Using a different analysis, Manning (52)has estimated 1%.Both these values are somewhat less than the upper limit of 5% deduced from proton-exchange rates (53-55). Although the latter may be up to 107 too large (56, 57), even the most conservative estimates of DNA breathing rates (reviewed in 57, 58) suggest that, at ordinary temperatures, an appreciable fraction of DNA bases may be energetically in a kinkable state. Because the sequence elements CA and TA are the lowest in stacking energy among dinucleotides (59, 6O), the equilibrium concentration of these that are energetically in kinkable states at ordinary temperatures may be greater than the above estimates suggest. Indirect evidence is available in support of this view from NMR investigations (61), from studies of nucleosome positioning (62), and from an analysis of sequence versus molecularsize dependencies of gel mobility retardation effects (63).Direct evidence for kinking at (CA)*(TG)dinucleotide elements has been provided by a recent high-resolution cocrystal structure of the complex between the CAP protein of the Escherichia coli lac operon and its operator DNA (3,48).In this work, two sharp kinks of about 40" each are observed at the two (CA)-(TG)elements symmetrically arranged about the pseudodyad of the specific binding consensus sequence. H e l i d phasing is such that this leads to a somewhat outof-plane overall bend of about 90".This is in essential agreement with a bending angle of about 100"deduced from studies using gel electrophoresis methods (64,65),but unlike the gel studies, the cocrystal structure demonstrates clearly that the locus of bending is focused primarily in (CA).(TG) h n k sites. Lower resolution crystallographic studies on the complex of the h phage Cro protein with the 0,3 operator site also show pronounced DNA bending in the complex (66).

C. Evidence for Sequence-dependent Flexibility in DNA 1. INDIRECT EVIDENCEFROM

ANOMALOUS

GEL MIGRATION

The first experimental evidence for sequence-dependent flexibility in free DNA was based on the unusual electrophoretic behavior of certain DNA

STRUCTURAL FLEXIBILITY IN

DNA-PROTEININTERACTIONS

203

fragments having fixed axial curvature but no A,, tracts (40).Each of these fragments was constructed by end-to-end ligation of a curved 21-bp “precursor” sequence. Because the number of helical repeats in each of these precursors was almost exactly two, the curvature in each was propagated coherently into planar bending oligomeric fragments. However, the retardation in mobility through the gel (characterized as RL, the ratio of apparent to true fragment length) was qualitatively different for different precursor sequences. In most cases, the retardation increased monotonically with oligomer size. This would be the expected behavior if the retardation is associated with axial curvature, at least up to oligomer sizes at which the fragment lengths and overall curvatures become very large. In a few cases, however, R L as a function of fragment size goes through a maximum at relatively low oligomeric size (about 100 bp). This “second retardation anomaly” does not appear to be correlated in any obvious manner with the precursor sequence. However, use of a sophisticated plotting algorithm (67)revealed that, at least among the precursor sequences investigated in this work, the second retardation anomaly can be observed only with those precursors whose oligomers exhibit essentially perfect planar curvature and, in addition, contain one or more (CA)*(TG)or (TA)*(TA)dinucleotide elements located precisely in the plane of curvature. It was therefore proposed that these elements are sites of unusual anisotropic flexibility as characterized by unusually small energy barriers to kinking. At ordinary temperatures, thermal breathing in the DNA would ensure that a significant subpopulation of molecules would be energetically in a kinked or, at least, in a “kinkable” state. If they are located in the plane of static curvature, the tensile force of the electric field in a gel electrophoresis experiment could facilitate their kinking in such a direction as to partially straighten out the static curvature in the fragment. According to current gel electrophoretic theory (reviewed in 68, 69), therefore, these fragments should experience less “friction” in reptating through the gel and the magnitude of R , would be correspondingly reduced. Viewed in another way, the gel could be thought to “entrap” transiently kinked species by allowing them to pass into pores too small to accommodate the normally curved fragments, thereby extending their effective lifetimes. This concept of entrapment is fully analogous to that envisioned when a specific binding protein alters the structure of its DNA recognition sequence at particular flexibility loci in order to ensure improved structural accommodations in binding. Although indirect, the anisotropic flexibility hypothesis is a plausible and attractive explanation for the second gel anomaly effects observed by McNamara et al. (40).Other explanations have been advanced that can account for nonmonotonous behavior of R , with fragment size (70), but these

204

RODNEY E. HAHRINGTON AND ILGA WINICOV

cannot account for the extraordinary consistency in the location of putative flexibility elements with respect to the plane of static curvature in the fragments studied (40). Combined with additional recent evidence, discussed in Section I,C,S,b, a strong circumstantial case can now be made for (CA).(TG) dinucleotides as sequence-directed flexibility elements of importance in protein-DNA interactions. In this connection, however, it is important to remember that McNamara et al. also identified (TA)*(TA)dinucleotides as putative kink sites in their sequences.

2. CRYSTALLOGRAPHIC EVIDENCE DNA. The earliest direct evidence for stereochemical kinking in DNA was observed in the high-resolution singlecocrystal structure of the EcoRI endonuclease with its specific binding site GAATTC (49). Crystals were grown in the absence of Mg2+ to preclude cleavage of the DNA. In this structure, the DNA conformation was distorted by the bound protein in two distinctive kinks: a torsional dislocation in the center of the binding region, called a type-I neokink, and two largely axial bending kinks, called type-II neokinks, at the edges of the central binding region. The type-I neokink effectively unwinds the DNA by approximately 25" and leads to a widening of both grooves, but particularly of the major groove, and it thus enhances the accessibility of the bases in this region. It occurs at the central (AT)-(AT)base-paired dinucleotides and leads to a relatively small bending dislocation of about 12". The type-I1 neokink, on the other hand, is primarily a bending dislocation of from about 20" to about 40" with a much smaller torsional component. It occurs mainly by roll toward the minor groove at flanking (CG)*(CG)base-paired dinucleotides. These neokinks are similar in concept to the stereochemical kinks described earlier (43, 44, 50), but differ significantly in structural details. A particularly interesting feature of the type-I neokink is that its formation creates an alignment of hydrogen-bonding sites on bases that appears necessary for direct readout in the EcoRI nucleoprotein complex, although such an alignment does not exist in the uncomplexed recognition DNA. This is a point of great importance in understanding the extreme subtlety of sequence-directed flexibility effects in protein-DNA recognition: many critical structural features, such as torsional or axial kinking, may be virtual features that exist only as transient aspects of a more complicated overall molecular dynamics in the binding partners separately, and become real only as the binding partners unite in a stable nucleoprotein complex. Stereochemical kinks have been observed also in the cocrystal structure of the CAP protein-operator DNA complex (48). In this system, kinking occurs through about 40" at two (CA).(TG) dinucleotide elements located a Eoihence for Static Kinks in

STRUCTURAL FLEXIBILITY IN

DNA-PROTEININTERACTIONS

205

symmetrically about the dyad in the palindromic recognition sequence, so that the overall DNA bend through the complex is about 90”. The cocrystal structure clearly shows that these kinks are essential for both direct and indirect readout in the CAP complex. The alignment of the recognition helix in the helix-turn-helix motif with the complementary bases in the major groove evidently cannot occur in the absence of a structural dislocation of this magnitude. In addition, the bent DNA conformations at the (CA).(TG) sites allows more distal regions of the DNA to interact with the protein, and it is suggested that sequence-directed effects similar to those proposed to explain the slight but significant sequence dependence of nucleosome placement on DNA (71)can resolve the remaining curvature observed in the CAP complex (48). However, the large DNA bend through this complex has an additional functional role: that of DNA “steering” in the transcriptional complex. Some evidence exists that CAP may activate transcription by facilitating interactions between polymerase and upstream DNA, presumably through a looping mechanism. (This question is reviewed in 3, 72.) The above cocrystal structures provide clear evidence for kinked DNA in the EcoRI and CAP nucleoprotein complexes. From this, it can be inferred that the kinks occur at “weak points” in the DNA, i.e., at sequence elements having a relatively low energy barrier to kinking, and that the protein-DNA interaction free energy is large enough to overcome the conformational free energy costs in both the protein and DNA moieties required to form the optimum nucleoprotein complex. Nevertheless, these cocrystal structures are snapshots of dynamic systems stabilized by crystal packing forces in a particular conformation; the evidence they provide that the observed structural dislocations represent sites of dynamic flexibility is necessarily indirect. This points up the need for continuing diversity in experimental approaches to study these fundamental and important structure-function relationships.

b. Evidence for True Sequence-directed Flexibility. A more direct depiction of site-directed flexibility in DNA has appeared in the work of Lipanov et al.(73),who crystallized the B-DNA decamer CCAACIlTGG in both monoclinic and trigonal space groups. This work provides a dramatic demonstration of the importance and extraordinary specificity of crystal packing forces, because the structures observed in the two space groups are different in several critical respects. Although differences in twist, roll, helical rise, slide, and propeller twist were observed in all dinucleotide steps, large differences occurred only in the pyrimidine-purine elements (CA)-(TG). These differences were primarily in roll, twist, and slide, suggesting that in this sequence, the (CA).(TG) elements are sites of unusual flexibility. Although the association of these differences with deformability is indirect, this

206

RODNEY E. HARRINGTON AND ILGA WINICOV

work must nevertheless be viewed as substantive evidence for sequencedirected dynamic flexibility in DNA. It provides additional confirmation that (CA)*(TG)is a consequential locus of such flexibility in a sequence.

3. FLEXIBILITY IN PROTEIN-BOUND DNA: THE COMPLEXOF CRO PROTEIN WITH THE OR3 BINDINGSITE a . Ecidence for DNA Bending in the Cro-OR3 Nucleoprotein ComPerhaps the most direct demonstration of (CA)-(TG)as a flexibility site in DNA is provided by gel electrophoresis studies of the cyclization properties of DNA oligonucleotides containing the Cro protein of A phage complexed to one of its several specific recognition sites. Lyubchenko et al. (74)described a gel electrophoresis technique that allows a direct determination of bending angle in certain nucleoprotein complexes. The technique is based on the mixed-ligation cyclization method of Ulanovsky et al. (75). It has been applied to the Cro-OR3 nucleoprotein complex. Although Cro is a relatively small regulator protein, the method seems likely to be applicable also to larger nucleoprotein systems. The DNA bending angle in the Cro0,3 complex was about 45",in excellent agreement with the value obtained from a recent low-resolution X-ray cocrystal structure on this complex (66). The method used (74)was relatively straightforward and direct. Complementary single strands 21 nt long and containing the 17-bp OR3recognition sequence were synthesized. These were designed so that subsequent hybridization produced double-stranded 21-bp precursors with 4-nt singlestrand overhangs, which were single end-labeled using T4 polynucleotide kinase and [Y-~~PIATP. With this sequence protocol, 0,3 sites were spaced by exactly two helical turns in the higher ligation products. After hybridization, the 21-bp oligomers were reacted with Cro protein in the ligation buffer and then ligated slowly at 0°C for about 12 hours. Following ligation, the protein was removed and the DNA was analyzed by autoradiography in a two-dimensional gel electrophoresis system similar to that described by U1anovsky et al. (75).Control determinations on the DNA in the absence of Cro protein were also performed; essentially no circles less than 300 bp in size were observed, and the gel mobilities were normal. Thus, the 0,3 sequence in the absence of bound protein exhibits no unusual curvature. The two-dimensional gel is shown in Fig. 1 and a scan of the spot distribution corresponding to circles is shown in Fig. 2. The distribution of clearly resolved circle sizes ranges from 147 to 273 bp with a fairly sharp maximum at 168 bp, or 8 x 21-bp precursor elements. Because the circles are topologically relaxed (75), the bending angle per 21-mer (or per bound 0,3 sequence) is immediately obtained as =360"/8 = 45" (Fig. 3). Potential problems in the interpretation of the mixed-ligation results are discussed in 69. It is possible that the distribution of circle sizes can be

plex.

STRUCTURAL FLEXIBILITY IN

DNA-PROTEININTERACTIONS

207

FIG.1. Analysis of ligation products of the 21-bp precursor element containing the 17-bp Cro recognition region (underlined), TAXACCGCAAGGGATAAATA (with complementary strand 3’-TGCCGTECCTATITATE-5’ to produce four base-unpaired ends), using twodimensional gel electrophoresis. The doublets in each circle group correspond to nicked or open (upper) and covalently closed (lower) circles. From Lyubchenko et al. (74).

1.o

! AU

0.5

b

8

11 12

Mobility (1 st Dimension) FIG. 2. Computerized densitometric scans of closed circles from Fig. 1. From Lyubchenko et al. (74).

208

RODNEY E. HARRINGTON AND ILGA WINICOV

I FIG.3. Schematic illustration of Cro protein-induced bending of the 0 , 3 operator site of A phage and the circulariz;ltion of ligated Cro-bound 21-bp oligomers (see legend to Fig. 1). An octainrric circle is shown that corresponds to the most probable circle-size obtained experinientally (Figs. 1 and 2). The dark rectangles represent boiind cro monomers. From Lyubchenko et nl. ( 7 4

distorted by poor end alignment, which might make the cyclization rate dependent on oligonucleotide length, or by the effects of intermediates in the complex eyclizationlligation process. This appears unlikely for the following reasons. Additional mixed-ligation experiments at higher temperatures. at increased T4 ligase concentrations, in the presence of polyethylene glycol (PEG), and at 21-bp precursor concentrations varied over a sinall range about the published values, showed no effect on the distribution of circle sizes formed within experimental error (74; also L. S. Shlyakhtenko, unpublished). This indicates that ligation conditions were very nearly optimum under the experimental conditions described (74) and that linear precursors for cyclization into the various sized circles observed were essentially at steady-state concentrations. Under these conditions, the distribution of circle sizes should reflect the distribution in true cyclization efficiencies for the various sizes of species observed. In addition, the agreement between the gel analysis and the cocrystal structure (66)is striking, lending additional credence to the results obtained in this work.

STRUCTURAL FLEXIBILITY IN

DNA-PROTEININTERACTIONS

209

It is possible that the 4-bp “sticky ends” on the precursor oligonucleotides used in 74 is an important factor in maintaining steady-state kinetic conditions in the present case. Considerable experimental evidence exists for relatively clean ligation of oligonucleotides having overhangs of 4 nt or more (reviewed in 69). This suggests that minor alignment problems in writhe or twist do not perturb seriously the cyclization reaction, and hence should have only a minor effect on the cyclization as a function of fragment size. The fact that other studies using mixed-ligation cyclization are in reasonable agreement with independent experimental determinations (75, 76) suggests also that complex kinetics do not limit seriously the results from these determinations under the experimental conditions employed.

b. Evidence for Site-specijic Flexibility in the Cro-0,3 Nucleoprotein Complex. Although the operator DNA is clearly bent in the Cro-0,3 nucleoprotein complex, the studies just described could only verify the bending angle and estimate its magnitude (66,74).Neither the low-resolution X-ray structure of the nucleoprotein cocrystal (66) nor the cyclization study (74)could localize specific bending loci within the 17-bp recognition region, or associate the bending with a particular sequence motif. Candidates for bending loci in the 0,3 sequence include the alternating pyrimidine-purine dinucleotide (CA)*(TG),because kinking at this sequence element has been observed by crystallography in the CAP-operator complex (48)and alternating [ (CA)*(TG)], runs are highly overrepresented in CAP sequences (77). Furthermore, this element has been identified as a possible kinking locus or site of unusual flexibility in DNA (40). Some additional information on this question has been provided by more recent experiments (78). In an extension of the work described above, studies were made of the cyclization properties of a set of single-site mutation and mismatched sequences derived from the 17-bp 0,3 wild-type recognition sequence. Mutations to the 0,3 sequence were made in accordance with thermodynamic binding constant criteria (79) to ensure that tight binding conditions were maintained in all cases. The set of mutations includes mutations to both the upstream and downstream specific binding regions as well as to the central, nonbinding region of 0,3. The specific sequences along with estimates of helical twist (80) are given in Fig. 4. From the thermodynamic data (79), the binding free energies relative to wild-type 0,3 for the mutant sequences shown are about -0.5 kcal/mol for M3 and less than -to. 1 kcal/mol for M2 and M1, respectively. Standard gel-mobility retardation assays on the free DNA for heptamers (147 bp) and decamers (210 bp) showed identical retardation within experimental error for both fragment sizes of all species (R, = 1.2 & 0.1). This indicates that all the free DNA sequences are very similar to one another in

210

RODNEY E. HARRXNGTON AND ILGA WINICOV

wild

5'- TATCACCGCAAGGGATAAATA -3' type

TWIST

A

716.6" -3.4"

3'- TGGCGTTCCCTATTTATATAG -5' Mutations

5'- m C a C A A G G G A T A A A T A -3' M1

715.7" -4.3"

5'- TATCACCGgAAGGGATAAATA -3' M 2

712.6' -7.4"

5'- TATCACCGCAAGtGATAAATA -3' M 3

718.1" -1.9"

5'- TATCtCCGgAAGGGATAAATA -3' M 4

708.3" -1 1.4"

Mismatches

5'- TGTCACCACAAGGGATAAATA -3' C O M l

C

5'- TATCACCGGAA-TA G

-3' COM2

5'- TATCKCGCAAGTGATAAATA -3' COM3

c

FIG.4. The complete set of oligonucleotide sequences used in the 0,3 mutation studies of Lyubchenko et d. (78).The specific binding regions (66, 79) are indicated by boldiunderlined typeface. Mutations are shown in lower-case type. Helical twists are from Kabsch et d. (80).All complementary strands are designed to allow 4-ba.se single-strand overhanging ends, and is shown for the wild-type sequence only

static bending properties. They are also very similar in helical twist (80),and estimated differences are all much less than the expected thermal fluctuation in this quantity (27,33).Static bending properties of the wild-type and pointmutated sequences predicted by computer modeling based on the wedge model (13) show no significant out-of-plane bending effects and are fully consistent in magnitude with the experimental gel-mobility results. Finally, the gel-mobility retardation data showed no temperature dependence; theoretical considerations predict that this can be true only for oligomers with negligible out-of-plane bending (81). These considerations are all consistent and suggest that the single-base mutations in Fig. 4 do not significantly change the torsional matching of ends. However, there are significant differences in cyclization properties among the sequences in Fig. 4. The observed cyclization properties of the mismatched sequences COM 1-COM3, included as positive controls, all

STRUCTURAL FLEXIBILITY IN

DNA-PROTEININTERACTIONS

211

show substantial cyclization in much smaller fragment lengths than the 0,3 wild-type sequence, both in the presence and absence of bound Cro. Thermodynamic (82),spectroscopic (83),and electron-microscope (84)studies as well as gel-retardation (84, 85) and enzymatic (85)investigations are consistent in demonstrating that such mismatches destabilize the double helix and may propagate structural disturbances a distance of several base pairs from the mismatch site (83), leading to increases in both bending and torsional flexibility. A typical gel for a mutant sequence (M1 of Fig. 4) containing bound Cro protein is shown in Fig. 5a and can be compared to the same sequence in the absence of Cro in Fig. 5b. The efficiency of ring formation clearly goes through a maximum at about 180 to about 200 bp in the presence of Cro, whereas only small amounts of larger circles are formed in its absence. Furthermore, the circle distribution curves for the wild-type OR3 (Fig. 2) and the M1 mutant (Fig. 5b) are noticeably dissimilar. Differences of this type are also observed for the other mutants in Fig. 4. The only fully selfconsistent explanation for these cyclization differences is to assume that they result from variations in anisotropic flexibility among these sequences. Furthermore, these variations must be due to the presence or absence of [(CA).(TG)], sequence elements, because these are the only sequence features that differ among the OR3wild-type and the point mutations Ml-M3 shown in Fig. 4. The cyclization results indicate that putative flexibility increases in the series (CA)*(TG)< (CAC).(GTG) < (CACA).(TGTG).The longest element studied, (CACA).(TGTG),appears to be significantly more flexible than the shorter elements, suggesting that it might adopt an alternative structure. A recent crystallographic study of alternating [(CA)*(TG)],tracts has shown that these elements can adopt an unusual structure at low temperatures in the crystal, which include intramolecular A-G and T C hydrogen bonds (86). If the (CACA)*(TGTG)tract in the central, nonbonding region of mutant M1 can assume this or a similar structure, it would explain the unusual propensity of this fragment to cyclize. This suggests that an alternative structure in the (CACA)*(TGTG)sequence element is associated with increased flexibility, which, in turn, might facilitate DNA bending by the Cro protein. Although the cyclization data cannot indicate a directional preference for flexible elements, calculations (33, 50) show a preference for bending into the major groove. These results suggest that the alternating pyrimidine-purine runs [(CA).(TG)], that appear in the Cro binding sites are loci of unusual anisotropic flexibility. These sequence elements may play a role in indirect readout, i. e., facilitate sequence-specific binding between the DNA-binding domain of Cro and the DNA-recognition region, by inducing bends in the

212

RODNEY E. HARRINGTON AND ILGA WINICOV

FIG.5. Representative two-dimensional polyacrylamide gel cyclization assays (74) (shown for mutant M1,Fig. 4) in the O,3 mutation studies of Lyubchenko ef ul. (78).Circle formation in (a) the presence and (b) the absence of bound Cro protein. Reprinted with permission from (78). Copyright 1993 American Chemical Society

STRUCTURAL FLEXIBILITY IN

DNA-PROTEIN

INTERACTIONS

213

latter that are strategically positioned to provide an improved fit between the binding partners. This is consistent with the observation that two (CAC)*(GTG)elements at the 0,l and 0,2 sites improve interaction with both Cro and CIrepressors (87). In support of these ideas, it should be noted that the triplet (CAC).(GTG) appears with exceptional frequency in regulatory-protein binding sites and has been proposed to be a potential site of alternative DNA structure (61, 88). It appears in a variety of regulatory sequences, as is discussed in more detail in Section 11, and regions of helically phased (CAC).(GTG)elements can weakly position nucleosomes (62). Cyclization studies may now offer a reasonable physical explanation for these various observations.

II. A Short Taxonomy of DNA-bending Proteins and Their Recognition Sequences

As we have noted, the primary motif in specific protein-DNA interactions is the binding of an a-helical or p-ribbon region of the protein to a special sequence-dependent DNA structure in which critical hydrogenbonding and other attractive interactions can occur. Conformational changes in both protein and DNA on complexation occur primarily to maximize these highly specific interactions, to enable additional sources of nonspecific binding energy, such as electrostatic and hydrophobic free energies, and to improve intermolecular interactions among proteins. Conformational changes in the protein and sequence-dependent bending in the DNA, either static or induced by the protein, probably allow correct positioning of the binding partners so that the binding &nity is either maximized or controlled within specific limits required by the biological process under regulation. These may involve adjustments in major or minor groove width to accommodate the recognition element of the protein. They may be sequence-directed, as, for example, the tendency of poly(A) runs to have narrower minor grooves (34, 89, 90)and (G C)-rich regions to have compressed major grooves (91, 92; reviewed in 5, 6, 36). Another role for DNA bending may be to allow DNA to achieve an optimum pathway in multiprotein complexes. Indeed, the only function of some specific binding proteins in oligomeric complexes may be to bend DNA (93). Finally, DNA bending may affect the kinetics of protein binding, i.e., the on-and-off rate, and in this way influence the competition of a protein for multiple binding sites. We are just beginning to witness the wide diversity in nucleoprotein complexes that exist in nature, although to date we have been extremely limited in our ability to characterize them.

+

214

RODNEY E . HARRINGTON AND ILGA WINICOV

A. Prokaryotic He1ix-Turn-He1 ix Proteins The helix-turn-helix DNA-binding domain motif was first discovered in the Cro protein of A phage (94) and has subsequently been identified in a variety of other prokaryotic regulatory proteins (reviewed in 95) including the Cro (96) and the repressor (97) proteins from phage 434, the Cro and repressor proteins from phage A, and the CAP and trp (98) repressor proteins of E . coli. Most of these have been cocrystallized with DNA fragments containing specific recognition sequences, and relatively detailed structural information from X-ray crystallography is now available for these complexes. These include the Cro (66)and cI repressor (99)from A phage, the Cro (100) and 434 repressor (also denoted R1-69) (101)from phage 434, and the CAP protein (48) and trp repressor (102) of E . coli. The crystallographic results show a general pattern of protein-induced bending in the operator DNA, and although many intriguing findings have been reported, it is not possible at present to demonstrate a consistent set of patterns in structure-function relationships.

1. THEREPRESSOR

AND

CRO PROTEINSFROM PHAGE434

The 434 repressor protein has been cocrystallized with a 20-bp fragment containing the full 0,l site (TATACAAGAAAGTJTGTACT). The 434 Cro protein has been cocrystallized with two different DNA fragments: a 14-bp oligomer with consensus homology to the 0 , n and 0,n (n = 1, 2, 3) binding sites (ACAATATATATTGT)(103),and with the same 20-mer shown above and used for the repressor (100). In the Cro system, the conformations of the DNA in the two complexes are significantly different. In the complex with the 14-mer, the DNA is straight, uniformly overwound, and in the canonical B-form. No unusual variations in the width of either the major or minor groove are evident. In the 20-mer, the central 14 bp are similar structurally to the DNA in the smaller complex, but the ends of the DNA are sharply bent in a fashion observed also with the 434 repressor complex to this same sequence (101). A close comparison of the DNA conformations obtained for both Cro and repressor shows that the principal locus of bending occurs at the symmetrically located CA and TG elements, which are separated by 12 bp, a little over a single helical repeat. The conformations of the DNA in the two systems are very similar and are roughly in the shape of a laterally elongated, shallow U with somewhat nonparallel arms (100). It is possible, therefore that these base stacks may play an indirect readout role in both the 434 Cro and repressor systems. It should be noted that the helical twists calculated for these elements from the crystallographic data do not show the anomalously large values reported for a number of other helix-turn-helix nucleoprotein

STRUCTURAL FLEXIBILITY IN

DNA-PROTEININTERACTIONS

215

complexes (see also the discussion of eukaryotic helix-turn-helix proteins in Section 11,B). However, the investigators note that the helical rise and twist parameters are highly sensitive to local variations in the structure; hence the values reported are approximate. 2. THE REPRESSORAND CRO PROTEINSFROM PHAGEA The repressor protein from phage A has been cocrystallized with a 20-bp fragment containing the 17-bp 0,l recognition site, TATATCACCGCCAGTGGTAT. The structure has been determined at two levels of resolution, 2.5 A in an early study (104),and 1.8A in a more recent determination (105). This recognition sequence has almost twofold symmetry about the central G. The left half is called the “consensus” half, because it matches the consensus sequence determined for the 12 operator half-sites (106);the right half, the “nonconsensus” half, differs at positions 13and 17. Binding to the A repressor protein is also quite different between the two halves (105). The entire recognition sequence also differs in several important respects from the sequence used in the 434 repressor study. In the A repressor, the central region of the 0,l site is (G + C)-rich, but is (A + T)-rich in the O,n sites of the 434 repressor. This suggests possible microstructural differences in this region between the two sites. We have noted, for example, that statistical studies on DNA conformation in nucleosomes show that (A + T)-rich sequences tend to occur in positions where the minor groove is compressed, whereas (G + C)-rich sequences are more likely characterized by compression of the major groove. This may be indicative of lability in major-groove width, because the major groove in the A repressor complex is slightly opened near the center, where it is contacted by the N-terminal arm of the protein (105). The 434 repressor also contains (CA).(TG)and (TG)*(CA)elements separated by 11bp and the A repressor (CA).(TG)and (TG).(CA)elements separated by 7 and 10 bp. The first and last of these appear as the triplets (CAC)*(GTG)and (GTG).(CAC).As noted in Section I,C,3,b, recent studies of DNA cyclization properties indicate that flexibility increases in the series (CA).(TG) < (CAC).(GTG) < (CACA).(TGTG)(78). The DNA conformation determined in the high-resolution structure of the A repressor complex (105) is similar at the ends of the sequence to that observed in the 434 repressor and includes bends at the terminal (CAC).(GTG) and (GTG)*(CAC)elements. As in the 434 repressor, specific contacts between the protein and DNA occur in these elements, and they exhibit no unusual helical parameters. Specific contacts are shown in Figs. 6a and 6b. On the other hand, the helical twist in the central (CA)-(TG) stack at position 12 is anomalously large: 49.2” compared to an average of 34.1” over the full sequence. There are no specific contacts made with the protein at this site (Fig. 6b), although

\

n

I

1

FIG. 6 . Summary o f the differeiitial contacts between the A repressor itnd (a) the consenstis half and (I)) the noncomensus half of the Oi,l binding site. The D N A is shown in a cylindrical projection representation. The backbone phosphate groups are shown as circles, with filled circles indicating those contacted by the protein. From Beainer and Pabo (99).

STRUCTURAL FLEXIBILITY IN

DNA-PROTEININTERACTIONS

217

contacts are made with adjacent base-pairs. A significant structural abnormality is indicated in the DNA at this position because the N-7 in the G at position 10 in the complementary strand is hyperreactive to chemical methylation on repressor binding (105).A larger than average twist (40.9")is also observed at the GC sequence at position 10. The (G + C) base-stack at position 10 is also the site of specific interactions with the N-terminal arm of the protein. The five interactions between Lys-3 and Lys-4 and various bases in this region are shown in Fig. 6a. The N-terminal arm is an important feature both in direct readout and in binding d n i t y , because mutations in the first six N-terminal residues reduce the repressor functionality and specific binding constant (107). However, it is defined structurally only in the repressor binding to the consensus (5')half of the recognition site, even at -15"C, where thermal motions are considerably reduced (105); the arm appears amorphous and structureless in the protein binding to the nonconsensus half (105, 107) and makes no contacts with bases in this region (Fig. 6b). This underscores the remarkable subtlety that governs the interrelationships between direct and indirect readout. The N-terminal arm is a structural feature in the A repressor that evidently develops only as the protein and recognition site come together to form a unified whole, and the sequence-directed extended structure of the six N-terminal residues in the protein is an important component of direct and indirect readout. Furthermore, the sequence-dependent structures of both the N-terminal region of the protein and the DNA-recognition site lead to differential binding in the two halves of the dimeric A repressor complex that may have important functional consequences in the transcriptional biology of A phage (105, 106). It appears from the above observations that indirect readout plays a significant role in the binding specificities of both the 434 and A repressors, although subtle differences exist between the two complexes. There can be little doubt that sequence-distinguished sites of anisotropic flexibility, located at critical positions in the DNA-recognition sequence, provide a basis for enhanced binding specificity through indirect readout mechanisms. These probably also allow local structural accommodations to occur between the protein and DNA binding partners that facilitate direct readout in these systems. The differential binding between the two half-sites in the DNArecognition region may also fine-tune repressor binding with respect to Cro in the lytic-lysogenic switches of the respective viruses. The structure of the Cro protein from A phage bound to the 17-bp 0,3 site, TATCACCGCGGGTGATA, has been determined in a cocrystal structure of 3.9 A resolution (66). The DNA in these crystals does not stack endto-end as in most other nucleoprotein crystals, and to date it has been dimcult to improve this resolution significantly. At present, these observa-

218

RODNEY E. HARRINGTON AND ILGA WINICOV

tions are inadequate for a direct structural comparison to the A repressor; the locus of bending cannot be determined precisely from this study, although the bending angle was estimated as about 44". As noted previously, the bending angle has also been determined as about 40" to about 45" using a two-dimensional polyacrylamide-gel assay for ring closure or cyclization (74). This is certainly in general agreement with the crystallographic observation. The additional cyclization studies in which Cro was complexed with several mutations to the OR3 binding site also suggest that the single (CA)*(TG) element in this site may also be primarily responsible for the observed bending (78). The Cro protein of A phage binds as a 14.7-kDa dimer specifically and noncooperatively to several binding sites in the A phage genome. It coinpetes with the A repressor for the 17-bp OR1, OR23 and OR3 sites to effect the switch between the lytic and lysogenic modes in this virus. The differences in binding between Cro and A repressor are not well understood from a structural standpoint. The A repressor contains distinct DNA-binding (Nterminal) and protein-binding (C-terminal) domains and binds cooperatively to these same recognition sites and to three additional 0 , n sites. Its binding affinity to operator sites is somewhat greater than that of Cro (106). (The molecular genetic mechanisms are reviewed in 108-110.) The reasons for the apparent differences in binding &nity and geometry between Cro and A repressor are not clear at the present time. What is clear is that they represent an extraordinarily finely tuned competitive binding system, and in view of the many similarities in the binding domains of these two proteins, the differences must be due to subtle effects such as microstructural relationships between the proteins and the DNA-recognition sites, and to sequencedependent anisotropic flexibility in the DNA. It is also possible that kinetic differences in binding due to DNA bending in Cro and cooperative interactions of C-terminal domains in the A repressor may play a role in the binding competition of the lytic-lysogenic switch mechanism.

3. THECATABOLITE GENEACTIVATORPROTEIN The catabolite gene activator protein (CAP; also called the cyclic AMP receptor protein, CRP) from E . coli functions primarily as an activator of transcription, although it can also act as a repressor (reviewed in 72, 111). If carbon sources are restricted, several operons, including lac and gal, are induced so that alternative sugars can be catabolized, using the coded enzymes. When CAP binds to its allosteric effector, CAMP, it actuates transcription in about 20 promoters in E . coli (111) located from 41 to 103 bp upstream from the start site. The complex binds as a dimer to a 16-bp coiisensus sequence TGTGANNNNNNTCACA. The strongest binding is to the lacP1 promoter having the sequence TGTGAGTTAGCTCACT, but the

STRUCTURAL FLEXIBILITY IN

DNA-PROTEININTERACTIONS

219

characteristic alternating purine-pyrimidine motif, (CAC)*(GTG),appears to

be a common feature of most CAP binding sites (77).Strong binding interac-

tions occur over 28 to 30 bp, a region almost twice the size of the consensus sequence (112).This appears to require substantial bending of the DNA in order to maintain contact between the DNA and protein. Although earlier studies implicated sequences outside the consensus region in the bending (92), the recent crystal structure of CAP complexed to a 30-bp sequence that includes the consensus sequence demonstrates clearly that the bending occurs almost entirely in two kinks at the symmetrically located (CA)*(TG)dinucleotide elements (3, 48). The bending is highly localized at (CA)-(TG)dinucleotide elements spaced at a helical repeat distance in the operator site of CAP (48);the (CA)*(TG)elements kink through about 45", with the result that the helical trajectory of the operator site bends by about 90". Similar bending angles for CAP have been determined in solution using cyclic permutation gel-mobility retardation studies (64, 65). Residual bending seems to be associated with the phased alternation of (A + T)- and (G + C)-rich regions, which have been identified in nucleosomes (71). In addition to the helix-turn-helix contacts, recognition is modulated by 13 additional amino-acid side-chains interacting with 11 phosphates that span a region 28 bp in size. Thus, the binding interaction has a large nonspecific component, but the specific binding part appears to be due largely to the unusual DNA conformation. The bending also seems to have a functional significance because sequences with fixed bends can activate the promoter both in uiuo (113)and in vitro (114). It is possible that the functional role of CAP in promoter activation, and possibly its only role, is to bend the promoter DNA to allow formation of the transcription complex (93),in which case it may be only one of a much larger class of DNA-steering proteins.

4. THE trp REPRESSOR PROTEIN: BY INDIRECTREADOUT

RECOGNITION

EXCLUSIVELY

The tryptophan operon (trp)is a contiguous string of five genes in E . coli that code for several biosynthetic enzymes in addition to its own control elements. It is regulated by the trp repressor, a tetramer of about 12,500-Da subunits, which requires L-tryptophan as a corepressor to provide an autogenous control system highly sensitive to local Trp concentration. The cocrystal structure of the trp repressor complexed with a 19-bp oligonucleotide of sequence TGTACTAGTTAACTAGTAC that simulates the actual operator sequence has been determined to 2.4 di resolution (102). This structure discloses a number of interesting differences between the trp repressor protein and other prokaryotic helix-turn-helix regulatory proteins, including the complete absence of direct readout mediated by specific contacts in

220

RODNEY E. HARRINGTON AND ILGA WINICOV

its binding to its DNA-recognition site. Instead of direct contacts between amino-acid residues and bases in the recognition site, a number of hydrogenbonded interactions are observed between amino-acid residues and phosphate groups in the phosphodiester backbone of the DNA. Thus, sequence specificity is evidently determined entirely by structural accommodations between the protein-binding domain and the DNA operator region. Other studies on the protein suggest that the recognition helix in the helix-turn-helix binding domain is unusually flexible in that binding L-tryptophan to the aporepressor does not lead to a unique binding domain structure (115). However, it does cause an orientation of the helix-turn-helix domain to occur that facilitates interaction of the recognition helix with the major groove in the recognition site (116). Similarly, no unique structure was found for the N-terminal residues, although it was recognized that this might be either an artifact of the crystal environment (102)or a possible consequence of N-terminal arm involvement in protein-protein interactions of dimer formation (117). The recognition-site DNA in the complex showed two shallow bends at (TA).(TA) base-stacks 6 and 14. These bends occur in different planes, and because they are separated by only nine bases, appreciably less than a full helical repeat. this suggests that the (TA)-(TA)elements are undergoing anisotropic flexure. These sequence elements are centered in the five-base tracts, (ACTAG)*(CTAGT),at positions 4 and 12. These exhibit the largest deviations in slide and roll angles from average B-DNA values, although the helical twist values are not extraordinary. The (ACTAG)-(CTAGT)tracts are also the region most sensitive to mutations in this operator site (118).The (TA)-(TA)elements at positions 3 and 17 have high helical twist values of 42.8", but are well within the canonical B-form DNA range in other helical parameters. The central (TA)-(TA)element at position 10 shows a slightly abnormal roll angle of 8.8". The only direct contacts between the tr p repressor protein and its DNArecognition site are three water-mediated interactions between specific residues and the A, G, and T at positions 15, 16, and 17, respectively. Although such water-mediated interactions might contribute a small component of direct readout to the binding specificity, it is doubtful that they could account for more than a small part of either the observed affinity or the binding specificity of the trp repressor protein to its operator. These are evidently determined exclusively or almost exclusively by indirect readout mechanisms. It is therefore of interest to examine more fully those sequence elements in the DNA that deviate most significantly from canonical B-form DNA. Gel electrophoresis studies suggest that (CA)*(TG)and (TA).(TA)dinucleotide elements are unusually susceptible to kinking by rolling into the

STRUCTURAL FLEXIBILITY IN

DNA-PROTEININTERACTIONS

221

major or minor groove, i.e., presumably they can deform at lower energies than can other dinucleotide stacks (40). The evidence above suggests that these elements may be the principal components in the large and nearly exclusive indirect readout mechanism that determines the binding specificity of the trp repressor system. By allowing the DNA to assume a very precisely defined microstructure, these (and possibly other) sequence elements may allow the DNA topography to conform very closely to that of the protein in the presence of bound corepressor at a relatively low free-energy cost. The free-energy difference between specific and nonspecific binding for the trp repressor has been estimated as =6 kcal/mol(119). The formation of 24 hydrogen bonds between the protein and the DNA phosphates (102),and improved electrostatic interactions and entropic contributions from enlarging the water-excluded contact surface between the two binding partners, can all contribute toward meeting this cost. However, it is clear that the requirement of high sequence specificity of binding also requires that the DNA be deformable in a very precise way at a relatively low energy cost. This could certainly be achieved in a sequence that contains flexible elements at just the correct positions.

6. Eukaryotic Helix-Turn-Helix The Homeodomain

Proteins:

The homeodomain is a 60-aminoacid region that comprises the DNAbinding domain of a large group of eukaryotic transcription factors (120,121). It was found in a group of Drosophila regulatory proteins involved in homeosis, a transformation in which the development of one part of a system mimics that of another, but evidence is now available that the homeodomain occurs in a broad family of eukaryotic regulatory proteins and can be grouped into a number of subfamilies (reviewed in 120). The amino-acid sequences are highly conserved and the modes of DNA binding are evidently comparable, with many similarities to the basic prokaryotic helixturn-helix motif (reviewed in 122). Current evidence suggests that the level of conservation is significantly higher among protein homeodomains than among the corresponding DNA homeobox sequences (which typically are found in the coding regions of the genes). In addition, there is usually little amino-acid sequence similarity outside the homeodomain regions, although there are a number of important exceptions. Thus, evolutionary pressures are generally presumed to be more important at the protein level in these systems (120). At present, relatively high-resolution structural data are available for three different homeodomain peptides complexed to DNA-recognition se-

222

RODNEY E. HARRINGTON AND ILGA WINICOV

quences. The structure of a mutant Antennapedia (Antp)homeodomain from Drosophila bound to a 14-bp oligomer has been studied by NMR (123). Cocrystal structures have been completed on the “engrailed protein from Drosophila (124) and on the a2 repressor protein from yeast (122), both complexed to DNA oligomers with consensus recognition sequences that bind the proteins tightly. These proteins are genetically distinct, but are quite similar to each other in the gross structure of their binding regions. which all fall into a broadly defined helix-turn-helix motif, although the sequence similarity among these homeodomains is <30%. The recognition DNA is exclusively B-form in all these systems, and there is no evidence for bending or for significant sequence-directed structural dislocations in the specific systems investigated. Binding specificity derives in all cases from a very limited number of specific contacts between recognition regions of the protein and specific bases in the major or minor grooves of the DNA. However, in all these structural studies, both the protein and recognition DNA moieties are severely truncated, and it is possible that complexes between the entire proteins and longer regions of DNA may utilize flexibility both in the proteins and in the DNA to increase binding specificity through indirect readout.

1. NMH STRUCTUREOF

AN

Antp HOMEODOMAIN PEPTIDE

An Antp homeodomain peptide consisting of the 68-aminoacid residues between 297 and 363 of the Antp protein, plus an N-terminal methionine associated with the preparative method, was investigated by NMR (125). It also contained a C-to-S mutation at residue 39 that prevents dimerization. This peptide was bound to a 14-bp DNA fragment corresponding to the BS2 binding site identified as a specific recognition site, CTCTAATGGCTITC (complementary strand 3’-GAAAGCCAITAGAG-S’) (125). The mutant and wild-type proteins bind to this DNA sequence with similar binding constants of approximately 10-9 M (126). An essentially complete set of sequencespecific resonance assignments were obtained; comparison of these to independent assignments on the peptide and DNA moieties alone suggested that conformationaf changes in both binding partners associated with complex formation were relatively small. The structure was determined by distance constraints guided by molecular modeling of the free protein associating with the free DNA. Nine intermolecular NOES were obtained, indicating specific contacts between the pepticle and the DNA; the most important of these are shown in Fig. 7. These contacts were all in the major groove with the single exception of R(S) to G(12) in the complementary strand, which was a minor-groove contact, and the majority from helix I or the loop involve sugar protons of the DNA. It is clear from Fig. 7 that four are contacts between the putative recognition

STRUCTURAL FLEXIBILITY IN

5 RO

10

DNA-PROTEININTERACTIONS

15

20

25

\

LT

223

R R R30 R

I A35

H

-0

A

N E K K W K M R R N O F W I K I O R E T L.

60

[

55

50

45

helix IV I helix 111 000000000000

40

s

L

-_--. turn 007

FIG.7. Summary of available data on interactions in the complex between the Antp homeodomain and the 14-bp binding-fragment used in the N M R structural studies of Otting et d. (123).The sequence of the fragment is given in the center; a and p refer to the two complementary strands. The amino-acid sequence of the protein is arranged in a clockwise fashion around the D N A and is numbered from the C-terminal end; residues at either terminus are omitted for simplicity. The secondary structure of the protein in free solution and in the complex is noted beside the protein sequence. Bold letters indicate those amino-acid residues for which large chemical-shift changes were noted on complexation. Squares indicate those residues with slow amide proton-exchange in the complex, and a question mark indicates those for which no measurement of exchange was possible. Arrows show the specific contacts as evidenced by NOES behveen the D N A and the protein. Reproduced from Otting et al. (123)by permission of Oxford University Press.

helix 111and the AA element in TAAT, and with helices 111 and IV and the C in (CA).(TG) of the complementary strand. In the free Antp homeodomain, helix IV is an extension of helix I11 and the two are linked by a kink of about 30"(127); this extended helical structure evidently persists in the nucleoprotein complex. It is possible that flexibility in both the DNA at the (CA)*(TG)element and in the protein at the helix 111-IV junction may permit additional contacts to occur in the complex. Overall resolution is insufficient to address this possibility directly, but it is supported by the presence of chemical shift differences for residues between positions 55 and 60 in helix IV. On the other hand, amide proton exchange rates suggest that helix I11 is stabilized in the complex but helix IV is similar in the complex and in the free peptide. Whether this latter observation is an artifact of the relatively short DNA sequence used in this work is not known. However, the observed high specificity of binding would be favored by additional specific contacts as well as by DNA and protein structural changes that strengthen indirect readout.

224

RODNEY E. HARRINGTON AND ILGA WINICOV

2. COCRYSTAL STRUCTURESOF NOMEODOhfAIN OF

HOMEODOMAINOF DNA SEQUENCES

THE ENGHAILED Drosophila AND THE MATa2 YEAST BOUND TO RECOGNITION

A peptide of 61-aminoacid residues containing the 60-residue homeodomain from the engrailed protein of Drosophila with an added N-terminal methionine was cocrystallized with a 21-bp fragment of sequence TITTGCCATGTAATTACCTAA, and the crystal structure was determined to 2.8 A resolution (1%). This fragment binds the peptide tightly with a biridirig constant of about 10-9 M. Three principal helical regions having a motif similar to prokaryotic helix-turn-helix proteins were found; helices I and I1 were oriented approximately antiparallel and joined by a loop; helix 111, the putative recognition helix, was oriented almost perpendicularly to these and was connected to helix 11 by a turn. In the nucleoprotein complex, the hydrophilic face of helix 111 lies in the major groove of the DNA so that several amino-acid side-chains can make specific contacts, whereas helices I and I1 span the major groove at almost right angles to it. No evidence for the kink in helix 111 observed in the A n t p homeodomain could be observed at the resolution reported either in the complex or in the free peptide. The DNA fragments are aligned end-to-end in the crystal to form a quasicontinuous helix, and peptides are bound both at the junctions and over the TAAT sequence element (bases 11-14). Binding to the TAAT element is tighter than at the ends by roughly two orders of magnitude, and in the complex to TAAT, specific contacts are made by three residues on helix 111 to this element in the major groove and by two residues located on an extended N-terminal arm in the minor groove. Additional contacts occur with the DNA backbone. Neither the peptide nor the DNA conformations are seriously distorted in the complex as compared to the free state. From this, it would appear that binding specificity in this complex arises almost exclusively from direct readout, with little if any indirect readout contribution. However, both the peptide and DNA-recognition fragment are truncated in the complex studied, and it cannot be determined from these results whether additional contacts and alternative structural features might augment the role of indirect readout in the full nucleoprotein complex. Certain structural features and a comparison of the engrailed and MATa2 complexes suggest this as a possibility. The horneodomain of the a2 repressor in yeast has also been cocrystallized with a DNA fragment containing a recognition sequence and the structure determined to 2.7 A resolution (122). In spite of the sequence divergence, the MAT& homeodomain is structurally similar to the engrailed horneodomain, and the DNA binding utilizes a comparable extended helix-

STRUCTURAL FLEXIBILITY IN

DNA-PROTEININTERACTIONS

22s

turn-helix motif. The DNA-recognition site investigated is a 21-bp sequence of imperfect dyad symmetry truncated from the STE6 gene operator (128)that consists of a conserved 13-bp region between two 9-bp a2 recognition elements. In the native a 2 complex, a dimer of MCMl proteins bind to the central region, and the a 2 dimer binds to the 9-bp regions through the homeodomains located in the C-terminal region ofthe protein. The N-terminal domains interact cooperatively in a manner analogous to the A CI repressor. The truncated recognition site used for the cocrystal structure has lost the central 12 bp of the MCMl site and consists only of the or2 sites and a single overhanging base at each 5' end. Its sequence is shown in Fig. 8 along with alignments of four additional a2 binding sites for which the a2-MCM1 complex functions genetically as a repressor. The CATGTAAT consensus sequence element is present also in the engrailed recognition site. As in the case of the engrailed complex, the a 2 complex appears to form with only minimal structural distortion of the free peptide and the DNAbinding partners. In particular, no bending of the DNA is evident. The pattern of specific contacts is also similar, but involves different amino-acid -1

1

2

3

4

5

6

7

8

9 10

A C A T G T A A T T C A T T T A C A C G G G T A C A T T A A G T A A A T G T G C G U

A)

9 ' 8 ' 7 ' 6 5 ' 4 ' 3 ' 2 '

B)STEG

C A T G T A A T T C G T G T A A A T

BAR1

C G T G T A A T T C A T G T A A T T

STE2

C A T G T A C T T C A T G T A A A T

MFAl

T G T G T A A T T C A T G T A A A T

MFA2

C A T G T A T T T C A T G T A A A T

1'-1'-2'

consensus C A T G T A A T T FIG.8. A comparison of the oligomer sequence used for the cocrystal structure study of the MATa2 homeodomain-operator complex by Wolberger et al. (122) with the five known operator sites. (A) Sequence of the 21-bp oligomer used for cocrystallization. The consensus side is to the left (bases 1-9) and the nonconsensus side is to the right (bases 9'-1'). The axis of pseudodyad symmetry is the C.G (base 10). (b) Alignment of the ten known a 2 binding-sites as obtained from the five known a2-MCMl operators. From Wolberger et al. (122)with permission of Cell Press 0 1991.

226

RODNEY E. HARRINGTON AND ILGA WINICOV

residues in the two complexes. In the a2 complex, three specific contacts occur in the major groove between helix I11 and the TGT element in the DNA, and an N-terminal arm of the homeodomain allows at least one specific minor-groove contact. Additional contacts with the backbone occur in the 5' end of the consensus sequence with the (CA)-(TG)and the T of the (TG).(CA) elements. The specific contacts are summarized and compared between the a2 and engrailed complexes in Fig. 9 (122).The most important specific contacts of helix I11 with the DNA occur two to three bases 5' in the a2 peptide compared to the engrailed peptide, but about the same number of contacts are involved in the two complexes and the factors governing direct readout in the two systems must therefore be similar. Although the contacts between the a2 and engrailed homeodomains and the 5' sequence elements (CA).(TG) and (TG)-(CA) are to the sugar-

a2

n

engrrllsd

5' 8asn paws

1

1

2

3

4

5

8

7

6

9

FIG.9. A comparison of the engrailed and a2 homeodoinain complexes with recognition sequences. The D N A is shown in a cylindrical projection. Backbone phosphates are shown as circles, with hatching indicating those phosphates that make specific contact with the homeodomain. Identical residues in the two proteins are enclosed in solid lines, and nonidentical residues, in dashed lines. The sequence of the a2 site is the consensus half. From Wolberger et al. (122) with permission of Cell Press Q 1991.

STRUCTURAL FLEXIBILITY IN

DNA-PROTEININTERACTIONS

227

phosphate backbone, the degree of sequence conservation at these sites is remarkable. Furthermore, the helical twist values at these sites are extraordinarily large in both cases. In the engrailed complex, the twist values for the (CA)*(TG)and (TG).(CA)elements just upstream from the (TAAT)*(ATTA) core are 44.7" and 40.6', respectively, compared to a mean of 34.2" over the entire recognition sequence. In the a 2 complex, the same elements twist 37.8" and 38.7", respectively with a mean of 34.5'. These values are especially interesting because Lipanov et aZ. (73) observed twist angles of about 50' and 36" at the same (CA).(TG)element in a DNA decamer crystallized in two different space groups, and attributed this to unusual flexibility. As noted in Section 11,A,3), (CA)*(TG)is kinked in the cocrystal structure of the CAP-DNA complex (48),and it appears to be a locus of unusual flexibility in the complex of Cro protein of A phage with its OR3 recognition site. It is therefore a reasonable conjecture that, in the native a 2 and engrailed homeodomain nucleoproteins, these sequence elements may grant enough axial or torsional flexure to permit additional specific contacts to be made and to amplify binding specificity through indirect readout mechanisms. Such a conjecture would also rationalize the high level of conservation in these sequences.

C. Zinc-finger Proteins: The "C2H2" Classes 1. THE ARCHETYPAL ZINC-FINGERTRANSCRIPTION FACTOR, TFIIIA The Cys,His, (C,H,) class of zinc-finger proteins, typified by transcription factor TFIIIA from Xenopus laeuis, have as a common basic structural unit two cysteines and two histidines coordinating a single zinc atom and separated by a loop of 12 or 13 amino acids. TFIIIA is required for transcription of the 5s ribosomal subunit genes by RNA polymerase I11 (129). It is a canonical zinc-finger protein of 39 kDa and it binds in a highly sequence-specific fashion to an approximately 50-bp internal control region (TCAGAAGCCAAGCAGGGTCGGGCCTGGTTAGTACTTG GATGGGAGACCGCCTG), spanning bases 45 through 97 in a 122-bp highly conserved region of the 5s RNA gene using a 30-kDa region of its N-terminal domain (130, 131), which contains nine well-defined zinc-fingers (132). On the average, therefore, each finger domain can interact with up to about 5 bp. It also can bind to the 5s gene product to form a 7s nucleoprotein particle that, in oocytes, evidently functions to store RNA for later use (133). Although the characterization of TFIIIA initiated the zinc-finger concept (132), many other specific binding regulatory proteins are now known to utilize this general binding motif (yeviewed in 133, 134). Two principal models have been proposed to describe the interaction of

228

RODNEY E. HARRINGTON AND ILGA WINICOV

TFIIIA with its recognition site. These models can probably be generalized to other zinc-finger proteins, at least to those of the same CzHz class. Both models require that the overall alignment of the protein is roughly parallel to the DNA. In the model developed by K h g and co-workers (135, 136; reviewed in 137), the protein lies along a single face of the DNA and successive zinc-fingers interact in alternate orientation with the major groove such that structurally equivalent -5-bp contacts occur every other finger and are spaced -10 bp apart. In an alternative model proposed by Berg (134, 138), the fingers wrap around the major groove, making -3-bp contacts. To fit the hydroxyl radical footprint of the complex (139),fingers 1, 5, 7, and 9 lie in the major groove while 6 is constrained to lie across the DNA due to the short linkers, resulting in a sharp kink of -60" or greater in the DNA at a point about one-third from the end of the internal control region. The structure of the Zif268 protein is generally consistent with this latter model (140). Both models can account for DNA bending in the association complex, but neither requires it in general except in the case of TFIIIA. It is not clear at the present time whether DNA bending with zinc-finger proteins is likely to be a general phenomenon (134). Studies on TFIIIA mutants (141) and on cloned peptides containing different zinc-fingers (142-144) demonstrate a division of labor among the nine zinc-fingers for DNA and RNA binding activity. Hydroxyl-radical cleavage patterns (141) and footprinting, methylation interference, and differentialbinding studies on precisely defined fragments of the DNA-binding domain (143, 144) show that fingers 1, 2, and 3 bind with high specificity and affinity to the 3' "C-block promoter element (the last 18 bp in the internal control sequence given above) and provide about 95% of the binding energy of the full protein to the full internal control region site. Base-specific and phosphate contacts in the major groove provide a direct readout mechanism for this binding. Zinc-fingers 4, 5 , 6, and 7 represent the minimal domain required for specific, high-affinity binding to RNA (144). Fingers 1, 2, and 3 are not required to account for most of the RNA-binding energy. Fingers 8 and 9 contribute little to either the DNA or RNA binding specificity or &nit); and evidently function only to locate the protein on the internal control sequence by binding fingers 7 through 9 weakly to the 5' adenine block promoter element (base-pairs 6 through 18 in the above sequence) (141-143). Such correct positioning of the protein on the target DNA is probably required to provide essential contacts for other components of the transcription complex (145). Hydroxyl radical studies (141) on TFIIIA bound to the internal control sequence have suggested that the DNA-binding fingers 1, 2, 3, 7, and 8 are in a compacted conformation similar to that observed in the Zif268 protein (see below), whereas the RNA-binding fingers 3,

STRUCTURAL FLEXIBILITY IN

DNA-PROTEININTERACTIONS

229

5, 6, and 7 are extended and lie roughly parallel to the helical axis of the DNA. There is presently some controversy as to whether the DNA bend in the TFIIIA recognition site complex predicted by the Berg model (134, 138) is observed experimentally. In Fig. 10, a helical trajectory plot is shown of the internal control region to which TFIIIA binds. This plot is based on wedge angles given by Bolshoy et al. (13) and twist angles from Kabsch et al. (80); the predicted average helical repeat for this region is 10.64 bp. Clearly, there is no evidence of significant fixed bending in the absence of bound protein. However, there is some experimental evidence that the DNA in the TFIIIA-DNA complex may be curved. Phosphorus-imaging electron-microscopy of TFIIIA bound to the internal control region suggests that the DNA may bend (146). Experiments based on circularization and cyclic permutation gel-mobility shift assays appear to demonstrate DNA bending of about 60 to 65" in the internal control region (147). This conclusion has been challenged by Zweib and Brown (148), who also used the cyclic permutation method and report that the DNA in the TFIIIA complex is bent by no more than about 30". It has been suggested

0"

60'

I 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7

5'-TCGGAAGCCAAGCAGGGTCGGGCCTGG 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3

TTAGTACTTGGATGGGAGACCGCCTG-3' FIG. 10. Computer-developedDNA trajectory projection for the internal control region of transcription factor 1lIA (TFIIIA) from Xenoplls luevis (86.87).The following helical parameters were used: dinucleotide wedge angles (combined roll and tilt angles) were taken from the compilation of Bolshoy et al. (13)and helical twist values from Kabsch et aZ. (80).The 5' base in the sequence is at the bottom of each strand. Two orientations of the strand about an axis perpendicular to the 5' base-pair, which differ by 60" in an anticlockwise sense, are shown.

230

RODNEY E. HARRINGTON AND ILGA WINICOV

that the discrepancy between these two studies may arise from differences in the ionic strengths employed, because, at high and low ionic strengths, TFIIIA may exist in different conformational states that may bend the DNA recognition region differently on complexation (149). This is an important question because studies to date cannot determine whether the intact TFIIIA protein undergoes conformational changes on binding to different substrates. The 50-bp recognition sequence of TFIIIA given above contains the putative flexibility elements (CA).(TG) and (TA).(TA)in near-helical or halfhelical periodicity. It is therefore of some interest to examine the effects these elements may have on DNA conformation. Fig. 11 shows axial trajectory plots of the full internal control region using the same helical parameters as in Fig. 10 except that (CA)*(TG)and (TA).(TA)are allowed to kink through an angle of 45" into the major groove. The first of these assumptions is justified by structural studies on the (CA)*(TG)element in CAP (48) and Cro (66, 74. 78) recognition sequences, which show that it kinks by about this amount in the respective nucleoprotein complexes. The second assumption, that kinking occurs into the major groove, is based on theoretical considerations (150). In Fig. l l a , a trajectory plot is shown for the internal control region assuming kinking at (CA).(TG)and (TA)-(TA)elements. Two views are shown: in the first, the DNA is lying primarily in the plane of principal bending; in the second, this plane is rotated 60"about an axis normal to the first (5')basepair. Figure I l b shows the same views, but excludes kinking at the (TA).(TA) elements. The jog in the axial trajectory nearest the 5' end is cased by two (CA).ITG) elements spaced at half-helical periodicity; subsequent elements then bend the DNA almost coherently in the plane until the final (3')about 10 bp. which are seen to curve away from the principal bending plane. The main effect of including the (TA)*(TA)elements is to increase the bending planarity over the 3' half of the sequence, which includes the C-block promoter element. Although the bending assumptions used in this illustration cannot be rigorously justified, Fig. 11 nevertheless points up a possible, and indeed, a likely role for sequence-directed flexibility in the TFJIIA internal control region that is consistent with known protein-DNA contacts (141-145). Limited axial deformation at (CA)+(TG)and (TA)-(TA)elements can easily lead to the degree of bending suggested by independent experiments (146, 147), and furthermore can lead to a DNA conformation in the internal control region that allows the DNA to wrap around the protein and thereby provides an efficient structure for multiple zinc-finger binding at either end (144, 14.5).

STRUCTURAL FLEXIBILITY IN

DNA-PROTEIN INTERACTIONS

231

60'

(b) 1 2 3 4 5 6

7 8 9 0 1 2

3 4 5 6 7 8 9 0 1 2 3 4 5 6 7

5'-TCGGAAGCCAAGCAGGGTCGGGCCTGG 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8

9 0 1 2 3

TTAGTACTTGGATGGGAGACCGCCTG-3' FIG. 11. Computer-developed DNA trajectory projection for the internal control region of transcription factor IIIA (TFIIIA) from Xenopus h o i s (86,87).The followinghelical parameters were used: dinucleotide wedge angles (combined roll and tilt angles) were taken from the compilation of Bolshoy et aE. (13)and helical twist values from Kabsch et al. (80).The 5' base in the sequence is at the bottom of each strand. (a) All (CA)-(TG)and (TA).(TA) dinucleotide elements are kinked through 45" by roll angle only into the major groove, as suggested by the available experimental evidence (3, 40,48, 74, 78).Views shown are rotated 60"with respect to each other about an axis normal to the 5' base-pair. (b) (CA).(TG) dinucleotide elements are kinked as in (a) (3, 48, 74, 78) and (TA).(TA)dinucleotides assume their normal wedge angle values (13). Views shown are also rotated 60" with respect to each other about an axis normal to the 5' base-pair.

232

HODNEY E . HARHINGTON AND ILGA WINICOV

The TFIIIA nucleoprotein complex has also been studied using a number of biochemical and biophysical techniques, including nuclease digestion (1S1, 152). methylation protection (135),and hydroxyl radical footprinting (136, 139), including the missing nucleoside method (143, phosphorusimaging electron-microscopy (146), circular dichroism spectropolarimetry (253), and electric dichroism (15.1).None of these methods has yet provided an unequivocal characterization of its conformation or conformational lability. Nevertheless, from what is currently known about this system, it seeins likely that TFIIIA utilizes sequence-directed flexibility and microstructure in its recognition sequence, so that indirect readout is an important component of its binding specificity However, evidence is not yet adequate to determine whether this notion can be extended to other zinc-finger proteins.

2. THEZr~268PROTEINOF

THE

MOUSE

‘4peptide ofthe ZiQ68 immediate-early-protein from the mouse containing only the DNA-binding domain has been cocrystallized with a DNArecognition site, and the structure has been determined to 2.1 A resolution j2.10). The Zif.268 protein uses a three-zinc-finger DNA-binding motif, and the peptide used for the structural study contained all of these. The DNA fi-agment used for cocrystallization was an 11-mer consensus (155) binding sequence AGCGTGGGCGT. The three zinc-fingers are in a tandem array, and the a-helices of each finger contact adjacent base triplets in the major groove with a similar orientation: finger 1 contacts the 3’ (GCG).(CGC) triplet, finger 2 contacts the central (TGG)*(CCA)triplet, and finger 3 contacts the 5‘ (GCG)-(CGC)triplet. The entire protein therefore assumed a slightly skewed C-shaped conformation that lies along the major groove. Although a (TG)*(CA)sequence is in the central-binding base-triplet, the structure showed little evidence for D N A bending at this site. Helical parameters for the entire recognition sequence were well within the normal range for B-form DNA, although considerable variability was observed in helical twists, particularly in the binding triplets for fingers 1 and 3. Binding specificity seeins to be mainly direct readout from 11 specific base contacts by the peptide. and a number of interactions with the DNA backbone. These are mainly with the guanine-rich noncoding strand. There is little or no evidence for indirect readout based on unusual DNA microstructures, although the central (TGG)*(CCA)triplet is highly conserved. It cannot be determined from this study whether deformation of the DNA-binding site occurs in the complex between the full protein and a longer DNA sequence. As noted above, such an effect has been found in cocrystal studies of the A Cro protein complex to the 0,1 operator site. In addition, crystal packing forces may influence complex conformation in these relatively “soft” sys-

STRUCTURAL FLEXIBILITY IN

DNA-MOTEIN INTERACTIONS

233

tems. In the Zif268 cocrystal, the 11-bp duplexes stack end-to-end, forming a quasi-continuous helix. This forces the 11-bphelical repeat observed in the operator DNA and could lead to additional constraints against DNA bending.

D. The “C,” Class of Zinc-binding Proteins A more diverse class of zinc binding proteins, the (Cys), class, contains a variable number of cysteines. For example, certain yeast transcription31 factors, typified by GAL4, contain six invariant cysteines (CJ. The steroid receptors that have been characterized are mostly of the C, or C, class. The C, proteins are predicted to form a 13-aminoacid loop similar to the C,H, group. Important members of this class include the large superfamily of hormone receptor proteins, including steroid hormones, thyroid hormones, retinoic acid, and vitamin D, receptors (reviewed in 156, 157‘).

1. THE GAL4 TRANSCRIPTIONAL FACTOR The GALA transcriptional factor from yeast regulates a number of genes responsible for galactose catabolism (reviewed in 158).Transcriptional activation by GALA is controlled by the regulatory factor GAL80 in an inverse fashion with local galactose availability and a GAL3 gene product acts also as a GAL80 regulator (159). A cocrystal structure at 2.7 A resolution has been reported for an N-terminal 65-aminoacid fragment from GALA bound to a consensus DNA operator sequence that provides many details of the protein-DNA binding in this complex (160). In the cocrystal, the protein was bound to a 19-bp fragment CCGGAGGACAGTCCTCCGG containing a 17-bp palindromic recognition sequence. This sequence differs from the consensus of the 11known binding sequences (161)only in replacing an A for a T at the dyad at position 10. Although GAL4 binds as a dimer to its palindromic recognition sequence, it does not dimerize in the absence of its operator DNA (162). A crystal structure (162)and an NMR study (163)of the protein show that the DNA-binding domain contains six cysteine residues coordinating two Zn2+ ions in a “binuclear cluster” (163).These represent a class of zinc-binding proteins distinct from the zinc-fingers of TFIIIA and the zinc-binding domains of the steroid receptor proteins. In GAL4, the zinc-binding domains interact in the major groove with the (CCG).(CGG)trinucleotide elements at the ends of the recognition sequence. These domains are linked to C-terminal coiled-coil domains, which lie roughly over the dyad in the symmetric complex, by an extended segment that is unstructured in the absence of bound DNA, but which follows along the minor groove in the complex. The overall complex is tripartite, and the carboxy-terminal regions function as weak dimerization elements.

234

RODNEY E. HARRINGTON AND ILGA WINICOV

The DNA is weakly bent toward the protein at the dyad, which is a (CA).(TG) in the cocrystal and a (TG).(CA) in the consensus operator sequence. The operator DNA is canonical B-form in its helical parameters, but the minor groove, which is unusually wide over most of the 17-bp recognition sequence, is conspicuously constricted just at the dyad. This is evidently due to interaction of side-chains in the N-terminal portion of the coiled-coil domain with phosphates in the minor groove 2 and 3 bp away from the dyad, but it may also facilitate the small degree of bending observed in the operator DNA at or near the dyad. The major groove in the central region of the DNA-recognition site is relatively free of protein in the cocrystal structure studied, and may therefore be a binding site for an additional protein under in uiuo conditions. This seems likely, because binding specificity froin the zinc-binding domains does not appear to be greatly extended by indirect readout, with the exception of possible flexure at the putative kink site at the dyad. Furthermore, the central sequence is relatively conserved, suggesting that it may also be involved in additional protein-DNA contacts.

2. THE NUCLEARRECEPTOR PROTEIXS

The nuclear receptor proteins constitute a broad family of regulatory proteins that includes receptors for steroid hormones, thyroid hormones, vitamin D, retinoids, and a number of other regulatory proteins, some of which are of unknown function (164). They are activated by ligand binding and have many functional features in common, such as distinct domains for DNA binding, ligand binding, and regulation of transcription. The DNAbinding domains contain eight cysteines coordinated in various ways to the zinc ion. The best studied from a structural standpoint are the steroid receptors (reviewed in 165) as typified by the glucocorticoid receptor protein. In the DNA-binding domains, the eight cysteines coordinate two zinc ions to form two tetrahedrally coordinated zinc-binding regions containing looped regions of -9 to -13 residues. A highly amphipathic a-helix of -11 to -13 residues begins at the residue following the third zinc-binding cysteine. NMR results indicate that the zinc-binding motifs in the glucocorticoid receptor differ significantly from those in TFIIIA. In the hormone receptors, the zinc-binding motifs tend to fold together as part of a larger single domain (165-167), in contrast to TFIIIA, in which they exist as independent zinc-fingers. In both cases, direct readout of the DNA sequence occurs through the zinc-stabilized recognition a-helices, but in the glucocorticoid receptor, the two helices lie almost at right angles in the unified domain, and the first helix functions as the recognition helix by insertion into the major groove of the recognition-site DNA.

STRUCTURAL FLEXIBILITY IN

DNA-PROTEININTERACTIONS

235

Such a unified structure has been verified in a recent crystallographic study at 2.9 resolution of the DNA-binding domain of the glucocorticoid receptor protein cocrystallized with a specifically bound DNA fragment (167). This suggests that the glucocorticoid receptor, and probably other steroid receptors as well, constitute a clearly delineated third class of zincbinding regulatory systems. In the cocrystal(267), the glucocorticoid receptor was bound to a symmetrized operator CCAGAACATCGATGl'TCTG instead of to the consensus (168) sequence TCAGAACATCATGTTCTGA. In these sequences, the half-sites are hexamers (underlined), so that the spacers between them are 4 and 3 bp, respectively. The DNA-binding domains dimerize when bound to the symmetrical sequences with a spacing between them corresponding to that of the consensus sequence. Only one of the domains interacted fully with the half-site, whereas the other was displaced 1 bp toward the first and interacted nonspecifically with a nonbinding hexamer. A subsequent study at 4 A resolution used cocrystals of the DNA-binding domains with the consensus sequence, and revealed that in this case the domains associated correctly with the two hexameric half-recognition sites. The pattern observed in the consensus sequence of (CA)-(TG)(position 7) separated from a (TG)*(CA)(position 12) by a 3-bp spacer, i.e., an exact halfhelical turn, is preserved also in the estrogen response element (168); in the thyroid response element, the 3-bp spacer is lost (169). The flanking (CA)*(TG)and (TG)*(CA)elements at positions 2, 10, and 17 in the glucocorticoid receptor consensus sequence are not highly conserved. Binding specificity in each half-site is provided by three specific base contacts and seven backbone contacts in the major groove between the DNA and the zinc-binding module (recognition helix) of the peptide. In the glucocorticoid and estrogen receptor proteins, spacing and orientation requirements for the half-recognition sites are determined by a region of about four residues in a loop on the surface of the protein near the recognition helix that can form protein-protein contacts with the adjacent bound protein. This leads to the correct spacing and orientation of the recognition helices in the bound protein dimer. In the thyroid receptor, these residues do not support such interactions, and allow dimeric protein binding to the two adjacent halfsites. In all cases, however, binding of two proteins to the full recognition site occurs independently and cooperatively, and under normal conditions the proteins do not dimerize in free solution. Although this level of direct readout is not extraordinary, even including the contributions of protein-protein interactions, the nuclear receptor proteins as a group are characterized by relatively weak af€inity to their target sequences. Although conserved putative flexibility elements are present in the symmetrical sequence used in the cocrystal structure, the DNA is not

236

RODNEY E. HARRINGTON AND ILGA WINICOV

bent in the half-recognition sites and its helical parameters are within the B-form range. However, considerable variance in these parameters occurs across the half-recognition sites and the major groove in the specifically bound site is widened by about 2 W compared to the nonspecifically bound site, possibly as a consequence of the inserted recognition helix. Whether this apparent absence or near absence of indirect readout is an artifact stemming from the small DNA sequence (19 bp) and truncated protein cannot be determined at present. There is a growing body of evidence that the functions of many hormone response elements inay be modulated not only by the binding of appropriate receptor proteins but of various nonreceptor accessory proteins as well. In addition, many response elements may in fact be composite and functionally modulated by more than one receptor, as well as by nonreceptor proteins. Recent examples include composite response elements for the mineralocorticoid and glucocorticoid receptors (170; reviewed in 171)and for the vitamin D receptor and retinoidX receptor-a (172; reviewed in 173). These highly complex differential interactions provide a richly varied menu of possibilities for ligand-mediated transcriptional control. They would also seem to virtually mandate flexibility in the response element and possibly in the receptor proteins as well. Hence, the absence of DNA bending observed in the cocrystal structure of the glucocorticoid DNA-binding domain complexed with the response element (167) may not be a general characteristic, because this structure may represent only one of a large number of functional structures, and the putative flexibility elements in the DNA-recognition sites may conceivably play important fiinctional roles in this regulatory system.

E. Leucine-zipper Proteins 1. Fos

r

l JLJK ~ ~

Fos and Jun are transcription factors of the bZIP family that interact either as homodimers or heterodimers with a DNA-binding region that is conserved from yeast to humans (reviewed in 174). By forming heterodimers with a large number of other family members, these proteins can create a very large number of complexes that can compete for the AP-1 and CRE recognition sites (175).Because these sites are often components of complex regulatory elements, the associated proteins may interact further with other transcription factor families. Thus, these recognition regions are subject to a set of rather special requirements, including a high level of versatility. Dimerization of the proteins is mediated by a leucine zipper in the DNAbinding domain of both proteins (1 76).The consensus sequence for binding is TG.4CTCA, which also binds the human transcription factor AP-1 and the

STRUCTURAL FLEXIBILITY I N

DNA-PROTEININTERACTIONS

237

yeast factor GCN4 (174;see Section 11,E,3).This sequence has dyad symmetry about the central (CG).(GC) base-pair and is unusual also as it contains (CA).(TG) putative kinking elements (48, 78) phased at exactly half-helical periodicity. Cellular Jun from both chicken and humans also binds to a TGACACA site, located between the CAAT box and a TATA-like sequence element; binding to this site autoregulates in a positive sense the expression of the gene (177). This sequence is asymmetric, and if kinking occurs at (CA).(TG),the change from a (CT).(AG)to a (CA)-(TG)element may substantially alter the trajectory of the DNA in the binding site. Figure 12 shows computer-generated helical-axis trajectories for the two AP-1 recognition sites discussed above. The same wedge and helical twist angle data used in Figs. 10 and 11are used here, and a similar comparison is made between free DNA (left-hand side) and the assumption of kinking through 45” involving roll angle only into the major groove at each (CA).(TG) element (right-hand side). This comparison suggests a possible structural rationale for the binding specificity differences between these sites, because the trajectories of the DNA, due to the different spatial relationships between putative induced bending sites, are quite different in the two sequences. Recent cyclic-permutation gel-mobility-shift studies employing phased bending analyses indicate that the association of Fos and Jun with the AP-1 consensus site results in significant conformational changes both in the proteins and in the DNA (176, 178). These studies further suggest that binding of Fos- Jun heterodimers and Jun- Jun homodimers directs bending in opposite directions; Fos-Jun binding leads to DNA bending into the major groove whereas Jun-Jun binding bends DNA toward the minor groove. Other Fos-Jun combinations appear to induce negligible DNA bending, and it has therefore been postulated that DNA flexibility in the AP-1 site may be affected by the binding of these proteins (176). Cyclic-permutation studies such as these are often difficult to interpret because anomalous polyacrylamide gel retardation due to DNA bending is highly nonlinear with respect to bending planarity, i.e., to the phases of multiple bending loci (32). Nevertheless, these studies suggest a “flexible hinge” model for these systems (176), a concept perfectly consistent with putative flexibility in the (CA)*(TG)elements in the recognition sequences.

2. THEMYCPROTEINS Another group of oncogene proteins that bind to palindromic recognition sequences containing approximately phased (CA).(TG) dinucleotide elements are the Myc proteins. Although little is known of their function, they are thought to have a role in cell differentiation and proliferation (reviewed in 179). They are thought to bind using a basic helix-loop-helix motif and

238

RODNEY E. HARRINGTON AND ILGA WINICOV

FIG. 12. Computer-developed DNA trajectory projection for (a) the AP-1 consensus site and ib) a binding-site for cellular Jun protein. The following helical parameters were used: dinucleotide wedge angles (combined roll and tilt angles) were taken from the compilation of . and helical twist values from Kabsch ef al. (SO).The 5' base in the sequence is Bolshoy et ~ l(I31 at the hottom of each strand. In each panel, the left-hand strand was modeled using only the wedge and twist angles above. In the right-hand strand, (CA).(TC)dinucleotide elements are kinked through 45" by roll angle only into the major groove, as suggested by available experimental ebidmce (48, 71. 78).Views shown are in the same orientation about an axis norinal to the 5' base-pair. Additional flanking guanines (lower-case type) were added to each binding-site to facilitate the graphical representation.

STRUCTURAL FLEXIBILITY IN

DNA-PROTEININTERACTIONS

239

have characteristic leucine-zipper regions that, combined with their palindromic recognition sites, suggest that they bind as dimers. The c-Myc recognition sites contain the consensus sequence CANNTG (180).The consensus binding site for v-Myc contains the CACGTG motif (184, which is also recognized by a heterodimeric complex of c-Myc with the basic factor containing the helix-loop-helix, Max (182, 183). c-Myc also trans-activates the p53 tumor suppressor gene promoter using an essential downstream CACGTG motif (184).In addition, this same sequence is recognized by a variety of regulatory proteins, including the USF (185)and TFE3 (186)transcription factors of the adenovirus major late promoter, the pE3 in the immunoglobulin heavy-chain enhancer (181),and the G-box proteins of plants (187; see the discussion on plant proteins below). In both motifs, the spacing between (CA)*(TG)and (TG).(CA)elements is nearly but not quite a half-helical turn, so that if anisotropic flexure occurs at these elements, the resulting DNA trajectory will be in the shape of a bent U. However, the degree of nonplanarity will be different, depending on the central bases in the c-Myc site. Furthermore, the CACGTG sequence contains the (CG).(CG) slement, which is subject to methylation, at its central dyad. It is known that the c-Myc, c-Myb, and v-Myb oncogene proteins bind to recognition sequences in a methylation-dependent fashion. 3. THE GCN4 TRANSCRIPTIONAL FACTOR FROM

YEAST

Recent structural studies on the GCN4 transcriptional activator from yeast (188) show that important differences exist among members of the basic-region leucine-zipper (bZIP) protein class in their requirements for conformational changes in the recognition DNA. The GCN4 protein is similar to other bZIP proteins in that its DNA-binding domain is not in a suitable conformation for binding until this is stabilized by the association with DNA, and it binds as a dimer to the same AP-1 site that the Fos and Jun oncogene proteins bind. However, in a number of important respects, the DNAstabilized binding domain differs from its counterparts in the oncogene systems. The GCN4 protein from yeast is one of a large family of eukaryotic bZIP regulators that have high sequence homology and bind to similar recognition sequences (189).This group includes a number of the plant proteins noted below (Section 11,I). A cocrystal structure of GCN4 with a 20-bp fragment containing the AP-1 sequence at 2.9 A resolution has been reported (188). The AP-1 binding sequence, 'ITCCTATGACTCATCCAGm, is identical to that bound by the Fos and Jun proteins. This structure clearly shows that the basic DNA-binding domain, as stabilized by the interaction with the DNA, is a long, continuous a-helix; in the bound dimer, the two domains form a paired coiled-coil that diverges toward the N-terminal end to grasp the DNA

240

RODNEY E . HARHINGTON AND ILGA WINICOV

through the major groove like a long pair of tongs. The sole function of the leucine zipper is to orient and stabilize the dimeric protein complex. Most significantly, the DNA-recognition region is straight and in the canonical B-form, with no unusual microstructure or helical parameters. It is not clear whether DNA bending might occur in a complex with longer DNA, but the structure provides no clue as to how such bending might occur if such were the case. Thus, the binding specificity in this system seems to derive exclusively from direct readout of specific hydrogenbond and phosphate-backbone interactions with the basic a-helix in the two asymmetric half-sites. Comparing the GCN4 nucleoprotein complex to the Fos/Jun systems raises interesting questions about structural similarities and differences that allow proteins binding to the same DNA-recognition site to place such different requirements on the D N A conformation. The basic regions of both proteins are highly homologous and hybrid proteins constructed by interchanging basic or leucine-zipper regions are found to bind more or less interchangeably (190, 191). Furthermore, GCN4 also binds with reduced affinity to the symmetrized CRE site, in which an additional symmetrizing base-pair is added at the center of AP-1, rotating two half-recognition sites about 36" with respect to each other. All of this suggests that the basic a-helical DNA-binding domain must be flexible to some degree. It is clear that the interplay of direct and indirect readout is exceedingly complex in the bZIP protein systems.

F. Minor-groove-binding Proteins: The TFllD Transcription Factor Complex TFIID is a transcription factor required for efficient pol I1 activity in many. and perhaps all, protein-encoding genes in eukaryotic cells. i t binds to the TATA box, a conserved sequence located -30 bp upstream from most pol i I start sites, and induces the formation of a tnultiprotein preinitiation complex that appears to be stable through several rounds of transcription (192). For the initiation of pol I1 genes, TFIID is a multisubunit association complex containing the TATA binding protein (TBP) and a number of TBPassociated proteins called TATA-associated factors (TAFs) (193, 194; reviewed in 195). It has elements of structural similarity with the integration host factor (IHF) of E . coli and several other regulatory proteins that may also be reflected in the binding properties (reviewed in 196). TFIID is not required for transcription by pol I and pol 111, although TBP is evidently involved (reviewed in 197). The roles of TBP in transcription by polymerases other than pol I1 are incompletely understood, although a TBP-TAF complex seems to be involved in pol I transcription,. and the associated TAFs are thought to be different from those in TFIID (198). A

STRUCTURAL FLEXIBILITY IN

DNA-PROTEININTERACTIONS

24 1

crystal structure for TBP from the plant Arabidopsis thaliana has been determined (199).This study shows that the protein has a cross-sectional shape rather like a quarter-moon, with a small lower concave surface that straddles the DNA and an upper convex surface that accommodates the binding of ancillary proteins, the TATA-associated factors. The DNA-binding element, which contacts the minor groove of the recognition DNA, is a curved, antiparallel 9-sheet. No cocrystal structure has been reported for TBP at the time of this writing. Two recent studies indicate that, unlike most regulatory proteins that form specific contacts in the major groove of their recognition sites, TFIID evidently binds in the minor groove of the TATA element (200,201).Cyclicpermutation gel-retardation studies show that TPB, the TATA-binding subunit of TFIID, bends the DNA of its recognition region (201).Furthermore, the kinetics of TFIID binding are slow and require thermal energy (204, suggesting that the protein may also undergo a significant conformational change on binding. Because these proteins evidently nucleate and become elements of larger multiprotein complexes, their conformations and the trajectory of the associated DNA must be subject to severe constraints. This suggests that binding kinetics, as influenced by conformational changes in both the DNA and proteins, and by sequential binding of multisubunit components, may also be a factor in conferring binding specificity in this large multisubunit system. A downstream initiation element is required for efficient TFIID binding in the human gfa gene promoter (202).The region involved runs from the TATA box at about -25 to the downstream element between about +10 to about +SO bp. This suggests that DNA bending or flexibility may be involved in the association with a rather long region of DNA with a transcription complex. The wild-type sequence for this region is TTCATAAAGCCCCTCGCATCCCAG GAGCGAGCAGAGCCAGAGCAGGATGGAGAGGAGACGCAXACCTCCGCTGCTCGCCG, where the TATA box is identified as the CATAAAG element in the 5' region and the downstream element is the final 30 bp. This sequence is (G + C)-rich, especially in the downstream region, and it contains a number of approximately helically phased (CA)*(TG) putative kink elements. Figure 13 shows two computer-developed projections of this sequence into two dimensions using the estimated wedge (13) and helical twist (80) angles as employed in Figs. 10-12 and assuming that (CA).(TG) elements can kink by roll through 45" into the major groove. It is clear that this sequence region can form a nearly planar bend of almost 180" that can bring the TATA region into proximity with the downstream initiation element. The two views in Fig. 13 are rotated about the 5' base -40" with respect to each other to illustrate the putative planarity of the curvature in the DNA. As in

242

RODNEY E. HARRINGTON AND ILGA WINICOV

40"

OD

1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3

4 5 6 7 8

5'-TTCATAAAGCCCCTCGCATCCCAGGAGC 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5

GAGCAGAGCCAGAGCAGGATGGAGAGG 6 7 8 9 0 1 2 3 4 5 6

7 8 9 0 1 2 3 4 5 6

7 8 9 0 1

AGACGCATCACCTCCGCTGCTCGCCG-3' Frc. 13. Computer-developed DNA trajectory projection for the transcription factor IID ITFIID! binding-site in the human &a gene (123). The following helical parameters were used:

dinucleotide wedge angles (combined roll and tilt angles) were taken from the compilation of Bolshoy ef al. (13)and helical twist values from Kabsch et al. (80).The 5' base in the sequence is at the bottom of each strand. (CAI.(TG)dinucleotide elements are kinked through 45" hy roll angle only into the major groove, as suggested hy available experimental evidence (48. 74, 78)., Views thown are rotated 40" with respect to each other about an axis normal to the 5' base-pair.

Figs. 10-12, the parameters and assumptions used in obtaining Fig. 13 are indirect. Nevertheless, all but the helical twist values have some experimental basis, and Fig. 13 therefore provides a possible rationalization for the role of the downstream site in the TFIID initiation complex, and suggests a way in which multiple DNA-binding factors might be part of the same large, multisubunit transcription complex proposed previously (202). POSSIBLE

FLEXIBILITY IN THE TATA

PROMOTER ELEMENT

The TATA promoter element (203)has a consensus sequence TATAAA(A); this sequence is quite highly conserved in a large number of eukaryotic

STRUCTURAL FLEXIBILITY IN

DNA-PROTEININTERACTIONS

243

promoters. However, occasional variations occur in which (CA).(TG) elements are substituted for (TA)-(TA).An example is the human gfa gene in which the first (TA).(TA) is replaced by a (CA)-(TG), but nevertheless a TFIID complex that regulates the initiation of transcription in this system is formed. This suggests that anisotropic flexibility might be an important feature of the TATA region because there is evidence that both (TA).(TA) and (CA).(TG)base-stacks may kink when subjected to relatively minimal bending stress (40).A more dramatic example of transpositions of (TA).(TA) to (CA).(TG)has been reported in the fission yeast Schizosaccharomyces pombe (204). The sequence CAGTCACA is in the normal TATA location in this organism. Deletions or mutations in this sequence resulted in an almost complete loss of transcription initiation, and other evidence suggests that its function is identical to the TATA sequence, although the protein that binds to it appears to be different from TBP. Thus, although different organisms may utilize different sequence motifs in their transcriptional initiation sites, these motifs appear to utilize sequence elements that are functionally related through a common tendency toward anisotropic flexibility.

G. The NF-KB Protein and Its Binding to DNA NF-KBis a pleiotropic transcription factor that can effect gene control in a highly tissue-specific fashion (reviewed in 205, 206). Its gene has recently been cloned, and analysis of the nucleotide sequence suggests that no zincfinger domains are present (207, 208). NF-KBbinds specifically to a number of DNA sequences about 10 bp long of consensus GGGR(A/T)TYYCC, depending on cell type, in various promoter and enhancer regions (reviewed in 205). It was originally identified in KBsites of the K light-chain enhancer of B cells. It was thought to be functional only in this system (209), but later studies show that it plays many roles in many cellular systems, including T-cell activation, cytokine regulation, and the control of a number of viral systems. Cytomegalovirus (CMV) and SV-40 have NF-KB binding sites in their enhancers (205, 210). The HIV-1 enhancer has two NF-KB binding sites, one of which regulates transcriptional inducibility of the 5-LTR in activated T cells (211). There is biochemical evidence suggesting that NF-KBis a homogeneous system. For example, binding specificity to a particular target and the pattern of base contacts in complexes with DNA seem to be independent of protein source (206). Extensive purification of NF-KB from human sources leads to a 50-kDa polypeptide, the DNA-binding form of which is a dimer (p50). The principal DNA-binding form of NF-KB is a heterotetramer that includes, in addition to p50, two nonbinding 65-kDa subunits (p65). The binding of the heterotetramer is influenced by the presence of zinc, although neither of the subunits exhibits a zinc-finger structural motif. An inhibitory

2-44

HODNEY E . HAHHINGTON AND ILGA WINlCOV

subunit, IKB,binds one each of the 50-kDa and 65-kDa subunits to produce a heterotriiner complex in unstimulated cells (212).The role of IKBevidently is to block the assembly of the heterotetramer. The 65-kDa subunit serves as a receptor for IKB (213)and therefore seems to function in NF-KB inactivation, which can occur even when the latter is tightly bound to DNA. The 65kDa subunit also seems to modulate dramatically the binding of the p50 subunit to DNA (213). The p50 subunit alone binds with high affinity to palindromic sequences constructed from KB motifs (GGGACGTCCC and GGAAAITICC obtained from five-base half-sites of the KBmotif GGGACTTTCC), but the (50-kDa + 65-kDa) heterodimer binds 10- to 20-fold less strongly (213).The addition of the p65 subunit also limits the requirement of p50 for highly symmetric or 11-bp binding motifs, and hence seems to broaden the range of binding motifs accessible to NF-KB (214). It is also interesting that homodimers from p50, and heterodimers from the p50 and p65 subunits, seem to bind these sequence motifs with about the same affinity (215). The synergistic action of p6." in complex with p50 evidently contributes to the ability of NF-KB to bind specifically to and discriminate among a multiplicity of target sites, thus enabling it to control a variety of genes under many physical and biological conditions. However, the structural and biophysical mechanism(s) it uses to discriminate among its many potential binding loci are not well understood. Some evidence exists that this discrirninatory ability is based on DNA bending (216). Results obtained using the cyclic-permutation gel-mobility shift method imply that binding of NF-KBto DNA induces bending within the decameric KBbinding domain (GGGACT'ITCC), with the locus of bending near or at the 3' end. There appear to be no minor-groove contacts between the protein and the KB site. The bending angle induced by NF-KB has been estimated as about 110". In contrast. the bending angle due to the p50 dimer alone has been estimated as about 75", with the bending locus symmetrically located near the center of the KB site; this suggests that the p65 subunit increases the magnitude as well as the locus and symmetry of DNA bending within the KB binding motif. DNase I cleavage patterns of this domain in the absence of binding suggest no unusual structure in the absence of binding. In addition, computer modeling the decameric KB binding domain ( G G G A C m C C ) , as discussed above, reveals little fixed curvature (13).Thus, the observed bending is evidently induced as a conformational change on binding with the protein. Comparative binding studies of the p50 homodimer and a heterodimer of 50- and 65-kDa subunits suggests a sequence of events in which the most conserved half-site is initially contacted by a p50 subunit, followed by DNA bending and the binding of a second p50 or p65 subunit (214).This suggests

STRUCTURAL FLEXIBILITY IN

DNA-PROTEININTERACTIONS

245

that a kinetic mechanism may also play a role in allowing NF-KB to discriminate among its possible binding sites. This notion is consistent with the very low concentrations of NF-KB typically found in cells (217). It has been proposed that such a mechanism might be more important than the thermodynamic stability of the complexes, i.e., the binding constants (214). Clearly, the binding of NF-KBto the various cognate DNA-recognition sites and its ability to discriminate among them is a subtle phenomenon in terms of the structural and conformational factors that are involved in both the protein and DNA partners. It will be important to determine the precise conformations of the recognition-site complexes with the subunits as well as with NF-KB in order to understand how this promiscuous binding transcription factor can function so selectively and yet so ubiquitously.

H. Other DNA-binding Proteins with Putative Flexibility Elements in Their Recognition Sites 1. THEOccR TRANSCRIPTIONAL ACTIVATORFROM Agrobacteriurn turnefaciens

Agrobacterium tumefaciens is widely used to transport and integrate foreign DNA into certain plants to produce transgenics. As part of this unusual pathogenesis, octopine released from crown-gall tumors serves as a nutrient and possibly as a signal source for the invading bacteria. Catabolism requires the products of the OccQ operon, which is transcriptionally induced by octopine, and this induction is mediated by the OccR transcriptional activator. OccR binds with high &nity to a single promoter site between the OccR and OccQ genes, which are divergently transcribed. Octopine binds to a single site in OccR. Octopine binding to OccR shortens the DNase I footprint of the protein on its recognition site (218). This shortening appears to result from the relaxation of a bend in the DNA; octopine binding evidently does not alter either the binding &nity or the sequence specificity of the protein. The OccR footprint maps to a 56-bp sequence, GGCA'ITCGGTCAAATTCATAATGACCGGGCAAGAATAAGCAGATGTGTGCGT. The locus of bending was found by cyclic-permutation gel-retardation to be about 26 bp from the 5' end, or a little to the right of center in this sequence. With bound octopine, the magnitude of the bend was estimated as -62"; in its absence, the bend relaxed to -46". Figure 14a shows an axial projection of the DNA obtained in a fashion similar to those in Figs. 10-13. A small bend in the DNA is predicted near the center of this sequence, due largely to the presence of AA and GA basestacks in correct array. Figure 14b shows the axial trajectory if both (CA)-(TG)and (TA).(TA) elements are allowed to flex through 45", and Fig.

1 2 3 4 5 6 1 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8

5'-TGC ATTCGGTC AAATTC ATAATGACCGG 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 1 8 9 0 1 2 3 4 5 6

GC AAGAAT AAGC AG ATGTT ATGGTGCGT-3'

it

clockwise (C)

FIG. 14. Computer-developed DNA trajectory projection for the binding-site for the OccR transcriptional activator from Agrobacteriurn tutnefaciens (138).The following helical parameters were used: dinucleotide wedge angles (combined roll and tilt angles) were taken from the compilation of Bolshoy et al. (23)and helical twist values from Kahsch et al. (80). The 5' base in the sequence is at the bottom of each strand. (a) The sequence was modeled using only the wedge and twist angles. Views shown are rotated 90"with respect to each other in a clockwise sense about an axis normal to the 5' base-pair. (b) All (CA).(TG) and (TA).(TA) dinucleotide elements are kinked through 45" by roll angle only into the major groove, as suggested by available experimental evidence ( 3 , 4 0 ,48, 7 4 , 78). Views shown are rotated 90" with respect to each other in a cltwkwise sense about an axis normal to the 5' base-pair. (c) (CA).(TG) dinucleotide elements are kinked through 45" by roll angle only into the major groove, as suggested by available experimental evidence (48, 74, 78). Views shown are rotated 60" with respect to each other in a clockwise sense about an axis normal to the 5' base-pair.

STRUCTURAL FLEXIBILITY I N

DNA-PROTEININTERACTIONS

247

14c shows the trajectory if such flexure is restricted to (CA)*(TG)elements. As in the case of the TFIIIA internal control region, discussed in Section II,C, 1, the inclusion of (TA)*(TA)flexing elements improves the planarity of the putative bend, which is broadly distributed about the center of the sequence. Considerable out-of-plane bending is predicted in Fig. 14c throughout the 3' half of the sequence, but this is substantially reduced in Fig. 14b. The OccR complex is unusual in that the modulation of transcriptional regulation by a bound mediator appears to occur by a DNA bending mechanism rather than by changes in binding affinity or operator occupancy. In this case, indirect readout may direct binding specificity and may carry an additional functional component as well. 2. INITIATION COMPLEXFOR PHAGE$29 DNA REPLICATION Initiation of replication of the linear double-stranded DNA in phage $29 from Bacillus subtilis is activated by a viral protein, p6, which forms a complex with double-stranded DNA at the replication origins. The p6 protein binds as a dimer every 24 bp, causing the DNA to form a right-handed superhelix around the multimeric protein core. The DNA bending properties of this complex have been investigated using an oligomer consisting of direct repeats of a 24-bp target sequence for p6 binding at the replication origin (219). From the linking number changes due to p6 binding to oligomers containing different numbers of the 24-bp precursor sequence, along with DNA compaction and the known helical repeat in the complex, the superhelical trajectory of the DNA in the complex could be determined. The observed trajectory requires that the DNA curve through a very large angle, about 66" for every 12 bp. This degree of curvature is considerably larger than any experimental observation of fixed bending in DNA (27,36). Hence, it seems likely that it can only be achieved by kinking the DNA at least once per helical turn. The precursor sequence used in the above experiments was CCTUTAXGACATMTCCGTCGA. This has a number of putative flexibility sites. Figure 15 shows an axial trajectory projection as in earlier illustrations, allowing both (CA)*(TG)and (TA)*(TA)elements to kink through 45" into the major groove. The resulting curvature is almost planar and curves through about 90" for this set of assumptions. As in other examples, allowing kinking only at (CA)-(TG)elements leads to a greater lack of planarity in the bend. It therefore seems likely that the observed DNA curvature in the p6 complex can be understood in terms of DNA flexibility elements, and other, more draconian structural dislocations need not be invoked to explain the large curvature observed in this system.

248

RODNEY E. HARRINGTON AND ILGA WINICOV

900

60’

1 ? 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3

S’-CTTAATATCGACATAATCGTCGA-3’ Frc:. 1.5. Coml’uter-de\eloped D N A trajectory projection for the binding site for the p6 replication activator protein of Bmillirs strbtilis 1139). The following helical parameters were used: diniicleotide wedge angles (combined roll and tilt angles) were taken from the compilation of Bolshoy et al. (13)and helical twist values from Kahsch et 01. (80).The 5’ base in the sequence is at the bottom of each strand. Rotation ahout an axis normal to the 5‘ hase-pair is O”, SO”,and 60”, respecti\&, for the panels from left to right.

I. Plant Regulatory Proteins with Provocative DNA Recognition Sequences

Plant DNA-binding proteins that regulate plant gene expression are being identified with increasing frequency using either transgenic plants or transient expression assays (reviewed in 220-222). The structural information on their interactions with DNA is presently limited to TBP from A . thaliunu interactions with the minor groove of DNA (122).However, many of the other DNA recognition sequences for plant proteins contain provocative elements, which may imply anisotropic flexibility at (CA)-(TG)dinucleotides in individual protein-binding sites, or more fixed bending through phased (CA).(TG) containing regulatory motifs and (A + T)-rich sequences. It is likely that biological mechanisms in DNA-protein interactions for plant gene regdation are conserved in relationship to other biological systems. This was first demonstrated by the use of yeast GALA derivatives to

STRUCTURAL FLEXIBILITY I N

DNA-PROTEININTERACTIONS

249

stimulate gene expression in tobacco leaf protoplasts (223). It appears that the protein structures of many of the regulatory factors first described in prokaryotes and mammalian systems appear to function also in plants. In addition, plants are providing examples of new variants in both promoter organization of regulatory motifs and possible protein structures that interact with these elements. The following discussion therefore focuses on plant proteins and DNA-recognition sequences that may portend a role for indirect readout in their regulatory interactions.

1. THE G-BOXMOTIF AND BASICLEUCINE-ZIPPER PROTEINS The palindromic G-box motif (CCACGTGG) appears in many plant gene promoters that are photoregulated, as well as in those that respond to plant hormones and other stimuli. Nuclear proteins interact with this sequence specifically (224-226). Several members of this GBF family of nuclear DNAbinding proteins have been cloned (227-231), and all belong to the class of basic leucine-zipper (bZZP) proteins (see Section 11,C). The DNA-binding domain in these bZZP proteins consists of a region enriched in basic amino acids. The adjacent leucine zipper (232, 233), which forms an amphipathic a-helix, permits dimerization (234, 235) of bZZF proteins and thus potentially can provide additional diversity in regulation through heterodimerization (227). Binding of bZZP proteins to their recognition sites is contingent on dimerization of the proteins (236), and the basic DNA-binding element assumes an a-helical conformation that is stabilized by DNA interactions (237). These requirements suggest that a number of kinetic steps must occur before a functional complex can interact with the major groove of DNA, such as in the “forceps model” discussed in II,E,3 (188). These kinetic steps are likely to be influenced both by the protein and by DNA structures. The center of the plant G-box motif contains the “core” CACGTG sequence, which has the same putative flexibility elements as the prokaryotic promoters discussed earlier. The binding of the bZIP proteins to this sequence, however, appears to be regulated by additional sequence elements flanking the core sequence. These flanking elements affect binding affinities for a variety of binding factors to individual DNA sequences in vitro (238241) and confer different expression patterns in transgenic plants in vivo (242).In the forceps model (188), contacts of the bZIP basic region occur at the two halves of the AP-1 (ATGACTCAT) pseudodyad with the protein oriented about the central C*Gbase-pair of the binding site. The inverted symmetry of the G-box core motif does not permit completely symmetric interactions of the two bZZP a-helices across the major groove, separated by a half-helical turn. These interactions may therefore be sensitive to small alterations in local DNA conformation in the core CACGTG sequence or to the contributions of flanking sequences. This may lead to binding &nity

250

RODNEY E. HARRINGTON AND ILGA WINICOV

differences resulting from induced conformational changes in the proteins or in the complex with DNA. Thus, indirect readout may play a significant role in hZIP protein interactions with the G-box-like elements in plant promoters, especially in sequences with the potential for multiple indirect readout signals. Further changes in the core sequence would be expected to affect the binding properties significantly. An example of this can be found in the Opuque-2 bZIP transcriptional activator, which recognizes a G-box-like element with overall pseudodyad symmetry that mimics one-half of the canonical 6-box binding site, but differs in the other half: CCACGTAG. Opaque-2 can interact with this sequence as a homodimer (243) or as Theterodimer (244), suggesting structural adaptability in this site. The role of individual bases in binding affinity has also been shown by competition studies for protein binding to the Arabidopsis G-box-like element TGACGTGG (239) that contains two (TG)*(CA)putative flexibility elements separated by a halfhelical turn, which may contribute to differential protein interactions with this site. The apparent restrictions dictated by the pseudodyad symmetry of the forceps model of bZlP protein-DNA interactions (188) suggest a similar mode of interaction between the OCSBF bZIP proteins from maize with their recognition sequence. The OCSBF-1 -encoded protein recognizes a 20bp DNA sequence with dyad symmetry (TGACGTAAGCGCTI'ACG-TCA)as well as the animal AP-1 (TGACTCA) and CREB (TGACGTCA) recognition sites (245). Other bZIP proteins encoded by TGAl a and TGAl b specifically bind to TGACG, an even shorter DNA sequence (246). However, it is not clear whether the concatamers used in individual experiments (246)contributed structurally to the recognition sequences in these investigations. It will require additional crystallographic studies on different bZIP protein-DNA complexes to resolve these questions.

2. PUTATIVEZINC-BINDINGDOMAIN PROTEINSIN PLANTS

A very limited number of potential zinc-finger regulatory proteins in plants have been described. The gene encoding a metal-dependent DNAbinding protein (3AF1) cloned from tobacco recognized a very adenine-rich motif (AAATAGATAAATAAAAACATQ in the pea rbcS-3A promoter (247). This interaction was abolished by mutating the T in the GATA sequence and two of the central AA residues, but detailed information on the potential role of DNA bending on 3AF1 binding is still lacking. The predicted amino-acid sequence from the 3 A F 1 clone contains two repeated cysteine- and histidine-rich segments, but does not conform to any currently known zincfinger structures. The binding site recognized by 3AF1 also contains the GATA motif. The family of trans-acting factors that recognize GATA in mam-

STRUCTURAL FLEXIBILITY IN

DNA-PROTEININTERACTIONS

251

malian cells share a highly conserved Cys, zinc-finger DNA-binding domain (248, 249). In plants, the GATA motif is found in many promoters and is recognized by ASF-2 (250) and GA-1(226,251) factors, but their relationship to 3AF1 protein presently is not understood. Another putative zinc-finger protein sequence encoded in a cDNA (Alj n - l ) sequence has been isolated by differential screening from salt-tolerant alfalfa cells (252). The predicted amino-acid sequence in this molecule contains a proline-rich region and an acidic region as well as one putative Cys, and one putative (His,Cys,) structure, which resemble the novel zinc-finger family of proteins described by Freemont et al. (253) and by Haupt et al. (254). However, the DNA-recognition sequence for this protein is currently unknown. The Petunia E P F l factor (255) demonstrates the presence of the first (Cys,His,)-type of zinc-finger protein in plants. Although the protein binds to at least four sites in the Petunia 5-enolpyruvylshikimate-3-phosphate synthase gene (EPSPS) regulated in a developmental- and tissue-specific manner, three of these sites show sequence similarity (ITACTNNNAT) and the fourth is a palindromic sequence (TGACAGTGTCA) ___in which the two (CA)*(TG)sequence elements are almost a half-helical turn apart and nearly on opposite sides of the DNA. A very unusual clone of an Arabidopsis regulatory gene, encodes a protein, COP1, with both zinc-binding motifs and a G-protein-related domain (256). This protein appears to function as a repressor of photomorphogenesis in darkness and may function also in transcriptional regulation as well as interact with the G-protein signaling pathway. The putative zinc-finger domain encodes one (Cys,His) and one Cys, zinc-binding motif, but its binding sequence is currently not known. It is inte esting that, so far, three of the cloned plant-protein sequences exhibit unusual putative zinc-binding motifs. The distinctive folds of these zinc-binding modules and their interactions with DNA remain to be determined.

rc:

3. AT-RICH BINDINGSITES IN PLANT PROMOTERS DNA bending at A, tracts, where n > -3 to -5, has been well documented (22-26). Thus, the presence of such tracts in plant promoter regions that are binding sites for nuclear proteins may indicate that sequencedependent bending could play a role in DNA protein recognition in these regions. A nuclear protein, AT-1, binds specifically to (A T)-rich elements in the pea rbcS gene (257). Similar elements are also found in photoregulated genes from other species. The complete binding domain for AT-1 in this gene consists of two overlapping AT-rich elements with the sequence CITATATATITITAA3TA"lTATTCTCTTAA, which extends from -566 to -533 in the upstream promoter region. In this sequence, the (A*T), tracts

+

252

RODNEY E. HARHINGTON AND ILGA WINICOV

are spaced exactly at integral helical periodicity and thus could bend the DNA recognition site in a coherent manner. A (C + A)-rich element in the Arabidupsis cub-140 gene promoter is the binding site for the phosphoprotein CA-1 (258) and also contains an adjacent stretch of As. A rice actin promoter that is constitutively active shows a particularly interesting region, where 33 of 37 bases are A with G residues interspersed (259).The same promoter contains seven repeats of a CCCAA, which effectively spaces six (CA)-(TG)dinucleotides a half-helical turn apart. Protein binding to (A T)-rich sites has also been observed in promoters responsive to a variety of stimuli (260-26.1) and may involve A-tracts or phased repeats of A*Tsequences. However, the understanding of the specific mechanism of those interactions will have to await cloning of genes for the specific binding proteins and detailed studies of their structures.

+

4. HELIX--TURS-HELIX-RJR;~;-HELIX REGULATORY PROTEINSI N PLANTS

The search for elements in plant gene promoters that are responsive to light has focused significantly on the signal transduction system that originates with the active form of the photoreceptor phytochrome (Pfr). This has led to the identification of two GT-rich elements associated with response to light. The box-II element, TCTGTGGlTAATATG, which in tandem copies conferred light-responsive expression of a reporter gene in transgenic tobacco (265),contains multiple (CA)-(TG)and (TA).(TA) elements, and might be expected to confer significant localized flexibility to the promoter element as noted above. Two genes, B2F and GT-la, that encode DNA-binding proteins that recognize this sequence have been identified (266, 267). Both proteins appear to have three a-helical regions that may interact with DNA, but their individual specificities are not known. A third protein from rice (GT-2) recognizes a GT-rich motif in the phyA promoter and has a similar triple helix-turn-helix structure (268, 269). The unusual GT-2 factor contains two autonomous DNA-binding domains, each with a triple helix-turnhelix structure, that discriminate between three GT-rich motif.. in the phyA promoter. Thus the triple helix-turn-helix motif may represent a new class of DNA-binding proteins. As further work in this area progresses, it will be interesting to see what similarities or differences in DNA binding exist among these plant proteins and the prokaryotic and homeodomain helixturn-helix regulatory proteins.

5. SPATIAL ARKANCEMENTOF DNA-BINDINGMOTIFS One of the remarkable characteristics of most plant gene promoters is the array of cvtntrol elements that are repeated, often with variation in different parts of a single promoter as well as in different order, among individual

STRUCTURAL FLEXIBILITY IN

DNA-PROTEININTERACTIONS

253

members of a gene family. The different organization of DNA control elements in the rbcS (270, 271) and cub gene (272) families demonstrates the extent of such variation within each gene family. It is likely that individual base variations within a motif as well as the spatial arrangement of the different motifs contribute to the indirect readout that determines the affinity with which functional DNA-protein interactions are established. As demonstrated by DNase footprint analysis of organ- and development-specific DNA-protein interactions in tomato rbcS genes, individual DNA-protein interactions as measured in vitro are necessary but are not sufficient for transcriptional activity (271).It is not yet clear whether protein modifications or interactions with additional proteins may contribute in subtle ways to the structure of the functional DNA-protein complex or act through kinetic pathways. However, both the specific nature as well as the spacing of flexibility elements in DNA-binding sites are likely to contribute to some of the discriminatory and kinetic aspects of DNA site interactions with regulatory proteins.

I11. Models of Sequence-directed StructureFunction Relationships in Selected Regulatory Systems The preceding sections detail the precise fitting together of DNA binding-sites to their cognate sequence-specific binding proteins. The possible effects of such structure-directed fitting on DNA-protein interactions within these sites were also discussed, along with the putative roles of both direct and indirect readout of binding information. In a number of systems, the contacts between individual bases in specific DNA binding-sites (recognition matrices) with appropriately positioned amino-acid side-chains in the protein-recognition element (direct readout) have been probed with a large array of mutants to try to understand their roles in regulatory DNA-protein interactions. Accordingly, we have taken published data from two promoters with extensive collections of mutants and have analyzed their putative sequencedirected DNA structures to assess possible contributions of indirect readout to the results. Both the mouse pmaj-globin promoter (273-275) and the oat phyA3 promoter (276)contain several regulatory sequences. The sequences have been mutated either by linker-scan mutagenesis or by point mutations without aflecting their spacing to the TATA or + 1transcription start site, and each mutation has been tested for function by gene expression in vivo. The results offer some interesting insights and raise questions about the complicated mechanics of regulatory protein-DNA interactions.

254

RODNEY E . HARRINGTON AND ILGA WLNICOV

A. The Proximal pmai-globin Promoter in Mouse The expression of globin genes during erythroid differentiation is controlled by cis-acting DNA elements and trans-acting regulatory factors at both the promoter and locus-control regions (reviewed in 277,278). Experiments with deletion mutants of cloned hybrid genes transferred into murine erythroleukemia (MEL) cells showed that a minimal promoter of 106 bp of P-globin 5'-flanking sequences directed correctly initiated transcripts, and that these transcripts were induced by MEL cell differentiation (274). Several important regulatory elements have been identified within the 106-bp region: the TATA box (from -30 to -26), a CCMT box (from -79 to -72), a CACCC element (from -95 to -87), the imperfect repeat (PDRE) element AGGGCAGGAGCCAGGGCAGAGC (from -53 to -32), and a GATA-like element on both strands. Binding factors have been identified for these sites from MEL cells (279, 284). It appears that the region is not only crowded with potential binding factors, but that a combination of several factors is required to mediate transcription from the minimal promoter (275,279).The matter is further complicated by the recent finding that the erythroid cellspecific factor that binds to the CACCC element, and is related to the Kruppel family of proteins, also binds the sequences (CCA).(TGG), (CAC).(GTG), and (CCT).(AGG) (284). Point mutations in most of the 106 bases of the mouse PmaJ proximal promoter have been tested in both long-term and transient assays in HeLa and MEL cells to identify sequences involved in transcription and the induction response (273-275). Figure 16 shows computer-developed DNA trajectory projections of the promoter region from base-pairs -97 to -26, in an effort to identify regions with potential indirect readout information and to correlate these with the known functional regions. Using the same helical parameters in the modeling as discussed earlier (Fig. ll), it is clear that the DNA structure for the mouse pnlaj-globin promoter shows a remarkable pattern of flexible sites arranged in a planar manner, except for the TATA site. Each one of the known protein-binding sites lies in a region of putative flexibility. In addition, the relatively planar overall conformation suggests the possibility of interactions between proteins that bind at sites distant from one another in this sequence. Interactions of proteins at nonadjacent sites has been suggested as the mechanism for negative regulation of the pmajglobin promoter (285)and may well play a quantitative role in transcriptional activation from the proximal promoter. This possibility adds another layer of complexity in comparisons and interpretation of nuclear-protein binding studies between oligonucleotides specific for binding a single protein and DNA fragments with a number of binding sites. Our model is supported by recent findings that the protein PDRf that binds the PDRE site in the pmaj-

STRUCTURAL FLEXIBILITY IN

DNA-PROTEININTERACTIONS

255

(5') 0"

goo

5'-gGAGCCACACCCTGGTAAGGGCCAATCTGCTCACAC 0

-90

-80

b

-70

AGGATAGAGAGGGCAGGAGCCAGGGCAGGGCAGAGCATAZlA-3' -60

b

-50

-40

b

-30

FIG. 16. Computer-developed DNA trajectory projection for the mouse Pmaj-globin proximal promoter sequence. Underlined regions represent known protein-binding sites. The following helical parameters were used: dinucleotide wedge angles (combined roll and tilt angles) were taken from the compilation of Bolshoy et al. (13)and helical twist values from Kabsch et al. (80).The 5' base in the sequence is at the bottom of each strand and is identified. Views shown are rotated 90"with respect to each other about an axis normal to the 5' base-pair. The extra guanine at the 5' end (lower-case typeface) is added to clarify the computer-modeling presentat ion.

globin promoter bends the DNA an estimated 90" at this site (285~).In addition, the binding affinity of PDRf was substantially higher for the intact p-globin promoter than for the isolated PDRE site ( 2 8 5 ~ ) . We used the same modeling methods to examine the effect of various point mutations in the CACCC element, the CCAAT box, and the PDRE element on the DNA conformation depicted in Fig. 16. These same mutations affect transcription in MEL cells (274).A summary of results is given in Table I, where (+) denotes a mutant as contributing to a change in the wildtype binding-site conformation and (-) denotes no significant change. These results demonstrate a remarkable correlation of conformational changes in DNA with changes in transcriptional activity of individual mutants. Individ-

256

RODNEY E. HARHINGTON AND ILGA WINICOV

ual mutants listed in Table I caused changes in the DNA trajectory ranging from alterations in the microarchitecture of DNA bending/flexing regions to variations in the planar structure of the entire proximal promoter region. Such changes in DN,4 planarity might well affect interactions between binding proteins brought together by DNA looping. These correlations therefore suggest that indirect readout very likely plays a role in the functional asseinblv of the mouse Pnlaj-globin promoter complex with its binding proteins.

B. The Phytochrome (phyA3) Promoter in Oats The plant photoreceptor phytochrorne represses its own phyA gene transcription via a signal pathway after conversion to the active Pfr form in red light (reviewed in 286, 287). Deletions and substitutions identified several phyA promoter elements involved in transcriptional activation of these genes (268, 288) and indicated that the minimal promoter (-400)provides sufficient information for high-level expression from the phytochrome pronioter

TABLE I M o v s ~p ” i ~ ~ ~ Pnosio G ~ ~ mi ) ~ MLTAN ~ u I‘ T~ANSCRIPTIONAL ACTIVITY S T n u c n x a CHANCES

AND

DNA position“

$1utation“

(%)

Structural change

Wild tlpe

-

None

100

Wild type

PDRE

-33 -25 - 37 -39 - 40 - -12 --16 - -18

G to T G to A C to A G to A G b to A C” to ‘4 G to A A to T G to A

80 97 119

+

Element

-.SO

CACAA CCA.4T

GCC.4CACCC

-G

--1.3 - 78 - 79 - 87

-91 - 93

-95

TranscriptioncJ

POTENTIAL.

100

104 106 85 107 80

C to A

69

h to G C to A G to ‘4

29

C to T

c to A

C to T G to A

88

-e

122

+

25 20 27

+

34

t

+

+

STRUCTURAL FLEXIBILITY IN

DNA-PROTEININTERACTIONS

257

in the dark, as well as phytochrome regulation by red light. Transient expression of linker-scan mutants of the PE1 and PE3-RE1 elements of the phyA3 promoter (276) tested the function of individual segments of these elements in both transcription and light regulation. Figure 17 shows the results of computer conformation modeling of the DNA trajectory similar to that described above for the p-globin system, on wild-type PE1 and PE3RE1 from the oat phyA3 promoter (276). As we did for P-globin, we tested the effect of each of the linker-scan mutations on the predicted DNA conformation. These data are summarized in Table 11, where (+) indicates a mutant that leads to an apparent conformational change and (-) indicates no apparent change. The results are quite provocative. In Fig. 17a, the PE1 element of the phyA3 promoter is modeled from position -372 to -338; this positive regulatory region shows a planar loop conformation with putative flexible sequence elements spanning the linker-scan mutant sites. Linker-scan mutants 614, 615, and 616 resulted in decreased transcription in the dark, whereas 617 showed normal levels of transcriptional activity (276) (Table 11). All four mutant sequences have been altered by mutations in six or seven bases in the sequence, producing significant changes in transcriptional activity. Mutants 614, 615, and 616 also show significant conformational changes in the planarity of the looped conformation or in the directionality of bend sites. Mutant 617, on the other hand, retains the wild-type level of transcriptional activity as well as wild-type conformation. Similar results are obtained with linker-scan mutations that define the positive PE3 element and the negative RE1 element of the oat phyA3 promoter (276). The wild-type DNA trajectory projection of these adjacent sequence elements is shown from - 115 to -65 in Fig. 17b, and the conformational modeling results are also summarized in Table 11. The boundaries of PE3 are marked with putative flexibility elements, of which the most pronounced is the apparent boundary between PE3 and RE1. The sequence trajectory for mutant 641, which showed wild-type activity in the dark and red-light repression (276), was essentially similar to that of the wild type (Table 11). On the other hand, mutants 639, 638, and 637 had more highly bent conformations and mutant 636 a less bent conformation than the wild type, and all four mutants showed loss of transcriptional activity in the dark (276). The conformation of mutant 635 DNA appeared very similar to that of the wild type, which correlates well with a substantial level of transcriptional activity in the dark (276). The two mutants in the RE1 element have opposite effects on potential bending. Mutant 634 abolishes the putative flexibility site at the PE3-RE1 boundary, whereas mutant 640 introduces a new flexibility site in the RE1 element. Both mutants have lost transcriptional repression in red light, indicating that both positive and negative regulatory

(5')

0"

30"

5'-GGCTGGAAATAGCAAATGTTAAAAATAAAGGTGA-3' S -370

0

-360

-350

-340

90"

ATGGCTCTEC ATC

S-TCGA~AGCTCCC 0

-110

-100

-90

CGCGCCGGTCIATGG~~GCGdAACAA-3' -80

0 -7 0

FIG. 17. Computer-developed DNA trajectory projection for the oat phyA3 regulatory elements. The following helical parameters were used: dinucleotide wedge angles (combined roll and tilt angles) were taken from the compilation of Bolshov e t a / . (13)and helical twist values from Kabsch et al. (80).The 5' base in the sequence is at the bottom of each strand. (a) the P E l element in two views rotated 30" with respect to each other about an axis normal to the 5' hase-pair. (b) The PE3-RE1 element in two views rotated 90" with respect to each other about an axis normal to the 5' base-pair.

STRUCTURAL FLEXIBILITY IN

DNA-PROTEIN INTERACTIONS

259

TABLE I1 THE phyA3 PROMOTER: LINKER-SCAN MUTANTTRANSCRIPTIONAL ACTIVITY AND POTENTIAL STRUCTURAL CHANGES Element PE 1

PE3

RE 1

Mutant Clone.

Activity (dark).

Activity (red light).

Structural Change

449 (wild type) 614 615 616 617

100 20 35 35 95

20 35 20 30 25

Wild type

449 (wild type) 641 639 638 637 636 635

100 100 45 30 30 45 70

25 25 30 30 35 30 45

Wild type

95 80

95 85

634 640

+ + + -

-

+ + + +

+ -

+ +

~

QDatafrom Bruce et 01. (276).

interactions in the phyA3 promoter may utilize DNA conformational information to achieve optimal DNA-protein fits required for regulatory functions.

IV. Past Challenges and Future Prospects Our understanding of DNA conformation and structure has undergone a slow but continuous redirection in the 40 or so years since the double helix was first proposed. The first chapters of the DNA saga stressed its polymeric character, as epitomized in the perfectly uniform double helix of Watson and Crick in which each base pair, although chemically different from the remaining three, was nevertheless thought to be structurally equivalent to its neighbors. This simplicity was aesthetically appealing; furthermore, it had real value in the initial concept of the basic structure of DNA. Watson and Crick were perhaps fortunate to have developed their structure by scrutinizing relatively imprecise macroscopic mechanical models. It was recognized rather early that axially directed sources of stabilization free energy, or base stacking, contributed significantly (along with laterally directed hydrogen-bond energy) to the overall stability of DNA. In addition, the polyelectrolyte nature of DNA led to some restrictions on its polymeric

260

HODNEY E . HAHHINGTON AND ILGA WINICOV

character. Under most solution conditions, the highly charged phosphates in the sugar-phosphate backbone led to electrostatic stresses with strong components directed along the helical axis. These short-range axial interactions produced a high level of “chain stiffness” in DNA, which led to its characterization as a rigid rodlike macromolecule, or in later, more sophisticated thinking, as a “weakly bending” rod or as a “wormlike” chain with persistence. The charged phosphates also resulted in large intrachain interactions among regions widely separated in contour distance along the DNA chain, which had a substantial excluded volume effect. Thus, because of its intrinsic complexity, DNA became a marvelous exercise bar for generations of polymer biophysicists, and enormous ingenuity was manifested in describing the hydrodynamic, thermodynamic, and conformational-dynainical properties of this remarkable molecule. All this was accomplished within the paradigms of high-polymer physics, however, and the relation of most of these properties to biological function remained obscure. From the beginning, the basic processes of biology such as replication, transcription, and recombination could be broadly rationalized at the molecular level through the Watson-Crick helical structure of DNA. However, it has only been within the last 15 years or so that the more subtle details of these processes have become understandable in terms of DNA conformation and structure. All this has been expedited by the striking advances in the molecular biology of transcription, replication, and recombination that have occurred during this same time period, and the enormous synergism that has developed between molecular biology and structural biophysics has been highly beneficial to both areas. Moreover, two structurally related concepts seem to have been particularly critical: the demonstration of fixed or static bending in DNA and the gradual acceptance of site-directed flexibility through the mechanism of the stereochemical kink. The molecular biology of gene regulation, including the explosive flood of new information on regulatory nucleoprotein complexes, has converged with these latter ideas to support sequence dependence in DNA structure. Over the past decade, the wealth of new data from structural biology, including results from crystallographic, sp&troscopic, microscopic, and other physical methods, has left no doubt that the older models of DNA as a weakly bending rod are inadequate in the extreme. The “new DNA” clearly is vastly more complicated than even its complex and difficult polymeric predecessor. The new structural paradigms for DNA lead to an essentially surrealistic picture of structure-function relationships. Every dinucleotide base-stack seems to possess components both of fixed bending and of axial flexibility. These are almost certainly interrelated, although we do not yet fully understand the physical basis for this interrelatedness. We know even less at

STRUCTURAL FLEXIBILITY IN

DNA-PROTEININTERACTIONS

261

present about site-localized helical twisting and torsional flexibility, although, like axial bending, this property has been characterized in terms of averages over a random base-sequence. We know less yet about cooperativity effects with neighboring sites, i.e., effects of sequence context on sitelocalized properties. It is clear, therefore, that we can no longer ignore the many new layers of complexity that sequence-directed structure superimposes upon the older polymeric concept of DNA. The DNA molecule is beginning to look less like a highway over land than a route over an ocean, the straight path broken by ripples, all of which have an origin, a history, and a fundamental meaning. To further extend the surrealism, we see, in this review, examples of structural relationships in regulatory nucleoprotein complexes that do not even exist in the absence of either one of the binding partners. The last f6w years have indeed witnessed a renaissance in biological regulation at the molecular level with both DNA and protein as active partners in the structural engagement we call gene regulation. However, in order to understand and digest this new information and its implications, we must continue to reassess our paradigms concerning the DNA molecule and redirect our thinking about this new Middle Earth of DNA structure-function in which much occurs that we cannot always see. We will also have to be at least as resourceful as past generations of structural biophysicists and will have to devise new experiments that can provide us with information on localized structural effects within the context of supramolecular systems. In other words, we must devise new methods to measure and interpret the ripples and waves on the surface of the genetic ocean. See addendum on p. 399.

V. Glossary of Abbreviations and Polynucleotide Notation

Abbreviations AP- 1 bZIP cab-140 CREB CI repressor EPFl

animal protein with DNA-binding specificity for ATGACTCAT basic leucine-zipper binding motif in a protein gene encoding the light-harvesting chlorophyll alb proteins animal protein with DNA-binding specificity for TGACGTCA the A repressor protein putative zinc-finger DNA-binding protein from Petunia

262

EPSPS GALA GBF proteins G-box Kruppel protein MCMl

NF-UB

NOE

OccR, OccQ OccR

OCSBF-1 Opaque2

PEI element

PE3-RE1 elements

RODNEY E. HARRINGTON AND ILGA WINICOV

gene encoding the 5-enolpyruvylshikimate-3phosphate synthase in Petunia a transcriptional activator protein in yeast required for galactose and melibiose catabolism plant nuclear proteins that bind to the G-box CCACGTGG sequence element in plant promoters zinc-finger protein encoded by the DrosophiEa segmentation gene Kruppel accessory proteins in the complex of the MATa2 repressor proteins with the STE6 gene operator in the yeast Saccharomyces cerevisiae (122, 128) a pleiotropic transcription factor that can effect gene control in a highly tissue-specific manner nuclear Overhauser enhancement (or effect), an enhancement of a nuclear magnetic resonance due to proximity of two nonbonded nuclei, which can be useful in spectral assignments and in determining interatomic distances genes in Agrobacterium tumfaciens that regulate certain catabolic processes transcription factor for the OccR gene gene encoding a bZJP regulatory protein in maize with DNA-binding specificity for the Ocs site: ACGTAAGCGCTI'ACGT gene encoding a bZIP regulatory protein in maize with DNA-binding specificity for TCCACGTAGA one of several specific recognition sites on the X phage genome for X Cro and X repressor proteins positive regulatory element of phyA promoter in rice: positions -372 to -338 relative to the 1 transcription start site adjacent positive and negative regulatory elements of the phyA promoter in rice. PE3, positive element: positions -11 to -91. RE1, negative element: positions -80 to -70. All positions relative to the + 1 transcription start site gene encoding the phytochrome apoprotein

+

STRUCTURAL FLEXIBILITY IN

rbcS RL

TBP TFIIIA

TGAla TGAl b

DNA-PROTEININTERACTIONS

263

gene encoding the small subunit of ribulosebisphosphate carboxylase the ratio of apparent to true fragment length as determined by electrophoretic mobility of curved DNA fragments in polyacrylamide gels the TATA-binding protein necessary for transcription of most eukaryotic genes a zinc-finger regulatory protein from Xenopus lmvis that binds specifically to the internal promoter of 5s RNA genes and also to 5s ribosomal RNA gene encoding a bZIP protein in tobacco with DNA-binding specificity for TGACGTCA gene encoding a bZIP protein in tobacco with DNA-binding specificity for TGAGGT

Polynucleot ide Notation dinucleotide (phosphodiester bonded) A-T or AT base-pair (H-bonded) A*T (A-plus-T)-rich region (same strand) A+T dinucleotides (on opposing strands, H-bonded) (A - T)*(A- T) base triplet CAC base triplets (H-bonded) (CAC)(GTG) nucleoside N (NNNN.**).(NNNN***) base-paired sequences (opposing)

REFERENCES 1. C. 0. Pabo and R. T.Sauer, ARB 61, 1053 (1992). 2. S. C. Harrison, Nature 353, 715 (1991). 3. T. A. Steitz, Q . Rev. Biophys. 23, 205 (1990). 4. R. E. Harrington, Mol. Microbiol. 6, 2549 (1992). 5. A. A. Travers, ARB 58, 427 (1989). 6. A. A. Travers, Cell 60, 177 (1990). 7. H. R. Drew and A. A. Travers, NARes 13, 4445 (1985). 8. W. F. Anderson, D. H. Ohlendorf, Y. Takeda and B. W. Matthews, Nature 290, 754 (1981). 9. W. F. Anderson, Y. Takeda, D . H. Ohlendorfand B. W. Matthews, J M B 159,745 (1982). 10. S.-H. Kim, Science 255, 1217 (1992). 11. R. E. Dickerson and H. R. Drew, J M B 149, 761 (1981). 12. R. E. Dickerson, in “Proceedings of the Sixth Conversation in Biomolecular Stereodynamics” (R. H. Sarma, ed.), p. 72. Adenine, Schenectady, New York, 1989.

26.1

RODNEY E. HAHHINGTON AND ILGA WlNICOV

13. A. Solshoy. P. T. McNamara, R. E . Harrington and E. N. Trifonov, PNAS 88,2312 (1991). 1.1. P. T. McNainara and R. E. Harrington, JBC 266, 12548 (1991). 15. M. Ptashne, Suture 323, 697 (1986). 16. H. Echols, Science 233, 1050 (1986). 17. C . Felsenfeld, Nature 355, 219 (1992). 18. H. Nussino\.; Crit. Rec. Biocheni. Mol. B i d . 25, 1885 (1990). 19. E. K. Trifonov, CRC Crit. Rev. Biochem. 19, 89 (1985). 20. S. lliekmann, in “h’ucleic Acids and Molecular Biology” (F. Eckstein and D. M .J. Lilley. eds. ), p. 138. Springer-\’erlag, Berlin, 1987. 21. hl. Sundaralingam and Y. C. Sekharudu, in “Structure and Expression” (W. K. Olson. M. H. Sarma, R. H. Saririaand M . Sundaralingam, eds.), Vol. 3, p. 9. Adenine, Schenectady, New York. 1988. 22. J. C. hfarini. S. D. Levene. D. M .Crothers and P. T. Engluiid, PNAS 79,766.1 (1982);up. cit. (10, 7678 (wrrectionl. 23. H . X1. \Vu and D. hl. Crothers, Mature 308, .509 (1984). 24. S. D. Levene and D. M .Crothers, J M B 189, 61 (1986). 25. S. D. Levene and D. 41. Crothers, J M B 189, 73 (1986). 26. H . 4 . Zooo.J. Drak, J. A. Rice and D. M .Crothers, Bchetn 29, 4227 (1990). 27. I). M .Crothers, J. Drak, J. D. Kahn and S. D. Levene, i n “Methods in Enzymology” (I). hl. J , Lilley and J. E. Dahlberg, eds.). Vol. 212. p. 3. Academic Press, San Diego. 1992. 28. 1). Sliore and R. L. Baldwin, J M B 170, 9.57 (1983). 29. I). Shore. J. Langowski and R. L. Baldwin. PNAS 78, 4833 (1981). 30. E. N. Trifonov and J. L. Sussman, PNAS 77, 3816 (1980). 3 1 . 1’. B. Zhnrkin, 1. Biornol. Struct. Dyn. 2, 785 (1985). 32. E. K.Trifonov and L. E. Ulanovsky, in “Unusual DNA Structures’’ (R. D. Wells and S. C. Harvey, eds. ), p. 173. Springer-Verlag, New York, 1988. 33. \-. B. Zhurkin. N. 8. Ulyanov, A. A. Gorin and R. L. Jernigan, PXAS 88, 7046 (1991). 34. if. C. M. Nelson, J. T. Finch, B. F. Luisi and A. Klug, Nature 330, 221 (1987). 3.5. H.4. Koo. H.-M. R’u and D. M. Crothers, Xafcttrrre 320, 501 (1986). 36. P. J. Hagerman, ARB 59, 755 (1990). ~37. P. J. Hagerman. Biopolytners 20, 1503 (1981). 38. K . I,. Cairncy and R . E. Harrington, Biopolyrners 21, 923 (1982). 39. 1. A. Schellman, BiopoIyrners 13, 216 (1974). 40. P. T. McNarnara, A. Bolshoy, E. N . Trifonov and H. E. Harrington, /. Biotnol, Struct. D y n . 8, 529 (1990). 41. P. J. Hagerman, PXAS 81, 4632 (1984). 42. S. D. Levene, H.-\V. Wn and D. M . Crothers, Bchern 25, 3988 (1986). 43. F. H. C. Crick and A. Klug, Nature 255, 530 (1975). 41. H. hl. Sobell. C . Tbai. S. G . Gilbert, S. C. Jain and T. D. Sahore, PNAS 73, 3068 (1976). 3.5. I). A . Pearlman. S. 14. Holbrook, D. H. Pirkle and S.-H. Kim, Science 227, 1304 (1985). 46. M.-J. Pillaire. G. Villani, J . 3 . Hoffmarin, A . - M . hlazard and M. Defais, NARes 20, 6473 il992). 47. A S. Bhattadiaryyai. A . I. H. Murchie and D. hi. J. Lilley. Nature 343, 484 (1990). 38. S. Schultz and T. A. Steitz, Science 253, I001 (1991). 49. J. A. McClarin, C. A. Frederick. B.-C. ivdiig, P. Greene, 11, W. Boyer, J. Grable and J. X I . Hosenherg, Science 234, 1526 (1986). .TO. \’. B. Zhnrkin. Y. P. Lysov and V. I. Ivanov, NARes 6 , 1081 (1979). 51. J. L. Sussman and A. Khg, PXAS 75, 103 (1978). 52. G. S. Manning. Biopokymers 22, 689 (1983).

STRUCTURAL FLEXIBILITY IN

DNA-PROTEININTERACTIONS

265

53. H. Teitelbaum and S. W. Englander, J M B 92, 55; ibid. 92, 79 (1975). 54. C. Mandal, N. R. Kallenbach and S. W. Englander, J M B 135, 391 (1979). 55. N. R. Kallenbach, C. Manda and S. W. Englander, in “Nucleic Acid Geometry and Dynamics” (R. H. Sarma, ed.). Pergarnon, New York, 1980. 56. M. Gueron, M. Kochoyan and J.-L. Leroy, Nature 328, 89 (1987). 57. J. Wilcoxen and J. M. Schurr, Biopolymers 22, 2273 (1983). 58. M. Frank-Karnenetskii, Nature 328, 17 (1987). 59. 0. Gotoh and T. Tagashira, Biopolymers 20, 1033 (1981). 60. K. J. Bresslauer, R. Frank, H. Blocker and L. A. Markey, PNAS 83, 3746 (1986). 61. S. Cheung, K. Arndt and P. Lu, PNAS 81, 3665 (1984). 62. P. T. McNamara, I. Winicov and R. E. Harrington, BBRC 138, 110 (1986). 63. N. C. Stellwagen, in “Structure and Expression” (W. K. Olson, M. H. Sarma, R. H. Sarma and M . Sundaralingarn, eds.), Vol. 3, p. 69. Adenine, Schenectady, New York, 1988. 64. S. Zinkel and D. M. Crothers, Nature 328, 178 (1987). 65. S. Zinkel and D. M. Crothers, Biopolymers 29, 29 (1990). 66. R. G. Brennan, S. L. Roderick, Y. Takeda and B. W. Matthews, PNAS 87, 8165 (1990). 67. A. Bolshoy and E. N. Trifonov, “Abstract 7 and Report to the EMBO Workshop on Auxiliary Binding Proteins in Prokaryotes and Eukaryotes,” Ein Gedi, Israel, May 21-25, 1990. 68. B. H. Zimrn and S. D. Levene, Q . Rev. Biophys. 25, 171 (1992). 69. R. E. Harrington, Electrophoresis 14, 732 (1993). 70. R. K. Z. Tan and S. C. Harvey, J . Biomol. Struct. Dyn. 5, 497 (1987). 71. H. R. Drew and A. A. Travers, J M B 186, 733 (1985). 72. D. M. Crothers and T. A. Steitz, in “Transcriptional Regulation,” p. 501. CSHLab, Cold Spring Harbor, New York, 1992. 73. A. Lipanov, M. L. Kopka, M. Kaczor-Grzeskowiak, J. Quintana and R. E. Dickerson, Bchem 32, 1373 (1993). 74. Y. L. Lyubchenko, L. S. Shlyakhtenko, B. Chernov and R. E. Harrington, PNAS 88,5331 (1991). 75. L. Ulanovsky, M. Bodner, E. N. Trifonov and M. Choder, PNAS 83, 862 (1986). 76. K. Zahn and F. R. Blattner, Science 236, 416 (1987). 77. A. M. Barber and V. B. Zhurkin, J. Biomol. Struct. Dyn. 8, 213 (1990). 78. Y. L. Lyubchenko, L. S. Shlyakhtenko, E. Appellaand R. E. Harrington, Bchefn32,4121 (1993). 79. Y. Takeda, A. Sarai and V. M. Rivera, PNAS 86, 439 (1989). 80. W. Kabsch, S. Sander and E. N. Trifonov, NARes 10, 1097 (1982). 81. L. S. Shlyakhtenko, Y. L. Lyubchenko, B. K. Chernov and V. B. Zhurkin, Mol. B i d . 24, 79 (1990). 82. F. Aboul-ela, D. Koh, I. Tinoko and F. Martin, NARes 13, 4811 (1985). 83. D. J. Patel, S. A. Kozlowski, S. Ikuta and K. Itakura, Fed. Proc. 43, 2663 (1984). 84. C.-H. Hsien and J. D. Griffith, PNAS 86, 4833 (1989). 85. A. Bhattacharyya and D. M. J. Lilley, NARes 17, 6821 (1989). 86. Y. Timsit, E. Vilbois and D. Moras, Nature 354, 167 (1992). 87. A. Sakai and Y. Takeda, PNAS 86, 6513 (1989). 88. M. E. Donlan and P. Lu, NARes 20, 525 (1992). 89. V. P. Chuprina, NARes 15, 293 (1987). 90. E. von Kitzing and S. Diekrnann, Eur. Biophys. I . 15, 13 (1987). 91. A. V. Fratini, M. L. Kopka, H. R. Drew and R. E. Dickerson, JBC 257, 14686 (1982). 92. M. R. Gartenberg and D. M. Crothers, Nature 333, 824 (1988).

266

RODNEY E. HAHRINGTON AND ILGA WINICOV

93. D. M . J. Lilley, Nature 354, 356 (1991). 94. W. F. Anderson, D. H. Ohlendorf, Y. Takeda and B. U’. Matthews, Nature 290, 754 (1981). 95. S. C. Harrison and A. K. Aggarwal, ARB 59, 933 (1990). 96. A. Mondragon, C. Wolberger and S. C. Harrison, J M B 205, 179 (1989). 97. A. Mondragon, S. Subbiah, S. C. Almo, M. Drottar and S. C. Harrison, J M B 205, 189 (1989). Y8. R. W. Schevitz, Z. Otwinowski, A. Joachimiak, C. L. Lawson and P. B. Sigler, Nature 327, 782 (1985). 99. I,. J. Beamer and C. 0.Pabo, J M B 227, 177 (1992). 100. A. Mondragon and S . C. Harrison, J M B 219, 321 (1991). 201. A. K. 14ggmal, D. W. Rodgers, M. Drottar, M. Ptashne and S. C. Harrison, Science 242, 899 (1988). 102. Z.Otwinowski, R. U’. Schevitz, R . 4 . Zhang, C. L. Lawson, A. Joachimiak, R. Q. Marmorstein, B. F. Luisi and P. B. Sigler, Nature 335, 321 (1988). 103. C. Wolbergrr, Y. Dong, M. Ptashne and S. C. Harrison, Nature 335, 789 (1988). 104. S. R. Jordan and C. 0. Pabo, Science 242, 893 (1988). 105. L. J. Beanirr and C. 0. Pabo, J M B 227, 177 (1992). 106’. M. Ptashne, “A Genetic Switch.” Cell Press, Cambridge, Massachusetts, 1986. 107. N. D. Clarke, L. J. Beamer, H. R. Goldberg, C. Berkower and C. 0. Pabo, Science 254, 267 (1991). 108. A. D. Johnson, A. R. Poteete, G. Lauer, R. T. Sauer, G . K. Akers and M. Ptashne, Nature 294, 217 (1981). 109. P. Ptashne. Cancer 67, 2422 (1991). 110. M. Ptashne, A. Jeffrey, A. D. Johnson, R. Maurer, B. J. Meyer, C. 0. Pabo, T. M. Roberts and R. T. Sauer, Cell 19, (1980). 111. B. de Crombrugghe, S. Busby and H. Buc, in “BiologicalRegulatioii and Development” (R. F. Coldberger and K. R. Yamomoto, eds.), p. 129. Plenum, New York, 1984. 112. H.-N. Liu-Johnson, M. R. Cartenberg and D. M. Crothers, Cell 47, 995 (1986). 113. L. Bracco, D. Kolarz, A. Kolb, S. Diekmann and H. Buc. EMBO J. 8, 4289 (1989). 114. M . R. Gartenberg and D. M. Crothers, JMB 219, 217 (1991). 115. C. L. Lawson and P. B. Sigler, Nature 233, 869 (1988). 116. R . 4 . Zhang, A. Joachimiak, C. L. Lawson, R. W. Schevitz, Z. Otwinowski and P. B. Sigler, Nature 327, 591 (1987). 117. A. A. Kuramoto, W. G. Miller and R. P. Gunsalus, Genes Deo. 1, 556 (1987). 118. S. Bass, P. Sugiono, D. N. Arvidson, R. P. Gunsalus and P. Youderain, Genes Den 1, 565 (1987). 119. J. Carey, PNAS 85, 975 (1988). 120. M. P. Scott, J. W.Tamkun and C . W. Hartzell, BBA 989, 25 (1989). 121. M. Molter, A. Schier and W. J. Gehring, Curr. Opin. Cell B i d . 2, 485 (1990). 122. C. Wolberger. A. K. Vershorr, B. Liu, A. D. Johnson and C. 0. Pabo, Cell 67, 517 (1991). 123. G. Otting. P Q. Qian, M. Billeter, M. Muller, M. Affolter, W. J. Gehring and K. Wuthrich, E M B O J . 9, 3085 (1990). 124. C. H. Kissinger, B. Liu, E. Martin-Blanco, T. 8. Kornberg and C. 0. Pabo, Cell 63, 579 ( 1990). 125. M. Muller, M. Molter, W. Leupin, G. Otting, K. Wuthrich and W. J. Gehring, E M B o J . 7, 4299 (1988). 126. M. Affoker, A. Percivd-Smith, M. Muller, W. Leupin and W. J. Cehring, PNAS 87, 4093 (1990).

STRUCTURAL FLEXIBILITY IN

DNA-PROTEIN INTERACTIONS

267

127. Y. Q. Qian, M. Billeter, G. Otting, M. Muller, W. J. Gehring and K. Wuthrich, Cell 59, 573 (1989). 128. C. A. Keleher, C. Goutte and A._DLJohnson, Cell 53, 927 (1988). 129. D. R. Engelke, S. Y. Ng, B. S., Shastry and R. G. Roeder, Cell 19, 717 (1980). 130. S. Sakonju, D. F. Bogenhagen and D. D. Brown, Cell 19, 13 (1980). 131. D. F. Bogenhagen, S. Sakonju and D. D. Brown, Cell 19, 27 (1980). 132. J. Miller, A. D. McLachlan and A. Klug, EMBO J. 4 , 1609 (1985). 133. K. Struhl, TZBS 14, 136 (1989). 134. J. M. Berg, Annu. Reu. Biophys. Biophys. Chem. 19, 405 (1990). 135. L. Fairall, D. Rhodes and A. Klug, ] M B 192, 577 (1986). 136. M. E. Churchill, T. D. Tullius and A. Klug, PNAS 87, 5528 (1990). 137. A. Klug and D. Rhodes, TZBS 12, 464 (1987). 138. J. Berg, PNAS 85, 99 (1988). 139. K. E. Vrana, M. I. A. Churchill, ' I D. Tullius and D. D. Brown, MCBiol8, 1684 (1988). 140. N. P. Pavletich and C. 0. Pabo, Science 252, 809 (1991). 141. J. J. Hayes and K. R. Clernens, Bchem 31, 11600 (1992). 142. X. Liao, K. R. Clemens, L. Tennant, P. E. Wright and J. M. Gottesfeld, JMB 223, 857 (1992). 143. K. R. Clemens, X. Liao, V. Wolf, P. E. Wright and J. M. Gottesfeld, PNAS 89, 10822 (1992). 144. K. R. Clemens, V. Wolf, S. J. McBryant, P. Zhang, X. Liao, P. E. Wright and J. M. Gottesfeld, Science 260, 530 (1993). 145. J. J. Hayes and T. D. Tullius, J M B 227, 407 (1992). 146. D. P. Bazett-Jones and M. L. Brown, MCBioZ 9, 336 (1989). 147. G. P. Schroth, G. R. Cook, E. M. Bradbury and J. M. Gottesfeld, Nature 340,487 (1989). 148. C. Zweib and R. S. Brown, NARes 18, 583 (1990). 149. G. P. Schroth, J. M. Gottesfeld and E. M. Bradbury, NARes 19, 511 (1991). 150. N. B. Ulyanov and V. B. Zhurkin, J. Biomol. Struct. Dyn. 2, 361 (1984). 151. D. Rhodes, E M B O J . 4, 3473 (1985). 152. Y. Y. Xing and A. Worcel, MCBiol9, 499 (1989). 153. J. M. Gottesfeld, J. Blanco and L. L. Tenant, Nature 329, 460 (1987). 154. D. Rau, personal communication (1993). 155. B. Christy and D. Nathans, PNAS 86, 8737 (1989). 156. M. Beato, Cell 56, 335 (1989). 157. R. E. Klevit, J. R. Heriott, and S. J. Horvath, Proteins: Struct., Funct. Genet. 7, 215 (1990). 158. M. Johnson, Microbiol. Aeu. 51, 458 (1987). 159. J. S. Flick and M. Johnson, MCBiol 10, 4757 (1990). 160. R. Marmorstein, M. Carey, M. Ptashne and S. C. Harrison, Nature 356, 408 (1992). 161. M. J. Fedor, N. F. Lue and R. D. Kornberg, J M B 204, 109 (1988). 162. J. D. Baleja, R. Marmorstein, S. C. Harrison and G. Wagner, Nature 356, 450 (1992). 163. P. J. Kraulis, A. R. C. Raine, P. L. Gadhavi and E. D. Laue, Nature 356, 448 (1992). 164. R. M. Evans, Science 242, 889 (1988). 165. J. W. R. Schwabe and D. Rhodes, TlBS 16, 291 (1991). 166. T. Hard, E. Kellenbach, R. Boelens, B. A. Maler, K. Dahlman, L. P. Freedman, J. Carlstedt-Duke, K. R. Yamamoto, J. A. Gustafsson and R. Kaptein, Science 249, 157 (1990). 167. B. F. Luisi, W. X. Xu, Z. Otwinowski, L. P. Freedman, K. R. Yamamoto and P. B. Sigler, Nature 352, 497 (1991).

268

RODNEY E. HARRINGTON AND ILGA WINICOV

1 6 8 . G . Klock, V . Strahle and G. Schutz, Nature 329, 734 (1987). 169. B. M. Forman and H. H. Sainuels. Mol. Endocrinol. 4, 1293 (1990). 170. 1). Pearce and K. R. Yamamoto, Science 259, 1161 (1993). 171. J. W.Funder, Science 259, 1132 (1993). 272, C. Carlberg. I. Bendik, A. Wyss, E. hleier, L. J. Sturzenbecker, J. F. Grippo and W. Hunziker, Sature 361, 657 (1993). 173. S. Green, Mature 361, 590 (1993). 174. P. K. Vogt and T. J. BOS, T l B S 14, 172 (1989). 17.5. T. Hai and T. Curran, PNAS 88, 3720 (1991). 176. T K. Kerppola and T. Curran, Science 254,1210 (1991). 177. P. Angel, E. Hattori, T. Smeal and hi. Karin, Cell 55, 875 (1988). 178. T K. Kerppola and T. Curran, Cell 66, 317 (1991). 179. E. M. Blackwood, L. Eretzner. T. I<. Blackwell, H. Weintraub and R. N. Eisenman, “Origins of Human Cancer: A Comprehensive Review,” p. 3665. CSHLab, Cold Spring Harbor, New York, 1991. 180. T. K. Blackwell, L. Kretzner, E. M. Blackwood, R. N. Eisenman and H. Weintrauh, Science 250, 1149 (1%”. 281. E. Kerkhoff, K. Bister and K.-H. Klempnauer. PNAS 88, 4323 (1991). 182. E. M . B l a c k w d and R. N. Eisenman, Science 251, 1211 (1991). ZR3. E. 91. Blackwood. B. Luscher and R. N. Eisenmdn, Genes Dea 6, 71 (1992). I&. D. Reisman. N. B. Elkind, B. Roy. J. Bearnon and V. Rotter, Cell Growth Differ. 4, 57 (1993). 18.5. $1. Sawaddogo, bl. W.Van Dyke, P. D. Gregor and R. G. Roeder,JRC 263, 11985(1988). 186. H. Beckniann, L.-K. Su and T. Kadesch, Genes Dec. 4, 167 (1990). 187. P. hl. Gilmartin. I.. Sarokin, J. Memelink and N . - H . Chua, PIant Cell 2, 369 (1990). 188. T. E . Ellenherger, C. J. Brandl, K. Struhl and S. C. Harrison, Cell 71, 1223 (1992). 189. A. R. Oliphant, C. J. Brandl and K. Struhl, MCBiol 9, 2944 (1989). 2 9 0 , J. W.Sellers and K. Struhl, Nature 341, 74 (1989). 291. G . Risse. K. Jooss, hl. Neuberg, H. J. Bruller and R. Muller, E M B O J . 8, 3825 (1989). 192. S , Buratowski. S . Hahn. L. Cuarente and P. A. Sharp, Cell 56, 549 (1989). 29$3.B. D. Dynlacht, T. Hoey and R. Tjian, Cell 66, 563 (1991). 194. B. F. Pugh and R. Tjidn, Genes Dec. 5 , 1935 (1991). 19.5. J. Ham, 6. Steger and M . Yaniv, FEBS M I . 307, 81 (1992). I96. H. A. Nash and A. E. Granstoir, Cell 67, 1037 (1991). 197. P. W. J. Rigby, Cell 72, (1993). 298. L. Coniai, N T4nese and R. Tjian, Cell 68, 965 (1992). 199. D, B. Nikolov. S.-H. Hu, J. Lin, A. Gasch, A. Hoffmann. M . Horikoshi, N.-H. Chua, R. 6. Roeder and S. K. Burley. Nature 360, 40 (1992). 200. D. 8.Starr and D. K. Hawley, Cell 67, 1231 (1991). 201. 1). K. Lee, Sl. Horikoshi and R. 6. Roeder, Cell 67, 1241 (1991). 2’02’. Y. Nakatani, M . Horikoshi, hl. Brenner, T. Yaniamoto, F. Besnard, R. 6. Roeder and E. Freese, Nature 348, 86 (1990). 203. R. Breathnach and P. Chanibon, ARB 50, 349 (1981). 204. I. U’itt, X . Stranb, N. F. Kaufer and T. Gross, E M B O J . 12, 1201 (1993). 205. J. J. Lenardo and D. Baltimore, Cell 58, 227 (1989). X f i . R. Sen and D. Baltimore, Cell 46, 705 (1986). 207. H. Kieran. LT. Blank. F. Logeat, 0. J. Vanderchove, F. Lottspeich, 0.LeBail, M. B. Urban, P. Kourilsky, P. .4.Baeuerle and A. Israel. Cell 62, 1007 (1990). 908. S. Ghosh, A . M. Clifford, L. R. Riviere, P. Tempst, G . P. Nolan and D. Baltimore, Cell 62, 1019 (lYgO\.

STRUCTURAL FLEXIBILITY IN

DNA-PROTEIN

INTERACTIONS

269

209. M. Atchison and R. P. Perry, Cell 48, 121 (1987). 210. F. Siddiqui, R. Gaynore, A. Srinivasan, J. Mapoles and R. W. Farr, Virology 169, 479 (1989). 211. G. Nabel and D. Baltimore, Nature 326, 711 (1987). 212. U . Zabel and P. A. Baeuerle, Cell 61, 255 (1990). 213. M. B. Urban and P. A. Baeuerle, Genes Dew. 4, 1975 (1990). 214. M. B. Urban and P. A. Baeuerle, New Biologist 3, 279 (1991). 215. E. Appella, personal communication (1993). 216. R. Schreck, H. Zorbas, E.-L. Winnacker and P. A. Baeuerle, NARes 18, 6498 (1990). 217. P. A. Baeuerle and D. Baltimore, Genes Dew. 3, 1689 (1989). 218. L. Wang, J. D. Helmann and S . C. Winans, Cell 69, 659 (1992). 219. M. Serrano, C. Gutierrez, M. Salas and J. M. Hermoso, J M B 230, 248 (1993). 220. P. M. Gilmartin, L. Saraokin, J. Memelink and N.-H. Chua, Plant Cell 2, 369 (1990). 221. C. Kuhlemeier, Plant Mol. B i d . 19, 1 (1992). 222. F. Katagiri and N.-H. Chua, Trends Genet. 8, 22 (1992). 223. J. Ma., E. Prizibilla, J. H’u, L. Bogorad and M. Ptashne, Nature 334, 631 (1988). 224. G . Giuliano, E. Pichersky, V. S. Malik, M. P. Timko, P. A. Scolnik and A. R. Cashmore, PNAS 85, 7089 (1988). 225. A. J. DeLisle and R. J. Ferl, Plant Cell 2, 547 (1990). 226. U. Schindler and A. R. Cashmore, EMBO J. 9, 3415 (1990). 227. U. Schindler, A. E. Menkens, H. Beckmann, J. R. Eckerand A. R. Cashmore, EMBO]. 11, 1261 (1992). 228. T. Tabata, H. Takase, S. Takayama, K. Mikami, A. Nakatsuka, T. Kawata, T. Nakayamaand M. Iwabuchi, Science 245, 965 (1989). 229. K. Oeda, J. Salinas aad N.-H. Chua, EMBOJ. 10, 1793 (1991). 230. B. Weisshaar, G . A. Armstrong, A. Block, 0. Costa e Silva and K. Hahlbrock, EMBO J. 10, 1777 (1991). 231. M. J. Guiltinan, W. R. Marcotte and R. S. Quatrano, Science 250, 267 (1990). 232. W. H. Landschulz, P. F. Johnson and S. L. McKnight, Science 240, 1759 (1988). 233. C. R. Vinson, P. B. Sigler and S. L. McKnight, Science 246, 911 (1989). 234. J. C. Hu, E. K. O’Shea, P. S. Kim and R. T. Sauer, Science 250, 1400 (1990). 235. R. Rasmussen, D. Benvegnu, E. O’Shea, P. S. Kim and T. Alber, PNAS 88, 561 (1991). 236. R. V. Talanian, C. J. McKnight and P. S. Kim, Science 249, 769 (1990). 237. K. T. O’Neil, J. D. Shuman, C. Ampe and W. F. DeGrado, Bchem 30, 9030 (1991). 238. U. Schindler, W. Terzaghi, H. Beckmann, T. Kadesch and A. R. Cashmore, EMBOJ. 11, 1275 (1992). 239. U. Schindler, H. Beckmann and A. R. Cashmore, Plant Cell 4, 1309 (1992). 240. M. E. Williams, R. Foster and N.-H. Chua, Plant Cell 4, 485 (1992). 241. G . A. Armstrong, B. Weisshaar and K. Hahlbrock, Plant Cell 4, 525 (1992). 242. J. Salinas, K. Oeda and N.-H. Chua, Plant Cell 4, 1485 (1992). 243. R. J. Schmidt, M. Ketudat, M. J. Aukerman and G. Hoschek, Plant Cell 4, 689 (1992). 244. L. D. Pysh, M. J. Aukerman and R. J. Schmidt, Plant Cell 5, 227 (1993). 245. K. Singh, E. S. Dennis, J. G. Ellis, D. J. Llewellyn, J. G. Tokuhisa, J. A. Wahleithner and W. J. Peacock, Plant Cell 2, 891 (1990). 246. F. Katagiri, E. Lam and N.-H. Chua, Nature 340, 727 (1989). 247. E. Lam, Y. Kano-Murakami, P. Gilmartin, B. Niner and N.-H. Chua, Plant Cell 2, 857 (1990). 248. T. Evans and G. Felsenfeld, Cell 5, 877 (1989). 249. M. Yamamoto, L. J. KO, M. W. Leonard, H. Beug, S. H. Orkin and J. D. Engel, Genes Deu. 4, 1650 (1990).

270

RODNEY E. HARRINGTON AND ILGA WINICOV

250. E. Lam and N.-H. Chua, Plant Cell 1, 1147 (1989). R. G. K. Donald and A. R. Cashmore, EMBO]. 9, 1717 (1990). 252. I . Winicvv, Plant Physiol. 102, 681 (1993). 253. P. S. Freemont, I. M. Hanson and J. Trowsdale, Cell 64, 483 (1991). 2%. ’L: Haupt, W.S. Alexander, G. Barri, S. P. Klinken and J. M. Adams, Cell 65, 753 (1991). 255. H. Takatsuji, M. Mori, P. N. Benfey, L. Ren and N.-H. Chua, EMBO J. 11, 241 (1992). 256. X.-W. Deng, M. Matsui, N. Wei, D. Wagner, A. M. Chu, K. A. Feldmann and P. H. Quail, Cell 71, 791 (1992). 257. N . Datta and A. R. Cashmore, Plant Cell 1, 1069 (1989). 2.9. L. Sun, H. A. Doxsee, E. Hare1 and E. M. Tobin, Plant Cell 5, 109 (1993). 259. Y. Wang, W. Zhang, J. Cao, D. McElroy and R. Wu, MCBiol 12, 3399 (1992). 260. B. G. Forde, J. Freeman, J. E. Oliver and M. Pineda, Plant Cell 2, 925 (1990). 261. B. A. Metz, P. Welters, H. J. Hoffmann, E. 0. Jensen, J. Schetland F. J, de Bruijn, hlCG 214, 181 (1988). 262. K. D. Jofuku, J. K. Okarnuro and R. B. Goldberg, Nature 238, 734 (1987). 263 K. Jacobsen, N. B. Laursen, E. 0. Jensen, A. Marcker, C. Poulsen and K. A. Marcker, Plant Cell 2, 85 (1990). 26.1. J . C. Cushnviln and H. J. Bohnert, Pkunt M o l . B i d . 20, 411 (1992). 265. E. Lam and N.-H. Chua, Science 248, 471 (1990). 266. 0. Persic and E. Lam, Plant Cell 4, 831 (1992). 267. P. hl. Gilmartin, J. Memelink, K. Hiratsuka, S. A. Kay and N.-H. Chua, Plant Cell 4, 839 (1992). 268. K. Dehesh, W. B. Bruce and P. H. Quail, Science 250, 1397 (1990). 269. K. Dehesh, H. lung. J. M. Tepperman and P. H. Quail, E M B O J . 11, 4131 (1992). 270. L. A. Wanner and W. Gruissem, Plant Cell 3, 1289 (1991). 271. T. Manzara. P. Carrasm and W. Gruissem, Plant Cell 3, 1305 (1991). 272. B. Piechulla, J.-W. Kellman, E. Pichersky E. Schwartzand H.-H. Forster, MGG 230,413 (1991). 27.3. R. hl. Myers. K. Tilly and T. Maniatis. Science 232, 613 (1986). 274. A. Cowie and R. M. Myers, MCBiol8, 3122 (1988). 275. L. Stuve and R. M. Myers, MCBiol 10, 972 (1990). 276. W. B. Bruce, X.-W. Deng and P. H. Quail, EMBO]. 10, 3015 (1991). 277. T. Evans, 6. Felsenfeld and M. Reitman, Annu. Reo. Cell Biol. 6, 95 (1990). 278. S. Orkin, Cell 63, 665 (1990). 279. E. deBoer, M. Antoniou, V. Mignotte, L. Wall and F. Grosveld, EMBO]. 7, 4203 (1988). 280. K . M . Barnhart, C. G. Kim and M. Sheffery. MCBiol 9, 2606 (1989). 281. R. M. Myers, A. Cowie, L. Stuve, G. Hartzog and I<. Gaensler. in “Hemoglobin Switching, Part A: Transcriptional Regulation” (G. Stamatoyannopoulos and A. Nienhuis, eds.), p. 117. Liss, New York, 1989. 282. hi. Plumb. J. Frampton, H. Wainwright, M. Walker, K. MacLeod, 6. Goodwin and P. Harrison, NARes 17, 73 (1989). 28.3. 6. A. Hartzog and R. M. Myers, MCBiol 13, 44 (1993). 284. I. J. Miller and J. J. Bieker, MCBwl 13, 2776 (1993). 28.5. K. McLeod and M. Plumb, MCBiol 11, 4324 (1991). 285,. L. L. Stuve and R. M. Myers, MCBiol 13, 4311 (1993). 286. E. Tobin and J. Silverthome, Annu. Rec. Plant Physiol. 36, 569 (1985). 287. P. H. Quail. ARGen 25, 389 (1991). 288. W. B. Bruce and P. H. Quail, Plant Cell 2, 1081 (1990). 251.