Transcription factor structure and DNA binding Cynthia Wolberger The Johns Hopkins University School of Medicine, Baltimore, USA Within the past year, structures have been determined for five transcription factors, each of which is a member of a different structural class. Papilloma virus E2 and the TFIID TATA-binding protein represent novel and unanticipated DNA-binding motifs. The GAL4 protein is a new kind of metal-binding domain, GCN4 is a basic-region leucine zipper protein, and MAT0t2 is a member of the homeodomain family. With the exception of TFIID, the structures of the DNA-bound protein have been determined, revealing how each of the folds is used to recognize specific DNA sequences. Current Opinion in Structural Biology 1993, 3:3 10
Introduction Cellular phenomena such as differentiation, response to extracellular signals, and cell growth are controlled at the most fundamental level by transcription factors: proteins that bind to DNA and regulate mRNA transcription and, hence, gene expression. The great majority of transcription factors that have been identified are sequence-specific DNA-binding proteins. Highly conserved DNA-binding and'dimerization domains have been identified on the basis of structural studies and amino acid sequence com~ parisons. Each structural motif represents, in part, the evolution of a different solution to the problem of how to design a protein whose surface is structurally and chemically complementary to its target DNA site. Two recent reviews have focussed on the different structural families of transcription factors and advances that have been made in understanding their interaction with DNA [1.,2.]. Work published within the past year has expanded significantly the repertoire of known transcription factor structural motifs and our understanding of how they bind to DNA. This review covers recent structural studies of five eukaryotic transcription factors: the yeast proteins GAL4 [3"], MAT~2 [4"], and GCN4 [5"], papilloma virus E2 [6"*], and the TFIID TATA-binding protein from Arabidopsis thaliana [7"]. Structures of DNA-bound domains have been determined for all but the latter. Each protein is a member of a different structural family and hence differs in its mode of DNA binding.
GAL4: a new zinc-binding d o m a i n The DNA-binding domain of the GAL4 yeast transcriptional activator comprises its 65 amino-terminal residues and retains the binding specificity and ability to dimerize
on DNA of the intact 881-amino-acid protein. The DNAbinding region of GAL4 had previously been shown to comprise a metal-binding domain containing two Zn 2 + ions coordinated by six cysteine residues which are conserved in eight other related fungal proteins [8,9]. Based on these studies and on sequence data, it was clear that GAL4 was a metal-coordinating transcription factor distinct from the TFIllA-like zinc fingers, the steroid receptors, and the retroviral gag proteins (for a review, see [10]). The three-dimensional structure of this DNA-binding domain has now been revealed by the X-ray crystal structure of a GAL4-DNA complex [3 *o] and by two solution NMR studies of the uncomplexed DNA-binding domain determined using Cd-substituted protein [11",12"]. All three studies show that residues 8-40 of GAL4, the core of the DNA binding domain, form a compact globular domain folded around two metal ions. The Zn ions are tetrahedrally coordinated by six cTsteine residues, forming a 'binuclear cluster' reminiscent of the bonding observed in metallothinein [13,14]. The polypeptide chain contains two ¢~-helices, each followed by an extended strand. These two helix strand motifs are structurally similar and are related by a dyad which bisects the core of the domain and is also shared by the two metal ions. (A further discussion of the GAL4 metal-binding domain, and its relationship to the other classes of metal-binding domains, is to be found in a review by Berg (pp 11-16) in this issue.) In the 2.7g. crystal structure of a GAL4(lq55) DNA complex, the metal-binding domain mediates sequence-specific contacts with the bases whereas carboxy-terminal residues form a dimerization element. The GAL4 dimer binds to the 19bp oligomer in the crystals by inserting the metal-binding domains into the major groove on opposite faces of the DNA, which are thus separated by approximately 1% turns of the DNA helix (Fig. la).
Abbreviations bZlP--basic-region leucine zipper; TBP--TATA-box-binding protein. ~) Current Biology Ltd ISSN 0959-440X
3
4
Protein-nucleic acid interactions These two domains are joined by a two-stranded coiled coil (residues 5044), which is formed by the association of one amphipathic a-helix from each monomer. Each helix is connected to a metal-binding domain by an extended linker (residues 41-49). The interaction between the two helices of the dimerization element is stabilized by interactions between hydrophobic side chains at the interface and by two inter-helix salt bridges. The axis of the coiled coil coincides with the dyad in the DNA sequence and is oriented such that the positive ends of the helix dipoles point towards the negatively charged DNA backbone. The dimerization element apparently does not form in the absence of DNA binding, as the NMR study of the same protein fragment, GAL4(l~55), in the absence of DNA shows a high degree of conformational flexibil ity in residues outside the metal-binding domain and an absence of dimerization [ 12.]. The oligonucleotide in the crystal contains a 17 bp palindromic consensus binding site. The metal-binding do mains are centered over conserved CCG triplets (base pairs 6-8, with base pair 0 defined as the central base pair) in the major grooves. Hydrogen-bond contacts with these bases are formed by two residues, Lysl7 and Lys18 (Fig. lb). It is most striking that two of these three base pairs are contacted by main-chain carbonyls as Lys17 and Lysl8 hydrogen bond to the N4 of cytosine. This implies that the binding preference for these operator positions cannot be changed by simple side-chain substitution. Although a contact between a peptide carbonyl and a cytosine residue had been previously observed in k repressor [15], this is the first known instance where two-thirds of the contacted base pairs are specified by main-chain atoms. Additional binding energy, and possibly some sequence specificity, are provided by five side chains from each monomer which form hydrogen bonds via side chains and peptide NH groups with five adjacent phosphate groups. These side chains (Fig. lc) are located in the metal-binding domain, the linker, and the dimerization element, thereby stabilizing the entire GAL4 protein on its binding site. The ability" of GAL4 to recognize its binding site derives from the peptide contacts to the dyad-related CCG triplets and the precise requirement for an 11 bp spacer between the two triplets. The latter is specified by the length of the linker region and the contbrmation of the dimerization element. The extended conformation of the protein dimer and the lack of major contacts outside the metal-binding domains leave almost a full turn of DNA with its major groove exposed at the center of the operator. Although base substitutions in this region have little effect on GAL4 binding in vitro, they are conserved in all k n o w n in vivo GAL4 binding sites, raising the possibility that the central portion of the operator serves as a binding site for a second protein. Cooperative interactions with an accessory protein could contribute significantly to overall specificity of binding through protein-protein interactions and additional DNA contacts.
GCN4: a bZIP transcription factor The yeast transcriptional activator GCN4 contains a basicregion leucine zipper (bZlP), a conserved DNA-binding and dimerization element which has been found in a number of eukaryotic transcription factors (for a review, see [16]). The leucine zipper consists of interacting ¢~ helices that contain a heptad repeat of leucine residues [17]. That the helices, one from each monomer, interact to form a parallel coiled coil was confirmed by the structure determination of the 33-amino-acid bZIP of GCN4 [18], which revealed the details of the interactions that stabilize the coiled coil. In addition to the leucine zipper, bZIP proteins contain residues amino-terminal to the leucine zipper helices that are important for DNA binding. Spectroscopic studies of this basic region of the bZIP proteins have shown that it is disordered in the absence of DNA, adopting an a-helical conformation only upon interaction with a DNA binding site [19,20]. The recent 2.9 ~ crystal structure determination of the entire bZIP element bound to DNA [5"] shows a dimer of the 56-amino-acid GCN4 bZIP element (residues 226-281 of the intact protein) bound to a 20 bp synthetic oligonucleotide containing the AP-1 binding site. This protein fragment contains an additional 24 residues amino-terminal to the coiled-coil dimerization region, whose structure was previously determined ([18], reviewed in [16] ). Each GCN4 m o n o m e r consists of a single continuous ahelix of 52 residues. The helices in the protein dimer pack closely at their carboxy-terminal ends, interacting via extensive hydrophobic interactions and forming a coiled coil essentially identical to that observed in the study of the 33 residue bZIP peptide [18]. The dimerization region is oriented nearly perpendicular to the DNA, whereas the portion of each helix amino-terminal to the dimerization region contains a gentle, smooth curve that splays the two helices apart, permitting the dimer to grasp the DNA like a pair of chopsticks (Fig. 2a). The GCN4 helices lie nearly parallel to the major groove, one on each side of the DNA helix, and extend beyond the point of contact with the DNA. This mode of binding confirms the essential features of the 'induced helical fork' model proposed by O'Neil et al. [21], and not the alternative 'scissors grip' model, which predicted kinked a-helices and more pronounced helical bending [22]. Contacts with the DNA are mediated by residues in the basic region of each a-helix, which lies in the major groove. The large set of protein-DNA contacts (Fig. 2b), involving side chains in three turns of the a-helix, must serve to stabilize helix formation upon DNA binding. The sequence of the nine base pairs contacted consists of two ATGA sites related by a dyad that intersects an intervening GC base pair. This asymmetry in the central base pair and the resulting protein-DNA contacts apparently give rise to a slight asymmetry in DNA contacts and, hence, a slight displacement of the protein dimer to one side of the pseudo-dyad. Symmetric base contacts are made to the four bases in each half site by four side chains: Asn235, Ala238, Ala239 and Ser242. In addition, Lys231
Transcription factor structure and DNA binding Wolberger forms a water-mediated hydrogen bond to A4 in only one half-site. Other asymmetric contacts are formed by the Arg243 residue from each monomer, which respectively form hydrogen bonds to the cytosine and guanine in the intervening base pair located at the operator dyad. GCN4 forms additional contacts with phosphates in the DNA backbone using eight side chains in each monomer. Again, although most of these are symmetric, one side chain in each monomer forms unique interactions in each half-site.
(a)
Papilloma virus E2: a novel [g-barrel DNA-binding motif The transcriptional activator E2 from bovine papilloma virus represents an entirely new fold for DNA-binding proteins, which had been neither predicted nor pre viously observed in any other class of proteins. The X-ray structure of the dimeric DNA-binding domain of E2 (residues 326-410) bound to DNA has been determined at a resolution of 1.7g. [6"°], revealing the conformation of this transcription factor, which is conserved among the papillomaviruses, and its unique mode of sequence-specific DNA binding. Like the great major ity of sequence-specific DNA-binding proteins that have been structurally characterized, E2 uses residues in an (xhelix to contact DNA bases in the major groove, but it presents those helices to the DNA using a novel protein scaffold.
(c)
Metal-
binding
domain
Linker
3'
5'
~ / / ~ ~ ~ ' ~ ~ Lys18(N~) ~E ~ 4 ~ ' ~~_~__~.~_~ '~I~ ~ , ~ ~ ~ -
"~L,4~ ~-Arg46(NHI)~ ~ ~ ~3 L.Lys43(N{) F Arg 51 (
N H 1 ) ~ 51'(NE) LArg51'(N)
Dimerizatien Arg Element
Lys17(0) Lys18(0) / Lys2O(N~) bi Metalnding Cys21(N) domain Lys23(N~) Arg15(NHI, -- Gln9(NE2)
The E2 DNA-binding domain forms an eight-stranded anti-parallel [8-barrel composed of two identical 'half barrels' which are related by dyad symmetry (Fig. 3a). In addition to the four [8-strands per half-barrel, a long (x helix connects strands 1 and 2 on the Outside of the barrel, and a short Qt-helix followed by a [8-strand makes crossover connection between [8-strands 3 and 5. Intersubunit [8-sheet interactions seal the edges of the barrel, and hydrogen bonds from side chains in strands 2 and 5 in the respective subunits, and hydrophobic interactions inside the dimer interface, stabilize the interaction between monomers. These interactions across the large solvent-excluded dimer interface afford the dimer great stability, which can only be disrupted under denaturing conditions. The longest (x-helix in each m o n o m e r serves as the recognition helix, formJng symmetric contacts in each DNA half-site. These contacts between protein and DNA are Fig. 1. The GAL4(1-65)-DNA complex. (a) Structure of the GAL4-DNA complex. The view is approximately along the twofold axis of the complex. Amino acid numbers indicate the borders of the three subdomains in one monomer: the DNA recognition module (8-40), linker region (41-49) and dimerization element (5(~64). (b) A view perpendicular to the twofold axis, showing the dimerization element and one complete protein monomer in relation to the DNA. (c) Schematic diagram showing all phosphate and base contacts with the protein on one side of the twofold-related complex. Parts (b) and (c) reprinted with permission [3"°].
5
6
Protein-nucleic acid interactions
(al
(b)
°°°°°*°*°'/ °*°°
Fig. 2. The GCN4-DNA complex. (a) Structure of the GCN~DNA complex, showing a smoothed Ca backbone only for the protein. The view is perpendicular to the complex dyad. (b) Diagram of contacts formed in each half-site between residues in each GCN4 monomer and the DNA. Dashed lines indicate hydrogen bonds with bases; dotted lines show hydrogen bonds witth phosphate groups. Lines of open circles indicate van der Waals interactions. Part (b) reprinted with permission [5°°].
achieved by the DNA adopting a smooth bend with a radius of curvature of 45 ~-, thereby wrapping around the [3-barrel and allowing the helices to penetrate the major grooves (Fig. 3a). This is in contrast with the DNA bending seen in complexes with the catabolite activator protein [23] and Trp repressor [24], where localized bends in the DNA occur in the region where cz-helices pene trate the major groove. The relatively even distribution of bending of the DNA in the E2 complex gives rise to compression of the major and minor grooves on the concave side of the DNA, facing the protein. As in other protein-DNA complexes, the interaction with the DNA is a combination of contacts with the DNA bases and with the sugar-phosphate backbone (Fig. 3b). Spe cific base contacts are formed with five base pairs in each half-site of the palindromic sequence. Four side chains mediate direct contacts to base pairs 4-6, and bases at positions 3 and 7 of the E2 binding site are contacted by water-mediated hydrogen bonds with side chains. One of the base contacts involves Cys340, which hydrogen bonds with the 0 6 of G5, and is the first observed occurrence of cysteine participating in specific hydrogen-bond interactions with DNA. In addition to base contacts, there are extensive interactions with the DNA backbone, both flanking the major groove in which the recognition helices lie and along the center of the operator, where there are no contacts with the DNA bases. 13-Strand 2 of each subunit runs antiparaHel to the sugar-phosphate backbone in the center of the operator, forming six protein phosphate hydrogen bonds per half-complex. Additional phosphate contacts flanking the base contacts are formed largely by side chains within the recognition helix. A total of 10 side chains participate in contacts with phosphates in the DNA backbone, forming 10 direct contacts and 14 water-mediated hydrogen bonds.
The TFIID TATA-box binding protein Initiation of mRNA transcription in eukaryotes is a complex process that requires the concerted action of a number of auxiliary proteins in addition to RNA polymerase. One factor required for initiation of transcription for all three classes of polymerase (pol I, pol II and pollII) is the TFIID TATA-box-binding protein (TBP). This has been the subject of intensive study in recent years and has been shown to be a site-specific DNAbinding protein that, in the case of pol II transcription, is required for assembly of the pre-initiation complex mad interacts directly with a series of proteins called TATAassociated factors, as well as certain transcriptional coactivators (reviewed in [25,26] ). Sequences of TBP from 10 organisms are now known and all contain a highly conserved 180-amino acid carboxy-terminal region. The first atomic structure of a TBP, that of the 200-amino-acid TBP-2 from A thaliana, has now been determined from crystals diffracting to 2.6A resolution [7°°]. TBP is a saddle-shaped molecule consisting of two 88residue structural domains related by an intramolecular pseudo-dyad (Fig. 4). Each domain is composed of a five-stranded antiparallel [3-sheet and two cz-helices, arranged in the order Sl-H1-S2-S3-S4-S5-H2, joined by a seven-residue linker connecting the carboxyl terminus of H2 with the amino terminus of SI' in the second domain. The two domains are held together by [3-sheet interactions between strand 1 (Sl) in each domain, giving rise to a 10-stranded [3-sheet. This curves to form a concave surface, whereas the long helices H~ and H2' lie on the opposite face of the molecule, parallel to the long axis of the protein, and helices HI and H I ' lie perpendicular to H2 at either end. The structure of TBP was determined in the absence of its DNA-binding site. Yet information from mutagenesis studies aimed at identifying residues involved in
Transcription factor structure and DNA binding Wolberger (a) (a)
N
H
~
~
~
H2
S
H1, C
$4"~ ! $3
HI $2 ~
$2'
(b)
HI
Fig. 4. Ribbon diagram of the structure of the TFIID TBP. (a) View perpendicular to the intramolecular dyad. Helices and ~-strands are labeled according to their numbering S1-H1-S2-S3-S4-S5-H2 in one domain and Sl'-H1'-S2'-S3'-S4'-S5'-H2' in the second domain, which is related to the first by a pseudodyad. (b) View looking down the intramolecular dyad. Reprinted with permission [7"].
axis without requiring a conformational change of either molecule. As TFIID binds in the minor groove of DNA [27.,28.], the residues shown to be important for specificity must have access to the TATA sequence in the minor groove. Residues important for TBP interactions with other transcription factors are located on the opposite face of the molecule, and on either end - - in both cases, not on the concave putative DNA-binding surface.
Fig. 3. Papilloma virus E2 bound to DNA. (a) Structure of the E2 DNA complex. The view is down the axis of the protein J3-barrel, perpendicular to the complex dyad. The protein is depicted as a ribbon drawing. (b) Schematic diagram of contacted base pairs and the side chains that interact with them. Water molecules are depicted as droplets labeled 'W'. Part (b) reprinted with permission [6"].
DNA binding and protein-protein interactions permit the identification of the different interaction surfaces. Amino acid substitutions affecting overall DNA-binding affinity are distributed across the concave side of the saddle, whereas substitutions affecting DNA-binding specificity are located on the same surface but are localized in one of the domains. Modeling studies have shown that the concave surface of TBP can easily accommodate B-DNA traversing at right angles to the protein's long
Hopefully a structure of a TBP-DNA complex will soon be determined, thereby revealing precisely how this protein binds DNA and which portions of the molecule remain accessible to the TATA-associated factors in the pre-initiation complex. In this regard, it is interesting to note that, although TBP contains intramolecular twofold symmetry, the exposed side chains contain no such symmetry. This would permit TBP, but not other DNA-binding proteins that are true symmetric dimers, to bind DNA with a certain directionality and to nucleate the assembly of an asymmetric preinitiation complex, eventually permitting the initiation of mRNA transcription in the correct direction.
The yeast MAT0c2 homeodomain Homeodomains are 60-amino-acid DNA-binding domains consisting of three a-helices and an amino-terminal arm that are structurally related to the bacterial helix-turn-he
7
8
Protein-nucleic acid interactions
lix proteins (for reviews, see [29-31,32"] ). Previous studies of DNA bound to two homeodomains, the Drosophila engrailed [33] and dntennapedia [34] proteins, showed that the homeodomain binds DNA as a monomer by inserting the long third ~z-helix into the major groove of the DNA and the amino-terminal arm into the adjacent minor groove. The 2.7 g- crystal structure determination of the yeast MAT(x2 homeodomain protein bound to DNA [4"] provides insights into how all members of this class recognize specific DNA sequences. The contacts formed by the MAT cz2 homeodomain with its binding site are shown schematically in Fig. 5. A total of four side chains form contacts with the bases: three from helix 3 contact bases in the major groove, and one from the amino-terminal arm contacts two bases in the minor groove. Eight additional side chains form contacts with the DNA backbone. A comparison of this complex with that of the engrailed homeodomain-DNA complex [33] revealed that the two homeodomains (which are 27% identical in amino acid sequence) are highly similar in structure and present their recognition helices (helix 3) to their binding sites in a nearly identical manner [4°].
5'
The conserved orientation of the recognition helix on the DNA site is evidently maintained by a series of DNA contacts formed by residues that are identical in the two proteins. Indeed, seven out of eight of the side chains contacting the DNA backbone in the MAT(x2 complex are identical in engrailed and form the same contacts. In addition, one of the base-contacting residues, Asn51, is also conserved in both proteins. With the exception of the latter, all other base-contacting residues are different in the two complexes. As the two homeodomains recognize different binding sites, the sequence specificity of binding must derive almost entirely from the different major- and minor-groove contacts. Other homeodomains will presumably be similarly positioned on their sites, as nearly all of the contacting residues that are identical in MAT(x2 and engrailed are also conserved in other homeodomains. The base-contacting Asn51 residue and phosphate-contacting Trp48 and Arg53 are invariant in all known homeodomains. Five other backbone-contacting residues are highly conserved as well. These contacts must serve to anchor the homeodomain on the DNA in a position that allows helix 3 side chains 47, 50, 51 and 54 to project into the major groove and form base contacts, along with additional minor-groove contacts provided by residues in the amino-terminal arm. It is important to note here that all of these possible contacting residues must be considered of potential importance in determining the specificity of homeodomain-DNA interactions. A series of solution binding studies of mutant homeodomain proteins (reviewed in [31]) focused a great deal of attention on the role of residue 50 and did not explore contributions from residue 54. These studies were carried out before the structure of the homeodomain was known and relied on assumptions - - not all correct - about the similarity between homeodomains and bacterial helix-turn-helix proteins.
Conclusions
5'
Fig, 5. Schematic diagram of contacts formed by the MAT0c2 homeodomain with its operator site. The DNA is represented as a cylindrical projection, and phosphates are represented as circles. Contacting residues that are identical in both ~,2 and engrailed are encircled in solid lines; these residues are highly conserved in other homeodomains, as well. The three residues that are invariant among all homeodomain are indicated by shaded boxes. Non-conserved residues are encircled in dashed lines. Dashed arrows indicate contacts between residues and the DNA bases and backbone.
Each of the free transcription factors described herein uses an entirely different structural motif to present a set of side chains for interactions with the DNA bases and sugar-phosphate backbone. Both GCN4 and E2 use (z-helices that track the major groove in a manner seen in the homeodomains, but present those helices using very different protein scaffolds. GAL4 uses functional groups at the end of an a-helix to contact bases, in a manner reminiscent of the zinc finger protein zif268 [35], whereas it appears likely that TBP will be shown to use Ig-sheet residues to contact the DNA. Future structural studies of other classes of transcription factors will no doubt continue to reveal yet other protein folds that are used to contact DNA. Perhaps as notable as the structural diversity among these proteins are the differences in conformational flexibility. The GCN4 DNA-binding helix becomes ordered only upon DNA binding, whereas GAL4 contacts the DNA with a stable, globular domain, yet formation of the coiled-coil
Transcription factor structure and DNA binding Wo/berger 9 dimerization region in this particular protein fragment is triggered by DNA binding. E2 is an unusually stable protein dimer that is unlikely to undergo any conformational change, yet induces a pronounced bend in its binding site. It is possible that both the conformational and structural diversities of transcription factors represent not only different ways of recognizing a DNA sequence but also are the result of other functional and regulatory aspects of these proteins. Although physical studies of transcription factors have, of necessity, focused largely on DNA binding, locating a target site in the genome is just the first step in activating or repressing mRNA transcription.
Acknowledgements I thank S Burley, S Harrison, P Sigler and their colleagues for sending manuscripts in advance of publication, and T Ellenberger, R Hegde and R Marmorstein for pnwiding figures for this review.
sociate by forming a coiled coil essentially identical to that observed in previous structural work on the isolated leucine zipper.
6,
HEGDERS, GROSSMAN SR, LA1MINS LA, SIGLER PB: The 1.7A Structure of the Bovine Papillomavirus-1 E2 DNA-binding Domain Bound to its DNA Target. Nature 1992, 359:505-512. A high-resolution crystal structure determination reveals that papillomavirus E2 forms a novel eight-stranded antiparallel [3-barrel, which is formed by the association of two monomers, each of which contributes four strands. The protein in the crystal is bound to DNA with a smooth and quite pronounced bend which allows the a-helix on the outside of each monomer to penetrate the major groove. **
7. ,,*
NIKLOVDB, HU S-H, LINJP, GASCHA, HOFFMANNNA, HORIKOSH1 M, CHUA NH, ROEDER RG, BURLEY SK: Crystal Structure of the TFIID TATA-box Binding Protein: a Central Transcription Initiation Factor. Nature 1992, 360:40-46. The TFIID TBP is shown to be a monomer that contains two very similar domains that are related by an intramolecular dyad. The protein is saddle-shaped, with a lO-stranded [B-sheet lining the concave surface and two pairs of ~x-helices located at each end, on roughly the opposite face.
8.
'Zinc Finger' but Forms a Zn(lI)2Cys6 Binuclear Cluster. Proc Natl Acad Sci USA 1990, 87:2077-2081. 9.
PO~&'YJF, D1AKUNGP, GARNERCD, WILSONSP, LAUEED: Metal Ion Coordination in the DNA Binding Domain of the Yeast Transcriptional Activator GAL4. FEBS Lett 1990, 266:142-146.
10.
KAPTEINR: Zinc-finger Structures. Curr Opin Struct Bio11992, 2:109-115.
References and recommended reading Papers of particular interest, published within the annual period of review, have been highlighted as: ° of special interest •• of outstanding interest 1. tiARRISONSC: A Structural Taxonomy of DNA-binding Do. mains. Nature 1991, 353:715-719. A review comparing the different structural classes of DNA-binding do mains.
2.
PABO CO, SAUERRT: Transcription Factors: Structural Farnilies and Principles of DNA Recognition. A n n u Rev Biochem 1992, 61:1053-1095. The most recent comprehensive review covering transcription factor structure and details of how the different motifs recognize different binding sites. •
3. ••
MARMORSTE1NR, CAREY M, PTASHNE M, tiARRISON SC: DNA Recognition by GAL4: Structure of a Protein-DNA Complex. Nature 1992, 356:408-414. The authors report the 2.7A. structure of the GAL4 DNA binding domain (residues 1~55) bound to DNA. The protein in the structure is a dimer, each monomer consisting of a metalbinding domain joined by a linker to an c~helix, which serves as a dimerization element by forming a coiled coil with the other monomer. The metal-binding domains insert into the major grooves on opposite faces of the DNA, and additional contacts with the backbone are mediated by residues in the linker and in the coiled-coil dimerization element. 4. •
WOLBERGERC, VERStION aK, LIU B, JOHNSON aD, PABO CO: Crystal Structure of a MAT ~x2 Homeodomain-Operator Complex Suggests a General Model for Homeodomain-DNA Interactions. Cell 1991, 67:517 528. The crystal structure of the yeast MAT~x2homeodomain bound to DNA, determined at 2.8~ resolution, shows the contacts ff)rmed with the DNA. Comparison of this with other homeodomains reveals that a large set of conserved phosphate contacts and one base contact an chor the homeodomain on its site, whereas sequence specificity arises from differences in contacts with the bases. 5. °°
ELLENBERGERTE, BRANDL CJ, STRUHI. K, HARRISON SC: The GCN4 Basic-region Leucine Zipper Binds DNA as a Dirner of Uninterrupted a-Helices: Crystal Structure of the Protein-DNA Complex. Cell 1992, 71:1223-1237. This study reveals that the bZlP of GCN4 fomls a rather remarkable dimer of two 52 residue ~x-helices which grasp the DNA like a pair of chopsticlcs, contacting bases in the major groove. The two helices as
PAN T, COLEMANJE: GAL4 Transcription Factor Is Not a
KRAUHSPJ, RAINE ARC, GADHAW PL, LAUE ED: Structure of the DNA-binding Domain of Zinc GAL4. Nature 1992, 356:448-450. An NMR structure determination of the GAL4 metal-binding domain (residues 7-49) shows that it contains a Zn2Cys6 cluster at the core and two helix-turn-strand motifs that are related by an intramolecular pseudodyad. 11. •
12. •
BALEJA JD, MARMOIZSTEINR, HARRISONSC, WAGNER G: Solution
13.
VASAKM, WORGOTLER E, WAGNER G, KAGI JH, W{)THRICH K: Metal Co-ordination in Rat Liver Metallothienein-2 Prepared with or without Reconstitution of the Metal Cluster, and Comparison with Rabbit Liver Metallothienein-2. J Mol Biol 1987, 196:711-719.
14.
ROBBINSA-H, MCREE DE, WILLIAMSONM, COLLET SA, XtlONG NH, FUREYWF, WANG BC, STOUT CD: Refined Crystal Structure of Cd, Zn Metallothienein at 2.0A Resolution. J Mol Biol 1991, 221:1269-1293.
15.
CLARKEND, BEAMERLJ, GOLDBERG HR, BERKOWERC, PABO CO: The DNA Binding Arm of ~v Repressor: Critical Contacts from a Flexible Region. Science 1991, 254:267-270.
16.
PATHAKD, SIGLERPB: Updating Structure-Function Relationships in the bZlP Family of Transcription Factors. Curr Opin Struct Biol 1992, 2:116-123.
17.
LANDSCHULZWH, JOHNSON PF, MCKNIGHT St: The Leucine Zipper: a Hypothetical Structure Common to a New Class of DNA-binding Proteins. Science 1988, 240:1759-1764.
18.
O'SHEAEK, KLIEMMJD, KtM PS, ALBER T: X-ray Structure of the GCN4 Leucine Zipper, a Two-stranded, Parallel Coiled Coil. Science 1991, 254:539-544.
19.
WEISS MR, ELI.ENBERGER WE, WOBBE CR, LEE JP, HARRISON SC, STRUHI.K: Folding Transitions in the DNA-binding Do-
Structure of the DNA-binding Domain of the Cd2-GAL4 from S. cerevisiae. Nature 1992, 356:450453. The fragment of GAL4 used in this NMR study is the same GAL4(l~55) domain used in the crystallographic study listed above [11°]. The authors arrive at the structure of the metal-binding domain and show that the coiled-coil dimerization domain seen in the crystal structure is disordered in the absence of DNA binding.
10
Protein-nucleic acid interactions main of GCN4 on Specific Binding to DNA. Nature 1990, 347:575-578. 20.
21.
O'NEILKT, SHUMANJD, AMPE C, DEGRADO WF: DNA-induced Increase in the cx-Helical C o n t e n t of C/EBP and GCN4. Biochemistry 1991, 30:9030-9034. O'NEILKT, HOESS RH, DEGRADOX~T: Design of DNA Binding Peptides Based on t h e Leucine Zipper Motif. Science 1990, 249:774-778.
A novel approach featuring the replacement of thymines and adenines in the TATA box with cytosines and inosines is used to demonstrate that TFIID binds in the minor groove of DNA. 29.
HARRISONSC, AGGARWALAK: DNA Recognition by Proteins with the Helix-turn-helix Motif. A n n u Rev Biochem 1990, 59:933-969.
30.
BRENNANRG: Interactions of the Helix-turn-helix Binding Domain. Curr Opin Struct Biol 1991, 1:80-88.
31.
BRENNANRG: DNA Recognition by the Helix-turn-helix Motif. Curt Opin Struct Biol 1992, 2:100-108.
22.
VINSONCR, SIGLER PB, MCKNIGHT SL: Scissors-grip Model for DNA Recogntition by a Family of Leucine Zipper Proteins. Science 1989, 246:911-916.
23.
SCHULTZSC, SHIELDS GC, STEITZ TA: Crystal Structure of a CAP-DNA Complex: the DNA Is Bent by 90 °. Science 1991, 253:1001-1007.
24.
OTWINOWSKI Z, SCHEVITZ RW, ZHANG R-G, LAWSON CL, JOACHIM1AKA, MARMORSTEINRQ, LUISl BF, SIGLER PB: Crystal Structure of trp Repressor/Operator Complex at Atomic Resolution. Nature 1988, 335:321 329.
33.
KISSINGERCR, LIU B, MARTIN-B1ANCEE, KORNBERG TB, PABO CO: Crystal Structure of an Engrailed H o m e o d o m a i n - D N A C o m p l e x at 2.8 A Resolution: a Framework for Understanding H o m e o d o m a i n - D N A Interactions. Cell 1990, 63:579-590.
25.
ROEDERRG: The Complexities of Eukaryotic Transcription Initiation: Regulation of Preinitiation Complex Assembly. Trends Biochem Sci 1991, 16:402~408.
34.
26.
GREENBLAYrJ: Roles of TEIID in Transcriptional Initiation by RNA Polymerase II. Cell 1991, 66:1067-1070.
OWING G, Q1AN YQ, BILLETER M, MUELLER M, AFFOLTER M, GEHRING W'J, WUTRICH K: Protein-DNA Contacts in t h e Structure of a H o m e o d o m a i n - D N A C o m p l e x Determined by Nuclear Magnetic Resonance Spectroscopy in Solution. EMBO J 1990, 9:3085-3092.
35.
PAVLETICHNP, PABO CO: Zinc Einger-DNA Recognition: Crystal Structure of a Zif268-DNA C o m p l e x at 2.1 A. Science 1991, 252:809-817.
27. •
LEE DK, HORIKOSHI M, ROEDER RG: Interaction of TEIID in the Minor Groove of the TATA Element. Cell 1991, 67:1242 1250. Chemical modification of DNA is used in binding studies to show that TFIID contacts the DNA in the minor groove. 28. •
STARRDB, HA',XCEyDK: TFIID Binds in the Minor Groove of the TATA Box. Cell 1991, 67:1231-1240.
32.
LAUGHONA: DNA Binding Specificity of Homeodomains. BiD chemistry 1991, 30:11357-11367. The most recent comprehensive review of structural, biochemical and genetic data on homeodomain-DNA interactions. •
C Wolberger, Department of Biophysics and Biophysical Chemistry, Johns Hopkins Universi.t.tySchool of Medicine, 725 North Wolfe Street, Baltimore, Maryland 21205, USA.