Engrailed homeodomain-DNA complex at 2.2 å resolution: a detailed view of the interface and comparison with other engrailed structures1

Engrailed homeodomain-DNA complex at 2.2 å resolution: a detailed view of the interface and comparison with other engrailed structures1

Article No. mb982147 J. Mol. Biol. (1998) 284, 351±361 Ê Engrailed Homeodomain-DNA Complex at 2.2 A Resolution: A Detailed View of the Interface and...

1MB Sizes 0 Downloads 49 Views

Article No. mb982147

J. Mol. Biol. (1998) 284, 351±361

Ê Engrailed Homeodomain-DNA Complex at 2.2 A Resolution: A Detailed View of the Interface and Comparison with Other Engrailed Structures Ernest Fraenkel1, Mark A. Rould1,2, Kristen A. Chambers1,2 and Carl O. Pabo1,2* 1

Department of Biology Massachusetts Institute of Technology, Cambridge, MA 02139, USA 2

Howard Hughes Medical Institute, Massachusetts Institute of Technology Cambridge, MA 02139, USA

Ê resolution structure of the Drosophila engrailed We report the 2.2 A Ê resolhomeodomain bound to its optimal DNA site. The original 2.8 A ution structure of this complex provided the ®rst detailed three-dimensional view of how homeodomains recognize DNA, and has served as the basis for biochemical studies, structural studies and molecular modeling. Our re®ned structure con®rms the principal conclusions of the original structure, but provides important new details about the recognition interface. Biochemical and NMR studies of other homeodomains had led to the notion that Gln50 was an especially important determinant of speci®city. However, our re®ned structure shows that this side-chain makes no direct hydrogen bonds to the DNA. The structure does reveal an extensive network of ordered water molecules which mediate contacts to several bases and phosphates (including contacts from Gln50), and our model provides a basis for detailed comparison with the structure of an engrailed Q50K altered-speci®city variant. Comparing our structure with the crystal structure of the free protein con®rms that the N and C termini of the homeodomain become ordered upon DNA-binding. However, we also ®nd that several key DNA contact residues in the recognition helix have the same conformation in the free and bound protein, and that several water molecules also are ``preorganized'' to contact the DNA. Our structure helps provide a more complete basis for the detailed analysis of homeodomain-DNA interactions. # 1998 Academic Press

*Corresponding author

Keywords: X-ray crystal structure; engrailed homeodomain; DNA-binding; speci®city; folding

Introduction The homeodomain is one of the most important DNA-binding motifs in eukaryotes, and has proPresent Addresses: E. Fraenkel, Department of Molecular and Cellular Biology, Harvard University 7 Divinity Avenue, Cambridge, MA 02138, USA; K. A. Chambers, Norris Cotton Cancer Center, Biostatistics & Epidemiology Research Computing, Dartmouth Medical School, Lebanon, NH 03766, USA; M. A. Rould, Department of Molecular Physiology and Biophysics, University of Vermont College of Medicine, Burlington, VT 05405, USA. Abbreviations used: rms, root mean square; MIR, multiple isomorphous replacement; CD, circular dichroism. E-mail address of the corresponding author: [email protected] 0022±2836/98/470351±11 $30.00/0

vided a model system for studying protein-DNA Ê resolution structure of the interactions. The 2.8 A engrailed homeodomain-DNA complex (Kissinger et al., 1990) was the ®rst crystal structure to reveal how this motif recognizes DNA. Since then, structures of a number of homeodomain-DNA complexes have been determined by crystallography and NMR (reviewed by Gehring et al., 1994; Kornberg, 1993; Laughon, 1991; Wolberger, 1996). These studies have demonstrated that the overall fold and DNA-docking arrangement of the homeodomain are well conserved. Most homeodomains bind to very similar DNA sites that contain a TAAT sequence, but typically have differential DNA-binding speci®city for the two base-pairs following the conserved core binding site (TAATNN). Experiments with bicoid, engrailed, fushi-tarazu and paired have shown that # 1998 Academic Press

352 residue 50 of the homeodomain can play a key role in this differential speci®city. Many of the sidechains at the protein-DNA interface are well conserved within the homeodomain family, but position 50 is more variable, and mutations at this position can dramatically alter the preference for sequences that lie 30 to the TAAT core. In fact, experiments with the bicoid and fushi-tarazu homeodomains demonstrated that exchanging glutamine and lysine at position 50 is suf®cient to alter promoter speci®city in Drosophila embryos (Hanes et al., 1994; Schier & Gehring, 1992). The role of lysine at position 50 seems quite clear, since the recent structure of an engrailed Q50K variant bound to DNA shows that Lys50 hydrogen bonds to a pair of guanine residues at the 30 side of the TAAT core binding site (Tucker-Kellogg et al., 1997). This result clearly explains the preference of Q50K engrailed for TAATCC sequences, and provides a model for how other homeodomains with lysine at position 50 recognize DNA. However, the role of glutamine Ê resolat position 50 was less clear, since the 2.8 A ution structure of wild-type engrailed did not reveal any especially favorable contacts between Gln50 and the DNA. In order to understand the basis for the speci®city of wild-type engrailed, we have collected higher resolution X-ray data, and Ê resolution. At this resre®ned the structure to 2.2 A olution we can describe the protein-DNA interactions in much greater detail. Our high resolution structure allows unambiguous assignment of the role of Gln50, allows a check on the contacts previously proposed for Asn51, and reveals a number of water molecules at the protein-DNA interface. The re®ned structure of the wild-type engrailed complex also provides an opportunity to compare the high-resolution structure of the protein bound to DNA with that of the free protein. Several studies have suggested that homeodomains undergo signi®cant changes in secondary structure when bound to DNA. NMR studies of the Antennapedia homeodomain have shown that the N-terminal arm of the homeodomain and the C-terminal end of the recognition helix become better ordered in the presence of DNA (Qian et al., 1993a), and NMR analysis of the NK-2 homeodomain indicated that the C terminus of the recognition helix extends by eight amino acids when bound to its site (Tsao et al., 1994). Clarke et al. (1994) also noted changes at the N and C termini of the homeodomain when Ê resolution structure of free comparing their 2.1 A Ê resolution structure of the engrailed with the 2.8 A engrailed-DNA complex. Our current high-resolution structure of the complex allows us to build on these observations and proceed with a more detailed analysis of the changes that occur on DNA-binding.

Results We grew crystals of wild-type engrailed bound to the same 20 base-pair oligonucleotide used in

Ê Resolution Engrailed Homeodomain-DNA Complex at 2.2 A

Figure 1. Overview of the engrailed homeodomainDNA complex showing the DNA in blue and the protein backbone in red with a helices represented by cylinders. Residue 5 is the ®rst amino acid we could reliably model; other numbers indicate helix termini.

the original studies of Kissinger et al. (1990) but under slightly different crystallization conditions (see Materials and Methods). These crystals difÊ resolution, and are nearly isomorfracted to 2.2 A Ê resolution phous to those used in the 2.8 A structure determination. Our data were collected at room temperature to avoid any artifacts in the hydration shell that might be associated with ¯ash cooling. Beginning with the model of Kissinger et al. (1990) we re®ned the structure against the new data using (2jFoj ÿ jFcj) acalc maps. The root mean square (rms) difference between the two Ê for the a carbon atoms, and in a models is 0.48 A Ê for the C10 atoms. As separate alignment, is 0.34 A before, the crystals contain two proteins bound to a 20 base-pair DNA duplex with single base overhangs at each end. One homeodomain binds to an optimal TAATTA site in the duplex, while the other binds to a suboptimal AAATTA sequence which is created by end-to-end stacking of two DNA duplexes within the crystal. Since binding to the suboptimal sequence (even when presented in the center of a single DNA duplex in binding studies) is two orders of magnitude weaker than binding to the optimal site (Kissinger et al., 1990), and since two phosphate groups are missing in the site created by the DNA stacking, we restrict our comments to the complex with the optimal TAATTA site. The engrailed homeodomain folds into a globular structure (Figure 1) consisting of an extended N-terminal arm and three a helices. The N-terminal arm and the third a helix, make bases contacts; almost all parts of the molecule (except helix 1)

Ê Resolution Engrailed Homeodomain-DNA Complex at 2.2 A

participate in contacts with the sugar-phosphate backbone. Several ordered water molecules also are present at the protein-DNA interface and help mediate side-chain/base and side-chain/phosphate interactions. DNA base contacts Helix 3, also known as the recognition helix, sits in the major groove, and base contacts are made

353 by Ile47, Gln50 and Asn51 of this helix (Figure 2). Although these residues were excluded from our starting model to avoid bias during the re®nement, conformations in the ®nal re®ned model are very similar to those in the original structure. Asn51, which is conserved in almost all homeodomains, forms direct hydrogen bonds to the bases in the major groove. It accepts a hydrogen bond from the N6 position of A3 (TAATTA) and donates a hydrogen bond to the N7 of this adenine. Gln50 and

Figure 2. A, Diagram showing major groove contacts made by wild-type and Q50K engrailed (Tucker-Kellogg et al., 1997). The DNA is represented as a cylindrical projection with phosphates shown as circles; phosphates contacted by the protein are shaded. Contacts from the protein backbone to the DNA are indicated by an oval around the name of the residue. Water molecules in the structure of wild-type engrailed that were also observed in the structure of the Ê for free protein are enclosed in boxes. Superimposing the free and bound proteins gives an rms distance. of 0.55 A these six water molecules. Those water molecules which surround Ala54 are shaded gray. B, Stereo view of the protein-DNA interface in the wild-type engrailed-DNA complex. DNA is shown in blue with the protein in red. Water molecules are indicated by light blue spheres and hydrogen bonds by broken lines.

354 Ile47 contribute van der Waals contacts to the Ê from Cg2 bases: Ile47 contacts A3 (TAATTA; 3.6 A Ê from to C8) and also contacts T4 (TAATTA; 4.0 A g2 C to C5 methyl); Gln50 contacts T6 (TAATTA; Ê resolution Ê from Cd to C5 methyl). The 2.8 A 3.6 A structure had suggested a contact for Arg3, but the current electron density maps do not allow us to see the protein backbone preceding residue ®ve. In our model, Arg5 contacts the O2 of T1, but there appears to be motion in this region, and we note that the density for the Arg5 side-chain is stronger for the guanidinium group than for the aliphatic carbons. In addition to direct contacts to the bases, there are water-mediated contacts which may contribute to speci®city. Although we do not know their energetic signi®cance, we are very con®dent of the water positions, since density for these water molecules is present in simulated annealing omit maps. Gln50 forms water-mediated contacts with T4, T5 and G7 (TAATTAC). The water-mediated contact to G7 is shared with Lys46, and the contact to T4 is shared with Asn51. Asn51 also contributes a hydrogen bond to a water which contacts A4. The latter water molecule is part of a network of seven water molecules that form an extended network with T3, A4, A5, phosphate groups 6 and 7, as well as with the backbone of Gln50. The Cb of Ala54 seems to play a key role in organizing this network. Five of these water molecules form a hydrogen-bonded ring around Ala54 with a mean Ê between the waters and the Cb distance of 3.9 A (Figure 3). Thus, Ala54 may contribute to recognition via its effect on water structure, even though the methyl group is aliphatic and the side-chain is too short to directly contact the DNA from this position. DNA backbone contacts Engrailed also makes extensive contacts to the DNA backbone, with numerous contacts along each side of the major groove, and these inter-

Ê Resolution Engrailed Homeodomain-DNA Complex at 2.2 A

actions must contribute signi®cantly to recognition. Thr6, Tyr25, Arg31 and Arg53 contact the phosphate groups directly, while the backbone of residues 44, 46, 48 and 50, as well as the side-chains of Lys46, Trp48 and Arg53, form water-mediated phosphate contacts. In addition, three lysine residues at the end of the recognition helix (positions 55, 57 and 58) may contact the DNA backbone, but the density is too weak to assign these contacts with con®dence. Further studies will be needed to determine precisely how the phosphate contacts affect binding, but it seems almost certain that they will increase the overall binding constant and help orient the homeodomain. It is also conceivable that they have some role in indirect readout of the sequence. A comparative study of these contacts should also be helpful in understanding the subtle variations in docking arrangements of different homeodomain-DNA complexes. The role of Gln50 Although several studies of other homeodomains had suggested that position 50 plays a crucial role in determining the DNA-binding Ê resolspeci®city of the homeodomains, the 2.8 A ution structure of the engrailed complex revealed only one van der Waals contact, and no direct hydrogen bonds, between Gln50 and the DNA. Our re®ned structure con®rms this observation: Gln50 makes a van der Waals contact to T6 and water-mediated contacts to base-pairs 4, 5 and 7. It is important to note that the biochemical data of Ades & Sauer (1994) show that mutation of Gln50 to Ala only reduces the DNA af®nity by approximately twofold. In this regard, there are some striking contrasts between wild-type engrailed and a previously studied variant that has lysine at position 50: the structure of the engrailed Q50K variant revealed direct hydrogen bonds to base-pairs 4, 5 and 6 (TAATCC), and these contribute signi®cantly to both af®nity and speci®city. Q50K engrailed shows an order of magnitude higher af®-

Figure 3. Stereo view showing interactions of the recognition helix in the major groove of engrailed. The backbone of residues 47 to 54 from the recognition helix is shown in red, and base-pairs 3 to 7 are shown in blue. Side-chains of Ile47, Gln50, Asn51 and Ala54 are yellow, with water molecules shown in light blue. Hydrogen bonds are indicated by golden spheres.

Ê Resolution Engrailed Homeodomain-DNA Complex at 2.2 A

355

Figure 4. Difference Fourier map between the wild-type and Q50K complex crystals, using phases free of any model bias. The coef®cients Ê resolution map are for this 2.2 A differences in observed amplitudes (FobsQ50 K ÿ Fobswild-type), with solvent¯attened MIR phases from the Q50K-DNA complex (TuckerKellogg et al., 1997). The wild-type model is in yellow, the mutant model in blue. The map is contoured at 3.3 sigma (green) and ÿ3.3 sigma (red); i.e. green contour cages indicate electron density present in the mutant complex that is absent in the wild-type, and red indicates density present in the wild-type complex that is absent in the mutant. This direct, purely experimental map clearly reveals the differences between the wild-type and mutant crystals in identities for base-pairs 5 and 6, in side-chain identity and location at amino acid 50, and in the interfacial water network. There are only very minor changes apparent in other regions of the map.

nity for its optimal site (TAATCC) than wild-type engrailed shows for its best site (TAATTA), and Q50K binds the TAATCC site almost 400-fold better than Q50A. The limited role of Gln50 in homeodomain speci®city is also supported by experiments with bicoid in vivo: both Gln50 and Ala50 bicoid variants bind speci®cally to TAATTA, while wild-type bicoid with lysine at position 50 is speci®c for TAATCC (Hanes et al., 1994). The ability, in some cases, to switch homeodomain speci®city with only a single amino acid mutation raises interesting questions about how such mutations are accommodated in the complex. The structures of wild-type engrailed bound to its high af®nity TAATTA site and of the Q50K variant bound to its optimal TAATCC site are extremely similar. Superimposing the Ca atoms of residues 5 to 59 and the C10 atoms of an eight base-pair segment containing the TAAT binding site gives a rms Ê between corresponding atoms. distance of 0.36 A We also have used difference Fourier maps to directly compare these structures: examination of a difference Fourier map using observed wild-type and Q50K diffraction amplitudes with experimentally determined solvent-¯attened MIR phases shows that almost all the most signi®cant differences are directly at the sites where side-chains or bases were changed (Figure 4). Detailed comparisons of the re®ned structures show minor differences in the ¯exible N-terminal arm, the end of the recognition helix, the loop between helices 2 and 3, and several phosphates around the mutated basepairs (Figure 5). Overall, it is striking to see that the docking of these two homeodomains to the DNA is essentially unchanged, and that neighboring conserved side-chains, such as Ile47 and Asn51, adopt the same conformations in both structures and make essentially identical contacts (Figure 2). We also note that almost all of the direct and water-mediated contacts to the phosphate

backbone are conserved. The only differences in phosphate contacts involve the water-mediated contacts to the phosphate of G7 and the contact from Gln44 to the phosphate of A3 (the latter is only seen in the Q50K complex). Finally, we note that much of the solvent structure at the proteinDNA interface is conserved. The ring of water molecules around Ala54 is visible in both structures, making conserved contacts to the N6 and N7 of A4 and the N7 of the purine in base-pair 5 (adenine in the wild-type structure and guanine in the Q50K structure). Related water structures have been observed in other homeodomain-DNA complexes (see Discussion).

Figure 5. Comparison of the protein conformation in our structure with that of Q50K engrailed ( ÐÐ ) or the free protein ( Ð Ð Ð Ð ). The plot shows the distance between corresponding Ca atoms after alignment of the proteins. The protein secondary structure is indicated at the bottom of the Figure.

356

Ê Resolution Engrailed Homeodomain-DNA Complex at 2.2 A

Figure 6. A, Stereo view of the recognition helix of engrailed in the presence of DNA (blue) and absence of DNA (orange) with the side-chains of Ile47, Gln50 and Asn51 shown. A water molecule is visible, hydrogen-bonded to both Gln50 and Asn51. B, Stereo view of the recognition helix of engrailed in the presence of DNA showing the van der Waals surfaces for Ile47, Gln50, Asn51 and a water molecule. The view is the same as in A.

Comparison with the free protein Ê resolution crystal strucComparison of the 2.1 A ture of engrailed in the absence of DNA (Clarke Ê resolution strucet al., 1994) with the original 2.8 A ture of the complex showed that the overall protein structures are quite similar. However, Clarke et al. (1994) noted that the N-terminal arm adopts a different conformation in these two structures. They also noted that the C-terminal end of the recognition helix is poorly ordered in the crystal structure of the free protein, which has weak density for residues 53 to 56 and no density for the last Ê resolfour amino acid residues. The current 2.2 A ution structure of the engrailed homeodomainDNA complex provides us with an opportunity to re-examine, at higher resolution, the changes which accompany DNA binding. Overall, the structures of the free and bound protein are very similar, and superimposing the Ca atoms of resiÊ . Residues 8 to 56 gives a rms distance of 0.35 A dues 53 to 56, which are poorly ordered in the free protein, are clearly visible in the structure of the complex, and we see density for residues 57 to 59. Despite the fact that many solvent-accessible side-chains adopt different conformations in the two structures and that the C-terminal end of the recognition helix does not complete folding until it binds the DNA, we ®nd that the conformations of several key side-chains in the recognition helix (Ile47, Gln50 and Asn51) are well conserved in both structures (Figure 6). (The only ambiguity involves a possible rotation of the terminal sidechain atoms of Gln50 by 180 about w3. In the

bound protein a hydrogen bond from Lys46 to the Oe of Gln50 ®xes the conformation of this residue; this interaction does not occur in the structure of free engrailed, and these rotamers cannot be distinguished crystallographically.) In addition to the conserved conformations of these side-chains, water molecules occupy several of the same positions in the presence and absence of DNA. In one of these positions, a conserved water is hydrogen bonded to Gln50 and Asn51, and this water contacts T4 in the complex (Figure 5). We note that a water is also observed in this position in the crystal structures of Antennapedia (Fraenkel & Pabo, 1998), and the paired homeodomain (Wilson et al., 1995). There are also ®ve positions where water molecules seen in the structure of free engrailed have conserved locations in the complex and bridge phosphate contacts (Figure 2). Experimental evidence for coupled folding and binding To obtain independent evidence for the conformational changes which accompany DNA binding, we measured the CD spectra of engrailed in the presence and absence of DNA (Figure 7). An oligonucleotide with a wild-type engrailed site causes an increase in the depth of the CD minima at 205 and 222 nm, consistent with extension of helix 3, while a non-speci®c oligonucleotide had no signi®cant effect on the spectra (Figure 7B). Very similar results were obtained in our CD studies with the Antennapedia homeodomain (Figure 7C and D). NMR experiments with Antennapedia have

Ê Resolution Engrailed Homeodomain-DNA Complex at 2.2 A

357

Figure 7. Circular dichroism spectra of engrailed (A and B) and Antennapedia (C and D), showing the spectra of the free protein (*), the free DNA (}), and the complex (&). Difference spectra () are calculated by subtracting the spectrum of the free DNA from that of the complex. A, spectra of engrailed and a speci®c oligonucleotide; B, spectra of engrailed and a nonspeci®c oligonucleotide; C, spectra of Antennapedia and a speci®c oligonucleotide; and D, spectra of Antennapedia and a non-speci®c oligonucleotide. For both engrailed and Antennapedia, addition of speci®c DNA (A and C) causes the difference spectra () to diverge from the protein spectra (*), consistent with a change in helix content. In the presence of non-speci®c DNA (B and D) the difference spectra () show negligible changes from the protein spectra (*).

demonstrated that the exchange rates for the backbone amide protons of residues 53 to 55 slow signi®cantly upon DNA binding and that the d methylene protons of Lys57, which are degenerate in the free protein, are resolved in the proteinDNA complex (Qian et al., 1993a). These NMR results indicate that residues 53 to 57 become more ordered in the presence of DNA. The similar behavior of engrailed and Antennapedia in our CD experiments supports our interpretation that the observed CD changes in engrailed are due to stabilization of the C terminus of the recognition helix and that differences seen in the crystal structures of free and bound engrailed accurately re¯ect changes that occur in solution.

Discussion Ê resolution crystal structure of The 2.2 A engrailed bound to DNA con®rms the main obserÊ resolution structure. Key vations made in the 2.8 A base-speci®c contacts are provided by Ile47, Gln50 and Asn51 in the major groove, and by Arg5 in the minor groove. As in all other homeodomain crystal structures, the highly conserved Asn51 makes a pair of hydrogen bonds to A3 (TAAT; Wolberger, 1996). In addition, Ile47 and Gln50 make van der Waals contacts with the bases. Numerous phosphate contacts help stabilize the complex and help de®ne the characteristic docking arrangement of the homeodomain. The role of water molecules In this high resolution structure, we can see an extensive network of water molecules at the pro-

tein-DNA interface. Water molecules mediate contacts from the protein to the DNA bases and phosphates. Five of the water molecules at the interface form a clathrate around the Cb of Ala54, and this set of waters is further stabilized by hydrogen bonds to the protein and the DNA. At this stage, it is not known how these watermediated contacts contribute to speci®city and af®nity of the homeodomain. Elegant studies of the trp-repressor have provided evidence that watermediated contacts can be highly speci®c (Joachimiak et al., 1994). However, for engrailed, the evidence suggests that at least two of the water-mediated contacts, involving base-pairs 5 and 7, are not signi®cant determinants of speci®city. The water-mediated contact from Gln50 to T5 seems to contribute little to speci®city as both wild-type and Q50A engrailed prefer thymine at this position. Selection experiments with engrailed also demonstrate very little speci®city at position 7 (Ades & Sauer, 1994). Although the functional signi®cance of the water structure is not yet clear, it is interesting to compare the arrangement of waters seen near base-pair 4 in various homeodomain-DNA complexes. Wilson et al. (1995) noted that the water-mediated contact to T4 observed in the paired homeodomain may augment the speci®city provided by the van der Waals interaction of Ile47. NMR studies of Antennapedia also had noted a slowly exchanging water in the vicinity of Ile47 (Qian et al., 1993b). In the paired homeodomain, as in our re®ned engrailed structure, all three polar atoms in the major groove at base-pair 4 hydrogen bond to water. In both structures, the water which contacts the O4 of T4 is stabilized by hydrogen bonds to

358 Gln50 and Asn51, while the pair of water molecules contacting the N6 and N7 positions of A4 is part of the clathrate around the Cb of Ala54. In the engrailed Q50K variant, the water-mediated contact to the O4 of thymine is lost, but the pair of water molecules contacting A4 still is present. The Antennapedia (Fraenkel & Pabo, 1998) and evenskipped (Hirsch & Aggarwal, 1995) homeodomains, which have Met at position 54, do not have waters contacting A4, but still have the watermediated contact to thymine. The role of variable positions of the recognition helix Despite the signi®cant change in speci®city that results from a Gln50 to Lys mutation, comparing these structures shows that there are relatively few changes at the protein-DNA interface. Direct examination of electron density difference maps as well as comparison of the re®ned structures for wildtype and Q50K engrailed lead to the same conclusion: the complexes are virtually identical except for the immediate sites of mutations in the protein and DNA (one side-chain and two base-pairs are changed). The structural conservation of other regions includes not only key amino acid residues which contact the DNA, but even conserved water molecules at the protein-DNA interface. At this stage, we do not know whether other mutations could be accommodated as easily as Q50K or whether this variant binds so well precisely because it requires few other changes in the protein-DNA contacts. In thinking about recognition, it is interesting to compare the range of sequence variation seen at position 50 of the homeodomain with the degree of sequence variation at other key positions in the recognition helix. It appears that the structural framework provided by the homeodomain allows for a limited repertoire of mutations that can alter speci®city for ¯anking DNA without adversely affecting interactions with the core of the binding site. Mutations of Asn51 are highly disruptive, and no other side-chain at this position can provide the same degree of speci®city and af®nity as asparagine (Pomerantz & Sharp, 1994). Position 50, in contrast, can tolerate a number of side-chains. Some amino acid residues, such as Lys, form stable highly speci®c interactions with the DNA bases, while others such as Gln do not. (The Q50K variant of engrailed still is the only structure that provides clear evidence for stable hydrogen bonds between the side-chain of residue 50 and the bases.) Position 54 is also on the face of the recognition helix which points toward the DNA. Engrailed has Ala at position 54, and we ®nd several water molecules that form a clathrate around the methyl group and interact with polar atoms on the protein and DNA. Other amino acid residues at this position have more direct contacts to the DNA. Arg54 in a2 (Wolberger et al., 1991) and Gln54 in Pit-1 (Jacobson et al., 1997) form hydrogen bonds to

Ê Resolution Engrailed Homeodomain-DNA Complex at 2.2 A

base-pair 4; Met54 makes van der Waals contacts to the bases in A1 (Li et al., 1995) and even-skipped (Hirsch & Aggarwal, 1995), and contacts the sugarphosphate backbone in a1 and Antennapedia (Fraenkel & Pabo, 1998). Tyr54 makes van der Waals contacts to base-pair 4 in the NMR structure of the vnd/NK-2 homeodomain (Gruschus et al., 1997). In the NK class of homeodomains, Tyr54 is a primary determinant of the preference for guanine at position 4 (Gruschus et al., 1997), and mutating position 54 of engrailed or Antennapedia to tyrosine alters the speci®city of these homeodomains at base-pair 4 (Damante et al., 1996). Key residues in the recognition helix are preorganized for binding Comparing the crystal structures of the free and bound engrailed homeodomain not only shows changes at the N and C termini (discussed below), but also shows that key side-chains of the recognition helix (Ile47, Gln50 and Asn51) are preorganized for binding: they have virtually the same conformation in the free and bound structures. Even the water molecule which hydrogen bonds to Gln50, Asn51 and T4 in the complex is visible in the absence of DNA. Figure 6A compares these regions of the two structures and shows the conserved waters, while Figure 6B suggests how these conformations are stabilized: the three side-chains are clearly close together and have some van der Waals contact. In addition, the helical conformation of the backbone puts severe constraints on Ile47. Both Cg1 and Cd1 make van der Waals conÊ tact with the carbonyl oxygen of residue 43 (3.3 A Ê , respectively). In fact, the w1 rotamer of and 3.5 A Ile47 found in the free protein is the same as that of 90% of all isoleucine residues with similar backbone conformation in a database of high resolution structures (Dunbrack & Cohen, 1997). The combination of mutual packing constraints, preferred rotamer conformations for isoleucine in an a helix, and the formation of hydrogen bonds from both Gln50 and Asn51 to a water molecule precon®gure these residues for interaction with DNA. Comparisons with other crystal structures show that closely related side-chain conformations occur in the Antennapedia-DNA and paired-DNA complexes. Flexibility in the homeodomain Comparing the crystal structures of the free and bound engrailed homeodomain shows that the C-terminal end of the recognition helix becomes better ordered on binding DNA (Clarke et al., 1994), and CD experiments provide further evidence for a DNA-induced structural change. Examination of the crystal structure readily explains this effect, revealing a set of DNA contacts that stabilize the C-terminal end of the recognition helix, thus allowing the helix to extend as the protein binds DNA. Speci®cally, lysine residues at positions 55, 57 and 58 form hydrogen bonds and/or

Ê Resolution Engrailed Homeodomain-DNA Complex at 2.2 A

electrostatic interactions with the DNA phosphate groups. Although the density is poor for these side-chains, in our model Lys55 hydrogen bonds to the phosphate of T1, Lys57 hydrogen bonds to the phosphate of A5, and Lys58 hydrogen bonds to a network of ordered water molecules at the proteinDNA interface. Electrostatic interactions clearly favor extension of the helix, since folding brings all three lysine residues near the phosphate backbone. The role of these residues is also supported by sequence comparisons which show that these positions are strongly conserved as lysine or arginine. In a comparison of 346 homeodomains (BuÈrglin, 1994) position 55 is 93% lysine and 7% arginine, position 57 is 84% lysine and 15% arginine, and position 58 is 62% lysine and 33% arginine. We note that the ¯exibility of the homeodomain termini may play an important role in facilitating protein-protein contacts that contribute to speci®city, and crystal structures of higher-order complexes have revealed how these regions can form protein-protein interfaces. Thus, the paired homeodomain uses its N-terminal arm to form a homodimeric interface; in the structure of a Pit-1 POU domain dimer (Jacobson et al., 1997), the C-terminal end of the POU-homeodomain recognition helix is one turn shorter than in other homeodomains and the non-helical residues make extensive contacts with the POU-speci®c domain of the dimeric partner. Flexible regions just outside of the canonical 60 amino acid homeodomain also play an important role in protein-protein interactions. A ¯anking C-terminal region of the a2 homeodomain interacts with an a1 homeodomain bound to a neighboring site (Li et al., 1995), and sequences immediately prior to the N-terminal arm of a2 fold into a b sheet when forming a higher-order complex with MCM1 (Tan & Richmond, 1998).

Conclusions The re®ned structure of the wild-type engrailed homeodomain bound to DNA provides important new information about the homeodomain-DNA interface, and allows a detailed comparison with the structures of the free protein and the Q50K variant. Comparing the structure of the protein in the presence and absence of DNA (Clarke et al., 1994) shows that the amino and carboxy termini of the homeodomain rearrange on binding. However, our high-resolution comparison also shows that key side-chains at the interface (Ile47, Gln50 and Asn51) are pre-oriented for recognition, consistent with the conserved conformation of these residues in the structures of engrailed, Antennapedia and paired. Comparison of wild-type and Q50K engrailed protein-DNA complexes shows that the new residue at position 50 and the new base-pairs at positions 5 and 6 are accommodated with only very minor changes at other regions in the structure. Lys50 plays a critical role in recognition, but our structure con®rms that the contacts of

359 engrailed Gln50 are much more modest. These contacts seem much less favorable than the Asn51-adenine contact, and mutagenesis con®rms that Gln50 is less important for recognition (Ades & Sauer, 1994). Our high-resolution structure of the engrailed-DNA complex should provide a foundation for computational efforts to understand the energetics of homeodomain-DNA interactions and should prove useful for systematic comparisons with other homeodomain-DNA complexes intended to extract the common structural principles that characterize this family of proteins.

Materials and Methods Crystallization Protein and DNA were puri®ed essentially as described (Ades & Sauer, 1994; Klemm et al., 1994). Crystals were obtained by the hanging drop vapor diffusion method at room temperature and grew in space group Ê , b ˆ 45.5 A Ê, C2 with unit cell dimensions a ˆ 130.6 A Ê and b ˆ 118.3 . To prepare crystals, engrailed c ˆ 72.9 A (17.4 mg/ml in 30 mM bis-Tris-propane (pH 7.0)) was mixed with 1.2-fold molar excess of DNA in ammonium acetate for a ®nal concentration of 10 mM protein and 1 M ammonium acetate. Two microliters of complex were suspended above a well containing 1 ml of 1% (v/v) PEG400 and 0.25 M ammonium acetate. After the crystal had grown to full-size (approximately 0.3 mm  0.3 mm  0.45 mm) the coverslip was removed and suspended over 1 ml of 0.05 M ammonium acetate and 1% PEG400 to allow most of the remaining ammonium acetate to diffuse out of the crystal. Data collection and refinement All data were collected at room temperature using an R-AXIS IIC imaging plate system. Image plate data were reduced with Denzo and Scalepack (Otwinowski & Minor, 1996) with a step in between to pre-merge the partials (M. Rould). The starting model for re®nement Ê resolution structure of engrailed (PDB was the 2.8 A accession code 1HDD; Kissinger et al., 1990) with positions 3, 5, 47, 50 and 51 initially modeled as alanine to reduce model bias at these key residues. The model was improved with multiple rounds of rebuilding into 2jFoj ÿ jFcj maps using the program O (Jones et al., 1991), followed by either simulated annealing or energy minimization in XPLOR (BruÈnger, 1992). The free R factor was monitored throughout the re®nement to avoid over®tting the data (BruÈnger, 1992b). The model was checked by using simulated annealing omit maps (starting at 2500 K) and PROCHECK (Laskowski et al., 1993). The ®nal model includes residues 5 to 59 of monomer 1 (which binds to the optimal site), residues 3 to 58 of monomer 2 (which binds to the suboptimal site), and all of the DNA. Re®nement used a bulk solvent correction, overall anisotropic B (tensor elements B11 ˆ 2.440, Ê 2); local B22 ˆ 17.934, B33 ˆ 1.983, B13 ˆ ÿ5.300 in A scaling of observed and calculated structure factors to correct for absorption effects (Rould, 1997); and restrained individual B-factors. Using all the data from Ê , the ®nal free R is 23.6% and the crystallo20.0 to 2.2 A graphic R is 21.2%. (Table 1). Coordinates have been deposited with the Brookhaven Data Bank (accession code 3HDD).

Ê Resolution Engrailed Homeodomain-DNA Complex at 2.2 A

360 Table 1. Data collection and re®nement statistics A. Data collection Ê) Resolution (A Measured reflections Unique reflections Ê (%) Completeness to 2.2 A Ê shell (%) Completeness in 2.30-2.20 A Rmergea (%) B. Refinement Rcrystb (%) Rfreec (%) Ê) rms deviation of bond lengths (A rms deviation of bond angles ( ) Number of water molecules Number of non-hydrogen atoms (including water molecules) Ê 2) rms B valuesd (A

20± 2.2 47,760 18,176 94 70 4.2 21.2 23.6 0.008 1.210 53 1864 3.5

a Rmerge ˆ jI ÿ hIij/I, where I ˆ observed intensity, hIi ˆ average intensity of multiple observations of symmetryrelated re¯ections. b Ê. The crystallographic R factor uses all data from 20.0±2.2 A c Rfree is calculated using 1802 re¯ections which were set aside for cross-validation prior to re®nement. d rms B is the root mean square difference between temperature factors of covalently bonded atoms.

CD experiments Antennapedia protein was prepared as described (Fraenkel & Pabo, 1998). Oligonucleotides for three 14 base-pair blunt-ended duplexes were synthesized and gel puri®ed on Tris-borate-EDTA containing gels with urea. The sequences of the top strand of these duplexes are shown below: Antennapedia oligonucleotide, 50 -CTCTAATGGCTTTC-30 ; engrailed oligonucleotide; 50 -CTCTAATTACTTTC-30 ; non-speci®c oligonucleotide, 50 -CTCTGCTAGCTGTC-30 . Circular dichroism spectra were measured on an AVIV 60DS spectropolarimeter at 37 C in 1 nm steps, for 15 seconds per step. All spectra were measured twice; the results were averaged and corrected for baseline noise. Protein, DNA and complex samples were prepared at 25 mM in 50 mM potassium phosphate (pH 7.0), 100 mM potassium chloride.

Acknowledgements This work was supported by National Institutes of Health grant GM-31471 (C.O.P.) and by the Howard Hughes Medical Institute. Crystallographic data were collected with equipment purchased with support from the PEW Charitable Trusts. We thank Timothy Benson and Joel Pomerantz for helpful discussions, and Ehud Gazit for assistance in CD experiments.

References Ades, S. E. & Sauer, R. T. (1994). Differential DNA-binding speci®city of the engrailed homeodomain: the role of residue 50. Biochemistry, 33, 9187± 9194. BruÈnger, A. T. (1992a). XPLOR Manual Version 3.1, Yale University Press, New Haven, CT.

BruÈnger, A. T. (1992b). The free R value: a novel statistical quantity for assessing the accuracy of crystal structures. Nature, 355, 472± 474. BuÈrglin, T. R. (1994). A comprehensive classi®cation of homeobox genes. In Guidebook to the Homeobox Genes (Duboule, D., ed.), Oxford University Press, Oxford, England. Clarke, N. D., Kissinger, C. R., Desjarlais, J., Gilliland, G. L. & Pabo, C. O. (1994). Structural studies of the engrailed homeodomain. Protein Sci. 3, 1779± 1787. Damante, G., Pellizzari, L., Esposito, G., Fogolari, F., Viglino, P., Fabbro, D., Tell, G., Formisano, S. & Lauro, R. D. (1996). A molecular code dictates sequence-speci®c DNA recognition by homeodomains. EMBO J. 15, 4992± 5000. Dunbrack, R. L. & Cohen, F. E. (1997). Bayesian statistical analysis of protein side-chain rotamer preferences. Protein Sci. 6, 1661± 1681. Fraenkel, E. & Pabo, C. O. (1998). Comparison of the X-ray and NMR structures for the Antennapedia homeodomain-DNA complex. Nature Struct. Biol. 5, 692± 697. Gehring, W. J., Affolter, M. & BuÈrglin, T. (1994). Homeodomain proteins. Annu. Rev. Biochem. 63, 487± 526. Gruschus, J. M., Tsao, D. H. H., Wang, L.-H., Nirenberg, M. & Ferretti, J. A. (1997). Interactions of the vnd/ NK-2 homeodomain with DNA by nuclear magnetic resonance spectroscopy: basis of binding speci®city. Biochemistry, 36, 5372± 5380. Hanes, S. D., Riddihough, G., Ish-Horowicz, D. & Brent, R. (1994). Speci®c DNA recognition and intersite spacing are critical for action of the bicoid morphogen. Mol. Cell. Biol. 14, 3364± 3375. Hirsch, J. A. & Aggarwal, A. K. (1995). Structure of the even-skipped homeodomain complexed to AT-rich DNA: new perspectives on homeodomain speci®city. EMBO J. 14, 6280± 6291. Jacobson, E. M., Li, P., Leon-del-Rio, A., Rosenfeld, M. G. & Aggarwal, A. K. (1997). Structure of Pit-1 POU domain bound to DNA as a dimer: unexpected arrangement and ¯exibility. Genes Dev. 11, 198±212. Joachimiak, A., Haran, T. E. & Sigler, P. B. (1994). Mutagenesis supports water mediated recognition in the trp repressor-operator system. EMBO J. 13, 367± 372. Jones, T. A., Zou, J.-Y., Cowan, S. W. & Kjeldgaard, M. (1991). Improved methods of building protein models in electron density maps and the location of errors in these models. Acta Crystallog. sect. A, 47, 110± 119. Kissinger, C. R., Liu, B., Martin-Bianco, E., Kornberg, T. B. & Pabo, C. O. (1990). Crystal structure of an Ê resengrailed homeodomain-DNA complex at 2.8 A olution: a framework for understanding homeodomain-DNA interactions. Cell, 63, 579± 590. Klemm, J. D., Rould, M. A., Aurora, R., Herr, W. & Pabo, C. O. (1994). Crystal structure of the Oct-1 POU domain bound to an octamer site: DNA recognition with tethered DNA-binding modules. Cell, 77, 21 ± 32. Kornberg, T. B. (1993). Understanding the homeodomain. J. Biol. Chem. 268, 26813± 26816. Laskowski, R. A., MacArthur, M. W., Moss, D. S. & Thornton, J. M. (1993). PROCHECK: a program to check the stereochemical quality of protein structures. J. Appl. Crystallog. 26, 283± 291. Laughon, A. (1991). DNA binding speci®city of homeodomains. Biochemistry, 30, 11357± 11367. Li, T., Stark, M. R., Johnson, A. D. & Wolberger, C. (1995). Crystal structure of the MATa1/MAT alpha

Ê Resolution Engrailed Homeodomain-DNA Complex at 2.2 A 2 homeodomain heterodimer bound to DNA. Science, 270, 262± 269. Otwinowski, Z. & Minor, W. (1996). Processing of X-ray diffraction data collected in oscillation mode. In Methods in Enzymology (Carter, C. W. & Sweet, R. M., eds), vol. 276, pp. 307± 326, Academic Press, New York, NY. Pomerantz, J. L. & Sharp, P. A. (1994). Homeodomain determinants of major groove recognition. Biochemistry, 33, 10851± 10858. Qian, Y. Q., Otting, G., Billeter, M., Muller, M., Gehring, W. & Wuthrich, K. (1993a). Nuclear magnetic resonance spectroscopy of a DNA complex with the uniformly 13C-labeled Antennapedia homeodomain and structure determination of the DNA-bound homeodomain. J. Mol. Biol. 234, 1070 ±1083. Qian, Y. Q., Otting, G. & WuÈthrich, K. (1993b). NMR detection of hydration water in the intermolecular interface of a protein-DNA complex. J. Am. Chem. Soc. 115, 1189± 1190. Rould, M. A. (1997). Screening for heavy-atom derivatives and obtaining accurate isomorphous differences. Methods Enzymol. 276, 461± 472. Schier, A. F. & Gehring, W. J. (1992). Direct homeodomain-DNA interaction in the autoregulation of the fushi tarazu gene. Nature, 356, 804± 807.

361 Tan, S. & Richmond, T. J. (1998). Crystal structure of the yeast MATa2/MCM1/DNA ternary complex. Nature, 391, 660± 666. Tsao, D. H., Gruschus, J. M., Wang, L. H., Nirenberg, M. & Ferretti, J. A. (1994). Elongation of helix III of the NK-2 homeodomain upon binding to DNA: a secondary structure study by NMR. Biochemistry, 33, 15053± 15060. Tucker-Kellogg, L., Rould, M. A., Chambers, K. A., Ades, S. E., Sauer, R. T. & Pabo, C. O. (1997). Engrailed (Gln50 ! Lys) homeodomain-DNA comÊ resolution: structural basis for plex at 1.9 A enhanced af®nity and altered speci®city. Structure, 5, 1047± 1054. Wilson, D. S., Guenther, B., Desplan, C. & Kuriyan, J. (1995). High resolution crystal structure of a paired (Pax) class cooperative homeodomain dimer on DNA. Cell, 82, 709± 719. Wolberger, C. (1996). Homeodomain interactions. Curr. Opin. Struct. Biol. 6, 62 ± 68. Wolberger, C., Vershon, A. K., Liu, B., Johnson, A. D. & Pabo, C. O. (1991). Crystal structure of a MAT alpha2 homeodomain-operator complex suggests a general model for homeodomain-DNA interactions. Cell, 67, 517± 528.

Edited by T. Richmond (Received 30 April 1998; received in revised form 21 July 1998; accepted 29 July 1998)