doi:10.1016/j.jmb.2005.02.065
J. Mol. Biol. (2005) 348, 253–264
Solution Structure of the Major DNA-binding Domain of Arabidopsis thaliana Ethylene-insensitive3-like3 Kazuhiko Yamasaki1,2*, Takanori Kigawa2, Makoto Inoue2 Tomoko Yamasaki1, Takashi Yabuki2, Masaaki Aoki2, Eiko Seki2 Takayoshi Matsuda2, Yasuko Tomo2, Takaho Terada2,3 Mikako Shirouzu2,3, Akiko Tanaka2, Motoaki Seki4,5, Kazuo Shinozaki4,5,6 and Shigeyuki Yokoyama2,3,7 1 Age Dimension Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 1-1-1 Higashi, Tsukuba 305-8566 Japan 2
Protein Research Group RIKEN Genomic Sciences Center, 1-7-22 Suehiro-cho Yokohama 230-0045, Japan 3 RIKEN Harima Institute at SPring-8, 1-1-1 Kouto Mikazuki-cho, Sayo, Hyogo 679-5148, Japan
Ethylene-insensitive3 (EIN3) and EIN3-like (EIL) proteins are essential transcription factors in the ethylene signaling of higher plants. The EIN3/EIL proteins bind to the promoter regions of the downstream genes and regulate their expression. The location of the DNA-binding domain (DBD) in the primary structure was unclear, since the proteins show no sequence similarity to other known DBDs. Here, we identify the major DBD of an EIN3/EIL protein, Arabidopsis thaliana EIL3, containing a key mutational site for DNA binding and signaling (ein3-3 site), and determine its solution structure by NMR spectroscopy. The structure consists of five a-helices, possessing a novel fold dissimilar to known DBD structures. By a chemical-shift perturbation analysis, a region including the ein3-3 site is suggested to be involved in DNA binding. q 2005 Elsevier Ltd. All rights reserved.
4
Laboratory of Plant Molecular Biology, RIKEN Tsukuba Institute, 3-1-1 Koyadai Tsukuba 305-0074, Japan 5 Plant Functional Genomics Research Group, RIKEN Genomic Sciences Center 1-7-22 Suehiro-cho, Yokohama 230-0045, Japan 6 Institute of Biological Sciences University of Tsukuba, 1-1-1 Tennoudai, Tsukuba 305-8572 Japan 7
Department of Biophysics and Biochemistry, Graduate School of Science, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku Tokyo 113-0033, Japan
*Corresponding author
Keywords: Arabidopsis thaliana; ethylene signaling; transcription factor; DNA-binding domain; NMR; solution structure
Abbreviations used: AtEIL, Arabidopsis thaliana EIL; AtEIN3, Arabidopsis thaliana EIN3; DBD, DNA-binding domain; DSS, sodium 2,2-dimethyl-2-silapentane-5-sulfonate; EIL, EIN3-like; EIN3, Ethylene-insensitive 3; EMSA, electrophoretic mobility-shift assay; SPR, surface plasmon resonance; TEIL, tobacco EIN3-like. E-mail address of the corresponding author:
[email protected] 0022-2836/$ - see front matter q 2005 Elsevier Ltd. All rights reserved.
254
Introduction The gaseous phytohormone ethylene regulates various processes related to growth, development, and stress response of higher plants.1 A largely linear pathway of ethylene signaling was revealed by a number of molecular genetic studies.2 In the pathway, Ethylene-insensitive3 (EIN3) and EIN3like (EIL) proteins are the key transcription factors.3 The EIN3/EIL proteins were shown to regulate expression of the GCC-box-binding transcription factors, ethylene-response factors.4 It is known that the phenotype of the Arabidopsis thaliana EIN3 (AtEIN3)-deficient mutants, e.g. long seedlings in the presence of ethylene or its precursor, is suppressed by overexpression of Arabidopsis EIL1 (AtEIL1) or EIL2 (AtEIL2), indicating the functional equivalence of these proteins.3 The EIN3/EIL proteins, currently identified in many other plant species, are highly homologous to one another in their primary sequences of w600 amino acid residues, especially in their N-terminal half. Sequence-specific DNA-binding activities of AtEIN3, AtEIL1, AtEIL2, and tobacco EIN3-like (TEIL) proteins have been demonstrated.4,5 A series of deletion mutants were tested for DNA binding, and the N-terminal half except for the first w80 residues were found to be indispensable in the activity.4,5 A single mutation at Lys245 of AtEIN3 (ein3-3 mutational site) located in this region impairs the DNA-binding activity and, thereby, interrupts the ethylene-signaling pathway.3,4 Thus, the region around this site is likely to act as the major DNA-binding domain (DBD), although the boundary of the domain was unclear because the region shows no sequence similarity to other known DBDs.3 In the present study, a region of A. thaliana EIL3 (AtEIL3) of w130 amino acid residues containing the ein3-3 site was isolated as a soluble domain, and its three-dimensional structure was determined by NMR spectroscopy. The structure consists of five a-helices packing together into a globular shape as a whole, revealing a novel fold.
Results and Discussion Major DNA-binding domain Most DBDs are basic and bind to acidic DNAs. In the primary sequences of the EIN3/EIL proteins, five basic clusters, each containing from five to eight Arg or Lys residues, were identified.3 The positions in the AtEIL3 sequence are 42–55 (basic cluster I, BC I), 79–85 (BC II), 225–235 (BC III), 252–261 (BC IV), and 365–371 (BC V). Among these, BC I and BC V are not required for the DNA-binding activity, as shown by experiments with deletion mutants.4,5 In contrast, BC III includes the ein3-3 site, Lys232 (Lys245 of AtEIN3), where replacement by Asn impairs the DNA-binding and signal transduction activities.3 Also, deletion of BC IV abolished the
DNA-binding Domain of Arabidopsis EIN3-like 3
DNA-binding activity of AtEIN3.4 A sequence alignment of the EIN3/EIL proteins (Figure 1(a)) revealed a continuous stretch of conserved residues spanning w130 residues in length, which contains BC III and BC IV, as well as the proline-rich region.3 This continuous region possesses an amino acid identity of 50.4% in the sequence alignment shown in Figure 1(a), and is highly basic (isoelectric point of 9.9), possessing 16 conserved basic residues including ten in BC III and BC IV. Therefore, we consider this region as the candidate for the major DBD. In addition to this region, however, a sequence alignment detected another highly basic (isoelectric point of 10.0), but much shorter, stretch of the conserved residues including BC II, that is, Asp76–Glu149 of AtEIL3 (data not shown). The possibility that this region contributes to DNA binding cannot be excluded. The major DBD region was then tested for expression and solubility as an isolated domain, by a high-throughput cell-free system (Figure 1(b)).6–8 Considering ambiguity in the boundary of the domain, 12 protein fragments with different lengths were tested. Although all the fragments were expressed, the expression levels were significantly different. Especially, when the N terminus starts with Ala172 or more C-terminal, the expression level is reduced and the expressed proteins are rather insoluble. Among the fragments that were well expressed in a soluble form, fragment Ser162–Gln288 is the shortest. Therefore, in the present study, we used this fragment as the major DBD of AtEIL3 (EIL3-DBDM). This EIL3DBDM protein produced in a large-scale, cell-free expression system exists in a monomeric form, as shown by a gel-filtration experiment (data not shown). The DNA-binding activity of EIL3-DBDM was demonstrated by surface plasmon resonance (SPR) (Figure 2). It was shown that EIL3-DBDM binds to a double-stranded DNA containing the recognition consensus sequence for TEIL, AYGWAYCT, where Y stands for C or T and W stands for A or T,5 with an estimated equilibrium binding constant of 1.7!107 MK1. In contrast, EIL3-DBDM binds much more weakly to a DNA containing a mutated consensus sequence, ATTTATCT, or does not bind significantly to those of totally unrelated sequences (Figure 2). This indicates that EIL3-DBDM retains sequence-specific binding ability. It was reported that AtEIL3 did not show significant binding to a DNA containing the EIN3-binding site in an electrophoretic mobility-shift assay (EMSA), although the AtEIN3, AtEIL1, and AtEIL2 proteins showed stable binding,4 which appears to be contradictory to the present observations. It should be noted, however, that the EMSA experiment requires formation of a complex that is stable during the period of the electrophoresis, whereas the SPR experiment detected less stable binding, including that with a dissociation rate constant of w0.04 sK1 in the present case. Therefore, the results from the previous EMSA experiment4 and the
DNA-binding Domain of Arabidopsis EIN3-like 3
255
Figure 1. (a) Sequence alignment of the DNA-binding domains of the EIN3/EIL proteins produced by the Clustal X program.33 Sequences of the EIN3/EIL proteins from Arabidopsis thaliana (AtEIN3, AtEIL1, AtEIL2, and AtEIL3), Nicotiana tabacum (TEIL), Zea mays (ZmEIN3), and Phalaenopsis equestris (PeEIN3) were obtained from the NCBI database (http://www.ncbi.nlm.nih.gov). Entry codes are AAC49749 for AtEIN3, BAA74714 for TEIL, AAC49746 for AtEIL1, CAC32395 for ZmEIN3, CAC87091 for PeEIN3, AAC49748 for AtEIL3, and AAC49747 for AtEIL2. Numbers above the sequences are for AtEIL3, while those of the first residues of the aligned sequences of the individual proteins are indicated beside the sequences. Basic (Arg and Lys) and acidic (Asp and Glu) residues conserved in six or more of the seven proteins presented here are colored cyan and magenta, respectively, while aliphatic (Ile, Leu, Met, and Val) and aromatic (His, Phe, Trp, and Tyr) residues conserved in six or more are colored green. Colored boxes above the sequence alignment indicate regions of the helices of EIL3-DBDM. Below the sequence, identical and similar residues are marked as produced by the CLUSTAL X program.33 Characteristic regions defined by Chao et al. are indicated by horizontal bars.3 The arrow indicates the ein3-3 mutational site. (b) Expression and solubility of the different AtEIL3 fragments, as shown in an SDS/polyacrylamide gel. Terminal residues of the fragments are shown above the lanes. Total proteins after the cell-free expression were loaded in the lanes labeled T, while supernatants after the centrifugation were loaded in the lanes labeled S. When the bands in the T and S lanes have similar intensities (e.g. fragment 152–288), the protein fragment is mostly soluble. In contrast, when the band in the S lane is less intense than that in the T lane (e.g. fragment 172–288), the protein is rather insoluble.
present SPR experiment on the AtEIL3 protein are not inconsistent. The DNA sequence of the EIN3-binding site used in the EMSA experiment contains GGATT CAAGGGGCATGTATCTTGAATCC sequence (a
pair of palindromic sequences are underlined). A series of base substitution experiments revealed that the central region excluding much of the palindromic sequences is important in the binding of AtEIN3. 4 Supposing that the recognition
256
DNA-binding Domain of Arabidopsis EIN3-like 3
Figure 2. DNA binding of EIL3-DBDM observed by surface plasmon resonance (SPR). Left: SPR difference sensorgrams (the responses in the control flow-cell were subtracted) for binding of EIL3-DBDM to an 18mer doublestranded DNA, 5 0 -GGGCATGTATCTTGAATC-3 0 /5 0 -GATTCAAGATACATGCCC-3 0 (the consensus recognition sequence5 is underlined), where the protein concentrations are 5 nM, 10 nM, 20 nM, 50 nM, 100 nM, 200 nM, 500 nM, and 1 mM (indicated beside the sensorgram lines). The protein solutions were injected during a period of 0–300 seconds. Right: Maximum response values as functions of the protein concentration. The binding profiles of EIL3-DBDM to the double-stranded 18mer DNA (as above) (filled circles), another 18mer DNA containing a mutated consensus recognition sequence, 5 0 -GGGCATTTATCTTGAATC-3 0 /5 0 -GATTCAAGA TAAATGCCC-3 0 (the TEIL recognition sequence except for the mutated base is underlined) (filled squares), or to 16mer and 25mer DNAs with unrelated sequence, 5 0 CGATCACCTGAGGCTG-3 0 /5 0 -CAGCCTCAGGTGATCG-3 0 (open circles) and 5 0 -TCTTTAATTTCTAATATATTTAGAA-3 0 /5 0 -TTCTAAATATATTAGAAATTAAAGA-3 0 (open squares) are presented. Curves fit to the simple 1 : 1 binding model are shown.
sequence is shared by AtEIN3 and TEIL at least partly, the sequence ATGTATCT in the middle of the above sequence satisfies the TEIL recognition consensus of AYGWAYCT.5 It is reasonable that base substitutions in this region significantly reduce the DNA-binding ability except for that of the fourth base in this sequence, i.e. T-base, to A-base,4 which is neutral in the TEIL consensus. The other strand of the EIN3-binding site contains a sequence ATGCCCCT (AGGGGCAT in the above strand), which overlaps the ATGTATCT sequence partially and is similar to the TEIL consensus sequence. Since this region is important in the AtEIN3 binding, the combined AGGGGCATGTATCT pseudopalindromic sequence is likely to be recognized by the proteins. Consistently, the AtEIN3 protein was shown to form a dimer, even in the absence of DNA.4 In contrast, the isolated EIL3-DBDM protein used in the present study exists in a monomeric form. On the basis of these observations, we suggest a possibility that the dimerization is important in the stable binding to a pseudo-palindromic sequence, although it is still possible for a monomeric domain to bind to a single TEIL recognition sequence. To test this hypothesis, however, further analyses of sequence specificities of the EIN3/EIL proteins in the isolated domains and full-length forms are necessary. Although we have tried to analyze the structure of the equivalent domain of the AtEIN3 protein, for which relatively detailed biological information is available as described above, the NMR spectral quality of the isolated domain was much poorer than that of AtEIL3 (data not shown). This is
presumably because the AtEIL3 protein forms a homodimer using this region at least partly.4 Considering the high level of amino acid identity between these proteins (76.4% in residues 162–288 of AtEIL3), it is likely that their structures in this region are very similar to each other. Helical structure The experimental constraints and stereochemical properties of the NMR solution structure of EIL3DBDM are shown in Table 1. The secondary structure elements are five a-helices (a1 Asp171– Leu181, a2 Glu209–Leu214, a3 Lys232–His245, a4 Ile250–Arg258, and a5 Glu271–Ile286; in addition, short helical turns were identified for Pro199– Trp202 and Pro227–Asp229), as identified by the program PROCHECK-NMR (Figures 1 and 3).9 These helices pack against one another in a parallel or an antiparallel manner, forming a globular shape as a whole (Figure 3(b)–(d)). The packing of the helices was achieved by interactions between hydrophobic side-chains mostly of aliphatic or aromatic residues i.e. Thr173, Leu174, Leu177, Leu181, Trp210, Trp211, Ile239, Thr240, Ala241, Val242, Ile243, Ile253, Val257, Glu271, Trp275, and Leu279 (Figure 3(d)), as defined by average C–C ˚ in the structural distances of less than 4.5 A ensemble. In addition, an electrostatic interaction between Asp171 and Lys235 side-chains, with an ˚ , is also likely to average OK to HC distance of 4.6 A contribute to the packing of a-helices 1 and 3. All of these residues are highly conserved among the EIN3/EIL proteins (Figure 1(a)), indicating that the
257
DNA-binding Domain of Arabidopsis EIN3-like 3
Table 1. Structural statistics A. Structural constraintsa Sequential NOEsb 492 343 Medium-range NOEs (2%jiKjj%4)b 355 Long-range NOEs (jiKjjO4)b c Hydrogen bonds 80 d 90 Torsion (f) angles 35 Torsion (c1) anglese Total 1395 B. Characteristics Ensemble of 20 structuresf r.m.s.d. from constraints ˚) NOEs and hydrogen bonds (A 0.0036G0.0004 Torsion angles (deg.) 0.082G0.019 16.1G1.3 van der Waals energyg (kcal/mol) R.m.s. deviation from the ideal geometry ˚) Bond lengths (A 0.0011G0.0001 Bond angles (deg.) 0.312G0.004 Improper angles (deg.) 0.118G0.004 ˚) Average r.m.s.d. from the non-minimized mean structureh (A N, Ca, C 0.68G0.15 All heavy-atoms 1.10G0.13 Ramachandran plot Most-favored region (%) 80.4 Additionaly allowed region (%) 17.4 Generously allowed region (%) 1.8 Disallowed region (%) 0.5
Minimized mean structure 0.003 0.126 12.1 0.0009 0.302 0.110
79.6 18.5 1.9 0.0
a
Constraints correspond to residues Ser162–Gln288, except for Pro200 and Lys261, for which no proton resonance was assigned. ˚ , 3.5 A ˚ , 4.5 A ˚ , and 6.0 A ˚ , classified according to the NOE intensities. The upper limits of the NOE distance constraints were 2.8 A c ˚ between the hydrogen and acceptor oxygen atom, and 2.5–3.5 A ˚ between the Hydrogen bond distance constraints are 1.5–2.5 A donor nitrogen and acceptor oxygen atom. d Three categories, K120(G50)8, K60(G30)8, and K100(G70)8, corresponding to the 3JaN coupling values of O7.0 Hz, !6.0 Hz, and 6.0–7.0 Hz, respectively. e Three categories, 60(G40)8, 180(G40)8, and K60(G40)8 for stereospecifically assigned residues. f ˚ , no torsion angle violation larger than 38, and the lowest total energies The 20 structures with no distance violation larger than 0.3 A were selected from 50 initial structures. g Values calculated with the repulsive non-bonded energy function in the CNS program. h Values calculated with Val165–Val257 and Lys270–Gln281. The eliminated regions tend to show relatively low 1H-15N heteronuclear NOE values, indicating relatively great flexibility of the residues (see Supplementary Data). b
domains of these proteins share a common structural architecture. Proline-rich region The proline-rich region3 contains a-helix 2 and most parts of the adjacent loops, i.e. that between a-helices 1 and 2 (loop 12) and that between a-helices 2 and 3 (loop 23) (Figure 1(a)), and is located as it surrounds a-helix 3 (Figure 4). The region is also rich in Trp residues (Trp202, Trp203, Trp210, and Trp211). All of them are involved in hydrophobic interactions with residues in a-helix 3, which includes Trp234, forming a hydrophobic interaction network of Trp202–Trp203–Trp234– Trp211–Trp210. The side-chains of three of the Trp residues form hydrogen bonds to Pro backbones (Figure 4), which protected the imino protons against the hydrogen-deuterium exchange (data not shown). In addition, the side-chain of Tyr224 forms a hydrogen bond to a Pro backbone (Figure 4), so that the hydroxyl proton was clearly observed in the NMR spectra (data not shown). The above hydrophobic and hydrogen-bonding interactions are likely to stabilize the conformation of the proline-rich region. It is also characteristic that Gly237 in the middle of a-helix 3 allows close access of the bulky side-chain of Trp211 (and
Leu214) of a-helix 2 (Figure 4), which makes main-chain/side-chain hydrophobic interactions and contributes significantly to the packing of a-helices 2 and 3. The hydrophobic interactions between loop 12 and loop 23, and between loop 12 and a-helix 3 also stabilize the conformation. The residues relevant to the interactions are Leu194 and Pro204 in loop 12, Pro222, Pro223, and Tyr224 in loop 23, and Val238 and His245 in a-helix 3, other than the Trp residues listed above. Furthermore, the intrinsically restricted f-angles of Pro residues are likely to be important in entropically stabilizing the conformation. Consequently, the conformation in the proline-rich region is highly restricted, which is illustrated by comparison of Figure 3(a) and (b). Also, on the basis of the 1H-15N heteronuclear NOE data, flexibility in this region is comparable to that in most of the other parts of the domain (see Supplementary Data). All the interactions described above, together with the side-chain/side-chain interactions between a-helices 2 and 3 (Figure 3(d)), contribute to the packing of the proline-rich region to the main helical body, so as to form the globular structure as a whole. Proline-rich regions appearing in functional proteins are involved mostly in the protein–protein interactions, among which the best characterized
258
DNA-binding Domain of Arabidopsis EIN3-like 3
Figure 3. Solution structure of EIL3-DBDM. The ensemble of the selected structures (a) in stereo view, (b) ribbon diagram of the minimized mean structure, (c) the contact surface with the presentation of the electrostatic polarization (blue, positive; red, negative), and (d) summary of packing of a-helices are shown, where the orientation in (a) is the same as in the left panels of (b) and (c). The secondary structure units are indicated in (b), while the conserved basic residues colored cyan in Figure 1(a) are indicated in (c). Green circles in (c) indicate two basic patches, corresponding to the BC III and BC IV regions. The figures in (a)–(c) were produced by MOLSCRIPT34 and MOLMOL.35 In (d), large numbers in circles represent a-helices, while small numbers represent residues involved in hydrophobic interactions (black) or a salt-bridge (red) between the helices. Black and red lines indicate parallel and antiparallel packing, respectively.
DNA-binding Domain of Arabidopsis EIN3-like 3
259
Figure 4. Structure of proline-rich region in a stereo view.3 Backbone trace of a region including the proline-rich region (Pro188–Pro227; yellow except for a-helix 2 (Glu209–Leu214; red)) and following a-helix 3 (Lys232–His245; blue) is shown in ribbon representation. Pro residues in this region are shown in black lines, while Trp and Tyr residues are shown as magenta and green sticks, respectively. In addition, Gly237 in a-helix 3 is shown as orange sticks. Lines in cyan indicate hydrogen bonds between Trp or Tyr side-chains and Pro backbones.
are those interacting with the SH3 and WW domains.10 It is important to note that both of these domains contain conserved Trp and Tyr residues in the binding sites for the proline-rich peptides. In addition to hydrophobic interactions, intermolecular hydrogen bonds between the sidechains of these aromatic residues and the backbones of the proline-rich peptides were detected in several high-resolution structures of complexes.11–13 Since these features are similar to the intramolecular interactions in the EIL3-DBDM structure described above, it is likely that combination of hydrogen bonding and hydrophobic interactions involving aromatic side-chains are common mechanisms of contacting the proline-rich regions. The function of the proline-rich region of EIL3DBDM other than the structural maintenance is totally unknown, since no other example of a DBD containing the proline-rich region has been reported, to our knowledge. Considering that the homodimerization of AtEIN3 requires region including the proline-rich region,4 we suggest that this region might act as the dimerization interface, where some structural rearrangements, such as swapping of the intramolecular contacts with intermolecular ones, may occur. Basic clusters The conserved basic residues in BC III, i.e. Arg225, Lys226, Lys231, Lys232, and Lys235, are located on loop 23 or a-helix 3, while those in BC IV, i.e. Lys252, Lys254, Arg255, Arg258, and Lys261, are located on a-helix 4 or its C-terminal loop (loop 45). As well as these residues, Arg190, Lys191, and Lys196 located in loop 12, and Lys266 and Lys270 in loop 45 are conserved. On the surface of the EIL3DBDM structure, two positively charged patches, corresponding to BC III and BC IV, respectively, are observed (Figure 3(c), left panel). Since some
residues in these patches are located nearly on the opposite sides of the molecule (e.g. Lys231 and Arg255), it is unlikely that both of these areas contact DNA directly, unless large distortion or bending of the DNA occurs. An NMR titration experiment was carried out, in order to identify the protein–DNA interface (Figure 5(a)). By adding increasing amounts of the DNA, chemical-shift perturbations were observed in heteronuclear single quantum coherence (HSQC) spectra, as follows. The positions of some crosspeaks did not change at all, or shifted only slightly, so that the chemical-shift changes were easy to follow (e.g. Trp203, Trp211, Lys213, Leu216, Lys252, Ala273, and Leu285 in Figure 5(a)). For others, however, the differences were significant, so that the changes could not be followed easily, for which distances to the nearest unassigned cross-peaks of the complex were temporarily considered as the chemical-shift difference (e.g. Arg225, Leu230, and Ala241 in Figure 5(a)). Note that the above treatment may cause underestimation of the difference, but causes no overestimation. These chemical-shift changes were completed when the concentration ratio of DNA to protein reached w1.0, which suggested a 1 : 1 binding stoichiometry of the protein–DNA complex. After classifying the residues according to their chemical shift differences, it became apparent that a relatively limited region of the structure is affected largely by the binding of DNA (Figure 5(b)). This region consists of a-helices 1 and 3, and, in part, loops 12 and 23. It is important to note that this region includes the ein3-3 site and covers the BC III region. In contrast, most of the residues in a-helices 2, 4, and 5, including the BC IV residues, are not much affected. Therefore, it was suggested that the BC III region is more likely to be involved in DNA binding than the BC IV region. The result is consistent with the fact that three of the five basic
260
DNA-binding Domain of Arabidopsis EIN3-like 3
Figure 5. (a) An overlay of a selected region of 1H-15N HSQC spectra of EIL3-DBDM in the absence (black) or the presence of 0.25 (blue), 0.5 (cyan), 0.75 (green), 1.0 (magenta), 1.25 (brown), and 1.5 (red) times molar concentrations of the 16mer DNA, 5 0 -GGGCATGTATCTTGAATC-3 0 /5 0 -GATTCAAGATACATGCCC-3 0 (the consensus recognition sequence).5 The final spectrum shown in red was recorded at w3-times a larger number of scans than the others. Spectra were recorded at the proton frequency of 750 MHz. (b) The profile of chemical-shift perturbations indicated on the EIL3-DBDM structure. Amino acid residues with backbone chemical shifts affected by DNA binding are colored red (ðDd2H C Dd2N Þ1=2 R 100 Hz at the proton frequency of 750 MHz: residues 170, 172, 178-180, 183, 194, 196, 225, 228, 233, 235, 237, 239, 243, 259, 268) or yellow (ðDd2H C Dd2N Þ1=2 R 50 Hz: residues 174–177, 181, 182, 189–192, 197, 202, 205, 212, 224, 226, 229, 230, 232, 236, 238, 241, 246, 247, 249, 260, 264, 267, 278), while those affected only slightly (ðDd2H C Dd2N Þ1=2 ! 50 Hz: residues 162–169, 173, 184, 185, 198, 203, 206, 207, 210, 211, 213–216, 218–221, 231, 234, 240, 242, 244, 245, 250–258, 265, 266, 270, 272–277, 279–281, 284–288) are shown in blue. Residues for which no meaningful information is available, i.e. those with unassigned or unobserved backbone resonances in the free protein, or proline residues (residues 171, 186–188, 193, 195, 199–201, 204, 208, 209, 217, 222, 223, 227, 248, 261–263, 269, 271, 282, 283) are shown in white. The yellow arrows indicate the ein3-3 site, Lys232. The Figure was produced by the Insight II molecular display program (Accelrys).
residues in the BC IV region are not conserved in AtEIL2 (Figure 1(a)), although the protein still possesses the ability to bind DNA.4 To specify the DNA recognition interface of EIL3-DBDM, and to reveal the DNA-recognition mechanism, however, future experiments designed to determine the structure of the protein–DNA complex will be necessary. Similarity to other structures Searching the Protein Data Bank by the DALI program14 identified several structures partially similar to EIL3-DBDM. Among those possessing Z-scores of 3.0 or more, structures of DNA-binding
proteins are shown in Figure 6. Although all these structures possess regions aligned to a-helices 1, 4, and 5 of EIL3-DBDM, none of them has all of the regions equivalent to the five helices of EIL3-DBDM. This is true also for the other partially similar structures possessing Z-scores of 3.0 or more, which indicates that the EIL3-DBDM structure possesses a novel fold. The Cre recombinase N-terminal domain is known to bind tightly to DNA when it is activated for cleavage.15 The DNA-binding residues are located on two helices (indicated by arrows in Figure 6(d)) and on the loop between them. Although one of the two helices was aligned to a-helix 5 of EIL3-DBDM, the latter was suggested
DNA-binding Domain of Arabidopsis EIN3-like 3
261
Figure 6. Structural comparison of (a) EIL3-DBDM and (b) domains of DNA-binding proteins identified to be partially similar to EIL3DBDM by the DALI program:14 domain 2B of Bacillus stearothermophilus DExx box DNA helicase (1pjr; Z-score 3.3), (c) Vaccinia virus topoisomerase IB catalytic domain (1a41; Z-score 3.2), (d) N-terminal domain of bacteriophage P1 Cre recombinase (4crx; Z-score 3.0). (d) For Cre recombinase, DNA strands are shown in red loops. The structures are viewed from the same orientation as the EIL3-DBDM structure in (a), as aligned by the DALI program. The aligned residues are shown in the same colors used for the corresponding residues of EIL3-DBDM. The Figure was produced by MOLSCRIPT.34
not to be involved in DNA binding, in the present study. In addition, although the DExx box DNA helicase binds to DNA, domain 2B shown in Figure 6(b) is not involved in DNA binding.16 Also, Vaccinia virus topoisomerase IB possesses DBD separately from the catalytic domain shown in Figure 6(c).17 Therefore, it is unlikely that EIL3DBDM has a functional relationship with these domains of the DNA-binding proteins. These observations are consistent with the fact that the EIN3/EIL proteins possess no sequence similarity to other known DNA-binding proteins. Plant-specific transcription factors The plant-specific transcription factors are believed to be involved in plant-specific reactions and developments, and were classified into distinct families according to homology in their DBDs, for which no sequence similarities were identified for DBDs of other kingdoms of life.18 Although these DBDs were thus believed to be plant-specific, recent
reports indicate that several DBDs that had been believed to be plant-specific are not necessarily plant-specific, as follows. The DBDs of AtERF1 of the plant-specific AP2/ERF transcription factor family and Tn916 integrase, a bacterial endonuclease, share a similar structural fold, consisting of a three-stranded antiparallel b-sheet and an a-helix, in which the b-sheet contacts DNA from the major groove side.19,20 Although no apparent sequence similarity was observed between these domains, some residues involved in the DNA base recognition are identical or similar, showing the conserved mechanism of DNA recognition.21 Very recently, putative HNH-type endonucleases from bacteria, phage, and protista that possess DBDs homologous to the plant AP2/ERF proteins were identified,22 suggesting an evolutionary relationship, but not a structural convergence, of the AP2/ ERF family and bacterial endonucleases, including the Tn916 integrase. Also, it was shown that the B3 DBD of a plant-specific transcription factor RAV1 is similar to a DBD of a bacterial restriction enzyme
262 EcoRII, in the three-dimensional structure, DNAbinding mechanism, and primary sequence.23 In addition, homologues of the “plant-specific” WRKY DNA-binding proteins were identified in primitive eukaryotes, such as slime mold and protista.24 In contrast to the above, the structure of EIL3DBDM possessing a novel fold was determined in the present study, being unrelated to DBDs of other kingdoms of life. Also, we have recently determined the structures of DBDs of the plant-specific transcription factors SQUAMOSA promoter-binding proteins, which also revealed a novel fold and a novel zinc-binding motif(s).25 In addition, the DBD of a plant-specific NAC transcription factor possesses a structure of a novel fold.26 For these DBDs, no homologue originating from other kingdoms has been identified. Therefore, it is expected that at least some DBDs of the plant-specific transcription factors are really plant-specific, which gained their functions along with the evolution of plant-specific reactions and developments occurring after the plant kingdom was isolated from others.
DNA-binding Domain of Arabidopsis EIN3-like 3 virus (TEV) protease are attached at the 5 0 end, and that the T7 terminator sequence is attached at the 3 0 end (T. Yab, Yoko Motoda, Miyuki Saito, Natsuko Matsuda, T.K. & S.Y., unpublished results). Consequently, additional amino acids derived from the expression vectors were attached to the protein fragments, i.e. GlySerSerGlySer SerGly to the N terminus and SerGlyProSerSerGly to the C terminus. Fragments were subjected to a high-throughput cell-free system developed at RIKEN for the test of expression and solubility.6–8 For the shortest soluble fragment, Ser162–Gln288 (see Results and Discussion), the 13C,15N-labeled and unlabeled proteins were expressed by the cell-free system on a large scale. The protein was purified by HiTrape chelating (Amersham) and HiTrape SP (Amersham) column chromatography. The buffers used were 50 mM sodium phosphate (pH 8.0), 500 mM NaCl, 20–500 mM imidazole for HiTrape chelating chromatography, and 20 mM sodium phosphate (pH 6.0), 50 mM–1 M NaCl for HiTrape SP chromatography. Protein concentration was determined from A280 values and molar absorption coefficients were calculated from the amino acid sequences. For NMR measurements, approximately 1.0 mM proteins were dissolved in 20 mM potassium phosphate buffer (pH 6.0) containing 200 mM KCl, 0.5 mM sodium 2,2-dimethyl-2-silapentane-5-sulfonate (DSS), and 5% (v/v) 2H2O, unless stated otherwise.
Conclusion NMR measurements and resonance assignments
In the present study, a region corresponding to the major DBD of the EIN3/EIL transcription factor family, which is the key factor in the ethylene signaling, was identified. The isolated domain exists in a monomeric form and retained the sequence specificity for the DNA binding. The structure determined by NMR spectroscopy consists of five a-helices, packing into a novel fold dissimilar to known DBDs. An NMR titration analysis suggested that a region including the key mutation site, ein3-3 site, was shown to be involved in DNA binding. The EIL3-DBDM structure is also novel, in that it possesses a proline-rich region, where intramolecular contacts have features similar to those of the intermolecular interactions between the well-known SH3 or WW domains and prolinerich peptides. Therefore, the present results expand our knowledge on the DBD structures considerably. It should be emphasized that the location of the major DBD in the primary sequence and its threedimensional structure may be clues to designing mutational experiments, by which research on the ethylene signaling will be facilitated.
Materials and Methods Sample preparation The DNAs that code for 12 different EIL3 fragments including the ein3-3 site (see Figure 1(b)) were subcloned into pCR2.1 vector (Invitrogen) by PCR from an Arabidopsis full-length cDNA clone27 with the ID code RAFL09-36-H03 (MIPS code: At1g73730). The PCR primers used were designed so that the phage T7 promoter sequence, ribosome-binding site, and oligohistidine tag, as well as the cleavage site for tobacco etch
Typical homonuclear and heteronuclear NMR spectra28,29 were recorded on Bruker DMX-750 (750.13 MHz for 1H and 76.02 MHz for 15N) and DMX-500 (500.13 MHz for 1H, 125.76 MHz for 13C, and 50.68 MHz for 15N) spectrometers at 298 K, essentially as described.25 The recorded spectra were analyzed for the backbone and side-chain resonances assignment as described.25 HSQC spectra of samples containing 100% 2H2O were recorded at 298 K, and 40 hydrogen bond donors were identified. By using the HMQC-J experiment, 90 3JaN coupling values were obtained.30 By analyzing the nuclear Overhauser enhancement spectroscopy (NOESY), total correlated spectroscopy (TOCSY), and double quantum filtered spectroscopy (DQF-COSY) spectra, 29 and six pairs of Hb and valine Hg resonances, respectively, were assigned stereospecifically. Determination of the 3D structure The distance constraints derived from the NOESY spectra and those maintaining hydrogen bonds were imposed as described.23,25 The f angle constraints were classified into three categories, K120(G50)8, K60(G30)8, and K100(G70)8, corresponding to the 3JaN coupling values of O7.0 Hz, !6.0 Hz, and 6.0–7.0 Hz, respectively. The last category corresponds to a broad angle region containing those for the former two, which allows a conformational disorder within the negative f angle region. For stereospecifically assigned residues, c1 torsion angle constraints, classified into three categories, 60(G40)8, 180(G40)8, and K60(G40)8, were imposed. The cis conformation was assumed for the Asp186– Pro187 peptide bond, for which a large sequential Ha-Ha NOE was observed and the alternative assumption of the trans conformation caused significant violations. Random simulated annealing31 was carried out by using the program CNS32, essentially as described.23
263
DNA-binding Domain of Arabidopsis EIN3-like 3
Surface plasmon resonance
Supplementary Data
Experiments were carried out at 298 K using a Biacore X apparatus (BIACORE). The running buffer was 20 mM potassium phosphate (pH 6.0) containing 100 mM KCl and 0.005% (v/v) Tween20. A total of 522, 580, 551, and 539 resonance units of four double-stranded DNAs (5 0 bio-GGGCATGTATCTTGAATC-3 0 /5 0 -GATTCAAGATA CATGCCC-3 0 , the TEIL recognition sequence5 is underlined; 5 0 -bio-GGGCATTTATCTTGAATC-3 0 /5 0 -GATT CAAGATAAATGCCC-3 0 , the TEIL recognition sequence except for the mutated base is underlined; 5 0 -bioC GAT CA CC T GA GGC T G-3 0 / 5 0 -CAGCCTCAGGT GATCG-3 0 and 5 0 -bio-TCTTTAATTTCTAATATATTTA GAA-3 0 /5 0 -TTCTAAATATATTAGAAATTAAAGA-3 0 , respectively; bio indicates biotinylation at the 5 0 end) were immobilized on the surfaces of Sensor Chip SAs (BIACORE) in one (flow-cell 2) of the two flow-cells, so that the other (flow-cell 1) is treated as the control. Solutions containing the protein at concentrations of 1 nM–1 mM were injected into the flow-cells at 20 ml minK1 for five minutes. The equilibrium binding constant was estimated by fitting the maximal response values at different protein concentrations to the simple 1 : 1 binding model using the BIAevaluation 3.0 software (BIACORE).
Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j. jmb.2005.02.065.
NMR titration analysis The protein with the uniform 15N-labelling (except for Glu residues) was prepared by the cell-free system essentially as described above. HSQC spectra of the protein at the initial concentration of 0.14 mM, dissolved in 20 mM potassium phosphate buffer (pH 6.0) containing 100 mM KCl, 0.5 mM DSS, and 5% 2H2O, were recorded at 298 K by adding increasing amounts of 2.8 mM doublestranded 18mer DNA (5 0 -bio-GGGCATGTATCTT GAATC-3 0 /5 0 -GATTCAAGATACATGCCC-3 0 , the consensus recognition sequence5 is underlined), dissolved in the same buffer. The concentration of the doublestranded DNA was determined by using an extinction coefficient calculated after digestion of the strands with phosphodiesterase I (Worthington). Protein Data Bank accession code The co-ordinates of the determined structure have been deposited with the Protein Data Bank under accession code ID 1WIJ.
Acknowledgements The authors thank N. Matsuda, Y. Motoda, Y. Fujikura, M. Saito, Y. Miyata, K. Hanada, A. Kobayashi, N. Sakagami, M. Ikari, F. Hiroyasu, Y. Nishimura, M. Watanabe, M. Sato, M. Hirato at RIKEN, and H. Yamaguchi at AIST for technical assistance. This work was supported, in part, by the RIKEN Structural Genomics/Proteomics Initiative (RSGI) and the National Project on Protein Structural and Functional Analyses, Ministry of Education, Culture, Sports, Science and Technology, Japan.
References 1. Abeles, F., Morgan, P. & Saltveit, M. (1992). Ethylene Plant Biology, Academic Press, San Diego, CA. 2. Guo, H. & Ecker, J. R. (2004). The ethylene signaling pathway: new insights. Curr. Opin. Plant Biol. 7, 40–49. 3. Chao, Q., Rothenberg, M., Solano, R., Roman, G., Terzaghi, W. & Ecker, J. R. (1997). Activation of the ethylene gas response pathway in Arabidopsis by the nuclear protein ETHYLENE-INSENSITIVE3 and related proteins. Cell, 89, 1133–1144. 4. Solano, R., Stepanova, A., Chao, Q. & Ecker, J. R. (1998). Nuclear events in ethylene signaling: a transcriptional cascade mediated by ETHYLENEINSENSITIVE3 and ETHYLENE-RESPONSEFACTOR1. Genes Dev. 12, 3703–3714. 5. Kosugi, S. & Ohashi, Y. (2000). Cloning and DNAbinding properties of a tobacco Ethylene-Insensitive3 (EIN3) homolog. Nucl. Acids Res. 28, 960–967. 6. Kigawa, T., Yabuki, T., Yoshida, Y., Tsutsui, M., Ito, Y., Shibata, T. & Yokoyama, S. (1999). Cell-free production and stable-isotope labelling of milligram quantities of proteins. FEBS Letters, 442, 15–19. 7. Kigawa, T., Yabuki, T., Matsuda, N., Matsuda, T., Nakajima, R., Tanaka, A. & Yokoyama, S. (2004). Preparation of Escherichia coli cell extract for highly productive cell-free protein expression. J. Struct. Funct. Genom. 5, 63–68. 8. Yokoyama, S., Hirota, H., Kigawa, T., Yabuki, T., Shirouzu, M., Terada, T. et al. (2000). Structural genomics projects in Japan. Nature Struct. Biol. 7, 943–945. 9. Laskowski, R. A., Rullman, J. A. C., MacArthur, W. M., Kaptein, R. & Thornton, J. M. (1996). AQUA and PROCHECK-NMR: programs for checking the quality of protein structures solved by NMR. J. Biomol. NMR, 8, 477–486. 10. Sudol, M. (1996). The WW module competes with the SH3 domain? Trends Biochem. Sci. 21, 161–163. 11. Musacchio, A., Saraste, M. & Wilmanns, M. (1994). High-resolution crystal structures of tyrosine kinase SH3 domains complexed with rpoline-rich peptides. Nature Struct. Biol. 1, 546–551. 12. Huang, X., Poy, F., Zhang, R., Joachimiak, A., Sudol, M. & Eck, M. J. (2000). Structure of a WW domain containing fragment of dystrophin in complex with b-dystroglycan. Nature Struct. Biol. 7, 634–638. 13. Pires, J. R., Taha-Nejad, F., Toepert, F., Ast, T., Hoffmu¨ller, U., Schneider-Mergener, J. et al. (2001). Solution structures of the YAP65 WW domain and the variant L30K in complex with the peptides GTPPPPYTVG, N-(n-octyl)-GPPPY and PLPPY and the application of peptide libraries reveal a minimal binding epitope. J. Mol. Biol. 314, 1147–1156. 14. Holm, L. & Sander, C. (1993). Protein structure comparison by alignment of distance matrices. J. Mol. Biol. 233, 123–138. 15. Guo, F., Gopaul, D. N. & Van Duyne, G. D. (1999). Asymmetric DNA bending in the Cre-loxP sitespecific recombination synapse. Proc. Natl Acad. Sci. USA, 96, 7143–7148. 16. Velankar, S. S., Soultanas, P., Dillingham, M. S.,
264
17.
18.
19.
20. 21.
22.
23.
24. 25.
DNA-binding Domain of Arabidopsis EIN3-like 3
Subramanya, H. S. & Wigley, D. B. (1999). Crystal structures of complexes of PcrA DNA helicase with a DNA substrate indicate an inchworm mechanism. Cell, 97, 75–84. Cheng, C., Kussie, P., Pavleitich, N. & Shuman, S. (1998). Conservation of structure and mechanism between eukaryotic topoisomerase I and site-specific recombinase. Cell, 92, 841–850. Riechmann, J. L., Heard, J., Martin, G., Reuber, L., Jiang, C.-Z., Keddie, J. et al. (2000). Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes. Science, 290, 2105–2110. Allen, M. D., Yamasaki, K., Ohme-Takagi, M., Tateno, M. & Suzuki, M. (1998). A novel mode of DNA recognition by a b-sheet revealed by the solution structure of the GCC-box binding domain in complex with DNA. EMBO J. 17, 5484–5496. Wojciak, J. C., Connolly, K. M. & Clubb, R. T. (1999). NMR structure of the Tn916 integrase–DNA complex. Nature Struct. Biol. 6, 366–377. Connolly, K. M., Ilagovan, U., Wojciak, J. M., Iwahara, M. & Clubb, R. T. (2000). Major groove recognition by three-stranded b-sheets: affinity determinants and conserved structural features. J. Mol. Biol. 300, 841–856. Magnani, E., Sjo¨lander, K. & Hake, S. (2004). From endonuclease to transcription factors: evolution of the AP2 DNA binding domain in plants. Plant Cell, 16, 2265–2277. Yamasaki, K., Kigawa, T., Inoue, M., Tateno, M., Yamasaki, T., Yabuki, T. et al. (2004). Solution structure of the B3 DNA-binding domain of the Arabidopsis cold-responsive transcription factor RAV1. Plant Cell, 16, 3448–3459. ¨ lker, B. & Somssich, I. E. (2004). WRKY transcription U factors: from DNA binding towards biological function. Curr. Opin. Plant Biol. 7, 491–498. Yamasaki, K., Kigawa, T., Inoue, M., Tateno, M., Yamasaki, T., Yabuki, T. et al. (2004). A novel zinc-
26.
27.
28. 29. 30. 31.
32.
33.
34. 35.
binding motif revealed by solution structures of DNA-binding domains of Arabidopsis SBP-family transcription factors. J. Mol. Biol. 337, 49–63. Ernst, H. A., Olsen, A. N., Skriver, K., Larsen, S. & Lo Leggio, L. (2004). Structure of the conserved domain of ANAC, a member of the NAC family of transcription factors. EMBO Rep. 5, 1–7. Seki, M., Narusaka, M., Kamiya, A., Ishida, J., Satou, M., Sakurai, T. et al. (2002). Functional annotation of a full-length Arabidopsis cDNA collection. Science, 296, 141–145. Wu¨thrich, K. (1986). NMR of Proteins and Nucleic Acids, John Wiley, New York. Bax, A. (1994). Multidimensional nuclear magnetic resonance methods for protein studies. Curr. Opin. Struct. Biol. 4, 738–744. Kay, L. E. & Bax, A. (1989). New methods for measurement of NH-CaH coupling constants in 15 N-labelled proteins. J. Magn. Reson. 86, 110–126. Nilges, M., Clore, G. M. & Gronenborn, A. M. (1988). Determination of the three-dimensional structures of proteins from inter-proton distance data by dynamic simulated annealing from a random array of atoms. FEBS Letters, 239, 129–136. Bru¨nger, A. T., Adams, P. D., Clore, G. M., DeLano, W. L., Gros, P., Grosse-Kunstleve, R. W. et al. (1998). Crystallography, NMR system: a new software suite for macromolecular structure determination. Acta Crystallog. sect. D, 54, 905–921. Thompson, J. D., Gibson, T. J., Plewniak, F., Jeanmougin, F. & Higgins, D. G. (1997). The CLUSTAL-X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucl. Acids Res. 25, 4876–4882. Kraulis, P. J. (1991). MOLSCRIPT: A program to produce both detailed and schematic plots of protein structures. J. Appl. Crystallog. 24, 946–950. Koradi, R., Billeter, M. & Wu¨thrich, K. (1996). MOLMOL: a program for display and analysis of macromolecular structures. J. Mol. Graph. 14, 51–55.
Edited by P. Wright (Received 28 December 2004; received in revised form 24 February 2005; accepted 25 February 2005)