Base Coupling in Sequence-specific Site Recognition by the ETS Domain of Murine PU.1

Base Coupling in Sequence-specific Site Recognition by the ETS Domain of Murine PU.1

doi:10.1016/S0022-2836(03)00362-0 J. Mol. Biol. (2003) 328, 805–819 Base Coupling in Sequence-specific Site Recognition by the ETS Domain of Murine ...

466KB Sizes 0 Downloads 16 Views

doi:10.1016/S0022-2836(03)00362-0

J. Mol. Biol. (2003) 328, 805–819

Base Coupling in Sequence-specific Site Recognition by the ETS Domain of Murine PU.1 Gregory M.K. Poon and Robert B. Macgregor Jr* Department of Pharmaceutical Sciences, Leslie Dan Faculty of Pharmacy, University of Toronto, 19, Russell Street Toronto, Ont., Canada M5S 2S2

The ETS domain of murine PU.1 tolerates a large number of DNA cognates bearing a central consensus 50 -GGAA-30 that is flanked by a diverse combination of bases on both sides. Previous attempts to define the sequence selectivity of this DNA binding domain by combinatorial methods have not successfully predicted observed patterns among in vivo promoter sequences in the genome, and have led to the hypothesis that energetic coupling occurs among the bases in the flanking sequences. To test this hypothesis, we determined, using thermodynamic cycles, the complex stabilities and base coupling energies of the PU.1 ETS domain for a set of 26 cognate variants (based on the lB site of the Igl2-4 enhancer, 50 -AATAAAAGGAAGTGAAACCAA-30 ) in which flanking sequences up to three bases upstream and/or two bases downstream of the core consensus are substituted. We observed that both cooperative and anticooperative coupling occurs commonly among the flanking sequences at all the positions investigated. This phenomenon extends at least three bases in the 50 side and is, at least on our experimental data, due exclusively to pairwise interactions between the flanking bases, and not changes in the local environment of the DNA groove floor. Energetic coupling also occurs between the flanking sides across the core consensus, suggesting long-range conformational effects along the DNA target and/ or in the protein. Our data provide an energetic explanation for the pattern of flanking bases observed among in vivo promoter sequences and reconcile the apparent discrepancies raised by the combinatorial experiments. We also discuss the significance of base coupling in light of an indirect readout mechanism in ETS/DNA site recognition. q 2003 Elsevier Science Ltd. All rights reserved

*Corresponding author

Keywords: PU.1; ETS domain; thermodynamic cycles; protein/DNA interactions; indirect readout

Introduction ETS proteins are characterized by a conserved 85-residue domain that serves as the DNA-binding module for a functionally diverse group of transcription factors. This domain recognizes purinerich sequences containing a 50 -(A/T)GGA(A/T)-30 consensus. Beyond this core sequence requirement, ETS proteins tolerate remarkable variation among the flanking bases although a compilation of in vivo ETS binding sites demonstrates that their constitutions are not randomly distributed.1 In addition, other interactions distal to the ETS/DNA interface modulate the stability of the complex. For example, ETS members such as Ets-1 and E-mail address of the corresponding author: [email protected]

GABPa contain inhibitory modules that interfere with DNA binding (autoinhibition), resulting in a reduction in the observed stability of the ETS/ DNA complex.2,3 Protein– protein interactions with accessory binding partners also affect DNA-binding affinity,4,5 in some cases by reorienting the inhibitory module away from the ETS domain.3 The ETS protein PU.1 forms sequence-specific complexes with no fewer than 65 different sequences with a dispersion of affinities.1,5 These sequences differ almost exclusively in the bases flanking a 50 -GGAA-30 consensus.1 Sequence selectivity in PU.1 ETS site recognition is of particular interest because the stability of the ETS/DNA complex is an important determinant in transcriptional efficacy of the regulated genes in vivo. In luciferase reporter constructs carrying the p47phox promoter, for example, binding of PU.1 to a cis element is strongly correlated with promoter activity.6 To

0022-2836/03/$ - see front matter q 2003 Elsevier Science Ltd. All rights reserved

806

Base Coupling in PU.1 ETS Site Selection

Figure 1. DNA probe used in this study and sequence mutation. The cognate sequence (with the 50 GGAA-30 consensus shaded in gray) is flanked by a trinucleotide G·C track; the 50 overhang facilitates end-labelling by T4 PNK. Positions 50 or upstream from the consensus are designated (2 ) positions and proceed away from the consensus in the 50 direction; conversely, (þ ) positions are 30 or downstream away from the consensus. In this study, positions (2 1) to (23) and/ or (þ 1)/(þ2) are substituted to yield various cognate variants.

date, PU.1 has been implicated, in conjunction with other transcription factors (including other ETS proteins), in the regulation of no fewer than 24 genes whose products are essential in a long and still growing list of developmental and immune processes (for a review see Lloberas et al.7). A recent report on the promoter of a related gene, p40phox, has identified three PU.1 binding sites,8 and promoter activity mediated by each site is correlated with PU.1 binding affinity. Moreover, the transcriptional deficits inflicted by mutation of these sites run parallel with the relative stabilities of PU.1/DNA complexes.8 An understanding of the structural and energetic basis of the base selectivity in the flanking sequences by ETS proteins is therefore crucial to the accurate prediction of high-affinity binding sites in the genome and elucidation of the role of ETS proteins in transcriptional regulation. Previous efforts to define the base selectivity in the flanking sequences by ETS proteins have used combinatorial site selection methods such as SELEX (systematic evolution of ligands by expoor length-encoded nential enrichment)9 – 11 1 multiplex. These experimental techniques allow rapid screening for high-affinity cognates from a randomized pool. A further advantage of the multiplex method is its ability to determine the contribution of each position to protein/DNA complex stability on the assumption that these positions contribute independently. Although the results of these experiments are qualitatively consistent with one another, in many cases they do not correspond to directly-determined stabilities and compiled in vivo promoter sequences reported in the literature.1 This discordance has led to the hypothesis that base neighbors in the flanking sequences interact to influence the overall ETS/DNA complex stability, and has received support from statistical analyses of available in vivo promoter sequences indicating that the presence of certain bases at specified flanking positions significantly biases the identity of the neighboring bases.1 Here, we tested the base coupling hypothesis in PU.1 ETS/DNA interactions by directly demonstrating and quantifying the energetics of the

coupling interaction between bases flanking the 50 -GGAA-30 consensus. We used thermodynamic cycles to determine coupling energies for a number of double and triple-base substitutions in the flanking sequences. Thermodynamic cycles are widely employed experimental constructs in protein stability studies to reveal cooperative interactions between (usually spatially proximal) residues by site-directed mutagenesis. Horovitz & Fersht12,13 extended this method to analyze cooperative effects of multiple simultaneous residue changes in protein engineering experiments, and Chen & Stites14,15 applied it to probe the effect of multiple mutations of core residues in staphylococcal nuclease. Although this method of detecting the effect of interactions between discrete substitutions on thermodynamic stability, and its analytical formulation, is entirely general, it is rarely applied to other macromolecular systems. To the best of our knowledge, this approach has not been utilized for the analysis of interactions involving nucleic acids. Our results indicate that energetic coupling among bases in the flanking sequences is a common and significant determinant in PU.1 ETS/ DNA complex stabilization. The sign and magnitude of the specific base coupling energies provide a thermodynamic explanation for base preference in the flanking sequences among in vivo promoter sequences and the discrepant results from combinatorial site selection experiments.

Results Sequence notation and selection We have performed equilibrium titration experiments for the binding of the PU.1 ETS domain to a series of cognate variants of the lB site of the Igl2-4 enhancer (Figure 1). The choice of this “standard” reference sequence is based on our previous investigation on the thermodynamics of PU.1 ETS association with this sequence.16 We made substitutions in the flanking bases up to three bases upstream in the 50 flanking sequence and two bases downstream on the 30 side, as these are

807

Base Coupling in PU.1 ETS Site Selection

Table 1. Energetics of ETS-DNA interactions in 150 mM Naþ at 25 8C, pH 7.4 Cognate varianta

KD (nM)b

DG8 (kJ/mol)c

Referencee [50 ]AAA/[30 ]GTG (7) 2.68 ^ 0.41 248.90 ^ 0.38

Figure 2. Sequence-specific site recognition by the PU.1 ETS domain. (A) Non-denaturing electrophoretic mobility shift of 0.15 mM of a non-specific sequence ð50 -AATAAAAGAGAGTG-30 Þ and the lB site of the Igl2-4 enhancer in the absence (lanes 1 and 3, respectively) or presence of 0.2 mM PU.1 ETS domain (lanes 2 and 4, respectively). In lane 5 the same concentrations of lB site and PU.1 ETS domain were co-incubated with 0.7 mM of the non-specific sequence. The Gaussian-fitted radiometric bound-to-unbound peak ratios for lanes 4 and 5 are 1.6(^ 0.3) and 1.4(^0.3), respectively. (B) Representative filter binding isotherms for selected cognate variants. Lines are fit of equation (1) to the data, with the ordinate normalized to accommodate the different concentrations of PU.1 ETS domain (0.4– 30 nM) used to obtain optimal signal-to-noise ratios. Generally, higher protein concentrations are required for weaker variants due to linearly increasing background DNA binding to filters, and lower concentrations are needed for stronger variants to minimize titrant depletion.

the flanking bases with which the PU.1 ETS domain makes contact in the sequence-specific cocrystal structure.17 For ease of reference, we adopt the following notation for denoting the variants: [P0 ]NNN, where P0 ¼ 50 or 30 refers to the side on which the trinucleotide NNN flank the 50 -GGAA-30 consensus, the remainder of the sequence being unchanged from the reference. Hence, [50 ]TGC denotes the triple-base variant,

DDG8 (kJ/mol)d –

Single-base variants [50 ]CAA (2) [50 ]GAA (2) [50 ]TAA (5) [50 ]ACA (3) [50 ]AGA (5) [50 ]AAC (3) [50 ]AAG (2) [50 ]AAT (2) [30 ]ATG (3) [30 ]GAG (3) [30 ]TTG (2) [30 ]GGG (2)

16.3 ^ 1.9 23.6 ^ 10.4 3.11 ^ 0.70 1.71 ^ 0.61 0.64 ^ 0.11 11.2 ^ 3.4 22.8 ^ 4.6 137 ^ 39 75.5 ^ 12.4 30.5 ^ 9.7 142 ^ 27 88.6 ^ 11.2

244.42 ^ 0.29 4.48 ^ 0.47 243.51 ^ 1.09 5.39 ^ 1.15 248.53 ^ 0.55 0.37 ^ 0.67 250.01 ^ 0.88 21.11 ^ 0.96 252.47 ^ 0.44 23.57 ^ 0.58 245.35 ^ 0.76 3.55 ^ 0.84 243.60 ^ 0.50 5.30 ^ 0.63 239.15 ^ 0.70 9.75 ^ 0.80 240.63 ^ 0.41 8.27 ^ 0.55 242.88 ^ 0.79 6.02 ^ 0.87 239.06 ^ 0.46 9.84 ^ 0.59 240.23 ^ 0.31 8.67 ^ 0.49

Double-base variants [50 ]ACC (2) [50 ]ACG (2) [50 ]ACT (2) [50 ]AGC (3) [50 ]AGG (3) [50 ]AGT (3) [50 ]TGA (3) [50 ]TAC (1) [30 ]AAG (2) [30 ]TGG (2)

0.99 ^ 0.29 39.5 ^ 13.1 45.2 ^ 8.5 0.53 ^ 0.13 2.02 ^ 0.45 17.5 ^ 5.4 3.17 ^ 1.12 5.49 ^ 1.79 78.0 ^ 20.8 240 ^ 70

251.36 ^ 0.73 22.46 ^ 0.82 242.24 ^ 0.82 6.66 ^ 0.90 241.90 ^ 0.47 7.00 ^ 0.60 252.93 ^ 0.61 24.03 ^ 0.72 249.60 ^ 0.55 20.70 ^ 0.67 244.26 ^ 0.77 4.64 ^ 0.85 248.49 ^ 0.88 0.41 ^ 0.95 247.12 ^ 0.81 1.78 ^ 0.89 240.55 ^ 0.66 8.35 ^ 0.76 237.76 ^ 0.72 11.14 ^ 0.81

Triple-base variant [50 ]TGC (4)

3.38 ^ 0.67 248.32 ^ 0.49

0.58 ^ 0.62

54.1 ^ 10.3 241.46 ^ 0.47 87.0 ^ 11.6 240.28 ^ 0.33

7.44 ^ 0.60 8.62 ^ 0.50

0

0

[5 ]/[3 ] variants [50 ]AGA/[30 ]GG (3) [50 ]AAC/[30 ]AT (2)

a Parentheses denote number of replicate experiments; substitutions are highlighted by bold type. See the text for details of sequence notation. b Error is the mean fitting error or standard deviation of the average KD, whichever is larger. c DG8 ¼ RT ln KD (T ¼ 298 K). d DDG8 ¼ DG8(variant) 2 DG8(reference). e lB site of Igl2-4 enhancer.

namely 50 -TGCGGAAGTG-30 (substituted bases are in bold). Positions 50 to the consensus are referred to as (2 ) positions in the 30 -to-50 direction, and (þ ) positions are those 30 to the consensus in the 50 -to-30 direction (Figure 1). Specific base changes from N to N0 at position (^ X) is denoted by N(^ X)N0 in analogy to protein amino acid mutations. The PU.1 ETS domain exhibits strong sequence-sensitive selectivity among cognate variants To ensure that our data represent sequencespecific interactions by the PU.1 ETS domain, we investigated potential contribution of non-specific DNA binding. Resolution of a mixture of 0.2 mM of the PU.1 ETS domain and 0.15 mM of a

808

Base Coupling in PU.1 ETS Site Selection

Figure 3. Comparative stabilities of PU.1 ETS/DNA complexes. DDG8s are relative to the reference lB site, 50 -AATAAAGGAAGT 0 GAAA-3 , and are arranged in order of decreasing stability. See Table 1 for numerical values.

non-specific sequence (identical to the lB site except the core consensus is changed to 50 -GAGA30 ) by electrophoretic mobility shift indicated no detectable binding as exhibited by the lB site (Figure 2(A)). In addition, the bound-to-unbound ratio, consistent with a 1:1 complex, for the lB site was not detectably reduced by 0.7 mM of unlabelled non-specific sequence. These results verified that our titration results reflect sequencespecific selectivity by the PU.1 ETS domain (Figure 2(B)). From the aggregate set of titration results, KD spans a range greater than 400-fold, or in excess of 15 kJ/mol in free energy (Table 1, Figure 3). Bearing in mind differences in solution conditions and titration technique, this spread is nonetheless considerably larger than that observed in previous reports in which sequence-specific quantitative PU.1 ETS/DNA stabilities have been directly determined.1,5 From these results it is clear that even a single substitution anywhere from (2 3) to (þ 2) in the flanking sequence can lead to a significant change (usually loss compared to the reference sequence) in PU.1 ETS/DNA stability. Of particular interest is the highly punitive substitution A(2 1)T (DDG8 ¼ þ 10 kJ/mol): a T at that position in [50 ]AAT or indeed any sequence context has never been encountered in a known PU.1 ETS binding site. It is also evident that the effect of the substitution is highly position-dependent. For example, [50 ]AAG exhibits a 36-fold lower KD (or þ 9 kJ/mol in free energy) than [50 ]AGA for the PU.1 ETS domain. In addition, a second substitution generally results in another significant change in complex stability, regardless of the identity of the substituting base. A broader analysis of the results indicates that sequence sensitivity of PU.1 ETS/DNA complex

stability extends beyond pairwise interactions. The single-base variants [50 ]TAA, the double-base variants [50 ]TAC and [50 ]TGA, as well as the triplebase variant [50 ]TGC all share highly similar DDG8s (, 0.64 kJ/mol) relative to the reference [50 ]AAA, but this is clearly not the case when A is present at the (2 3) position (Figure 4). Thus, the interplay between the identity/position of substitution and ETS/DNA complex stability is sequence-dependent from (2 1) to at least (2 3) in the [50 ] flanking sequence. Bases flanking the 50 -GGAA-30 consensus cooperate in many cases in determining sequence selectivity by the PU.1 ETS domain In all, we resolved 13 double-substitution thermodynamic cycles for the (2 2) and (2 1) positions, 11 of which are [50 ] cycles and the other two [30 ] cycles. For these results, using the propagated errors in the reported DG8 (Table 1), the average error in D2G8int (calculated using equation (5)) is 1.22(^ 0.14) kJ/mol (^ standard deviation). This value establishes the lower limit below which base coupling cannot be detected above experimental noise. Of the 13 double-substitution cycles tested, 11 reveal base coupling, and of these, eight are cooperative (i.e. D2G8int , 0). In general, the values of D2G8int exhibit a significant spread and are comparable to DDG8 in magnitude (Table 1), constituting up to 15% of the observed DG8 of the double-base variant in question. Of the 11 [50 ] cycles, six constitute an interconnected web encompassing the set of double-base variants [50 ]A(G/C)N (Figure 5). The [50 ]A(G/C)C and [50 ]AGG variants exhibit strongly cooperative coupling, whereas the magnitudes of D2G8int for

Base Coupling in PU.1 ETS Site Selection

809

Figure 4. Effect of base at the (2 3) position on base coupling at positions (21) and (2 2). (A) Thermodynamic cycle used to examine this effect. Filled arrows represent values of DDG8s determined experimentally to resolve the cycle. See the text and Figure 1 for sequence notation. (B) DDG8 for the three variants of each series, the lB sequence [50 ]AAA being the reference i.e. DDDG8 ¼ DG8(variant) 2 DG8([50 ]AAA).

the two [50 ]A(G/C)T variants are just slightly above noise. In contrast, coupling is anticooperative for the [50 ]ACG variant; we note that this sequence is not found in reported PU.1 cognates despite a number of examples containing C or G at the (2 2) or (2 1) position, respectively.1,8 Another interesting feature is that the coupling energies for ½50 AGC and ½50 ACC are the same within error (average D2G8int ¼ 2 4.46 kJ/mol), as are those for ½50 AGT and ½50 ACT (average D2G8int ¼ 2 1.59 kJ/mol). This coincidence is more striking given that the individual DG8 values that comprise these component subcycles are all significantly different (Table 1), and raises the possibility that C and T at the (2 1) position have a dominant effect on base coupling. This trend, however, is not apparently followed by the [50 ]A(C/G)G cognates. We have also resolved a triple-substitution [50 ] cycle for the variant [50 ]TGC (Figure 6). For such a construct, it is possible to dissect the contributions made by the component single and double-base

substitutions, as given by D3G8int-1 (equation (6)) and D3G8int (equation (9)), respectively. As applied to the [50 ]TGC triple-base variant, we obtained D3G8int-1 ¼ 0.22(^ 1.37) kJ/mol, indicating that the three component single-base substitutions have only an additive effect on the observed PU.1 ETS/ DNA stability for the triple-base variant. This is a significant result, and indicates that local changes effected by single-base substitutions (such as steric constraints, dipole distribution and hydrophobicity), as embodied by D3G8int-1, appear to play no role in determining their overall base-neighbor interactivity (D3G8int ¼ 2.77(^ 1.80) kJ/mol), at least for this triple-base substitution. The anticooperative interactions observed in the energetics of the triple mutant [50 ]TGC are due exclusively to base coupling among the three component double-base variants. We point out that a significant D3G8int is an alternative expression of the three-base interaction illustrated in Figure 3. The [50 ]TAA/TGA/TAC/ TGC subcycle (Figure 4) exhibits no significant

810

Base Coupling in PU.1 ETS Site Selection

Figure 5. Thermodynamic cycles for the set of variants [50 ]A(G/C)N. Singly and doubly-substituted variants are highlighted in gray and black boxes, respectively; the reference sequence is left unboxed. Some arrows are shaded to aid visualization and the (23) base is omitted for simplicity. Individual DDG8s for the single-base substitutions measured experimentally are given beside the specified arrow. The coupling energies (D2G8int) are given at the center of the cycles. All values are in kJ/ mol. See the text and Figure 1 for sequence notation.

coupling for the two-base A(2 2)G/A(2 1)C substitutions (D2G8int ¼ 2 1.24(^ 1.40) kJ/mol). (The fact that the four component DG8s are similar is only coincidental, since equation (4) requires only the sum of the single-base variant DDG8s be algebraically balanced by the double-base variant DDG8.) The case of the same double-base substitution cycle but with A at the (2 3) position is exactly the AAA

opposite face of the [50 ]TAA/TGA/TAC/TGC subcycle, and its overall coupling (which is strongly cooperative at D2G8int ¼ 2 4.02(^ 1.13) kJ/ mol) is related by a difference equal to D3G8int. Hence a significant D3G8int is evidence of base coupling across the [50 ] positions (i.e. (2 1) to (2 3)) investigated. All of the six [30 ] double-base cognate variants

AAC

-2.15 ± 1.29 -4.02 ± 1.13 TAC

TAA

6.39 ± 1.36

3.61 ± 1.18

AGC

AGA -1.24 ± 1.40 0.63 ± 1.25

TGA

TGC

∆3G°int = 2.77 ± 1.80 ∆3G°int-1 = 0.22 ± 1.37

Figure 6. Thermodynamic box for the triple-base variant [50 ]TGC. Singly, doubly, and triply substituted variants are highlighted in gray, black, and transparent boxes, respectively; the reference sequence is left unboxed. The coupling energies D3G8int and D3G8int-1 are calculated with equations (9) and (6), respectively. D2G8int/(kJ/mol) for the constituent two-base subcycles are given at the center of the cycles. All values are in kJ/mol. See the text and Figure 1 for sequence notation.

811

Base Coupling in PU.1 ETS Site Selection

TG

GG

-7.37 ± 0.98

TT

GA

GT

-5.94 ± 1.17

AA

AT

tested exhibited substantially lower affinity for the PU.1 ETS domain, and are overall the poorest cognates encountered in our experiments (Figure 3). This is somewhat different from the results obtained for the murine Ets-1/DNA complex for which purine or pyrimidine transitions (from [30 ]GTG to [30 ]ATG or [30 ]GCG) exhibited no change in complex stability,2 and may reflect differences between the two ETS proteins. It is interesting that significant cooperative coupling is observed in the thermodynamic cycle analysis (Figure 7), such that the double-base variants are not as poor cognates as would be predicted from analysis of the single-base variants. This is clearly insufficient, however, to render the double-base variants much more attractive targets: [30 ]TGG is,

[5']AGA/[3']GGG

Figure 7. Thermodynamic cycles for the [30 ] double-base variants [30 ]TGG and [30 ]AAG. Singly and doubly-substituted variants are highlighted in gray and black boxes, respectively; the reference sequence is left unboxed. Individual DDG8s for the single-base substitutions measured experimentally are given beside the specified arrow. The coupling energies (D2G8int) are given at the center of the cycles. All values are in kJ/ mol. See the text and Figure 1 for sequence notation.

for example, the worst cognate variant we observed (DDG8 ¼ 11.41(^ 0.81) kJ/mol). The upstream and downstream flanking sequences are energetically coupled across the 50 -GGAA-30 consensus in PU.1 ETS/ DNA association While we observe that, individually as two noncontiguous blocs, the [50 ] and [30 ] flanking sequences exhibit base coupling in binding the PU.1 ETS domain, it is not known whether these blocs may energetically interact in modifying the overall stability of the ETS/DNA complex. To answer this question, we again made use of thermodynamic cycles except in this case each

[5']AGA/[3']GTG

2.35 ± 0.81

[5']AAA/GGG

8.67 ± 0.49

[5']AAA/[3']GTG

8.27 ± 0.55

[5']AAA/[3']ATG

-3.20 ± 0.99

[5']AAC/[3]GTG

[5']AAC/[3']ATG

Figure 8. Thermodynamic cycles for the bloc variants [50 ]AGA/ [30 ]GGG and [50 ]AAC/[30 ]ATG. Variants in which only the [50 ] or [30 ] sequences, and both, are substituted are highlighted in gray and black boxes, respectively; the reference sequence is left unboxed. Individual DDG8s for the single-base substitutions measured experimentally are given beside the specified arrow. The coupling energies (D2G8int) are given at the center of the cycles. All values are in kJ/ mol. See the text and Figure 1 for sequence notation.

812

individual substitution is a “bloc-wide” modification (within which bases may be singly or multiply substituted). We chose to examine two and such possibilities, [50 ]AGA/[30 ]GGG [50 ]AAC/[30 ]ATG (Figure 8), based on the relatively wide spread of their DG8s of the individual [50 ] and [30 ] cognate variants that comprise the cycles (Table 1, Figure 3). For the [50 ]AGA/ [30 ]GGG cycle, D2G8int ¼ 2.35(^ 0.81) kJ/mol, while the [50 ]AAC/[30 ]ATG cycle is cooperative (D2G8int ¼ 2 3.20(^ 0.99) kJ/mol). These results indicate that the [50 ] and [30 ] flanking blocs are indeed energetically coupled, and the nature of this coupling is sensitive to the actual flanking bases themselves. Thus, interactivity between the PU.1 ETS domain adjacent to one side of the core consensus appears to be transmitted to the other side (along the DNA sequence, or via the PU.1 ETS domain, or both), and this coupling may be either cooperative or anticooperative.

Discussion Assessment of the experimental data We have investigated base coupling effects among the bases flanking the critical 50 -GGAA-30 consensus18 for PU.1 ETS/DNA association by determining ETS/DNA stability by equilibrium filter binding titrations under standard conditions using the lB site of the Igl2-4 enhancer as a reference sequence. Previously, we have reported a lack of salt-induced effect on complex stability when titrations were analyzed by quantitative electrophoretic mobility shift under similar solution conditions.16 When we subsequently performed titrations by filter binding, a moderate salt dependence was observed: in 150 mM Naþ, KD is approximately one order of magnitude lower than in 250 mM Naþ. We address this and other energetic issues of sequence selectivity by the PU.1 ETS domain in another report, and here suffice to point out that the salt dependence issue does not affect the present discussion, since all equilibrium experiments in this study were carried out under identical conditions (in phosphate buffer containing 150 mM Naþ, pH 7.4 at 25 8C). For our set of filter binding data (72 experiments), the average error in DG8 is 0.61(^ 0.21) kJ/ mol (^ standard deviation). This is comparable to those reported for other equilibrium titration techniques for this protein –DNA system such as electrophoretic mobility shift,1,16 surface plasmon quantitative hydroxyl radical resonance,18 footprinting,5 and fluorescence anisotropy.5,19 Based on this average error, we expect an average propagated error in D2G8int and D3G8int of 1.2 and 1.5 kJ/mol, respectively. All of the actual errors in the thermodynamic cycles (each arising from a subset of the data pool) fall near these values (from 0.94 kJ/mol to 1.42 kJ/mol for dD2G8int and

Base Coupling in PU.1 ETS Site Selection

1.80 kJ/mol for dD3G8int), indicating that the data are stable and robust across the entire pool. Base coupling in a quadruple-base (or higherorder) variant can be evaluated by appropriate expansion of equation (9).12 The utility of this practice is likely limited by the inherent error propagation. Based on our same average error in DG8, the average propagated error dD4G8int ¼ 2.4 kJ/mol (for a quadruple-base variant, the expansion contains 16 independent terms). At this level of uncertainty, many otherwise real coupling energies would be drowned in the experimental noise. Quantitative base coupling energies explain many statistical patterns in in vivo promoters and apparently contradictory predictions made by combinatorial methods Compiled from 68 in vivo promoters available in the literature for PU.1,1,8 a “consensus sequence” can be derived by picking the most commonly occurring base at each surveyed position. When screened by SELEX or length-encoded multiplex, however, half or fewer of the flanking bases were correctly predicted (small letters denote a common but less frequently occurring base).1 Consensus SELEX Multiplex

50 50 50 -

At AT AG

G Gc AG

A Cga Cag

GGAA GGAA GGAA

Gc Gc AGc

T T Ct

-30 -30 -30

In particular, individual base contribution to the observed PU.1 ETS/DNA complex stability obtained in the multiplex experiments (assuming independent contribution at each position) failed to predict the directly measured stability in many cases.1 Base coupling was therefore inferred from these inconsistent results. We now demonstrate how our results can account for the consensus sequence and resolve the combinatorial discrepancies. For the [50 ] flanking sequence, the most favored combination among in vivo promoters, [50 ]AGA, has the second highest affinity for the PU.1 ETS domain in our experiments (Table 1). The alternative favorite, [50 ]TGA, also forms a highly stable complex with the ETS domain. Indeed, an inspection of Table 1 shows immediately that A and T are, respectively, the most and second most (by only a small margin) favored base at the (2 3) position, exactly seen as the consensus sequence. Virtually all permutations formed by the base candidates put forward by SELEX or lengthencoded multiplex experiments are among the ten most avid PU.1 ETS targets in our experiments. We can also understand now why the two combinatorial methods have yielded degenerate results at these (2 ) positions because the energetic differences among these sequences are quite small. In particular, both methods predict C as the major base at the (2 1) position; in our experiments [50 ]AGC is a more avid target for PU.1 ETS than either [50 ]AGA and [50 ]AGG. More remarkably, none of the 68 examples of in

Base Coupling in PU.1 ETS Site Selection

vivo promoter sequences carries a (21)T. This is corroborated by both SELEX and multiplex, which predict every base except T at the (21) position. Again, our results provide an energetic basis for these observations. Every sequence in our experiments which contains a (21)T ([50 ]AGT, [50 ]ACT, and [50 ]AAT) is a significantly poorer cognate for the PU.1 ETS domain than the reference [50 ]AAA (DDG8 ¼ 4.64(^0.85), 7.00(^0.60), and 11.14(^0.81), respectively). An inspection of the coupling energies reveals the influence of base coupling on complex stability (Figure 5): D2G8int is more negative for both the double-base substitutions [50 ]AGC and [50 ]AGG than for [50 ]AGT (which is just beyond experimental noise). Similarly, D2G8int is significantly less negative for [50 ]ACT than [50 ]ACC. Thus the presence of a (21)T practically abolishes the cooperative stabilization afforded by the otherwise highly favorable G at the (22) position. For the [30 ] flanking sequence, any substitution attempted at the (þ 1) and (þ 2) positions from G and T, respectively, resulted in substantially poorer cognates for the PU.1 ETS domain (Figure 3). Not surprisingly, G and T are the most common bases at the (þ 1) and (þ 2) positions, respectively, among in vivo promoters. In addition, the set of [50 ]GTN variants comprises the top four PU.1 cognates in semi-quantitative electrophoretic mobility shift assays of nuclear extracts of THP-1 cells, and also the most efficacious (by a wide margin but highly similar within the set) promoters in luciferase reporter constructs bearing the p47phox promoter ð50 -GAAGGAAGTG-30 Þ:6 Our results are not able to directly explain the minor preference for (þ 1)C or (þ 2)T in the consensus sequence or (þ 1)A and (þ 2)C in the multiplex experiments, although comparison with the luciferase reporter experiments again suggests that any permutations of these bases at their respective positions should be a poorer cognate than the reference [30 ]GTG.6 The two thermodynamic cycles constructed from the [30 ] cognate variants both exhibited significant cooperative coupling between the two base neighbors substituted (Figure 7), although overall these double-base variants are still highly unfavored cognates for the PU.1 ETS domain. Thus, the ETS/ DNA complex is able to compensate energetically for the double substitutions more effectively than any of the single-base variants, but there are currently no suitable comparative thermodynamic or structural data to explain the apparent cooperative stabilization conferred by [30 ]AAG and [30 ]TGG. We note, however, that the coupling energies in these two cases, while relatively large, are “dead bounces” since in terms of DG8 the double and component single-base variants are comparably dismal targets for the PU.1 ETS domain. Structural implication of base coupling in the flanking sequences of PU.1 ETS cognates In the sequence-specific PU.1 ETS domain-DNA crystal structure,17 the protein contacts the DNA

813

over a 10-bp area, inserting helix a3 into the major groove where conserved basic side-chains make base-specific contacts in the major groove of at the consensus sequence. The DNA is additionally contacted at the adjacent minor grooves a half-helical turn away from the consensus. These minor groove contacts, unlike those with the major groove, occur exclusively with backbone phosphate or deoxyribose atoms of the DNA. This segmentation of protein –DNA contacts is consistently reproduced in the solved structures for other ETS/DNA complexes, namely human Ets-1,20 murine Ets-1,21 fli1,22 GABPa/b,3 SAP-1,23 and Elk-1,24 and suggests a basis for the promiscuity of site recognition by this motif. Despite the apparent lack of basespecific minor groove contacts, a comparison of known cognate sequences and biochemical characterizations have demonstrated definite base preferences in the flanking sequences1,8 in support of an indirect readout mechanism of site recognition. In this paradigm, base coupling energies are interpreted as a measure of the excess contribution by multiple base-pairs, in tandem, to meet the structural requirements for site recognition by the protein. Our experimental results are compatible with this interpretation. Avid cognates such as [50 ]AGC and [50 ]AGG, for example, owe their high affinity (relative to the lB site, [50 ]AAA) to cooperative coupling at the (-2) and (-1) positions. When such favorable coupling cannot be achieved, as in the case of [50 ]AGT, a substantially poorer cognate results. Sequence sensitivity of complex stability across the (2 3) to (2 1) positions (Figure 4) and coupling of the [50 ] and [30 ] flanking blocs (Figure 8) suggest readout of a gross structural unit that may span the entire length of the binding site. In support of this proposal, as shown by the thermodynamic box for the triple-base variant [50 ]TGC (Figure 6), cooperative base coupling appears to stem not from simple effects of the individual substitutions (D3G8int-1 , 0) such as changes in local hydrophobicity or dipole distribution in the groove floor, but exclusively from pairwise interactions of the component bases. Although we have now direct demonstration of base coupling consistent with indirect readout in PU.1 ETS/DNA interactions, structural details are lacking. In the sequence-specific PU.1 ETS/DNA co-crystal structure,19 the bound DNA undergoes a three-dimensional writhe across the contact interface (Figure 9(A)).25 As the recognition helix anchors the major groove of the DNA at the 50 GGAA-30 consensus, the flanking sequences curve towards the bound protein, leading to an 88 overall curvature. As in other known examples of DNA flexure in protein/DNA complexes, this deformation is largely the result of a series of positive roll angles, namely from the (2 3)/(2 2) base step to the A – A in the core consensus and more distally beginning at the (þ 2)/(þ 3) step (Figure 9(B)); tiltwedge, an energetically costly deformation, is a negligible contributor.25,26 Overall, the helix is moderately untwisted across the contact

814

Base Coupling in PU.1 ETS Site Selection

Figure 9. Positive roll deformation drives three-dimensional writhe of DNA in the PU.1 ETS/DNA co-crystal structure. (A) Normal vector plots (values generated by FREEHELIX)25 for the DNA sequence in the PU.1 ETS/DNA co-crystal structure (50 -AAAAGGGGAAGTGGG-30 ; PDB ID: 1PUE).19 The normal vectors are obtained by first establishing a viewing axis along the helix, fitting each base-pair to a best least squares plane, and defining a unit vector perpendicular to this plane. The plot is a projection of the normal vectors onto the XY plane (orthogonal to the viewing axis Z) by plotting the vector cosines for Y against X. Three-dimensional writhe is characterized by a counterclockwise circle in the normal vector plot.25,26 Blue squares denote the 50 -GGAA-30 consensus; flanking bases are in red. (B) Roll (VROL) and tilt (VTIL) contribution to the total angle between the normal vectors (VALL) of successive basepairs, as calculated by FREEHELIX. Minor groove width is the average inter-phosphate distance (measured by xrasmol) as defined by El Hassan & Calladine;41 for comparison, the average minor groove width from 76 naked ˚ ) is included (dotted line).41 The 50 -GGAA-30 consensus is underlined. DNA crystal structures (10.7 A

interface,27 and the minor groove is substantially expanded along the entire sequence, in step with changes in roll angles (Figure 9(B)). Roll-wedge flexure appears to be conserved in other solved ETS/DNA structures: in the SAP-1/DNA complex, for example, the flanking sequence has been

described as “A-DNA-like”.23 DNA bending is also evident in DNase I footprints of various sequence-specific ETS/DNA complexes of PU.1 and other ETS proteins,28 – 31 in which a hypersensitive site 30 to the (2 1) base is observed on the complimentary “30 -CCTT-50 ” strand. As DNase I

815

Base Coupling in PU.1 ETS Site Selection

detects minor groove roll-wedge bendability,32 this phenomenon is ascribed to an ETS-induced expansion (or bend) of the minor groove.27,28 It is likely, therefore, that flexibility at that base step, and possibly others (although they cannot be observed in the footprints since they are wiped out by the presence of the protein), will influence the affinity of the cognate variants for the PU.1 ETS domain. The role of the protein in inducing the observed writhe in the DNA cognate has been linked to neutralization of phosphate contacts in the flanking sequences. All but one of these phosphate contacts lie on one face of the helix,27 such that direct and water-mediated neutralization by ETS side-chains19 causes the helix to collapse towards the protein (the partially neutralized surface). Methylphosphonate substitution experiments at these contact positions have demonstrated that this process is energetically more than sufficient to drive the observed DNA flexure in the PU.1 ETS/DNA co-crystal.33 One component of the proposed indirect readout mechanism may therefore be the alignment of the appropriate backbone phosphates in the flanking sequences to make these neutralizing contacts with the ETS domain.2 Thus cognates whose flanking sequences are “in alignment” or are sufficiently flexible to assume the required alignment should carry a lower energetic cost of association and, therefore, greater affinity for the PU.1 ETS domain. Although a 68 intrinsic curvature towards the major groove has been noted in the co-crystal sequence (50 -AAAAG GGGAAGTGGG-30 , which is very similar to the [50 ]AGG variant, a strong PU.1 ETS target in our experiment), sequence-directed flexibility, rather than intrinsic flexure, is likely the more important criterion. In DNase I footprints, including those involving the crystal sequence, the hypersenitive site is observed only when the ETS domain is also present; it is not observed in the unbound DNA.28 – 31 Given these observations it is clear that deformation of the DNA cognate to the appropriate conformation for binding is an important event in PU.1 ETS/DNA association. Much less certain, however, is the extent to which ETS/DNA contacts along the flanking sequences are local interactions or triggers for more global conformational changes that affect the overall stability of the complex. Current structural and biochemical evidence is not able to unambiguously answer this question. DNase I and hydroxyl radical footprints of various PU.1 ETS/DNA complexes, for example, are qualitatively identical in their cleavage patterns and intensities,28 – 31 even though DNase I digests of unbound DNA exhibit quite different patterns at the substituted positions (our unpublished results). In addition, comparison of co-crystal structures of SAP-1 bound to two different cognates (the E74 and c-fos promoters, the former being more avid) reveals no significant differences in the overall helical parameters of the bound sequences.23 These observations, however, cannot differentiate

whether the observed DDG8 values and coupling energies among cognate variants reflect simply the costs of deforming the local flanking DNA segments (as a function of sequence-dependent roll-wedge flexibility, or some other anisotropic roll-wedge-related property) of these variants to the required conformation, or that distal processes absorb (with varying ease) the local differences throughout the extended length of the DNA cognate. Adding to this uncertainty is the mutual influence between the PU.1 ETS domain and the DNA cognate. In the PU.1 ETS/DNA co-crystal structure17 backbone contacts along the flanking sequences are frequently made by multiple protein atoms; to what extent do flanking base substitutions alter these local couplings, alter distal coupled contacts, and/or possibly, tune the tertiary structure within the protein? Comparison of SAP-1 and Elk-1, two ETS members with sequence identity in the DNA-recognition domain, bound to the same E74 promoter has revealed the role of distal, non-conserved residues in differentially orienting DNA-recognition secondary elements and resulting in significantly different ETS/DNA contacts.23,24 Of course, none of the above possibilities are mutually exclusive and a given effect may be apparent only when its contribution is severe, e.g. (A)[50 ]AAT, an example of the 50 -AnT-30 motif (n $ 2) which is commonly observed to bend towards the minor groove and compress it with negative roll.34 Finally, given the large range of DDG8 values exhibited by some cognates, it remains possible that differences may exist in the base-specific readout of different cognates by the ETS domain. Thus, flexibility at a particular flanking base step may have a dominant effect on observed ETS/DNA complex stability. Ultimately, elucidation of these possibilities will depend on investigating and comparing additional structural examples of the same ETS domain bound in energetically disparate sequence-specific complexes.

Conclusion We have shown that bases flanking the core 50 GGAA-30 consensus in PU.1 ETS cognates interact energetically to influence the stability of the ETSDNA complex, and the nature of base coupling can be cooperative or anticooperative. Thus, the energetic consequence of a substitution in a flanking sequence on ETS affinity cannot be treated only as a local effect but as a long-range phenomenon that can propagate along the entire cognate sequence. This finding implies that the indirect mechanism of the PU.1 ETS domain involves recognition of a gross structural unit formed from interdependent interactions among the component bases. More generally, to the extent that indirect readout can be considered as a structure-sensing mechanism of DNA recognition, it is reasonable to hypothesize that for other protein/DNA systems that employ indirect readout, such as CAP,35,36

816 trpR,37 Eco RV,38 and MetJ,39 base coupling may also be operative in determining the affinity of the DNA site for its target protein.

Materials and Methods Materials (,6000 Ci/mol) and poly(dA-dT)· [g-32P]ATP poly(dA-dT) were purchased from Amersham Biosciences. T4 polynucleotide kinase came from New England Biolabs, thrombin from Boehringer Mannheim, and bovine serum albumin (BSA) from Calbiochem. Oligonucleotides were synthesized and purified by either denaturing polyacrylamide gel electrophoresis or a reverse-phase cartridge by Sigma Genosys (Mississauga, Ontario, Canada) or Cortec Laboratories (Kingston, Ontario, Canada). Sources of other chemicals and materials are as indicated below. All reagents are of analytical grade or better. DNA preparation The central 23-bp region of the lB site of the murine Igl2-4 enhancer40 or variant (as described in Results) was embedded in a 67-nt hairpin containing a 6-nt 50 overhang (Figure 1). It was radiolabelled with [g-32P]ATP and T4 PNK at the 50 terminus. Labelled DNA was purified and transferred into binding buffer through a polyacrylamide (Bio-Spin 6, Bio-Rad) or Sephadex G-25 (MicroSpin G-25, Amersham Biosceinces) column. Unlabelled probe was scanned and quantified at 25 8C on an Aviv 14DS spectrophotometer (Lakewood, New Jersey), assuming an A of 1 ¼ 50 mg ml21 at 260 nm. Protein expression and purification The ETS domain of murine PU.1 (amino acid residues 154– 266, equivalent to the human protein numbering 152– 264) was expressed and purified as described.16 Briefly, N-terminally (His)6-tagged PU.1 ETS domain was overexpressed in Escherichia coli (BL21DE3-pLysS), and purified by immobilized Co2þ affinity chromatography. The (His)6 tag was cleaved with thrombin and removed by passing through the Co2þ resin. SDS-PAGE of eluates sampled at various stages demonstrated single bands (by Coomassie blue staining) after affinity chromatography and thrombin proteolysis at the expected mobilities. After purification, EDTA and PMSF were added to 1 mM each and the protein was stored in small aliquots at 278 8C. BSA and glycerol were added to aliquots in active use (to 0.2 mg/ml and 50% (v/v), respectively) and kept at # 2 20 8C. Filter binding PU.1 ETS domain (, 10 nM) was mixed with graded concentrations of 50 -radiolabelled DNA probe (diluted to a known concentration with unlabelled probe), 0.25 mg BSA and 6 mg poly(dA-dT)·poly(dA-dT) in 50 ml of binding buffer consisting of 25 mM Na2HPO4 (pH 7.4), 1 mM H4EGTA, 0.1 mM Na2EDTA, 100 mM NaCl, 10% glycerol. Samples were incubated at 25 8C for at least one hour. Vacuum filtration through mixed cellulose nitrate-acetate filters (Millipore type HA, 0.45 mm pore size) and scintillation counting were conducted as

Base Coupling in PU.1 ETS Site Selection

described.16 Background binding was measured in the absence of the PU.1 ETS domain across the range of probe concentrations used. Scintillation counts were converted to DNA concentrations by counting 3 ml of each remaining sample and 3 ml of the radiolabelled DNA stock. The total DNA (Dt) and ETS-bound (DPapp) concentrations were fitted to equation (1) to obtain the equilibrium dissociation constant (KD) using Origin (version 6.0, Microcal, Northampton, MA): ½Dt ½Pt 2 ð½Dt þ ½Pt þ KD Þ½DP þ ½DP2 ¼ 0

ð1Þ

where [P]t is the total PU.1 ETS concentration, and: ½DP ¼

½DPapp 1

ð2Þ

In equation (2), 1 is the filter retention efficiency. Background DNA binding was subtracted by simultaneously fitting the filter-retained DNA concentration in the absence and presence of protein. Electrophoretic mobility shift DNA and PU.1 ETS domain were co-incubated to equilibrium as described for filter binding experiments. Samples were run on an 8%, 19:1 polyacrylamide gel containing 89 mM Tris, 89 mM boric acid, 2.5 mM Na2EDTA at 25 V/cm for 20 minutes at room temperature. After electrophoresis, the gel was transferred, vacuumdried onto blotting paper, and scanned radiometrically by an Ambis 4000A detector (Scanalytics, Billerica, MA). Band peaks were fitted to Gaussian distributions and integrated to obtain their intensities using One-Dscan (version 1.33, Scanalytics). Data analysis Fitted KD values are converted to free energies of (R ¼ 8.314 J/(mol K), association: DG8 ¼ RT ln KD T ¼ 298 K). In our experiments, the lB site of the murine Igl2-4 enhancer was chosen as the reference sequence. The change in stability of a cognate variant relative to the reference cognate is given by: DDG8 ¼ DG8variant 2 DG8ref

ð3Þ

For a double-base variant, the observed stability change may be expressed as a sum of the changes of the constituent single-base variants and a provisional coupling term, D2G8int: DDG8double ¼ DDG8single;1 þ DDG8single;2 þ D2 G8int ð4Þ where the superscript 2 indicates the order of the variant (following the notation of Horovitz & Fersht).12 If D2G8int ¼ 0 within experimental uncertainty, the stability effects of the double-base substitution in question are entirely accounted for by the additive effects of the constituent single-base substitutions (i.e. no energetic base coupling). If D2G8int , 0, the dual substitutions synergize to enhance complex stability (i.e. cooperative coupling between the two substitutions); if D2G8int . 0, they antagonize to yield a poorer cognate (i.e. anticooperative base coupling). We can alternatively express D2G8int in terms of the variants’ measured free energies (a form directly amenable to the experimental data) by substituting in equation (3): D2 G8int ¼ DG8double 2 ðDG8single;1 þ DG8single;2 Þ þ DG8ref ð5Þ

817

Base Coupling in PU.1 ETS Site Selection

Figure 10. Graphical representation of thermodynamic cycles for a double-base and triple-base cognate variant. Arrows refer to the DDG8 for the mutation in the direction indicated. Filled arrows are those DDG8 that are actually determined by experiment, and dotted arrows correspond to those DDG8 that can be subsequently derived. Singly, doubly, and triply-substituted variants are highlighted in gray, black, and transparent boxes, respectively; the original sequence is left unboxed. (A) The pair PQ is mutated singly and doubly to yield double-base variant XY. (B) A triple mutant box for the stepwise mutation of the triplet PQR to the triple-base variant XYZ. Some arrows are shaded only to aid visualization. The total coupling energy D3G8int corresponds to the difference in coupling energy (D2G8int) across opposite faces of the box. Thus, the six values of D2G8int indicated on the box are not independent.12,13

For higher-order cognate variants, it is possible to additionally examine inter-residue interactions at the level of pairs, triplets, quartets, and so on. For a triplebase variant, two such interaction energies can be determined. If only the effects due to single-base substitutions are considered, this constituent coupling energy is: D3 G8int-1 ¼ DDG8triple 2

3 X

DDG8single;i

i¼1

¼ DG8triple 2

3 X

DG8single;i þ 2DG8ref

ð6Þ

i¼1

where the subscript int 2 1 indicates that only singlebase substitutive effects (such as changes in the local environment of the DNA groove floor) are included.15 Thus equation (6) neglects any possible contributions to the observed DDG8triple made by the three double-base substitutions. When both types of interactions are considered, the total coupling energy, D3G8int, is given by: D3 G8int ¼ D3 G8int-1 2

3 X

D2 G8int;i

We can expand equation (7) by substituting in equations (4) and (6) to give: 3 X i¼1

DDG8double;i 2

3 X

D3 G8int ¼ DG8triple 2

3 X

DG8double;i þ

i¼1

3 X

DG8double;i

i¼1

ð9Þ

2 DG8ref

Graphically, double-base substitutions can be represented by the familiar two-dimensional thermodynamic cycle (Figure 10(A)), while a triple-base substitution gives rise to a “triple mutant box” whose faces correspond to a component double-base cycle (Figure 10(B)).12,13 In this construction, D3G8int is the difference in coupling energies (i.e. D2G8int) of opposite faces of the box, and can be intuitively interpreted as the effect of a third substitution on the interaction between two other substitutions. Hence, although resolution of the triple-base box requires separate determination of DG8 for the reference cognate, the triple-base variant, and the six intermediate single and double-base variants, the double-base cycles that constitute the triplebase box are not independent.

ð7Þ

i¼1

DDG8triple ¼

measured free energies:

DDG8single;i þ D3 G8int ð8Þ

i¼1

or again, expressed in terms of the constituent variants’

Error propagation The uncertainty reported in the DG8 terms is derived from the error in the fitted KD of the titration curves: dDG ¼ RT

dKD KD

ð10Þ

Analytically, dDDG8 and dDm G8int-n (m $ 2, n $ 0) are

818

Base Coupling in PU.1 ETS Site Selection

series of DG8 terms (i.e. equations (5), (6) and (9)). Thus, experimental errors are propagated into them as follows: vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u k uX dDDG8; dDm G8int-n ¼ t dDG82j ð11Þ

10.

j¼1

11.

Acknowledgements We are grateful to Dr Cheryl H. Arrowsmith of the Department of Medical Biophysics, University of Toronto, for providing the plasmid for the PU.1 ETS domain and access to the facilities for the expression and purification of the protein. Mr Blair R. Szymczyna’s assistance with the expression and purification procedure is greatly appreciated. Ms Shaheen Gillani is acknowledged for her experimental efforts as a Burroughs-Wellcome summer student. We also thank the referees for their insightful comments during manuscript revision. This investigation was supported by a Natural Science and Engineering Research Council (NSERC) postgraduate scholarship B to G.M.K.P. and a grant from the Canadian Institutes of Health Research.

References 1. Szymczyna, B. R. & Arrowsmith, C. H. (2000). DNA binding specificity studies of four ETS proteins support an indirect read-out mechanism of protein– DNA recognition. J. Biol. Chem. 275, 28363– 28370. 2. Wang, H., McIntosh, L. P. & Graves, B. J. (2002). Inhibitory module of Ets-1 allosterically regulates DNA binding through a dipole-facilitated phosphate contact. J. Biol. Chem. 277, 2225– 2233. 3. Batchelor, A. H., Piper, D. E., de la Brousse, F. C., McKnight, S. L. & Wolberger, C. (1998). The structure of GABPa/ankyrin repeat heterodimer bound to DNA. Science, 279, 1037 –1041. 4. Mo, Y., Ho, W., Johnston, K. & Marmorstein, R. (2001). Crystal structure of a ternary SAP-1/SRF/ c-fos SRE DNA complex. J. Mol. Biol. 314, 495– 506. 5. Gross, P., Yee, A. A., Arrowsmith, A. H. & Macgregor, R. B., Jr (1998). Quantitative hydroxyl radical footprinting reveals cooperative interactions between DNA-binding subdomains of PU.1 and IRF4. Biochemistry, 38, 9802– 9811. 6. Li, S. L., Schlegel, W., Valente, A. J. & Clark, R. A. (1999). Critical flanking sequences of PU.1 binding sites in myeloid-specific promoters. J. Biol. Chem. 274, 32453– 32460. 7. Lloberas, J., Soler, C. & Celada, A. (1999). The key role of PU.1/SPI-1 in B cells, myeloid cells and macrophages. Immunol. Today, 20, 184– 189. 8. Li, S. L., Valente, A. J., Qiang, M., Schlegel, W., Gamez, M. & Clark, R. A. (2002). Multiple PU.1 sites cooperate in the regulation of p40(phox) transcription during granulocytic differentiation of myeloid cells. Blood, 99, 4578 –4587. 9. Mao, X., Miesfeldt, S., Yang, H., Leiden, J. M. & Thompson, C. B. (1994). The FLI-1 and chimeric

12. 13.

14.

15.

16.

17.

18.

19.

20.

21.

22.

23.

24.

25.

EWS-FLI-1 oncoproteins display similar DNA binding specificities. J. Biol. Chem. 269, 18216– 18222. Ray-Gallet, D., Mao, C., Tavitian, A. & MoreauGachelin, F. (1995). DNA binding specificities of Spi--1/PU.1 and Spi-B transcription factors and identification of a Spi-1/Spi-B binding site in the c-fes/c-fps promoter. Oncogene, 11, 303– 313. Shore, P. & Sharrocks, A. D. (1995). The ETS-domain transcription factors Elk-1 and SAP-1 exhibit differential DNA binding specificities. Nucl. Acids Res. 23, 4698– 4706. Horovitz, A. & Fersht, A. R. (1992). Co-operative interactions during protein folding. J. Mol. Biol. 224, 733– 740. Horovitz, A. & Fersht, A. R. (1990). Strategy for analysing the co-operativity of intramolecular interactions in peptides and proteins. J. Mol. Biol. 214, 613– 617. Chen, J. & Stites, W. E. (2001). Energetics of side chain packing in staphylococcal nuclease assessed by systematic double mutant cycles. Biochemistry, 40, 14004– 14011. Chen, J. & Stites, W. E. (2001). Higher-order packing interactions in triple and quadruple mutants of staphylococcal nuclease. Biochemistry, 40, 14012– 14019. Poon, G. M. K., Gross, P. & Macgregor, R. B., Jr (2002). The sequence-specific association of the ETS domain of murine PU.1 with DNA exhibits unusual energetics. Biochemistry, 41, 2361– 2371. Kodandapani, R., Pio, F., Ni, C.-Z., Piccialli, G., Klemsz, M., Mckercher, S. et al. (1996). A new pattern for helix-turn-helix recognition revealed by the PU.1 ETS-domain – DNA complex. Nature, 380, 456– 460. Pio´, F., Assa-Munt, N., Yguerabide, J. & Maki, R. A. (1999). Mutants of ETS domain PU.1 and GGAA/T recognition: free energies and kinetics. Protein Sci. 8, 2098– 2109. Yee, A. A., Yin, P., Siderovski, D. P., Mak, T. W., Litchfield, D. W. & Arrowsmith, C. H. (1998). Cooperative interaction between the DNA-binding domains of PU.1 and IRF4. J. Mol. Biol. 279, 1075– 1083. Werner, M. H., Clore, G. M., Fisher, C. L., Fisher, R. J., Trinh, L., Shiloach, J. & Gronenborn, A. M. (1995). The solution structure of the human ETS1– DNA complex reveals a novel mode of binding and true side chain intercalation. Cell, 83, 761– 771. Erratum (1996) Cell 87, 2. Donaldson, L. W., Petersen, J. M., Graves, B. J. & McIntosh, L. P. (1996). Solution structure of the ETS domain from murine Ets-1: a winged helix-turnhelix DNA binding motif. EMBO J. 15, 125– 134. Liang, H., Mao, X., Olejniczak, E. T., Nettesheim, D. G., Yu, L., Meadows, R. P. et al. (1994). Solution structure of the ets domain of Fli-1 when bound to DNA. Nature Struct. Biol. 1, 871– 876. Mo, Y., Vaessen, B., Johnston, K. & Marmorstein, R. (1998). Structures of SAP-1 bound to DNA targets from the E74 and c-fos promoters: insights into DNA sequence discrimination by Ets proteins. Mol. Cell, 2, 201– 212. Mo, Y., Vaessen, B., Johnston, K. & Marmorstein, R. (2000). Structure of the elk-1– DNA complex reveals how DNA-distal residues affect ETS domain recognition of DNA. Nature Struct. Biol. 7, 292– 297. Dickerson, R. E. & Chiu, T. K. (1997). Helix bending as a factor in protein/DNA recognition. Biopolymers, 44, 361– 403.

819

Base Coupling in PU.1 ETS Site Selection

26. Dickerson, R. E. (1998). DNA bending: the prevalence of kinkiness and the virtues of normality. Nucl. Acids Res. 26, 1906– 1926. 27. Pio´, F., Kodandapani, R., Ni, C-Z., Shepard, W., Klemsz, M., McKercher, S. R. et al. (1996). New insights on DNA recognition by ets proteins from the crystal structure of the PU.1 ETS domain – DNA complex. J. Biol. Chem. 271, 23329 – 23337. 28. Gross, P., Arrowsmith, A. H. & Macgregor, R. B., Jr (1998). Hydroxyl radical footprinting of DNA complexes of the ets domain of PU.1 and its comparison to the crystal structure. Biochemistry, 37, 5129– 5135. 29. Li, S. L., Valente, A. J., Zhao, S. J. & Clark, R. A. (1997). PU.1 is essential for p47(phox) promoter activity in myeloid cells. J. Biol. Chem. 272, 17802– 17809. 30. Graves, B. J., Gillespie, M. E. & McIntosh, L. P. (1996). DNA binding by the ETS domain. Nature, 384, 322. 31. Nye, J. A., Petersen, J. M., Gunther, C. V., Jonsen, M. D. & Graves, B. J. (1992). Interaction of murine ets-1 with GGA-binding sites establishes the ETS domain as a new DNA-binding motif. Genes Dev. 6, 975–990. 32. Brukner, I., Sa´nchez, R., Suck, D. & Pongor, S. (1995). Sequence-dependent bending propensity of DNA as revealed by DNase I: parameters for trinucleotides. EMBO J. 14, 1812– 1818. 33. Strauss-Soukup, J. K. & Maher, L. J., III (1997). Role of asymmetric phosphate neutralization in DNA bending by PU.1. J. Biol. Chem. 272, 31570– 31575. 34. Mack, D. R., Chiu, T. K. & Dickerson, R. E. (2001). Intrinsic bending and deformability at the T-A step

35.

36.

37.

38.

39. 40.

41.

of CCTTTAAAGG: a comparative analysis of T-A and A-T steps within A-tracts. J. Mol. Biol. 312, 1037 –1049. Chen, S., Gunasekera, A., Zhang, X., Kunkel, T. A., Ebright, R. H. & Berman, H. M. (2001). Indirect readout of DNA sequence at the primary-kink site in the CAP-DNA complex: alteration of DNA binding specificity through alteration of DNA kinking. J. Mol. Biol. 314, 75 –82. Chen, S., Vojtechovsky, J., Parkinson, G. N., Ebright, R. H. & Berman, H. M. (2001). Indirect readout of DNA sequence at the primary-kink site in the CAPDNA complex: DNA binding specificity based on energetics of DNA kinking. J. Mol. Biol. 314, 63 – 74. Bareket-Samish, A., Cohen, I. & Haran, T. E. (1998). Direct versus indirect readout in the interaction of the trp repressor with non-canonical binding sites. J. Mol. Biol. 277, 1071 –1080. Wenz, C., Jeltsch, A. & Pingoud, A. (1996). Probing the indirect readout of the restriction enzyme EcoRV. Mutational analysis of contacts to the DNA backbone. J. Biol. Chem. 271, 5565– 5573. Garvie, C. W. & Phillips, S. E. (2000). Direct and indirect readout in mutant Met repressor-operator complexes. Struct. Fold. Des. 8, 905– 914. Eisenbeis, C. F., Singh, H. & Storb, U. (1995). PU.1 is a component of a multiprotein complex which binds an essential site in the murine Igl2-4 enhancer. Mol. Cell. Biol. 13, 6452– 6461. El Hassan, M. A. & Calladine, C. R. (1998). Two distinct modes of protein-induced bending in DNA. J. Mol. Biol. 282, 331– 343.

Edited by J. O. Thomas (Received 2 December 2002; received in revised form 4 March 2003; accepted 8 March 2003)