Quadruple intercalated G-6 stack: a possible motif in the fold-back structure of the Drosophila centromeric dodeca-satellite?1

Quadruple intercalated G-6 stack: a possible motif in the fold-back structure of the Drosophila centromeric dodeca-satellite?1

doi:10.1006/jmbi.2001.5131 available online at http://www.idealibrary.com on J. Mol. Biol. (2001) 314, 139±152 Quadruple Intercalated G-6 Stack: A P...

582KB Sizes 0 Downloads 14 Views

doi:10.1006/jmbi.2001.5131 available online at http://www.idealibrary.com on

J. Mol. Biol. (2001) 314, 139±152

Quadruple Intercalated G-6 Stack: A Possible Motif in the Fold-back Structure of the Drosophila Centromeric Dodeca-Satellite? Shan-Ho Chou* and Ko-Hsin Chin Institute of Biochemistry National Chung-Hsing University, Taichung 40227, Taiwan

The purine-rich strand d(GTACGGGACCGA)n of the Drosophila centromeric dodeca-satellite sequence is highly conserved and was found to form stable fold-back structures in which the homopurine 50 -GGGA-30 sequence was determined to play a crucial role. Here, we report the stable formation of the d(GGGA)2 motif in the stem of a DNA hairpin closed by a single-residue d(ACC) loop. Similar to the zipper-like d(GGA)2 motif observed in the human centromeric (TGGAA)n sequence, the central four guanosine bases in the d(GGGA)2 motif do not pair, but interdigitate to form an elongated zipper-like quadruple-intercalated G-6 stack bracketed by sheared G A base-pairs. Comparison between the current d(GGGA)2 structure and the published crystal d(GAAA)2 structure implies that the alignment of the unpaired purine bases plays an important role in determining the minor groove width of the purine-rich d(GPuPuA)2 motif. Similarity between the zipper-like motifs possibly present in the Drosophila centromeric dodeca-satellite sequence and in the human centromeric (TGGAA)n sequence led us to propose that these special zipper-like motifs may constitute common cores in organizing eukaryotic centromeres. # 2001 Academic Press

*Corresponding author

Keywords: Drosophila centromere dodeca-satellite; unusual DNA structure; sheared GA pairing; quadruple intercalation; NMR

Introduction The centromere is a specialized region in the chromosome that is essential for the accurate segregation of chromosomes during mitosis and meiosis.1 Except for the budding yeast Saccharomyces cerevisiae, most centromeric DNAs contain highly repetitive satellite DNA sequences that are essential for their proper functioning. Intriguingly, most centromeric satellite sequences are asymmetric in the distribution of purine content, which usually results in one strand being purine-rich vis-aÁ-vis the other strand. Examination of such extraordinary purine distribution has led to the ®nding that centromeric purinerich sequences are capable of adopting stable fold-back structures.2 ± 7 In this respect, it is Abbreviations used: DQ-COSY, double quantum correlated spectroscopy; HSQC, heteronuclear single quantum coherence; NOE, nuclear Overhauser enhancement; NOESY, NOE spectroscopy. E-mail address of the corresponding author: [email protected] 0022-2836/01/010139±14 $35.00/0

important to note that a single-stranded DNAbinding protein from the Drosophila nuclear extracts that preferentially binds to the individual dodeca-satellite pyrimidine-strand has been discovered, implying that its complementary purine-strand may be free to form fold-back structures in vivo to organize the centromere structure.8 Although centromeric satellite DNA sequences are poorly conserved for most species, it is expected that their structures may be conserved to some extent through evolution to serve their specialized function.9 In order to look for possible common structural elements in the centromeres, it is interesting to note that two DNA satellite sequences, the human 5 bp satellite 50 -(TGGAA)n-30 2,10,11 and the Drosophila dodeca-satellite 50 -(GTACGGGACCGA)n-30 12 ± 13 are abundantly present and highly conserved in the centromere. These may serve as the model sequences for the structural characterization of the eukaryotic centromere. The fold-back structure of the human centromeric (TGGAA)n repeat2,3 was indeed found to adopt a stable zipper-like (GGA)2 motif in our # 2001 Academic Press

140 earlier studies;4 the central guanosine bases are not paired, but interdigitated to form a doubleintercalated G-4 stack with the bracketed G A base-pairs. As a result, the unpaired guanine bases and the bracketed sheared GA pairs are able to provide many hydrogen-bonding donors and acceptors in the major groove for possible interaction with other ligands. On the other hand, the structure of the G-rich repeats (GTACGGGACCGA)n of the Drosophila centromeric dodeca-satellite sequence is less well identi®ed,12 ± 13 although it was also demonstrated to form stable fold-back structures with the central 50 -GGGA-30 tetranucleotide found to play a determinant role in stabilizing the fold-back structures.5 We are thus interested in studying this fold-back structure in order to compare it with the (GGA)2 motif in the human centromeric (TGGAA)n repeats to search for possible common structural motifs in the eukaryotic centromeres. Since homopurine DNA sequences have a tendency to be structurally polymorphic,14 we embedded the (GPuPuA)2 tracts (Pu ˆ purine) in the stem of a DNA hairpin closed by a stable single-residue d(ACC) loop to facilitate its studies by NMR spectroscopy. The excellent stability of the resulting structures indicates that the (GGGA)2 motif may possibly play important roles in organizing the highly conserved Drosophila centromeric dodeca-satellite (GTACGGGACCGA)n repeats. The similar zipper-like motifs adopted by the (GPuA)2 tract and the (GPuPuA)2 tract and their prevalence in the eukaryotic centromere imply that the zipper-like interdigitated motifs may serve as common cores in organizing the eukaryotic centromere structure.

Centromeric Interdigitated G-6 Stack Structure

Results Thermodynamic studies The various homopurine (GPuPuA)2 motifs are embedded in the stem of a 19-mer hairpin 50 GCGPuPuAACACCGTGPuPuAGC-30 for both UV-melting and NMR studies. The UV-melting curves for several such motifs are shown in Figure 1 with their thermodynamics parameters listed in Table 1. All hairpins containing the homopurine (GPuPuA)2 motifs exhibit well-behaved transition curves that indicate good structural formation for these sequences. Several points are noteworthy from the UV-melting studies; (1) the number of G-2NH2 group in the (GPuPuA)2 motifs seems to affect the melting temperatures of the corresponding hairpins in a consistent manner, i.e. the presence of each G-2NH2 group accounts for approximately 3 to 4 deg.C increase in the melting temperature for this series of hairpin structures. Thus when the 50 -GGGA/AGGG-50 motif (row 2) is replaced with the 50 -GAGA/AGAG-50 (row 6) or 50 -GAAA/AAAG-50 motif (row 8), the melting temperatures of the corresponding hairpins are decreased by approximately 6 deg.C and 13 deg.C, respectively. Similarly, the replacement of the 50 -GGGA/AGGG-50 motif with the 50 -GGIA/ AGGG-50 (row 3) or 50 -GIGA/AGIG-50 motif (row 5) leads to a decrease of 4 deg.C and 7 deg.C, respectively, in the melting temperature of the corresponding hairpins. It is thus reasonable to assume that each G-2NH2 in the central four guanosine residues (in bold) is involved in speci®c hydrogen bonding, and the absence of one such group would lead to the loss of one hydrogen bond, which would in turn reduce the melting

Figure 1. The UV-melting transition curves for the 50 -GCGPuPuACACCGTGPuPuAGC-30 nonadecamers. The tm values were determined from the maximum of the ®rst differential curves.

141

Centromeric Interdigitated G-6 Stack Structure

Table 1. The thermodynamics parameters for the formation of DNA hairpins 50 -GCGPuPuAACACCGTGPuPuAGC30 containing various zipper-like quadruple-intercalation motifs (bold letters) in the stem region Motif

Tm ( C)

G37 (kcal/mol)

ÿH (kcal/mol)

1

TTAA AATT

52.5

ÿ1.96

37.0

113.0

2

GGGA AGGG

52.7

ÿ2.75

57.0

174.8

3

GG I A AGGG

48.6

ÿ1.94

54.3

168.8

4

G I GA AGGG

47.2

ÿ1.58

49.0

153.0

5

G I GA AG I G

45.5

ÿ1.20

50.0

157.5

6

GAGA AGAG

46.5

ÿ1.46

50.0

156.6

7

GGAA AAGG

43.3

0.96

48.0

151.7

8

GAAA AAAG

40.0

ÿ0.29

40.0

128.0

ÿS (cal/mol K)

The data in the ®rst column are from the control sequence containing all canonical AT base-pairs in the corresponding region. The UV-melting experiments were performed with a Cary 100 spectrophotometer equipped with a temperature-controller under a low-salt, neutral buffer condition (20 mM NaCl, 3 mM sodium phosphate (pH 6.8) as that used in the NMR studies. A temperature probe was used to monitor the genuine temperature inside the cell. Thermodynamic data were calculated from the van't Hoff plots obtained from the Thermal application software supplied by the vendor. The rows are arranged in the order of decreasing potential H-bonding numbers.

temperature by about 3 to 4 deg. C. (2) Although the hairpin containing the 50 -GGGA/AGGG-50 motif (row 2) exhibits approximately the same UV-melting temperature as that of the hairpin containing the canonically paired 50 -TTAA/AATT-50 motif (row 1), they do reveal considerable difference in the thermodynamics H and S values. This result indicates that they adopt different structures under identical condition. Since approximately equal numbers of hydrogen bonds are present in these two motifs (assuming that each central guanosine residue in the (GGGA)2 motif contributes one hydrogen bond), the huge difference in the H values must originate from the different base stacking. The NMR structural studies described below do prove that there is excellent cross-strand G/G/G/G stacking in the 50 -GGGA/ AGGG-50 motif, contrary to the much weaker partial intra-strand stacking that usually takes place in a canonically paired B-DNA duplex. This can explain why the 50 -GGGA/AGGG-50 motif has a larger H value (ÿ57 kcal/mol; 1 cal ˆ 4.184 J) than the 50 -TTAA/AATT-50 motif that has a smaller H value (ÿ37 kcal/mol). However, this unusual interdigitated motif contains four unpaired guanine bases (see Figure 5) that expose a large number of functional groups for interacting with the surrounding water molecules. Larger entropy loss would be expected for this motif, due to the ordered formation of water molecules surrounding the interdigitated region. (3) The position of guano-

sine residues in the interdigitated motif also markedly affect the UV-melting temperatures of their corresponding hairpins. Comparing row 6 with row 7 in Table 1, one can see that when the 50 -GAGA/AGAG-50 motif is replaced with the 50 -GGAA/AAGG-50 motif, its UV-melting temperature drops by about 3 deg. C. This can be explained by the special stacking present in the (GPuPuA)2 motif. As shown in Figure 4, two interdigitated modes are possible for the (GGGA)2 motif; either type I with residue G4 stacking upon residue G3 and residue G15 upon residue G14, or type II with residue G5 stacking upon residue A6 and residue G16 upon residue A17. Experimental data described below clearly indicate that the (GGGA)2 motif adopts a type I interdigitated mode, i.e. the unpaired residue G4 stacking upon the sheared G3 A17 pair and the unpaired residue G15 stacking upon the sheared G14  A6 pair. This special intercalation mode can be used to rationalize why the hairpin containing the 50 -GAGA/ AGAG-50 motif is more stable than the hairpin containing the 50 -GGAA/AAGG-50 motif due to the distinctive G/G stacking. Theoretical calculation has indicated that the guanine base is much more polar than the adenine base.15,16 Stability of the G/ G stacking would therefore depend considerably upon whether they are engaged in the anti-parallel or parallel stacking, with the anti-parallel stacking stabilizing the G/G stacking and the parallel stacking destabilizing it. On the contrary, base stacking

142 with no polar base like A/A or with only one polar base like G/A would experience no such large orientation effect. This reasoning has in fact been used to account for the sequence-dependent stability of different DNA sequences.15,16 In the current 50 -GGAA/AAGG-50 motif, the ®rst unpaired guanine base (in bold) stacks in parallel with the 50 -end guanine base (see Figure 4) that would cause destabilization. While in the 50 GAGA/AGAG-50 motif, it is the adenine that stacks in parallel with the 50 -end guanine; no destabilization would thus be expected. Furthermore, the two inner zipper guanine bases in the 50 GAGA/AGAG-50 motif are stacked in an anti-parallel way (see Figure 6), leading further to its stabilization against the 50 -GGAA/AAGG-50 motif. (4) The 50 -GAAA/AAAG-50 motif (row 8) is the least stable in this series of zipper-like motifs, possibly due to the lack of hydrogen bonding contribution from the inner zipper adenine bases. Broader linewidth (Figure 2) and weaker NOE cross-peaks for this motif lead to the deterioration of the NMR spectra that are not of suf®cient quality for structural studies. However, it is interesting to note that the crystal structure of a duplex containing the zipper-like 50 -GAAA/AAAG-50 motif has been solved successfully in the presence of metal hexamine salts.17

Centromeric Interdigitated G-6 Stack Structure

(GGA)2 motif,4 further implying that the central two G-G steps adopt a similar interdigitated motif. In the GAGA spectrum, only two such unpaired G-imino protons (shifted to 9.9 ppm) were observed, while in the GAAA spectrum, no such proton was observed at all, although the two Gimino protons belonging to the bracketed sheared G A base-pairs were observable in all the cases. The imino proton spectra of these four oligomers are thus consistent with a picture in which the central four purine residues in the (GPuPuA)2 motif are not paired but interdigitated and bracketed by a pair of sheared G A base-pairs. However, some minor forms of unknown nature are present in the GAGA or GAAA spectra, as revealed by the presence of minor peaks or the broader linewidth of several peaks. In fact, the linewidth of the two unpaired G-imino protons in the bracketed sheared G A base-pairs in the GAAA spectrum have become broadened to such an extent that they collapse into one broad peak. This phenomenon indicates that the amino protons in the unpaired guanosine bases are indispensable for stabilizing the interdigitated motif. They are possibly involved in hydrogen bonding to stabilize the nearby imino protons and prevent them from exchanging with solvent.

NMR studies

Unusual cross-strand NOEs between the interdigitated guanosine residues

The one-dimensional imino and aromatic proton spectra at a neutral (pH 6.8) low-salt buffer condition at 0  C for the hairpins containing the 50 -GGGA/AGGG-50 motif (designated as GGGA), (GIGA), 50 -GAGA/ 50 -GIGA/AGGG-50 motif AGAG-50 motif (GAGA), and 50 -GAAA/AAAG50 motif (GAAA) are shown in Figure 2. The imino proton signals were assigned by .2D-NOESY in 90 % H2O/10 % 2H2O as previously described.18 All four oligomers reveal the characteristic imino proton signals expected for the four canonical G C and A  T pairs in the 12.5-3.2 ppm region. However, extra signals from the (GPuPuA)2 motifs were clearly detected. In the GGGA spectrum, two sharp imino proton signals at approximately 10.2 ppm, as well as two sharp signals accounting for four imino protons at 9.6 ppm, were observed. The signals at 9.6 ppm were further separated into four peaks when one of the central guanosine residues is changed to inosine, as shown in the GIGA spectrum. The two imino proton signals at 10.2 ppm are characteristic of the unpaired G-imino proton in a sheared G A base-pair, indicating the formation of bracketed sheared G3 A17 and G14 A6 base-pairs.19,20 The four imino protons at 9.6 ppm, on the other hand, imply that the central two G-G ``steps'' of the G4, G5, G15, and G16 residues are not involved in a hydrogen bond, yet are well protected from the solvent exchange as judged by their narrow linewidths. The chemical shifts of these G-imino protons are similar to those of the unpaired G-imino protons in the zipper-like

Due to the unusual structure formation in this quadruple intercalation motif, many extraordinary NOEs were observed, which are partially shown in the NOESY spectrum in Figure 3. Figure 3(a) illustrates the base-H10 connectivity (in blue) and the base-H30 connectivity (in gray) that could be followed successfully through the previously described sequential assignment procedure.21 The H10 chemical shifts of residues G3/G4 and G14/ G15 are overlapped but could be revolved by replacing either residue G4 or G5 with inosine (data not shown). Besides the regular NOEs, many informative unusual NOEs were observed in Figure 3(a) and these are marked with small letters. We have observed systematic cross-strand NOEs between the G4H8-G16H10 protons (crosspeak f, see also Figure 5(a)), the G16H8-G5H10 protons (g), and the G5H8-G15H10 protons (h), and their reciprocal NOEs between the G15H8-G5H10 protons (i), the G5H8-G16H10 protons (j), and the G16H8G4H10 protons (k), which are extremely useful in establishing this unusual quadruple-intercalation feature of the central two G-G steps. In other word, the observation of the G16H8-G4H10 /G5H10 and G5H8-G16H10 /G15H10 NOEs indicates that residue G16 is intercalated between residues G4 and G5, while residue G5 is intercalated between residues G16 and G15. These non-sequential and systematic NOEs are rarely detected in any regular B or A-form double helix except in the i motif consisted of two C  ‡C paired duplexes interdigitiated with each other to form a tetrameric structure.22

Centromeric Interdigitated G-6 Stack Structure

143

Figure 2. The one-dimensional 600 MHz imino, amino and aromatic proton NMR spectrum of the GGGA, GIGA, GAGA, and GAAA hairpins. The imino protons were assigned from the NOESY experiments in 10 % 2H2O/90 % H2O solution as described.18

Figure 3(b) further shows the inter-residue H10 -H10 NOEs of the four unpaired guanosine residues. The G4H10 -G16H10 -G5H10 -G15H10 NOEs again demonstrate that these four unpaired guanosine residues are interdigitated in the G4/G16/G5/G15 order, not in the other way around of the G16/

G4/G15/G5 order (see Figure 4). Although these crosspeaks were shown in the spectrum at 600 ms mixing time, they were clearly detectable in the spectrum at 100 ms mixing time (data not shown). Again, these NOEs were hardly detected in any regular B or A-form duplex except in the i motif.22

144

Centromeric Interdigitated G-6 Stack Structure

Figure 3. (a) The base-H10 /H5/H30 ®ngerprint region of the NOESY spectrum of the GGGA nonadecamer. The H8/H6-H10 connectivity is traced by blue dotted lines in the lower part of the Figure, those of the H8/H6-H30 by gray dotted lines in the upper portion of the Figure. The C(n)H5-C(n)H6 cross-peaks are respectively connected to the C(n)H5-Pu(n-1)H8 cross-peaks by red horizontal lines. Some crucial NOEs for this unusual motif were indicated by lower-case letters, namely: a, A9H2-C11H5; b, A9H2-C10H10 ; c, A17H2-G4H10 ; d, A17H8-G4H10 ; e, A6H8-G15H10 ; f, G4H8-G16H10 ; g, G16H8-G5H10 ; h, G5H8-G15H10 ; i, G15H8-G5H10 ; j, G5H8-G16H10 ; k, G16H8-G4H10 . (b) The crossstrand G4H10 -G16H10 -G5H10 -G15H10 NOE connectivity. The cross-peak indicated by the arrow is the C19H5-G18H10 Ê . (c) The stacking plot of the boxed region in (a). The four CH5-CH6 NOE that has a distance of approximately 4.5 A cross-peaks are labeled by their residue numbers. The very strong intra-residue H8-H30 NOE cross-peaks of the G4 Ê ) and the medium-strength H8-H30 NOE crossand G15 residues (corresponding to a distance of approximately 2.3 A peaks of the G5 and G16 residues are labeled. Cross-peak a is the inter-residue G15H8-G14H10 and b is the inter-residue G4H8-G3H10 NOE. These strong NOE cross-peaks at the junction between the ®rst unpaired residues and the sheared G A pair were observed also in the previously studied (GGA)2 motif.4 The weak G5H8-H10 and G16H10 NOE cross-peaks in the lower part of the Figure are labeled, and are suggestive of anti-glycosidic angles of the central guanine residues in the (GGGA)2 motif.

Figure 3(c) shows the expanded stacked plot of the boxed region shown in Figure 3(a) at a mixing time of 100 ms. The intra-residue G15H8-G15H30 and G4H8-G4H30 cross-peaks (marked by bigger red capital letters G4 and G15) are very strong (even stronger than the CH5-CH6 cross peaks of the C8, C2, C10, and C19 residues), indicating that the outer zipper guanosine residues (G4 and G15) in the interdigitated motif are located in the unusual C30 -endo domain with short intra-residue Ê ). On the GH8-GH30 distances (approximately 2.3 A other hand, the corresponding cross-peaks for the inner zipper guanosine residues (G5 and G16) have weaker intensity but are still stronger than those of other residues in this oligomer, indicating that residues G5 and G16 incorporate a sugar conformation that is intermediate between the C20 -

endo and C30 -endo domains. The unusual sugar puckers for the outer zipper G4 and G15 residues and the inner zipper G5 and G16 residues are also con®rmed by the DQ-COSY 23, 24 and 1H-13C HSQC experiments.25 The C30 -endo sugar conformation for G4 and G15 residues are clearly demonstrated by the observation of strong H30 -H40 cross peaks and equally strong H30 -H20 and H30 -H200 cross-peaks in the DQ-COSY spectrum (data not shown). This is not the case for the G5 and G16 residues, which have instead equally strong H10 H20 and H10 -H200 cross-peaks, strong H30 -H200 cross-peaks but very weak H30 -H20 cross-peaks that are more indicative of a sugar pucker between the C20 -endo and C30 -endo conformations. The C30 endo sugar conformation of residues G4 and G15 are further con®rmed by a 1H-13C HSQC exper-

Centromeric Interdigitated G-6 Stack Structure

145

Figure 4. The 31P-1H heteronuclear correlation spectrum of the DNA hairpin containing the purine-rich d(GGGA)2 motif. The two intercalation modes of the central unpaired guanine residues could be distinguished from this spectrum as described in the text. The cross-peaks that are too weak or un-detectable are marked by x.

iment; the chemical shifts of G4C30 , G15C30 and G4C40 , G15C40 are found shifted up®eld by approximately 5 and 3 ppm respectively, a feature that is characteristic of the switching of sugar pucker from the C20 -endo to C30 -endo conformation.25 The 31P-1H heteronuclear correlation spectrum of the GGGA sequence is shown in Figure 4, which is also used to distinguish between the two possible interdigitated modes (type I or II) for the unpaired guanine residues in the (GGGA)2 motif. From the Figure, it is clear that four phosphorus atoms (P5, P6, P16, and P17) out of the six phosphorus atoms in the backbone of the (GGGA)2 motif exhibit resonance at a higher ®eld position of ÿ2.60 to ÿ3.0 ppm, shifted approximately 1.7 ppm from the cluster resonance at around ÿ4.5 ppm. On the other hand, the signals of the two remaining phosphorus atoms P4 and P15 are located in the regular region at approximately ÿ4.0 ppm. These results are more compatible with interdigitated mode I than with mode II. The 31P-1H heteronuclear correlation spectrum further reveals two H40 protons (G4H40 and G15H40 ) that resonate at very high ®eld position of 1.65 and 1.9 ppm respectively. These large up®eld shifts (approximately 2 ppm!) from the regular H40 chemical shifts are due to the stacking of the deox-

yribose of the unpaired G14 and G4 upon the A6 and A17 bases of the ¯anking sheared A6 G14 and A17 G3 pairs, respectively (see Figure 6). This idea of sugar-base stacking, with sugar O-40 and H-40 pointing directly toward the center of the ¯anking A6 and A17 bases, can help stabilize this unusual interdigitated motif, and have been observed several times in other cases.26 ± 32 Theoretical calculation also shows that the interaction energy of sugar-base contacts can add up to 4 kcal/mol, comparable with that afforded by normal basebase interaction.15 Therefore, all NMR data, whether they are from the cross-strand G4H10 G16H10 -G5H10 -G15H10 NOEs (Figure 3), the up®eld-shifted signals of the G4-G5, G5-A6, G15G16, and G16-A17 phosphodiesters (Figure 4), or the up®eld-shifted signals of the G4H40 and G15H40 protons (Figure 4), all indicate that the unpaired guanine residues in the (GGGA)2 motif adopt type I interdigitated mode. On the contrary, the deoxyriboses of the central unpaired residues G5 and G16 experience no stacking at all, even though their bases experience excellent stacking (Figure 5(a)). This accounts for the fact that the chemical shifts of their sugar protons resume the normal values. Since only weak or undetectable (n)H40 -(n)P cross-peaks were observed for all the

146

Centromeric Interdigitated G-6 Stack Structure

Figure 5. (a) Some idiosyncratic NOEs of the zipper-like quadruple-intercalation (GGGA)2 motif. The protons that exhibit inter-strand H10 -H10 NOEs in the G-rich zipper region are connected by red dotted arrows, while those exhibiting mutual H8-H10 NOEs are connected by blue dotted arrows. These NOEs are hardly ever observed in any DNA structure. (b) The stereo overlapping picture of the 15 ®nal structures produced by embedding from and re®ning against the distance bounds. (c) The picture of the view perpendicular to the helical axis and (d) into the minor groove of the DNA oligomer containing the zipper-like quadruple intercalation (GGGA)2 motif (the top ACC loop region is excluded for comparison). Guanne residues are shown in red, adenine in blue, cytosine in green, and thymine in yellow. The consecutive G/G/G/G/G/G stacking and the characteristic X shape, zigzag phosphodiester backbone are obvious from this Figure. (e) The stereo picture in space-®lling mode into the major groove of the quadruple intercalation (GGGA)2 motif. The oxygen atoms are colored in red, nitrogen atoms in blue, carbon atoms in green, phosphors atoms in orange, and hydrogen atoms in white. The abundant hydrogen-bonding donors and acceptors are obvious in the major groove and may play important roles in interacting with other unidenti®ed proteins for organizing the Drosophila centromere.

phosphodiesters in the (GGGA)2 motif (G4, G5, A6, G15, G16, and A17), their z and a torsional angles were all left unconstrained during the structural calculations. Using the information obtained from throughspace NOE connectivity, through-bond J-coupling connectivity, and 31P-1H correlations, all exchangeable protons, non-exchangeable protons (except for some H50 /H500 protons), and phosphorus atoms of the 50 -d(GCGGGAACACCGTGGGAGC)-30 hairpin were assigned unambiguously and are listed in the Supplementary Material (Table S1). A representation of the abundant inter-stranded NOEs for

this quadrauple-intercalation G-6 motif is shown in Figure 5(a), with the constraint statistics used to determine its solution structure listed in Table 2. Structural feature Due to the abundant experimental distance and torsional angle constraints, the current unusual d(GGGA)2 motif was well determined, as judged from the overlapping of the 15 ®nal structures from the view perpendicular to the helical axis shown in Figure 5(b). Well-converged structures Ê (the top ACC with r.m.s.d. values of 0.98 (0.25) A

147

Centromeric Interdigitated G-6 Stack Structure Table 2. Structural statistics for the 50 -d(GCGGGAACACCGTGGGAGC)-30 hairpin Restraints

Numbers Exchangeable NOEs Ê - 2.1 A Ê) H-bonds (1.8 A Ê - 5.0 A Ê 2.0 A Ê - 6.0 A Ê 3.0 A Non-exchangeable NOEs 2.0 3.0 4.0 Total NOEs Torsional Angles

NOEs per residue NOEs and torsion angles per residue Violations of experimental restraints r.m.s.d.

Ê - 4.0 A Ê A Ê - 5.0 A Ê A Ê - 6.0 A Ê A

Ê >5A Backbone (b, g, e) Glycosidic

Ê) Distance restraints (> 0.15 A Torsional angles restraints (> 3 )

loop was excluded for comparison) were obtained after distance geometry/molecular dynamics calculation. The unusual X shape backbone due to the four interdigitated unpaired guanine basess is clearly revealed through the tracing ribbon shown in Figure 5(c). The four unpaired guanine bases are bracketed by two highly buckled sheared G A pairs with the G-6 stack clearly revealed in the right side (guanine residues are colored in red and adenine residues in blue). Figure 5(d) shows the view into the minor groove to reveal the zigzag backbone of the zipper-like quadruple intercalation motif. This Figure also reveals the rather narrow minor groove (the shortest inter-strand P-P disÊ ) and the elongated feature of tance is only 8.8 A this duplex resulting from the interdigitation of the central guanosine residues. Figure 5(e) further shows the wide-eye stereo picture of the quadruple-intercalation motif in space-®lling mode. The abundant hydrogen bonding donors and acceptors of this unusual motif are obvious from this view into the major groove, which, along with other unidenti®ed proteins, may work together to organize eukaryotic centromeres. Figure 6 shows the parallel stacking feature between the ®rst unpaired guanine base (G15 in blue) with the bracketed guanine base (G14 in green) in the sheared G A pair (top panel) and a typical anti-parallel stacking feature between the two unpaired guanine bases in the zipper (G5 and G15 in the middle panel) of the (GGGA)2 motif. As described above, the guanine base is signi®cantly more polar than the adenine base, and thus the stability of guanine stacks would depend very much upon the guanine base alignment,15 with the anti-parallel alignment stabilizing and the parallel alignment destabilizing the guanine stacks. This idea is indeed found to be decisive in determining the conformation of the G-6 stack in this unusual zipper motif. Thus, the very strong inter-strand interaction between

20 4 2 57 95 55 39 272 104 85 19 14.3 19.7 0 0 0.98  0.25

the inner zipper G5 and G15 bases (middle panel) is demonstrated by the excellent G5/G15 stacking (the G5 base stacks almost entirely upon the G16 base) and the nearly anti-parallel alignment (approximately 150  ) between the G5/G15 bases (polarity is shown in orange arrows), while a twisting of approximately 60  is employed to prevent the unfavorable parallel intra-strand G14/G15 stacking (top panel). A similar situation occurs in the symmetrical half between the G4/G16 and G3/ G4 stacks (not shown). This Figure also accounts for the almost 2 ppm difference in the H40 proton chemical shifts between the outer zipper G15 and the inner zipper G5 residues (Table S1); the H40 proton of the outer zipper G15 residue (pink dots in the top panel of Figure 6) is situated right below the center of the A6 six-membered ring of the neighboring sheared G A pair to exhibit a dramatic up®eld chemical shift of 1.63 ppm, while that of the inner zipper G5 residue has no neighboring purine base to shift its chemical shift, and hence exhibits only a slightly up®eld value of 3.82 ppm compared with other H40 signals (middle panel of Figure 6). Another point worth mentioning about the G-6 stack is the dramatically different twisting angles employed to accommodate such a special multiple purine stacking arrangement; while more than 60  of twisting angle is employed to prevent the parallel alignment of the ®rst unpaired G15 base with the G14 base of the bracketed G14 A6 pair (indicated by purple arrows in the top panel of Figure 6), less than 10  of twisting angles is implemented between the unpaired guanine bases to maintain the excellent intra- and inter-strand stacking, as clearly revealed in the middle and bottom panels of Figure 6, in which the inter-strand G5/G15 stacking and the intra-strand G15/G16 overlapping are obvious, while both of the inter-strand and intra-strand twisting angles are close to zero (indicated by

148

Centromeric Interdigitated G-6 Stack Structure

can account for the comparable stability of the (GGGA)2 motif with the canonically paired 50 -(TTAA)/(AATT)-50 segment (Table 1).

Discussion

Figure 6. Several typical characteristic base stackings in the d(GGGA)2 motif. Top: the intra-strand stacking at the junction between the ®rst unpaired guanine base and the paired guanine base in the bracketed sheared G A pair. Middle: the inter-strand stacking. Bottom: the intra-strand stacking between the outer zipper and inner zipper guanine bases. The polarity of guanine bases is indicated by orange arrows (top and middle) while the glycosidic bonds are marked by purple arrows (top, middle, and bottom). Signi®cantly different twist angles were adopted to accommodate this special G-6 stack; while a greater than 60  twisting angle was employed at the junction between the outer zipper G15 residue and the G14 A6 pair, less than 10  twisting angles were adopted for both the inter (middle) and the intra-G/G (bottom) stacking. Such uneven distribution of twisting angles results in an overall excellent G-6 stack. The outer zipper G15H40 proton (shown as a pink dot in the top panel) is situated directly under the A6 base participating in the sheared pairing with G14 and experiences a huge ring-current shielding effect to exhibit a chemical shift at 1.63 ppm, while no such effect was observed for the inner zipper G5H40 , which has no neighboring purine base and therefore exhibits a chemical shift at a somewhat regular value of 3.82 ppm (Table S1).

purple arrows). It is important to note also that, the intra-strand G15/G16 bases are not neighboring bases, but intercalated by a cross-strand G5 base (Figure 5(a)). Such a particular arrangement of twisting angles has therefore resulted in an excellent overall stacking between the G14/G15/ G5/G16/G4/G3 residues in the d(GGGA)2 motif (Figure 5(a)), which, along with the H-bonding between the amino protons of these unpaired guanine bases with the cross-strand phosphodiesters,

The dodeca-satellite sequence of Drosophila centeromere is highly conserved and has been detected in widely different species separated for more than 60 million years, like plant, ¯y, and human.12 Previous studies by Azorin's group have shown that the purine-rich dodeca-satellite strand alone can form stable intramolecular fold-back structure in a B-DNA environment (as judged by electron microscopy).5 No such unusual structure was detected for either the pyrimidine-rich strand alone or the double-stranded dodeca-satellite DNA under similar conditions. From chemical mapping studies, it was suggested that the central guanine residues in the GGGA-tract adopt special stacking interaction with the adjacent GA mismatches and contribute signi®cantly to the stability of the foldback structures.5 These data are consistent with the novel quadruple-intercalated G-6 stack structure in the (GGGA)2 motif presented here, in which the central guanine residues are unpaired, but interdigitated to exhibit excellent cross-strand stacking with each other and form cross-strand H-bonds with the opposite strand backbone phosphodiesters. However, it is still unclear which register of the dodeca-satellite is responsible for the extraordinary stability of the fold-back structures, as three different registers were proposed for the GGGA-tract (hairpins I, II, and III 5) from the chemical mapping studies that yield three different purine-rich motifs of the (GA)2, (GGA)2, and (GGGA)2 sequences, respectively. Although the (GA)2 motif containing tandem sheared G A pairs was proposed to be the major cause for the high stability of the fold-back structures,7 it is, however, located in an unfavorable 50 -G-(G-A)-C-30 context that is not consistent with our previous NMR studies.20,33 Our previous studies indicated that only when situated in either a 50 -Py(GA)Pu/Pu(AG)Py-50 20 or a 50 -Py(GA)Py/ Pu(AG)Pu-50 context 33 will the (GA)2 motif be stable enough to compare with the canonically paired duplex motif. Even when inserted into a longer stem sequence, the 50 -(GGAC)2-30 motif still does not adopt the sheared GA pairing con®guration, possibly to prevent the unfavorable parallel G/G stacking in this context. There is thus some discrepancy between the chemical mapping and the NMR data. But, as described by these authors, it is dif®cult to use either diethylpyrocarbonate (DEPC) or dimethylsulphoxide (DMS) to determine the base-pair con®guration unambiguously in the homopurine (GGA)n and (GGGA)n sequences.7,14 For example, even though the N-7 atoms of guanine bases in the tandem sheared GA base-pairs are not involved in hydrogen bonding, they are anyway unreactive toward such chemicals, poss-

Centromeric Interdigitated G-6 Stack Structure

ibly due to the excellent cross-strand purine-purine stacking, as suggested by the authors. More studies are therefore necessary to clarify this situation. Sequence-dependent studies of the (GGA)2 and (GGGA)2 motifs are in progress in our laboratory (unpublished results) to determine which register is more responsible for the high stability of the fold-back structure. Judged by the great stability of these three distinct motifs, it is possible that no single register dominates but that all three registers are populated uniformly. In either case, the abundant hydrogen-bonding donors and acceptors from the interdigitated guanine bases and the bracketed sheared G A pairs would very likely be involved in interacting with other proteins to organize the eukaryotic kinetchore and centromere structure. Recently, the crystal structure of a nonamer containing a zipper-like d(GAAA)2 motif has been Ê 17 with the help of covalent hexamsolved at 2.1 A ine cation. The cobalt ion basically serves to form strong H-bonds with N-7 and O-6 atoms of the G residue in the bracketed sheared G A pair to bring together adjacent duplexes, which does not disturb the zipper-like structure. However, the d(GAAA)2 motif is the least stable among the d(GPuPuA)2 motifs in solution as studied here, possibly due to the lack of cross-strand hydrogen-bonding of the zipper adenines (Table 1). Addition of cobalt ion does not improve the spectra quality to any extent (S.-H.C et al., unpublished results). Its detailed three-dimensional structure could not therefore, be addressed by NMR methods due to its dynamic feature. Instead, we have studied the d(GGGA)2 motif, which is the most stable among the d(GPuPuA)2 motifs in solution and exhibits rather high quality NMR spectra (Figures 2 and 3) that are suitable for the structural determination. However, it is still worthwhile to compare the solid-state d(GAAA)2 structure with the solution-state d(GGGA)2 structure, with the overlapping between these two structures shown in Figure 7. From the Figure, it is clear that several features of these two structures are similar; the bracketed sheared G A pairs are highly buckled, and the central four purine bases are unpaired and interdigitated with each other to form an elongated backbone with a characteristic X shape. However, two major differences exist between these two structures; (1) the solid-state d(GAAA)2 structure (in red) has a narrower minor groove than that of the solution-state d(GGGA)2 structure (in blue). The shortest interstrand P-P distance for the d(GAAA)2 structure is Ê , while that of the d(GGGA)2 structure is only 6.6 A Ê ; (2) the phosphodiesters and the adenine resi8.5 A dues of the zipper residues in the solid-state d(GAAA)2 structure are not in a position to form cross-strand hydrogen-bonding even when an amino group is attached to the adenine C-2 position. This can be seen in the top of Figure 7, in which a typical cross-strand hydrogen-bond between the unpaired G-NH2 and the opposite strand phosphodiester in the d(GGGA)2 structure is marked by a blue arrow. It is clear from this

149

Figure 7. Overlapping stereo picture between the crystal structure containing the d(GAAA)2 motif (in red) and the NMR structure containing the d(GGGA)2 motif (in blue). Different base orientation and groove width were observed between these two structures, possibly due to the different base alignment in the zipper region. One typical base-base stacking between the outer and the inner zipper adenine residues in the crystal d(GAAA)2 structure is marked by two red arrows and is expanded in the bottom panel. It is clear that the adenine base alignment in this Figure is different from the guanine base alignment in the middle panel of Figure 6.

view that a similar hydrogen-bonding in the crystal d(GAAA)2 structure would be unlikely to happen (the corresponding distances are larger Ê ), due to the different stacking pattern of than 3.5 A the zipper adenine residues. Closer examination of the overlapping structures indicates that the intercalated adenine residue stacking in the crystal d(GAAA)2 structure is considerably different from the intercalated guanine base stacking in the solution d(GGGA)2 structure, due to the different polar natures of the guanine and adenine bases. The stacking between the inner zipper adenine bases (marked by two red arrows in the top panel of Figure 7 and expanded in the bottom panel of Figure 7) in the crystal d(GAAA)2 structure is around 110  , while that of the inner zipper guanine bases in the solution d(GGGA)2 structure is closer to 180  (middle panel of Figure 6) to prevent unfavorable repulsion. The resulting smaller twisting angle in the d(GGGA)2 motif thus draws nearer the two strands and decreases the minor groove

150 width, while the larger twisting angle in the d(GAAA)2 motif pushes away the two strands and increases the minor groove width. The anti-parallel stacking nature among the four unpaired guanine bases in the d(GGGA)2 motif (top panel of Figure 7) thus considerably increases its minor groove width. However, it is not clear why the adenine zipper does not adopt a stacking geometry similar to that of the guanine zipper. Unconstrained nanosecond molecular dynamics studies of the d(GAAA)2 and d(GGGA)2 motifs starting from the d(GAAA)2 crystal coordinates have been performed.16 Both zipper motifs were found to be internally stable with no major conformational change along the trajectory. However, their theoretical calculation indicates that the intrinsic base-base stacking energy difference in vacuo between the (GGGA)2 and (GAAA)2 motifs is only about 1 kcal/mol.16 This is signi®cantly different from our experimental data in the buffered aqueous solution, in which a large enthalpy difference of approximately 17 kcal/mol or a UV-melting temperature difference of about 13 deg. C was detected. It is likely that the different hydration energies between the two zippers could affect the zipper stability signi®cantly, as the intrinsic basebase stacking does not differ too much (personal communication with Dr Sponer).

Materials and Methods Sample preparation All DNA samples were synthesized at the 3 mmol scale on an Applied Biosystems 380B DNA synthesizer with the ®nal 50 -DMT groups attached. The samples were puri®ed and prepared for NMR studies as described.34 UV-melting studies The absorbance versus temperature pro®le was obtained at 260 nm with a Cary 100 photospectrometer equipped with a temperature-controller. A temperature probe was placed inside the UV chamber to monitor the cell temperature. The temperature in each run was increased from 20  C to 90  C at a rate of 0.5 deg. C/minute. All thermodynamics parameters were calculated by the Van't Hoff method 35 using the program supplied by the vendor. NMR experiments All NMR experiments were obtained on a Varian Unity Inova 600 MHz spectrometer. One-dimensional imino proton spectra at 0  C were acquired using a jump-return pulse sequence.36 The spectral width was 12,000 Hz with the carrier frequency set at the resonance of water. The maximum excitation was set at 12.5 ppm. For each experiment, 4 K complex points were collected and 112 scans were averaged with a two seconds relaxation delay. The 2D NOESY in 90 % H2O/10 % 2H2O was performed at 0  C in a pH 6.8 low-salt (20 mM NaCl, 3 mM sodium phosphate) buffer with the following par-

Centromeric Interdigitated G-6 Stack Structure ameters; delay time one second, mixing time 0.12 second, spectra width 12,136 Hz, complex points 2048, number of transients 96, and number of increments 500. The 2D NOESY experiments in 2H2O were carried out at 20  C in the hypercomplex mode with a spectral width of 4705 Hz. Spectra were collected using three mixing times of 100, 300, and 600 ms with a relaxation delay of one second between each transient and with 2048 complex points in the t2 and 200 complex points in the t1 dimension. For each t1 increment, 64 scans were averaged. The 2D 1H-13C HSQC experiments in .2H2O were carried out at 20  C in the hypercomplex mode with a 1H spectral width of 4705 Hz and a 13C spectra width of 25,649 Hz. Spectra were collected with a relaxation delay of one second between each transient and with 2048 complex points in the t2 and 100 complex points in the t1 dimension. For each t1 increment, 96 scans were averaged. A DQF-COSY spectrum was collected in the TPPI mode with a spectral width of 4705 Hz in both dimensions; 2048 complex points in the t2 dimension and 320 (real) points in the t1 dimension were collected with a relaxation delay of one second, and 40 scans were averaged for each t1 incrementation. A proton-detected 31P-1H heteronuclear correlation spectrum37 was collected in the TPPI mode with a spectral width of 4705 Hz in the 1H dimension and a spectral width of 1000 Hz in the 31P dimension: 1024 complex points in the t2 (1H) dimension and 128 complex points in the t1 (31P) dimension were collected. Protons were presaturated for 1.0 second and 128 scans were accumulated for each t1 incrementation. The acquired data were transferred to an IRIS 4D workstation and processed by the software FELIX (MSI Inc.) as described.38 Structure determination The 3D structures of the 50 -GAAGC-TCC-GCTTC-30 oligomer were generated by distance geometry and molecular dynamics calculations using distance and torsional angle constraints derived from NMR experiments. Most distance constraints from NOESY spectra in 2H2O were classi®ed as strong, medium, or weak based on their relative intensities at 100 ms and 300 ms mixing time and were given generous distance bounds of 2.0Ê , 3.0-5.0 A Ê , or 4.0-6.0 A Ê , respectively. Canonical 4.0 A Ê were hydrogen-bond distances with bounds of 1.8-2.1 A assigned to Watson-Crick base-pairs. A large number of distance constraints involving exchangeable protons were also derived from H2O/NOESY spectra and were Ê given only two wide distance bounds of either 2.0-5.0 A Ê , due to the exchange phenomena. The b and or 3.0-6.0 A g torsional angle constraints were determined primarily semi-quantitatively from the 31P-1H heteronuclear correlation data30 using the in-plane ``W'' rule.39 Based on the absence of long-range 4JH20 -P coupling, all e torsion angles were constrained to the trans domain (180(30  ).40 The z and a dihedral angles were all constrained in the nontrans domain, since no backbone phosphorus signal of extraordinary shifting was observed.41 The w dihedral angles were constrained to ÿ100  (ideal B-DNA values) 30  when no aromatic-anomeric cross-peaks of comparable intensity to the CH5/CH6 cross-peaks was detected. These NOE distance (272 in total) and torsional angle (104 in total) constraints were used to generate initial structures using the DGII program (MSI, Inc.). The

Centromeric Interdigitated G-6 Stack Structure initial structures were further re®ned by restrained molecular dynamics using the program DISCOVER (MSI, Inc.). A 2 ps dynamics was run at 300 K with a step size of 1.0 fs, which was followed by a conjugate gradient minimization of 200 iterations looped ten times. Well-converged ®nal structures with pair-wise r.m.s.d. Ê were obtained after molvalues of approximately 0.98 A ecular dynamics calculations.

Acknowledgments We thank the National Science Council and the Chung-Zhen Agricultural Foundation Society of Taiwan, ROC for the instrumentation grants and Dr Larvery for the kind gift of the CURVE program. Personal communication with Dr Sponer is highly appreciated. S.-H. C. is a recipient of the Outstanding Research Award from the National Science Council, Taiwan. This work was supported by the NSC grants 89-2113-M-005-034 to S.-H. C.

References 1. Choo, K. H. (1997). The Centromere, Oxford University Press, Oxford, UK. 2. Grady, D. L., Ratliff, R. L., Robinson, D. L., McCanlies, E. C., Meyne, J. & Moyzis, R. K. (1992). Highly conserved repetitive DNA sequences are present at human centromeres. Proc. Natl Acad. Sci. USA, 89, 1695-1699. 3. Catasti, P., Gupta, G., Garcia, A. E., Ratliff, R., Hong, L., Yau, P. et al. (1994). Unusual structures of the tandem repetitive DNA sequences located at human centromeres. Biochemistry, 33, 3819-3830. 4. Chou, S.-H., Zhu, L. & Reid, B. R. (1994). The unusual structure of the human centromere (GGA)2 motif: unpaired guanosines stacked between sheared GA pairs. J. Mol. Biol. 244, 259-268. 5. Ferrer, N., Azorin, F., Villasante, A., Gutierrez, C. & Abad, J. P. (1995). Centromeric dedeca-satellite DNA sequences form fold-back structures. J. Mol. Biol. 245, 8-21. 6. Zhu, L., Chou, S.-H. & Reid, B. R. (1996). A single G-to-C change causes human centromere TGGAA repeats to fold back into hairpins. Proc. Natl Acad. Sci. USA, 93, 12159-12164. 7. Ortiz-Lombardia, M., Cortes, A., Huertas, D., Eritia, R. & Azorin, F. (1998). Tandem 50 -GA:GA-30 mismatches account for the high stability of the foldback structures formed by the centromeric Drosophila dodeca-satellite. J. Mol. Biol. 277, 757-762. 8. CorteÂs, A., Huertas, D., Fanti, L., Pimpinelli, S., Marsellach, F. X., PinÄa, B. & AzorõÂn, F. (1999). DDP1, a single-stranded nucleic acid-binding protein of Drosophila, associates with pericentric heterochromatin and is functionally homologous to the yeast Scp 160p, which is involved in the control of cell ploidy. EMBO J. 18, 3820-3833. 9. Sunkel, C. E. & Coelho, P. A. (1995). The elusive centromere: sequence divergence and functional conservation. Curr. Opin. Genet. Dev. 5, 756-767. 10. Prosser, J., Frommer, M., Paul, C. & Vincent, P. C. (1986). Sequence relationships of three human satellite DNAs. J. Mol. Biol. 187, 145-155.

151 11. Pluta, A. F., Cooke, C. A. & Earnshaw, W. C. (1990). Structure of the human centromere at metaphase. Trends Biochem. Sci. 15, 181-185. 12. Abad, J. P., Carmena, M., Baars, S., Saunders, R. D. C., Glover, D. M., Ludena, P. et al. (1992). Dodeca satellite: a converved G ‡ C-rich satellite from the centromeric heterochromatin of Drosophila melanogaster. Proc. Natl Acad. Sci. USA, 89, 46634667. 13. Carmena, M., Abad, J. P., Villasante, A. & Gonzalez, C. (1993). The Drosophila melanogaster dodeca-satellite sequence is closely linked to the centeomere and can form connections between sister chromatids during mitosis. J. Cell Sci. 105, 41-50. 14. Huertas, D. & Azorin, F. (1996). Structural polymorphism of homopurine DNA sequences. d(GGA)n and d(GGGA)n repeats form intramolecular hairpins stabilized by different base-pairing interactions. Biochemistry, 35, 13125-13135. 15. Sponer, J., Gabb, H. A., Leszczynski, J. & Hobza, P. (1997). Base-base and deoxyribose-base stacking interactions in B-DNA and Z-DNA: a quantumchemical study. Biophys. J. 73, 76-87. 16. Spackova, N., Berger, I. & Sponer, J. (2000). Nanosecond molecular dynamics of zipper-like DNA duplex structures containing sheared G A mismatch pairs. J. Am. Chem. Soc. 122, 7564-7572. 17. Shepard, W., Cruse, W. B. T., Fourme, R., Fortelle, E. & Prange, T. (1998). A zipper-like duplex in DNA: the crystal structure of d(GCGAAAGCT) at Ê resolution. Structure, 6, 849-861. 2.1 A 18. Tseng, Y.-Y. & Chou, S.-H. (1999). Systematic NMR assignment pathways for DNA exchangeable protons. J. Chin. Chem. Soc. 46, 699-706. 19. Chou, S.-H., Cheng, J.-W. & Reid, B. (1992). Solution structure of [d(ATGAGCGAATA)]2: adjacent GA mismatches stablized by cross-strand base-stacking and BII phosphate groups. J. Mol. Biol. 228, 138-155. 20. Cheng, J.-W., Chou, S.-H. & Reid, B. R. (1992). Basepairing geometry in G  A mismatches depends entirely on the neighboring sequence. J. Mol. Biol. 228, 1037-1041. 21. Hare, D. R., Wemmer, D. E., Chou, S.-H., Drobny, G. & Reid, B. R. (1983). Assignment of the non-exchangeable proton resonances of d(CGCGAATTCGCG) using two-dimensional nuclear magnetic resonance methods. J. Mol. Biol. 171, 319-336. 22. Gehring, K., Leroy, J.-L. & Gueron, M. (1993). A tetrameric DNA structure with protonated cytosine:cytosine base pairs. Nature, 363, 561-565. 23. Varani, G. & Tinoco, I. J. (1991). RNA structure and NMR spectroscopy. Quart. Rev. Biophys. 24, 479-532. 24. Marino, J. P., Schwalbe, H. & Griesinger, C. (1999). J-coupling restraints in RNA structure determination. Accts Chem. Res. 32, 614-623. 25. Varani, G. & Tinoco, I. J. (1991). Carbon assignments and heteronuclear coupling constants for an RNA oligonucleotide from natural abundance 13C-1H correlated experiments. J. Am. Chem. Soc. 113, 93499354. 26. Wang, A. H.-J., Quigley, G. J., Kolpak, F. J., van der Marel, G., van Boom, J. H. & Rich, A. (1981). Lefthanded double helical DNA: variations in the backbone conformation. Science, 211, 171-176. 27. Frederick, C. A., Coll, M., van der Marel, G. A., van Boom, J. H. & Wang, A. H.-J. (1988). Molecular structure of cyclic deoxydiadenylic acid at atomic resolution. Biochemistry, 27, 8350-8361.

152 28. Guan, Y., Gao, Y.-G., Liaw, Y.-C., Robinson, H. & Wang, A. H.-J. (1993). Molecular structure of cyclic Ê resolution of two crystal diguanylic acid at 1 A forms: self-association, interactions with metal ion/ planar dyes and modeling studies. J. Biomol. Struct. Dynam, 11,, 253-276. 29. Chou, S.-H., Zhu, L. & Reid, R. R. (1996). On the relative ability of centromeric GNA triplets to form hairpins versus self-paired duplexes. J. Mol. Biol. 259, 445-457. 30. Chou, S.-H., Zhu, L., Gao, Z., Cheng, J.-W. & Reid, B. R. (1996). Hairpin loops consisting of single adenine residues closed by sheared A:A and G:G pairs formed by the DNA triplets AAA and GAG: Solution structure of the d(GTACAAAGTAC) hairpin. J. Mol. Biol. 264, 981-1001. 31. Chou, S.-H., Tseng, Y.-Y. & Wang, S.-W. (1999). Stable sheared A:C pair in DNA hairpins. J. Mol. Biol. 287, 301-313. 32. Umezawa, Y. & Nishio, M. (2000). CH/p interaction in the crystal structure of TATA-box binding protein/DNA complexes. Bioorg. Med. Chem. 8, 26432650. 33. Chou, S.-H., Tseng, Y.-Y., Chen, Y.-R. & Cheng, J.-W. (1999). Structural studies of symmetric DNA undecamers containing non-symmetrical sheared (PuGAPu):(PyGAPy) motifs. J. Biomol. NMR, 14, 157-167. 34. Chou, S.-H. & Tseng, Y.-Y. (1999). Cross-strand pruine-pyrimidine stack and sheared purine:pyrimidine pairing in the human HIV-1 reverse transcriptase inhibitors. J. Mol. Biol. 285, 41-48. 35. Marky, L. A. & Breslauer, K. J. (1987). Calculating thermodynamic data for transitions of any molecularity from equilibirum melting curves. Biopolymers, 26, 1601-1620. 36. Plateau, P. & Gueron, M. (1982). Exchangeable Proton NMR without base-line distortion, using new strong-pulse sequences. J. Am. Chem. Soc. 104, 73107311. 37. Sklenar, V., Miyashiro, H., Zon, G., Miles, H. T. & Bax, A. (1986). Assignment of the 31P and 1H reson-

Centromeric Interdigitated G-6 Stack Structure

38.

39.

40.

41.

ances in oligonucleotides by two-dimensional NMR spectroscopy. FEBS Letters, 208, 94-98. Chou, S.-H., Cheng, J.-W., Fedoroff, O. & Reid, B. R. (1994). DNA sequence GCGAATGAGC containing the human centromere core sequence GAAT forms a self-complementary duplex with sheared G:A pairs in solution. J. Mol. Biol. 241, 467-479. Sarma, R. H., Mynott, R. J., Wood, D. J. & Hruska, F. E. (1973). Determination of the preferred conformations constrained along the C40 -C50 and C50 -O50 bonds of b-50 -nucleotide in solution. Four-bond. 31 1 P- H coupling. J. Am. Chem. Soc. 95, 6457-6459. Altona, C. (1982). Conformational analysis of nucleic acids. Determination of backbone geometry of single-helical RNA and DNA in aqueous solution. Recl. Trav. Chim. Pays-Bas. 101, 413-433. Gorenstein, D. G., Schroeder, S. A., Fu, J. M., Metz, J. T., Roongta, V. & Jones, C. R. (1988). Assignments of 31P NMR resonances in oligodeoxyribonucleotides: origin of sequence-speci®c variations in the deoxyribose phosphate backbone conformation and the 31P chemical shifts of double-helical nucleic acids. Biochemistry, 27, 7223-7237.

Edited by I. Tinoco (Received 18 July 2001; received in revised form 27 September 2001; accepted 27 September 2001)

http://www.academicpress.com/jmb Supplementary Material comprising one Table is available on IDEAL