A zipper-like duplex in DNA: the crystal structure of d(GCGAAAGCT) at 2.1 å resolution

A zipper-like duplex in DNA: the crystal structure of d(GCGAAAGCT) at 2.1 å resolution

Research Article 849 A zipper-like duplex in DNA: the crystal structure of d(GCGAAAGCT) at 2.1 Å resolution William Shepard1*, William BT Cruse2, Ro...

927KB Sizes 0 Downloads 4 Views

Research Article

849

A zipper-like duplex in DNA: the crystal structure of d(GCGAAAGCT) at 2.1 Å resolution William Shepard1*, William BT Cruse2, Roger Fourme1, Eric de la Fortelle3 and Thierry Prangé1,2 Background: The replication origin of the single-stranded (ss)DNA bacteriophage G4 has been proposed to fold into a hairpin loop containing the sequence GCGAAAGC. This sequence comprises a purine-rich motif (GAAA), which also occurs in conserved repetitive sequences of centromeric DNA. ssDNA analogues of these sequences often show exceptional stability which is associated with hairpin loops or unusual duplexes, and may be important in DNA replication and centromere function. Nuclear magnetic resonance (NMR) studies indicate that the GCGAAAGC sequence forms a hairpin loop in solution, while centromere-like repeats dimerise into unusual duplexes. The factors stabilising these unusual secondary structure elements in ssDNA, however, are poorly understood. Results: The nonamer d(GCGAAAGCT) was crystallised as a bromocytosine derivative in the presence of cobalt hexammine. The crystal structure, solved by the multiple wavelength anomalous dispersion (MAD) method at the bromine K-edge, reveals an unexpected zipper-like motif in the middle of a standard B-DNA duplex. Four central adenines, flanked by two sheared G·A mismatches, are intercalated and stacked on top of each other without any interstrand Watson–Crick base pairing. The cobalt hexammine cation appears to participate only in crystal cohesion.

Addresses: 1LURE, Bâtiment 209d, Université Paris-Sud, 91405-Orsay Cedex, France, 2Chimie Structurale Biomoléculaire (URA 1430 CNRS), 93017-Bobigny Cedex, France and 3Laboratory of Molecular Biology, MRC, Hills Road, Cambridge CB2 2QH, UK. *Corresponding author. E-mail: [email protected] Key words: adenine tract, G·A mismatch, hairpin, MAD, single-stranded DNA, zipper-like DNA Received: 6 April 1998 Revisions requested: 6 May 1998 Revisions received: 21 May 1998 Accepted: 21 May 1998 Structure 15 July 1998, 6:849–861 http://biomednet.com/elecref/0969212600600849 © Current Biology Ltd ISSN 0969-2126

Conclusions: The GAAA consensus sequence can dimerise into a stable zipper-like duplex as well as forming a hairpin loop. The arrangement closes the minor groove and exposes the intercalated, unpaired, adenines to the solvent and DNA-binding proteins. Such a motif, which can transform into a hairpin, should be considered as a structural option in modelling DNA and as a potential binding site, where it could have a role in DNA replication, nuclease resistance, ssDNA genome packaging and centromere function.

Introduction On the basis of their conserved nucleotide sequence, the secondary structures of single-stranded (ss)DNA at the replication origins of bacteriophages and parvoviruses have been proposed to form hairpin structures [1–4]. In bacteriophage G4, the starting point of negative-strand synthesis, initiated by the Escherichia coli dnaG (primase) protein, lies just before one of three regions proposed to form hairpin loops [3]. The proposed hairpin at this site comprises an 8 base-pair (bp) stem and the loop sequence GAAAGCC. Studies on synthetic analogues of this hairpin revealed that fragments containing a GAAA consensus are just as stable as the original proposed hairpin [5]. These fragments exhibit very high melting temperatures (Tm > 75°C) [5], resistance to nucleases [6] and fast gel mobilities [7], even with only two Watson– Crick base pairs in the stem. On the basis of preliminary NMR studies and molecular dynamic calculations, a hairpin structure with a minimal sequence of GCGAAAGC

was proposed to account for these properties [8–10]. Interestingly, the d(GCGAAAGC) fragment is much more stable than its RNA analogue [9], which is a member of the r(GNRA) family of RNA tetraloop sequences (where N = A, C, G or U and R = A or G) and are well documented for their stability [11–14]. It is interesting to note that the loop sequence d(GAAA) is similar to the highly conserved repetitive DNA sequence (GGAAT)n found in human centromeres [15], and occurs with moderate frequency in fission yeast tandem repeats dg and dh [16,17]. The sequence is also found at the core of the 17 bp repeat consensus that associates with the centromere-binding protein known as CENP-B [18–20]. In its single-stranded form the (GGAAT)n repeat sequence, as well as its (GNAAT)n variant, shows thermal stabilities comparable to Watson–Crick duplexes formed between complementary strands [15]. These stable single-stranded tandem array structures were originally proposed to form a

850

Structure 1998, Vol 6 No 7

duplex involving mismatched G·A base pairing [15]. Later, however, NMR studies on analogues of this centromerelike repeat were shown to form a variety of stem loop and duplex structures [21–24]. Here we present the crystal structure of the d(GCGAAAGCT) sequence at 2.1 Å resolution, which has been solved by the multiple wavelength anomalous dispersion (MAD) method after incorporation of a bromine onto the second cytosine. The sequence folds into a self-forming duplex structure containing a novel adenine-rich zipper-like motif embedded in standard B DNA. None of the hairpin structures as proposed from NMR studies was observed [8–10]. This novel structure demonstrates how DNA can overcome a noncomplementary sequence to fold into a duplex, and raises questions about secondary structure in ssDNA systems.

RNA oligomers [29]. In the present deoxy structure, the G·A mismatches occur after only two Watson–Crick base pairs, suggesting that this type of mismatch is a rather stable configuration. The substantially buckled nature of Figure 1 (a)

Results Description of the structure

The structure of d(GCGAAAGCT) is an elongated and stretched duplex with a four-step zipper-like motif at the adenine core. The asymmetric unit contains only one DNA strand, but self-association about a twofold crystallographic axis generates a duplex structure (Figures 1 and 2). Although the duplex at both ends folds into standard B DNA with the formation of two classic Watson–Crick base pairs G(1)–C(8*) and C(2)–G(7*) (where the asterisk denotes a base from the crystallographically related molecule), the rest of the organisation is completely different. Two uncommon, buckled G·A mismatches occur at steps 3 and 6, and the core adenines are unpaired and intercalated into a zipper-like arrangement. This motif has the effect of unwinding the helix, leading to the elongation of the duplex. As the base rise still remains at normal B DNA stacking distances (an average value of 3.3 Å), an elongation of the phosphodiester backbone accommodates the deformation induced by the two consecutive intercalations. This type of deformation of the phosphodiester backbone is similar to those observed in DNA–drug complexes [25,26]. The intercalated adenines are bracketed on either side by two G·A mismatches. As illustrated in Figure 3, the central adenines are aligned in such a fashion that they stack between the guanines of the G·A mismatches, and thus form a well aligned polypurine stack of six bases. Table 1 highlights the dramatic conformational differences in the phosphate backbone compared to the more standard structures of DNA. The third and sixth steps of the duplex are G(anti)·A(anti) mismatches which are characterised by the formation of two hydrogen bonds at G(N3)–A(N6) and G(N2)–A(N7). This type of G·A mismatch, referred to as a sheared base pair, has been observed in DNA oligomers (for a review see [27] ), as well as at a divalent ion binding site in domain II of the hammerhead ribozyme [28] and in unusual

(b) Base spacings

T9* G1

n

C2

n

G3

n

C8* G7* A6*

Watson–Crick pairs G•A mismatch

A4 A5*

2n

Adenine zipper

A5 A4*

2n A6 n

G7

n

C8

T9

G3* C2* G1*

G•A mismatch Watson–Crick pairs

Structure

The zipper-like motif in DNA. (a) The initial experimental electrondensity map at 2.35 Å resolution with the final model superimposed. Contouring is at 2σ above the mean density level. The region shown is the central part of the intercalating adenines. The symmetry-related molecule (in red) is shown to illustrate how the zipper-like dimer is built. (b) Schematic diagram of the base-stacking arrangement in the extended/stretched duplex (the helix is represented unwound). The two ends comprise normal Watson–Crick pairings but the middle of the helix shows the disymmetric extension and alternate intercalations of unpaired adenines with an interspacing parameter n = 3.3 Å. Adenines are shown in blue, guanines in green, cytosines in red and thymines in magenta; the bromine atoms are shown as yellow spheres.

Research Article Structure of an adenine zipper in DNA Shepard et al.

851

Figure 2 Comparison of the zipper-like motif to standard B DNA. (a) The zipper-like motif alongside a segment of (b) B DNA. Both structures correspond to an 8 bp stretch (the ninth thymine residue in the zipper-like motif is flipped out of the helix). The zipper-like motif displays the characteristic shape of an X with an elongated phosphodiester backbone. All the adenine bases point in the same direction as the minor groove collapses in the central region. The standard B-DNA structure is more regular and compact.

the G·A mismatches brings the phosphodiester chains together, merging the opposing DNA strands into the zipper-like motif. A severe torsion (kink) is introduced in the helical structure at the G·A mismatches (visible in Figure 3), but this has no effect on the adenine tract itself, which shows a rather straight alignment in the core region (Figures 2 and 3). Other variations of deoxy G·A mismatches are known from X-ray crystal structures and NMR

studies [27,30], but in general they have a destabilising effect compared with Watson–Crick base pairs [31]. Crystals of the zipper-like duplex form only in the presence of metal hexammine salts (cobalt, rhodium or iridium), and a cobalt hexammine cation is clearly apparent in the crystal structure. A dramatic increase was found both in the resolution of the diffraction pattern

Figure 3 Stereo view of the zipper-like motif. A closeup view of the central adenine tract cut out from the duplex. The figure illustrates the substantial buckling of the sheared G·A mismatch located at the bottom of the figure. Atoms are shown in standard colours.

852

Structure 1998, Vol 6 No 7

Table 1 Geometry parameters in the monomer and in the duplex. Intrastrand†

Interstrand‡

Sugar puckerings§

Pn–Pn+1 distances

Step

η

P–P* distances¶

C1′–C1′ distances¶

φ

P

τm

(Å)

#

(°)

(Å)

(Å)

(°)

(°)

(°)

1 2 3 4 5 6 7 8

9.4 9.8 17.4 12.6 –9.6 27.8 13.8 9.5

165 120 160 35 220 205 100 125

33 44 43 35 37 37 39 28

P2–P3 P3–P4 P4–P5 P5–P6 P6–P7 P7–P8 P8–P9

6.60 7.11 6.14 6.29 6.40 6.78 6.27

P2–P8* P3–P7* P4–P6* P5–P5* P6–P4* P7–P3* P8–P2*

18.43 16.29 10.25 9.10 10.25 16.29 18.43

G1–C8* C2–G7* G3–A6* A4–A5* A5–A4* A5–G3* G7–C2* C8–G1*

10.59 10.48 8.0 4.5 4.5 8.01 10.48 10.59

11.5 6.5 11.7 –86 94 –11.7 –6.5 –11.5

G1 C2 G3 A4 A5 A6 G7 C8

†For

comparison, the average values for phosphate–phosphate distances (Pn–Pn+1) within a strand are 5.9, 7.0 and 6.5 Å for A DNA, B DNA and tetraplexes, respectively. Except for the abrupt change at the G·A mismatches, these values are similar to the tetraplex geometry. The inclination angle, η, represents the angle between the mean plane of the individual base and the perpendicular to the helix axis (positive values denote downward inclinations). ‡Interstrand P distances represent the variable diameter of the duplex. Typical values are remarkably constant in A and B DNA (17.55 and 17.75 Å). The adenine zipper squeezes the strands together which dramatically reduces the interstrand P distances in the middle. The line from P5 to P5* is not perpendicular to the helix axis and does not represent the shortest interstrand distance which is P6–P5* (6.36 Å). The C1′–C1′

vectors are usually perpendicular to the helix axis in ‘standard’ B DNA (φ close to 0°). This is true for the three first base pairs, where distances correspond to the canonical B DNA, but not for the others. The inner vectors A4–A5* (and its symmetric A5–A4*), are now nearly parallel to the helix axis, as these adenines are not paired but only stacked. For comparison, the average C1′–C1′ distances are 10.5 Å in both A and B forms of DNA and 9.4 Å in the tetraplex structures. §P and τ are the pseudo-rotation parameters of the sugar puckerings. m These parameters were not constrained during refinement of the structure. Standard deviations are in the range of 5°. The sugars all adopt the ‘southern’ conformation, with the exception of sugar 4 which adopts the ‘northern’ conformation. ¶The asterisk denotes a symmetryrelated atom (second strand of the duplex).

and the mechanical resistance of the crystals, when changing the complexed metal hexammine from cobalt to either rhodium or iridium hexammine — a feature already observed in RNA oligomers [32]. The cobalt hexammine forms strong and specific hydrogen bonds at the G(3) base level. The side-by-side arrangement of the bases of the sheared G·A mismatch turns the N7 and O6 atoms of G(3) outwards from the major groove and allows for formation of hydrogen-bonding interactions with the cobalt hexammine cation. This cation sits on a threefold crystallographic axis and bridges three duplexes at the sheared G·A mismatch (Figure 4). It also occupies a position in one of the solvent channels and makes no interactions to any phosphate oxygens (Figures 4a and 5a). As such, the cobalt hexammine is considered to participate largely in crystal cohesion rather than in duplex stability. The central unpaired adenines are also located about the threefold axis and are retained in a weak triplex association by hydrogen bonds, mediated by a water molecule which is also visible in the initial experimental MAD electron-density map.

considered an important stabilising effect in the crystal formation between two aligned symmetry-related helices. The G–C pairs at both ends are parallel and oriented so that the guanine G(1) is stacked over the symmetry-related cytosine C(8*) and vice versa. The last thymine residue at position 9 was initially introduced in the synthesis in order to promote an intermolecular A–T pair formation as a means of crystal stabilisation. This residue projects out of the helix, however, into a cavity delimited by the packing. The residue is strongly disordered, unpaired and autonomous of the stacking — showing up only as a faint signature in the electron density. In the packing structure, the helices are arranged in such a fashion that a large solvent channel, about 18 Å in diameter, is built about the sixfold axes and oriented along the z axis (Figure 5).

As the double helix runs parallel to the z axis of the crystal lattice, the packing of symmetry-related duplexes on top of each other leads to the formation of continuous doublehelical stacks in the crystal (Figure 5). This style of packing is rather well documented in A and B nucleic acid structures, in both the free and intercalated states [26,32], and is

Discussion Comparison with [GGA]2 motifs and other intercalated structures

The zipper-like motif has certain similarities with three centromere-like DNA oligonucleotides, which form selfassociating duplexes or hairpin structures, as determined from high resolution NMR studies [22,23,33]. In all cases, the oligomers form B DNA family duplexes with two intercalated and unpaired guanine residues bracketed by sheared G·A mismatches, and dubbed the [GGA]2 motif (unpaired and intercalated bases are highlighted in bold). This motif is of particular interest because human

Research Article Structure of an adenine zipper in DNA Shepard et al.

centromeric DNA was discovered to contain d(GGAAT)n tandem repeats, which are just as thermally stable alone as with their complementary strand in Watson–Crick duplexes [15]. Both [d(TGGAATGGAA)]2 [22] and [d(GTGGAATGGAAC)]2 [23] oligonucleotides are noncomplementary sequences that form duplexes with two [GGA]2 motifs. The sequence d(GTGGAATGCAATGGAAC) [33] forms a ‘fold-back’ structure with a GCA hairpin loop at the central cytosine residue (underlined) and the remaining nucleotides folding into a duplex with two (GGA)2 motifs. All of the duplex sugar residues are in the C2′-endo conformation except for the unpaired guanosines which are in the unusual C3′-endo conformation. The remarkable stability of these duplexes, despite having as few as only two Watson– Crick base pairs out of ten, has been attributed to three factors: the interstrand stacking of the unpaired guanines with the guanines in the adjacent G·A mismatch; packing of the sugar residue of the unpaired guanine in between two of the bases of the G·A mismatch; and a hydrogen bond between the unpaired guanine NH2 and a phosphate oxygen from the opposite strand. The zipper-like motif in the [d(GCGAAAGCT)]2 duplex shows base stacking more extended than the [GGA]2 motif. The central adenines stack in between the guanine bases of the two sheared G·A pairs and beyond to extend throughout the whole duplex. Figure 3 illustrates this and how the stacking crosses over to the opposite strand (i.e. the stacking follows the sequence GCGA*AA*AG*C*G*, where bases of one of the strands are denoted with asterisks). As in the [GGA]2 motif, the sugar residue of the first unpaired base (A4 in the present structure) following the sheared G·A mismatch conforms to C3′-endo or north configuration and packs against the adenine of the G·A mismatch. The adenine A4 does not, however, form a hydrogen bond to the phosphate oxygen from the opposite strand as seen for the [GGA]2 motif, because adenine lacks the amino group (NH2) on the C2 carbon atom, which is present for guanine. Interestingly, both the phosphate and adenine in question are in suitable positions to form such a hydrogen bond if the amine group were added. This implies that this hydrogen bond in the [GGA]2 motif is not essential to the formation of the duplex, but does contribute to its overall stability (see below). The sugar residue of the second unpaired and intercalated adenine (A5), reverts back to the C2′-endo or south configuration. This adenine also does not form any other hydrogen bonds to phosphate groups or bases. A consequence of the zipper-like motif is a subsequent pinching together of the phosphodiester chains to bring the interstrand P–P distances between P5 and P6 to within 6.6 Å. This is more marked in the [GGA]2 motif, where the shortest interstrand P–P distance is 8.5 Å [23]. The negatively charged phosphate groups compensate for this by pointing their oxygens in opposite directions and away from each other (see Figure 3). In fact, the

853

Figure 4 (a)

W13

W12

G3

A6*

(b)

N7* O6 N4

O6*

N3

N6 N7

N5 N1

N2 N7# O6#

Structure

The cobalt hexammine site. Two views of the cobalt hexammine interactions at the G3 base level, which illustrate the special role of this residue. (a) The 2Fo–Fc map at 2.1 Å resolution of the Co(NH3)63+ and the surrounding guanine residues (contoured at 2σ above the mean density with phases from the refined model). (b) Diagram of the extensive hydrogen-bond network linking the Co(NH3)63+ cation around the threefold crystallographic axis to the N7 and O6 atoms of G3.

conformation of the phosphodiester backbone in the zipper-like motif is analogous to those found in the intercalated ‘i-motifs’ of cytosine-rich DNA oligonucleotides [34–45]. The phosphodiester chains in i-motifs are packed side-by-side in an antiparallel fashion with the sugar residues in C3′-endo or C4′-exo conformations. With the minor groove effectively closed, the bases are turned out

854

Structure 1998, Vol 6 No 7

Figure 5 (a)

(b)

Structure

Crystal packing. (a) Packing of the structure viewed along the z axis showing the large solvent channel located on the sixfold axes (diameter approximately 18 Å). The cobalt hexammine cations lie in narrower channels which also run parallel to the z axis and are located about the threefold axes. (b) Two perpendicular views of the end-to-end packing

of duplexes in the crystal structure that generates a continuous helix along the z axis. The asymmetric unit is shown in yellow, its ‘complementary’ strand in red and the stacked symmetry-related helices in green. This duplex is flattened into a ribbon-like structure compared with the standard B DNA double helix.

to present their Watson–Crick faces along the opposite side of the duplex with respect to the phosphodiester backbone, as is seen for the adenines in the zipper-like motif. However, the Watson–Crick face of the i-motif duplex self-associates into tetraplex via the formation of protonated C·C+ base pairs which requires acidic conditions. In the zipper-like motif, the central adenines do not form any hydrogen bonds with any other symmetry-related bases or phosphate groups, but rather make contacts mediated through a water molecule as mentioned above.

and was named the ‘base zipper’. There are, however, several striking differences between this RNA motif and the DNA motif reported here. As the base-zipper motif forms the core of the theophylline-binding pocket, there is an extensive network of hydrogen-bonding contacts to this motif; in particular, one base of the motif makes three hydrogen bonds to the theophylline and this residue is responsible for discriminating theophylline from caffeine. The phosphodiester chains of the base-zipper motif are disposed differently as well, with the two chains separated as far apart as possible. This disposition of the phosphodiester chains is more reminiscent of the intercalated packing mode in cyclic ribo-diguanylic and cyclic deoxyribodiadenylic acids [47,48].

The stacking of the adenine bases in the zipper-like motif differs somewhat from that seen in i-motifs (see Figure 3). In i-motifs, the hemiprotonated C·C+ base pairs crossstack in such a way that the intercalated base pairs are rotated almost 90° from each other, and which aligns only every other stack of pyrimidine rings. In the zipper-like motif, the stacking of intercalated purine rings and the bases of the flanking GCG sequences exhibits a small displacement of the base at each step, creating a sheared stack that winds its way around the duplex. A zipper-like motif has recently been discovered in an RNA structure that specifically binds theophylline [46]

Duplex or hairpin?

The X-ray crystallographic evidence presented here for a duplex structure of the oligonucleotide d(GCGAAAGCT) is not necessarily contradictory to the hairpin model proposed by Hirao et al. [10]. Rather, it is likely to be the duplex form of a dynamic equilibrium between hairpin and duplex conformations. Hairpin↔duplex transitions have been studied and shown to exist for a number of deoxyribose oligonucleotides, especially double-stranded

Research Article Structure of an adenine zipper in DNA Shepard et al.

DNA sequences containing inverted repeats or palindromes [49–54]. Non-self-complementary sequences are also known to undergo hairpin↔duplex transitions [21,24,55–57], and this includes the formation of pHdependent i-motifs [39,58–60]. Several parameters contribute to whether a given oligonucleotide folds into a hairpin structure or self-associates into a duplex. Most notably, these parameters include the nucleotide sequence, its length and its concentration (e.g. [24,57]), as well as salt concentration [55,56] and pH [39,44,58,59]. A detailed NMR study on variants of the d(NAATGNAATG) sequence (where N is either A, C, G or T), revealed that the oligonucleotides exist as differing equilibrium mixtures of single-residue hairpins and mismatched duplexes containing sheared G·A pairs and unpaired, intercalated bases [24]. Depending upon the central GNA sequence, for N = C, the strand folds exclusively into a hairpin. For N = A or T, the mixture contains approximately 20% duplex, and for N = G the mixture is predominately duplex. The preference of the N = G sequence to form duplexes, relative to the N = A sequence, which favours the hairpin form, is most probably a consequence of an additional guanine–phosphate hydrogen bond which contributes to the stability of the duplex form. Because adenine cannot form this particular hydrogen bond, it was reasoned that its duplex form is less stable. In the zipper-like motif, however, the corresponding adenine is in the same position as the guanine residue that interacts with the phosphate, and this suggests that the zipper-like motif is representative of the minor duplex form of d(NAATGNAATG) where N = A. In the NMR studies on the d(GCGAAGC) sequence [60] and its d(GCGNAGC) variants [61], which lack a central adenine compared with the sequence reported here, the NMR spectra did not demonstrate clear-cut duplex formation under experimental conditions. Reid and coworkers [23,24] suggested that the localised broadening in the spectra of the H1′ protons of the GAA loop of the d(GCGAAGC) sequence could just as well be due to a hairpin–duplex equilibrium as to the original interpretation of a small structural fluctuation caused by the wobbling of the single adenine loop residue. Although no duplex form was presented in the NMR studies on d(GCGAAAGC) [10], a hairpin–duplex equilibrium could very well exist for the d(GCGAAAGCT) sequence. As mentioned above, oligonucleotide and salt concentrations, along with the nucleotide sequence, are known to affect the relative hairpin–duplex populations of DNA oligomers. In general, the hairpin form is typically favoured at low oligomer and salt concentrations while the duplex form is favoured at high oligomer and salt concentrations (e.g. [55,56,62]). With this in mind, the appearance of hairpin or duplex structures of the d(GCGAAAGCT)

855

fragment becomes more clear. The relatively high oligomer (and possibly salt) concentration in the crystallization solution probably favours the duplex form, and crystallization of the duplex form displaces the equilibrium as it is depleted from the mother liquor solution. In principle, crystallisation of the hairpin form is possible, but requires conditions where the hairpin is favoured over the duplex which must remain soluble during crystallization. Role of the cobalt hexammine cation

As stated earlier, the cobalt hexammine cation contributes more to crystal cohesion, as a result of bridging interactions between guanines of adjacent duplexes, than directly to duplex stability. The crystal stabilization effect of metal hexammine cations has been recognized in crystallizations of nucleic acid oligomers for some time [32,63–68], even though the cation itself may not be observed in the crystal structure [66]. In DNA, the cobalt hexammine cation has a well defined binding mode at the O6 or N7 sites on the major groove side of a guanine, often with an additional interaction to a phosphate oxygen [66,67,69,70]. In most cases, the cobalt hexammine sites cross-link between symmetry-related strands of the phosphate oxygens or guanine bases. Until recently [66], this cation has only been directly observed in association with DNA in its Z form, and in this particular case, the oligomer folds into an A DNA duplex. The only cobalt hexammine cation that was found not to bind to a guanine in a DNA crystal structure, instead binds to phosphate oxygens from adjacent duplexes [65]. In RNA structures, the cobalt hexammine is usually located in the major groove of the duplex, within a binding pocket built by tandem G·U base pairs [64,71], or linked in a similar way to a guanine step as observed in tRNA [72]. Similar binding modes are observed for other metal hexammines, such as osmium hexammine in RNA [64] or ruthenium hexammine in DNA [73]. The interest in cobalt hexammine-binding properties arises from the fact that it is known to promote the transition of either B to A or B to Z DNA conformations depending on the DNA sequence [63,67,68,74–77]. This cation is also effective in promoting the formation of the four-way (Holliday) junction [78]. In RNAs, the cobalt hexammine cation is known to mimic and substitute for hexahydrated magnesium cations, which are considered to be an important structural component in RNA molecules [64,71,72,79–81]. The use of these metal hexammine derivatives is thus a way of studying ribozyme catalytic behaviour [64]. The binding mode of cobalt hexammine in the structure presented here is similar to the most frequently observed binding mode where hydrogen-bonding interactions crosslink the O6 and N7 of a guanine step, except that three symmetry-related guanine bases are involved (Figure 4). Surprisingly, interactions with the surrounding phosphate

856

Structure 1998, Vol 6 No 7

groups are absent. The fact that the cobalt hexammine cation resides in a solvent channel and makes interduplex rather than intraduplex hydrogen bonds suggests that it does not affect the conformation of the zipper-like motif. Other structural studies on oligonucleotides also provide evidence for insignificant structural changes upon cobalt hexammine binding [66,71]. In light of the different structures analogous to the d(GCGAAAGCT) sequence, it would be interesting to find out if the zipper-like motif could extend beyond four intercalated and unpaired adenines, what factors would stabilise it over the hairpin form, and whether the motif is dependent upon the nucleotide sequence. At present, we are continuing research along these lines and, in particular, we have recently collected X-ray data to 1.7 Å resolution on isomorphous crystals containing iridium hexammine which will provide a more precise and detailed structure to be presented elsewhere. DNA secondary structure elements in replication origins and viruses

In the ssDNA bacteriophages α3, φK, G4 and St-1, the replication origins contain regions of significant nucleotide conservation that have been proposed, on the basis of sequence analysis, to fold into successive hairpin loops. These models are based primarily upon the formation of Watson–Crick base pairs, without considering other possible forms of DNA secondary structural elements. It is worth noting that in the vicinity of the replication origins of these bacteriophages, there exist several GA and GAA stretches which could potentially fold back on to each other and form intercalated or zipper-like motifs [3]. In particular, the replication origins of α3, φK and St-1 contain a tandem repeat with the GGAA consensus separated by only three residues. Tandem repeats of GRnA (where R is usually G or A) runs are also prevalent in the proposed replication origins of phages λ and φ80 [82,83], and of E. coli [84,85]. In phage λ, three GRnA tandem repeats exist ([GAAAA]2, [GAGGGA]2 and [GGGGGA]2) [82], in phage φ80, there are two GRnA tandem repeats ([GAAA]2 and [GAACA]2), and E. coli has only one GRnA tandem repeat ([GAATGA]2). Curiously, a GRnA tandem repeat occurs in or near the bubble in the stem of a proposed hairpin structure for these three cases [3]. Although ssDNA or dsDNA replication origins have the potential to fold into hairpin or cruciform structures, whether they do so in vivo still remains to be determined. The 3′ terminus of parvoviruses is a highly conserved sequence of 115 or 116 nucleotides which has been proposed to fold into a Y-shaped hairpin structure and which is also an origin for DNA replication [4,86]. There exists in the main stem of the Y-shaped hairpin a GAA:GA ‘bubble’ which is close to the initiation site of DNA replication and is resistant to mung bean endonuclease [4]. An unpaired

adenine base could be stacked between the guanine bases of two bracketing G·A mismatches, as in the zipper-like motif, and this could explain the endonuclease resistance of the GAA:GA bubble. Indeed, such a motif has been proposed on the basis of unpublished NMR studies by Chou et al. [27]. The zipper-like motifs and their transitions into hairpin structures might have an important role in viral replication. Recently, hairpin↔duplex transitions have been implicated in the initiation of DNA replication at palindromic telomeres of the minute virus of mice (MVM), a parvovirus [87]. Amplification of the replication form intermediate is initiated by the folding back of palindromic sequences which act as primers for strand-displacement synthesis. In fact, hairpin↔duplex transitions would aid parvovirus DNA replication where the folding and unfolding of palindromic sequences is necessary in the rolling hairpin model [86,88]. Furthermore, mismatched nucleotides of a bubble in the 5′-terminal hairpin of MVM were found to be critical for growth of the virus, but the nucleotide sequence was not critical [89]. The results suggest that the replacement of the bubble with full base pairs impairs the conversion of the hairpin 5′ termini to the extended form. This hints at the possibility of an unusual DNA secondary structure having a role in viral DNA replication. The zipper-like motif also provides a model for how filamentous inoviruses, which contain circular ssDNA, can package their genome. In the case of the inovirus Pf1, the interior cross-section diameter of the filamentous protein shell is too narrow to package standard B DNA [90–92]. To account for the genome packaging into the confined space of the interior of the Pf1 inovirus, models of a duplex with intercalated and unpaired bases were put forward on the basis of fibre diffraction studies, energy minimisation calculations and small-molecule crystallography [90,92,93]. In these models, however, it was reasoned that the negative charge on the phosphate groups would induce the two phosphodiester chains to separate as far apart as possible. The zipper-like motif shows that this restriction is not necessary, and that an extended zipperlike DNA conformation could act as a suitable model for the packaging of the genome of inoviruses. GAAA tandem repeats in centromeric DNA

Centromeric DNA of higher eukaryotes is characterised by large arrays of relatively short tandem repeats of untranscribed DNA which is associated with the kinetochore and is functionally important during mitosis and meiosis (for a review see [94]). Although the precise function of the tandem repeats in centromeres is obscure, it is thought that they are involved in the formation of a higher-order structural complex at the kinetochore [15,20,95]. A variety of centromeric DNA contains GAAA stretches or related

Research Article Structure of an adenine zipper in DNA Shepard et al.

sequences. (GGAAT)n repeats have been found to be highly conserved in the centromeres of humans and other higher eukaryotes [15]. The GAAA stretch is also found in the 17 bp CENP-B box which binds the CENP-B protein located in the centromere region beneath the kinetochore [18,96–99]. In the centromeres of fission yeast, which have been extensively studied as a model for higher eukaryote centromeres (for a review see [99]), the dg and dh repeat sequences and the central sequence are moderately rich in GAAA stretches [16,17,95]. Studies on centromere-like oligonucleotides have revealed that many of these sequences exhibit high thermal stability when separate from their complementary strands [15,21,100]. Unusual secondary structures, such as mismatched duplexes and fold-back structures containing G·A pairs and unpaired bases, have been implicated as being responsible for the exceptional stability of these ssDNA analogues in vitro [21–23,33]. In view of the nucleotide sequence similarities, the zipper-like motif presented here is probably one member of a family of unusual secondary structures in centromere-like DNA. Nevertheless, the existence for these unusual secondary structures in vivo, and whether they have a role in centromere function, depends upon the separation of the two complementary strands in dsDNA into ssDNA, which has, as yet, to be determined. But, it is interesting to note that the chromatin structure of the central core of centromeres from fission yeast has been reported to be unusual, with no evidence of regular nucleosomal packaging [94,95]. Furthermore, when the central core is spliced into a non-functional environment, the sequences are packaged normally into nucleosomes [94]. This has led to the suggestion that the unusual organisation of the core region is central to a higher-order structural complex that distinguishes the centromere from the chromatin arms. The basis for this might lie in unusual secondary structures of ssDNA which have been separated from their complementary strand by preferential binding with specialized nuclear proteins — a phenomenon observed with the centromeric dodeca-satellite DNA sequences from Drosophila [100]. Such unusual ssDNA secondary structures, controlled via the binding of the complementary strand, would open up the possibility of hairpins and mismatched duplexes containing zipper-like and [GGA]2 motifs with stacks of unpaired bases, and these structures would constitute a new family of DNA-binding sites available for interaction with proteins, such as those that make up the kinetochore complex.

Biological implications The fragment d(GCGAAAGCT) of the replication origin of bacteriophage G4 was previously proposed to form a hairpin [10]. In this paper, the fragment is shown instead to self-associate into a duplex containing a zipperlike motif —a novel secondary structure element in DNA. The intercalation and stacking of the central,

857

unpaired adenines between two sheared G·A base pairs is a stable conformation that provides a way for noncomplementary strands of DNA to form a duplex. As such, it should be considered as a structural option when constructing the single-stranded (ss)DNA secondary structures of biological systems that contain tandem repeats of GAAA stretches or purine tracts. In particular, the replication origins of many bacteriophages contain tandem repeats of GRnA tracts often located in the stems of proposed hairpins [3]. Such repeats could provide an alternative to Watson–Crick fold-back structures in the form of a zipper-like duplex containing an array of stacked, unpaired purines. Furthermore, the exposed nature of the adenines in the zipper-like motif could allow proteins to bind to their free Watson–Crick face, and would constitute a new family of DNA-binding sites. The intercalated and stacked nature of the zipperlike motif also provides a model for the packaging of the circular ssDNA genome into the restricted space of the interior of filamentous inoviruses [90–93]. The polymorphism exhibited by many DNA sequences, such as the one studied here, is likely to be an important factor in the initiation of DNA replication in parvoviruses [86–88], where hairpin↔duplex transitions at the telomeres have been implicated in the rolling hairpin model. Indeed, Chou et al. [27] have proposed a zipperlike motif at the 3′ terminus of the parvovirus genome on the basis of unpublished NMR studies of the GAA:GA bubble, which is resistant to mung bean endonuclease. Similarly, the endonuclease resistance shown by other oligonucleotides could be explained by the formation of zipper-like motifs in mismatched duplexes. The GAAA consensus and similar analogues occur in a number of repetitive sequences of centromeric DNA [15–18,94–99]. Although the function of these tandem repeats in mitosis and meiosis is obscure there has been some speculation, on the basis of their exceptional thermal stability when separated from the complementary strand, that they have a structural role in the formation of unusual secondary structure. The existence of unusual secondary structure, such as hairpins or zipper-like duplexes, in vivo has yet to be determined, however.

Materials and methods Synthesis of the nonamer The nonanucleotide was synthesised by automated methods with an Applied Biosystem 391 synthesizer and phosphoroamidite monomers obtained from Millipore (a 5-Br cytosine was introduced at position 2). The product was hydrolysed with 3 M aqueous ammonia, washed with ether and precipitated with ethanol. Purifications were by reverse phase HPLC.

Crystallisation The nonamer dGCGAAAGCT (1 mg) was dissolved in 0.4 ml lithium cacodylate buffer (0.05 M). To this solution, 3 µl of 0.1 M NH4OH,

858

Structure 1998, Vol 6 No 7

15 µl of a 10 mM cobalt hexammine chloride solution and 8 µl of 50% methylpentanediol (MPD) solution were added. The solution was allowed to equilibrate at 4°C by vapour diffusion against a reservoir containing a solution of similar composition except with MPD at a higher concentration (25%). Small yellowish hexagonal crystals develop in one or two weeks. They are very stable at 4°C (diffracting well even after two years) but much less so at room temperature.

Data collection and processing Despite many attempts, it was not possible to solve the structure using molecular replacement methods with either a canonical B model or starting from different known B-DNA structures. Consequently, MAD data were collected about the bromine K-edge at four different wavelengths. Of the two crystals used, both were crystallographically aligned: one with the c* axis parallel and the other with the c* axis perpendicular to the spindle. The X-ray wavelengths were chosen from the X-ray fluorescence spectra of the bromine K-edge recorded directly from the crystals in order to optimise the anomalous dispersion effects from the bromine atom (Table 2). The MAD data were recorded with an 18 cm diameter image plate system on the new experimental station DW21b, situated at the end station of the LURE-DCI wiggler, Orsay, France. The crystal-detector distance was 210 mm and rotation frames of 2° were used for data collection. Exposure times per frame were 180–300 s. Diffraction images and data were processed using MOSFLM, SCALA and the CCP4 suite of programs [101]. Data were put on to a common scale in two steps: internal scaling was applied to the data of each wavelength of each crystal to correct for sample decay and incident beam fluctuations; and then data at all wavelengths were scaled as a smoothly varying function along the detector and spindle position utilising the long wavelength remote

(lowest f″) as a reference. Scale factors between the two crystals were determined at the heavy-atom refinement stage. Following the structure determination, a high-resolution data set was recorded on the large MAR Research image plate system (diameter 300 mm) available at the W32 wiggler beam line at LURE. The nominal resolution was 2.1 Å (for a ratio I/σ(I) = 4 in the last data shell 2.2–2.1 Å). This data set comprises 1832 independent structure factors over a total of 21,345 measured reflections (redundancy 11.6; Rsym 5.3%).

Phasing at the bromine edge Anomalous Patterson maps were calculated for each individual wavelength and crystal [102,103], as well as for the set of |oFA| as determined from algebraic equations implemented in the program MADLSQ [104]. This allowed us to follow the evolution of the Patterson peaks in the maps with respect to the change in wavelength. Only one distinct set of peaks appears in the Patterson maps. The corresponding site and its enantiomorph were entered as the bromine atom of the cytosine for refinement using the program SHARP [105] and then followed by solvent flattening with the program SOLOMON [106]. Choice of the enantiomorph was made by inspecting the electron-density maps and the correct one was clearly evident showing the macromolecular boundary and the stacking of bases along the c axis. Eight out of nine of the bases were easily identified and could be fitted into density, but tracing of the phosphate backbone was unclear at certain points. Along the crystallographic threefold axis, however, a cobalt hexammine was readily identified as it is known to be necessary for the growth of the crystals. After including this cobalt and its anomalous scattering as an additional ‘fragment’ into the second round of heavy-atom refinement

Table 2 Crystal data. Crystal 1 (c* perpendicular to spindle) Wavelength (Å)

Resolution limits (Å) Nobs Nuniq Redundancy Completeness (%) I > 3σI (%) Rsym* Ranom† RCullis (centrics)‡ Dispersive phasing power (centrics/acentrics)§ Anomalous phasing power (acentrics) Theoretical f′/f″ ¶ Experimentally refined f′/f″ #

Crystal 2 (c* parallel to spindle)

0.9119 short λ remote

0.9191 peak

0.9200 edge

0.9536 long λ remote

0.9119 short λ remote

0.9191 peak

0.9196 edge

0.9535 long λ remote

33.4–2.43 15,433 1299 11.9 100 90.2 0.045 0.052 0.893

33.4–2.43 16,218 1297 12.5 100 93.1 0.037 0.056 0.611

33.4–2.43 15,153 1297 11.7 100 89.6 0.048 0.029 0.736

33.4–2.45 13,653 1261 10.8 97.2 92.2 0.034 0.020 0.710

28.9–2.33 14,526 1287 11.3 96.6 91.0 0.052 0.051 0.744

28.9–2.36 11,182 1231 9.1 92.2 88.7 0.054 0.056 0.663

28.9–2.36 10,929 1208 9.0 90.3 88.1 0.052 0.034 0.646

28.9–2.46 6357 1076 5.9 79.8 87.4 0.048 0.025 0.707

2.35/3.65

1.92/3.25

1.11/3.02

1.15/2.90

1.47/2.49

1.89/3.34

2.02/3.11

1.47/2.91

4.84 –4.122/ 3.752 –4.122/ 3.752

4.30 –6.380/ 3.814 –10.931/ 8.186

2.74 –9.634/ 2.163 –14.687/ 2.913

1.93 –2.811/ 0.543 –2.811/ 0.543

2.52 –4.122/ 3.752 –4.122/ 3.752

4.83 –6.380/ 3.814 –13.604/ 4.561

2.66 –9.634/ 2.163 –9.486/ 1.952

1.85 –2.977/ 0.536 –2.977/ 0.536

The crystal is in space group P6322 with unit-cell dimensions a = b = 37.56 Å and c = 65.39 Å. There is one strand per asymmetric unit corresponding to a volume of 739 Å3/residue; the calculated solvent content is approximately 36%. *Rsym = ΣhklΣi |Ii – 〈I+/–〉| / Σhkl 〈I+/–〉. †Ranom = Σhkl |〈I+〉 – 〈I–〉| / Σhkl (〈I+〉 + 〈I–〉) for I+ and I– on the same or adjacent images. ‡RCullis = 〈phase-integrated lack of closure〉 / 〈|FPH – FP|〉. §Phasing power = 〈{|FH(calc)|/ phase-integrated lack of

closure}〉. ¶Theoretical f′ and f″ values estimated from [109]. #Experimentally refined values of f′ and f″ were calculated during the heavy-atom refinement procedure in SHARP. The f′ and f″ values were held constant for wavelengths remote from the Br K-edge, and they are strongly correlated with other parameters in the heavy-atom refinement (e.g. occupancy, thermal factors, scaling between crystals, resolution, etc).

Research Article Structure of an adenine zipper in DNA Shepard et al.

with SHARP, the tracing of the entire DNA molecule including the phosphate backbone became trivial. The exceptional quality of the map is such that the zipper-like motif became unambiguous and even some structural water molecules could be clearly seen. The initial model was built using an interactive graphics interface.

Refinement of the structure The refinement was performed with the data set recorded at high resolution (2.1 Å). Beyond the last phosphate group (position 9) the electron density is too feeble to build the last base, a thymine, which is disordered as it points towards a threefold axis. A first round of refinement was conducted with XPLOR [107] at the maximum resolution then with the SHELX97 program [108]. Water molecules were introduced during the course of the refinement following electron-density map inspections on a graphics system. The final model comprises 185 atoms (including the disordered thymine), a third of a cobalt hexammine cation, a chlorine ion and 18 water molecules. The R factor is 18.9% for all data (1832 structure factors); the corresponding Rfree is 24.1%. The root mean square (rms) bond distances and angles are 0.015 Å and 2.4°, respectively.

Accession numbers The coordinates of the d(GCGAAAGCT) structure have been deposited with the Nucleic Acid Data Bank (accession code UDIB70) and are on hold for one year.

Acknowledgements We thank F Fossard, C Arrachart and C Maman for assistance during data collection and processing. We are also especially grateful to D Ragonnet and D Chandesris for aid and support on the DW21 beam line. We are grateful to Olga Kennard and the members of the CCDC laboratory (Cambridge, UK) where WBTC initiated this project.

References 1. Fiddes, J.C., Barrell, B.G. & Godson, G.N. (1978). Nucleotide sequences of the separate origins of synthesis of bacteriophage G4 viral and complementary DNA strands. Proc. Natl Acad. Sci. USA 75, 1081-1085. 2. Sims, J. & Dressler, D. (1978). Site-specific initiation of a DNA fragment: nucleotide sequence of the bacteriophage G4 negativestrand initiation site. Proc. Natl Acad. Sci. USA 75, 3094-3098. 3. Sims, J., Capon, D. & Dressler, D. (1979). DnaG (primase)-dependent origins of DNA replication. Nucleotide sequences of the negative strand initiation sites of bacteriophages St-1, phi K, and alpha 3. J. Biol. Chem. 254, 12615-12628. 4. Astell, C.R., Smith, M., Chow, M.B. & Ward, D.C. (1979). Structure of the 3′ hairpin termini of four rodent parvovirus genomes: nucleotide sequence homology at origins of DNA replication. Cell 17, 691-703. 5. Hirao, I., Ishida, M., Watanabe, K. & Miura, K. (1990). Unique hairpin structures occurring at the replication origin of phage G4 DNA. Biochim. Biophys. Acta 1087, 199-204. 6. Yoshizawa, S., Ueda, T., Ishido, Y., Miura, K., Watanabe, K. & Hirao I. (1994). Nuclease resistance of an extraordinarily thermostable minihairpin DNA fragment, d(GCGAAGC) and its application to in vitro protein synthesis. Nucleic Acids Res. 22, 2217-2221. 7. Hirao, I., Naraoka, T., Kanamori, S., Nakamura, M. & Miura, K. (1988). Synthetic oligodeoxyribonucleotides showing abnormal mobilities on polyacrylamide gel electrophoresis. Biochem. Int. 16, 157-162. 8. Tanikawa, J., Nishimura, Y., Hirao, I. & Miura, K. (1991). NMR spectroscopic study of single-stranded DNA fragments of d(CGGCGAAAGCCG) and d(CGGCAAAAGCCG). Nucleic Acids Symp. Ser. 25, 47-48. 9. Hirao, I., Nishimura, Y., Tagawa, Y., Watanabe, K. & Miura, K. (1992). Extraordinarily stable mini-hairpins: electrophoretical and thermal properties of the various sequence variants of d(GCGAAAGC) and their effect on DNA sequencing. Nucleic Acids Res. 20, 3891-3896. 10. Hirao, I., Nishimura, Y., Naraoka, T., Watanabe, K., Arata, Y. & Miura, K. (1989). Extraordinary stable structure of short single-stranded DNA fragments containing a specific base sequence: d(GCGAAAGC). Nucleic Acid Res. 17, 2223-2231. 11. Woese, C.R., Winker, S. & Gutell, R.R. (1990). Architecture of ribosomal RNA: constraints on the sequence of “tetra-loops”. Proc. Natl Acad. Sci. USA 87, 8467-8471.

859

12. Antao, V.P., Lai, S.Y. & Tinoco, I. Jr (1991). A thermodynamic study of unusually stable RNA and DNA hairpins. Nucleic Acids Res. 19, 5901-5905. 13. Heus, H.A. & Pardi, A. (1991). Structural features that give rise to the unusual stability of RNA hairpins containing GNRA loops. Science 253, 191-194. 14. Jucker, F.M., Heus, H.A., Yip, P.F., Moors, E.H. & Pardi, A. (1996). A network of heterogeneous hydrogen bonds in GNRA tetraloops. J. Mol. Biol. 264, 968-980. 15. Grady, D.L., Ratliff, R.L., Robinson, D.L., McCanlies, E.C., Meyne, J. & Moyzis, R.K. (1992). Highly conserved repetitive DNA sequences are present at human centromeres. Proc. Natl Acad. Sci. USA 89, 1695-1699. 16. Nakaseko, Y., Adachi, Y., Funahashi, S., Niwa, O. & Yanagida, M. (1986). Chromosome walking shows a highly homologous repetitive sequence present in all the centromere regions of fission yeast. EMBO J. 5, 1011-1021. 17. Nakaseko, Y., Kinoshita, N. & Yanagida, M. (1987). A novel sequence common to the centromere regions of Schizosaccharomyces pombe chromosomes. Nucleic Acids Res. 15, 4705-4715. 18. Masumoto, H., Masukata, H., Muro, Y., Nozaki, N. & Okazaki, T. (1989). A human centromere antigen (CENP-B) interacts with a short specific sequence in alphoid DNA, a human centromeric satellite. J. Cell Biol. 109, 1963-1973. 19. Wevrick, R., Earnshaw, W.C., Howard-Peebles, P.N. & Willard, H.F. (1990). Partial deletion of alpha satellite DNA associated with reduced amounts of the centromere protein CENP-B in a mitotically stable human chromosome rearrangement. Mol. Cell Biol. 10, 6374-6380. 20. Willard, H.F. (1990). Centromeres of mammalian chromosomes. Trends Genet. 6, 410-416. 21. Catasti, P., et al., & Bradbury, E.M. (1994). Unusual structures of the tandem repetitive DNA sequences located at human centromeres. Biochemistry 33, 3819-3830. 22. Chou, S.H., Zhu, L. & Reid, B.R. (1994). The unusual structure of the human centromere (GGA)2 motif. Unpaired guanosine residues stacked between sheared G·A pairs. J. Mol. Biol. 244, 259-268. 23. Zhu, L., Chou, S.H. & Reid, B.R. (1995). The structure of a novel DNA duplex formed by human centromere d(TGGAA) repeats with possible implications for chromosome attachment during mitosis. J. Mol. Biol. 254, 623-637. 24. Chou, S.H., Zhu, L. & Reid, B.R. (1996). On the relative ability of centromeric GNA triplets to form hairpins versus self-paired duplexes. J. Mol. Biol. 259, 445-457. 25. Gilbert, D.E. & Feigon, J. (1991). Structural analysis of drug–DNA interactions. Curr. Opin. Struct. Biol. 1, 439-445. 26. Cruse, W.B.T., Saludjian, P., Leroux, Y., Léger, G., El Manouni, D. & Prangé, T. (1996). A continuous transition from A to B DNA in the 1:1 complex between nogalamycin and the hexamer dCCCGGG. J. Biol. Chem. 271, 15558-15566. 27. Chou, S.H., Zhu, L. & Reid, B.R. (1997). Sheared purine x purine pairing in biology. J. Mol. Biol. 267, 1055-1067. 28. Pley, H.W., Flaherty, K.M. & McKay, D. (1994). The three-dimensional structure of a hammerhead ribozyme. Nature 372, 68-74. 29. Baeyens, K.J., De Bondt, H.L., Pardi, A. & Holbrook, S.R. (1996). A curved RNA helix incorporating an internal loop with G-A and A-A non Watson–Crick base pairing. Proc. Natl Acad. Sci. USA 93, 12851-12855. 30. Li, Y., Zon, G. & Wilson, W.D. (1991). Thermodynamics of DNA duplexes with adjacent G·A mismatches. Biochemistry 30, 7566-7572. 31. Aboul-ela, F., Koh, D., Tinoco, I. Jr & Martin, F.H. (1985). Base-base mismatches. Thermodynamics of double helix formation for dCA3XA3G + dCT3YT3G (X, Y = A,C,G,T). Nucleic Acids Res. 13, 4811-4824. 32. Cruse, W.B.T., Saludjian, P., Biala, E., Strazewski, P., Prangé, T. & Kennard, O. (1994). Structure of a mispaired RNA double helix at 1.6 Å resolution and implications for the prediction of RNA secondary structure. Proc. Natl Acad. Sci. USA 91, 4160-4164. 33. Zhu, L., Chou, S.H., Xu, J. & Reid, B.R. (1995). Structure of a singlecytidine hairpin loop formed by the DNA triplet GCA. Nat. Struct. Biol. 2, 1012-1017. 34. Gehring, K., Leroy, J.L. & Gueron, M. (1993). A tetrameric DNA structure with protonated cytosine·cytosine base pairs. Nature 363, 561-565. 35. Ahmed, S. & Henderson, E. (1992). Formation of novel hairpin structures by telomeric C-strand oligonucleotides. Nucleic Acids Res. 20, 507-511. 36. Ahmed, S., Kintanar, A. & Henderson, E. (1994). Human telomeric Cstrand tetraplexes. Nat. Struct. Biol. 1, 83-88.

860

Structure 1998, Vol 6 No 7

37. Chen, L., Cai, L., Zhang, X. & Rich, A. (1994). Crystal structure of a fourstranded intercalated DNA: d(C4). Biochemistry 33, 13540-13546. 38. Kang, C.H., Berger, I., Lockshin, C., Ratliff, R., Moyzis, R. & Rich, A. (1994). Crystal structure of intercalated four-stranded d(C3T) at 1.4 Å resolution. Proc. Natl Acad. Sci. USA 91, 11636-11640. 39. Rohozinski, J., Hancock, J.M. & Keniry, M.A. (1994). Polycytosine regions contained in DNA hairpin loops interact via a four-stranded, parallel structure similar to the i-motif. Nucleic Acids Res. 22, 4653-4659. 40. Kang, C., Berger, I., Lockshin, C., Ratliff, R., Moyzis, R. & Rich, A. (1995). Stable loop in the crystal structure of the intercalated fourstranded cytosine-rich metazoan telomere. Proc. Natl Acad. Sci. USA 92, 3874-3878. 41. Berger, I., Kang, C., Fredian, A., Ratliff, R., Moyzis, R. & Rich, A. (1995). Extension of the four-stranded intercalated cytosine motif by adenine.adenine base pairing in the crystal structure of d(CCCAAT). Nat. Struct. Biol. 2, 416-425. 42. Leroy, J.L. & Gueron, M. (1995). Solution structures of the i-motif tetramers of d(TCC), d(5methylCCT) and d(T5methylCC): novel NOE connections between amino protons and sugar protons. Structure 3, 101-120. 43. Nonin, S. & Leroy, J.L. (1996). Structure and conversion kinetics of a bi-stable DNA i-motif: broken symmetry in the [d(5mCCTCC)]4 tetramer. J. Mol. Biol. 261, 399-414. 44. Nonin, S., Phan, A.T. & Leroy, J.L. (1997). Solution structure and base pair opening kinetics of the i-motif dimer of d(5mCCTTTACC): a noncanonical structure with possible roles in chromosome stability. Structure 5, 1231-1246. 45. Gallego, J., Chou, S.H. & Reid, B.R. (1997). Centromeric pyrimidine strands fold into an intercalated motif by forming a double hairpin with a novel T:G:G:T tetrad: solution structure of the d(TCCCGTTTCCA) dimer. J. Mol. Biol. 273, 840-856. 46. Zimmermann, G.R., Jenison, R.D., Wick, C.L., Simorre, J.P. & Pardi, A. (1997). Interlocking structural motifs mediate molecular discrimination by a theophylline-binding RNA. Nat. Struct. Biol. 4, 644-649. 47. Frederick, C.A., Coll, M., van der Marel, G·A., van Boom, J.H. & Wang, A.H. (1988). Molecular structure of cyclic deoxydiadenylic acid at atomic resolution. Biochemistry 27, 8350-8361. 48. Guan, Y., Gao, Y.G., Liaw, Y.C., Robinson, H. & Wang, A.H. (1993). Molecular structure of cyclic diguanylic acid at 1 Å resolution of two crystal forms: self-association, interactions with metal ion/planar dyes and modeling studies. J. Biomol. Struct. Dyn. 11, 253-276. 49. Wemmer, D.E., Chou, S.H., Hare, D.R. & Reid, B.R. (1985). Duplex–hairpin transitions in DNA: NMR studies on CGCGTATACGCG. Nucleic Acids Res. 13, 3755-3772. 50. Rinkel, L.J., van der Marel, G.A., van Boom, J.H. & Altona, C. (1987). Influence of N6-methylation of residue A(5) on the conformational behaviour of d(C-C-G-A-A-T-T-C-G-G) in solution studied by 1HNMR spectroscopy. 2. The hairpin form. Eur. J. Biochem. 163, 287-296. 51. Miller, M., et al., & Sussman, J.L. (1987). Conformational transitions of synthetic DNA sequences with inserted bases, related to the dodecamer d(CGCGAATTCGCG). Nucleic Acids Res. 15, 3877-3890. 52. Hald, M., Pedersen, J.B., Stein, P.C., Kirpekar, F. & Jacobsen, J.P. (1995). A comparison of the hairpin stability of the palindromic d(CGCG(A/T)4CGCG) oligonucleotides. Nucleic Acids Res. 23, 4576-4582. 53. Boulard, Y., et al., & Fazakerley, G.V. (1991). The solution structure of a DNA hairpin containing a loop of three thymidines determined by nuclear magnetic resonance and molecular mechanics. Nucleic Acids Res. 19, 5159-5167. 54. Summers, M.F., Byrd, R.A., Gallo, K.A., Samson, C.J., Zon, G. & Egan, W. (1985). Nuclear magnetic resonance and circular dichroism studies of a duplex—single-stranded hairpin loop equilibrium for the oligodeoxyribonucleotide sequence d(CGCGATTCGCG). Nucleic Acids Res. 13, 6375-6386. 55. Garcia, A.E., Gupta, G., Sarma, M.H. & Sarma, R.H. (1988). Stability and motion of a hairpin and the corresponding mismatched duplex: a theoretical exploration using molecular mechanics and normal mode analysis of 2D NMR results on d(GCCGCAGC). J. Biomol. Struct. Dyn. 6, 525-542. 56. Garcia, A.E., Gupta, G., Soumpasis, D.M. & Tung, C.S. (1990). Energetics of the hairpin to mismatched duplex transition of d(GCCGCAGC) on NaCl solution. J. Biomol. Struct. Dyn. 8, 173-186. 57. Santhana-Mariappan, S.V., et al., & Gupta, G. (1996). Solution structures of the individual single strands of the fragile X DNA triplets (GCC)n & (GGC)n. Nucleic Acids Res. 24, 784-792.

58. Catasti, P., Chen, X., Deaven, L.L., Moyzis, R.K., Bradbury, E.M. & Gupta, G. (1997). Cytosine-rich strands of the insulin minisatellite adopt hairpins with intercalated cytosine+·cytosine pairs. J. Mol. Biol. 272, 369-382. 59. Singh, S., Patel, P.K. & Hosur, R.V. (1997). Structural polymorphism and dynamism in the DNA segment GATCTTCCCCCCGGAA: NMR investigations of hairpin, dumbbell, nicked duplex, parallel strands, and i-motif. Biochemistry 36, 13214-13222. 60. Hirao, I., et al., & Miura, K. (1994). Most compact hairpin-turn structure exerted by a short DNA fragment, d(GCGAAGC) in solution: an extraordinarily stable structure resistant to nucleases and heat. Nucleic Acids Res. 22, 576-582. 61. Yoshizawa, S., Kawai, G., Watanabe, K., Miura, K. & Hirao, I. (1997). GNA trinucleotide loop sequences producing extraordinarily stable DNA minihairpins. Biochemistry 36, 4761-4767. 62. Gupta, G., et al., & Erdmann, V.A. (1987). DNA hairpin structures in solution: 500-MHz two-dimensional 1H NMR studies on d(CGCCGCAGC) and d(CGCCGTAGC). Biochemistry 26, 7715-7723. 63. Gao, Y.G., Robinson, H., van Boom, J.H. & Wang, A. (1995). Influence of counterions on the crystal structure of DNA decamers: binding of [Co(NH3)6]3+ and Ba2+ to A-DNA. Biophys. J. 69, 559-568. 64. Cate, J.H. & Doudna, J.A. (1996). Metal-binding sites in the major groove of a large ribozyme domain. Structure 4, 1221-1229. 65. Nunn, C.M. & Neidle, S. (1996). The high resolution crystal structure of a DNA decamer d(AGGCATGCCT). J. Mol. Biol. 256, 340-351. 66. Nunn, C.M. (1996). Crystal structure of d(AGGBrCATGCCT): implications for cobalt hexammine binding to DNA. J. Biomol. Struct. Dyn. 14, 49-56. 67. Geissner, R.V., Quigley, G.J, Wang, H.J., van der Marel, A., van Boom, J.H. & Rich, A. (1985). Structural basis for stabilization of Z-DNA by cobalt hexaammine and magnesium cations. Biochemistry 24, 237-240. 68. Peck, L.J., Nordheim, A., Rich, A. & Wang, A. (1982). Flipping of cloned d(pCpG).d(pCpG)n DNA sequences from right- to left-handed helical structure by salt, Co(III) or negative supercoiling. Proc. Natl Acad. Sci. USA 79, 4560-4564. 69. Brennan, R.G., Westhof, E. & Sundaralingam, M. (1986). Structure of a Z-DNA with two different backbone chain conformations. Stabilization of the decadeoxyoligonucleotide d(CGTACGTACG) by [Co(NH3)6]3+ binding to the guanine. J. Biomol. Struct. Dyn. 3, 649-665. 70. Pan, B., Ban, C., Wahl, M.C. & Sundaralingam, M. (1997). Crystal structure of d(GCGCGCG) with 5′-overhang G residues. Biophys. J. 73, 1553-1561. 71. Kieft, J.S. & Tinoco, I. Jr (1997). Solution structure of a metal-binding site in the major groove of RNA complexed with cobalt (III) hexammine. Structure 5, 713-721. 72. Hingerty, B.E., Brown, R.S. & Klug, A. (1982). Stabilization of the tertiary structure of yeast phenylalanine tRNA by [Co(NH3)6]3+. X-ray evidence for hydrogen bonding to pairs of guanine bases in the major groove. Biochim. Biophys. Acta 697, 78-82. 73. Ho, P.S., Frederick, C.A., Saal, D., Wang, A.H. & Rich, A. (1987). The interactions of ruthenium hexaammine with Z-DNA: crystal structure of a Ru(NH3)6+3 salt of d(CGCGCG) at 1.2 Å resolution. J. Biomol. Struct. Dyn. 5, 521-534. 74. Bauer, C. & Wang, A.H. (1997). Bridged cobalt amine complexes induce DNA conformational changes effectively. J. Inorg. Biochem. 68, 129-135. 75. Cheatham, T.E. 3rd & Kollman, P.A. (1997). Insight into the stabilization of A-DNA by specific ion association: spontaneous B-DNA to A-DNA transitions observed in molecular dynamics simulations of d[ACCCGCGGGT]2 in the presence of hexaamminecobalt(III). Structure 5, 1297-1311. 76. Robinson, H. & Wang, A.H. (1996). Neomycin, spermine and hexaamminecobalt (III) share common structural motifs in converting B- to A-DNA. Nucleic Acids Res. 24, 676-682. 77. Xu, Q., Shoemaker, R.K. & Braunlin, W.H. (1993). Induction of B–A transitions of deoxyoligonucleotides by multivalent cations in dilute aqueous solution. Biophys. J. 65, 1039-1049. 78. Duckett, D.R., Murchie, A.I.H. & Lilley, D.M.J. (1990). The role of metal ions in the conformation of the four-way DNA junctions. EMBO J. 9, 583-590. 79. Cate, J.H., Hanna, R.L. & Doudna, J.A. (1997). A magnesium ion core at the heart of a ribozyme domain. Nat. Struct. Biol. 4, 553-558. 80. Cate, J.H., et al., & Doudna, J.A. (1996). RNA tertiary structure mediation by adenosine platforms. Science 273, 1696-1699. 81. Hampel, A. & Cowan, J.A. (1997). A unique mechanism for RNA catalysis: the role of metal cofactors in hairpin ribozyme cleavage. Chem. Biol. 4, 513-517.

Research Article Structure of an adenine zipper in DNA Shepard et al.

82. Grosschedl, R. & Hobom, G. (1979). DNA sequences and structural homologies of the replication origins of lambdoid bacteriophages. Nature 277, 621-627. 83. Denniston-Thompson, K., Moore, D.D., Kruger, K.E., Furth, M.E. & Blattner, F.R. (1977). Physical structure of the replication origin of bacteriophage lambda. Science 198, 1051-1056. 84. Meijer, M., et al., & Schaller, H. (1979). Nucleotide sequence of the origin of replication of the Escherichia coli K-12 chromosome. Proc. Natl Acad. Sci. USA 76, 580-584. 85. Sugimoto, K., et al., & Hirota, Y. (1979). Nucleotide sequence of Escherichia coli K-12 replication origin. Proc. Natl Acad. Sci. USA 76, 575-579. 86. Astell, C.R., Chow, M.B. & Ward, D.C. (1985). Sequence analysis of the termini of virion and replicative forms of minute virus of mice DNA suggests a modified rolling hairpin model for autonomous parvovirus DNA replication. J. Virol. 54, 171-177. 87. Willwand, K., Mumtsidu, E., Kuntz-Simon, G. & Rommelaere, J. (1998). Initiation of DNA replication at palindromic telomeres is mediated by a duplex-to-hairpin transition induced by the minute virus of mice nonstructural protein NS1. J. Biol. Chem. 273, 1165-1174. 88. Tattersall, P. & Ward, D.C. (1976). Rolling hairpin model for replication of parvovirus and linear chromosomal DNA. Nature 263, 106-109. 89. Costello, E., Sahli, R., Hirt, B. & Beard, P. (1995). The mismatched nucleotides in the 5′-terminal hairpin of minute virus of mice are required for efficient viral DNA replication. J. Virol. 69, 7489-7496. 90. Marvin, D.A. & Wachtel, E.J. (1976). Structure and assembly of filamentous bacterial viruses. Phil. Trans. R. Soc. Lond. (Biol. Sci.) B 276, 81-98. 91. Marvin, D.A., Nave, C., Ladner, J.E., Fowler, A.G., Brown, R.S. & Wachtel, E.J. (1981). Macromolecular structural transitions in Pf1 filamentous bacterial virus. In Structural Aspects of Recognition and Assembly in Biological Macromolecules. (Balaban, M., Sussman, J.L., Traub, W. & Yonath, A. eds), pp. 891-910, Balaban ISS, Rehovot, Israel. 92. Marvin, D.A., Nave, C., Bansal, M., Hale, R.D. & Salje, E.K.H. (1992). Two forms of Pf1 inovirus: X-ray diffraction studies on a structural phase transition and a calculated liberation normal mode of the asymmetric unit. Phase Transitions 39, 45-80. 93. Viswamitra, M.A. & Pandit, J. (1983). A proposal for a specific doublehelical structure in which the polynucleotide strands intercalate instead of forming base-pairs. J. Biomol. Struct. Dyn. 1, 743-753. 94. Polizzi, C. & Clarke, L. (1991). The chromatin structure of centromeres from fission yeast: differentiation of the central core that correlates with function. J. Cell Biol. 112, 191-201. 95. Takahashi, K., Murakami, S., Chikashige, Y., Funabiki, H., Niwa, O. & Yanagida, M. (1992). A low copy number central sequence with strict symmetry and unusual chromatin structure in fission yeast centromere. Mol. Biol. Cell 3, 819-835. 96. Muro, Y., Masumoto, H., Yoda, K., Nozaki, N., Ohashi, M. & Okazaki, T. (1992). Centromere protein B assembles human centromeric alpha-satellite DNA at the 17-bp sequence, CENP-B box. J. Cell Biol. 116, 585-596. 97. Yoda, K., Kitagawa, K., Masumoto, H., Muro, Y. & Okazaki, T. (1992). A human centromere protein, CENP-B, has a DNA binding domain containing four potential alpha helices at the NH2 terminus, which is separable from dimerizing activity. J. Cell Biol. 119, 1413-1427. 98. Cooke, C.A., Bernat, R.L. & Earnshaw, W.C. (1990). CENP-B: a major human centromere protein located beneath the kinetochore. J. Cell Biol. 110, 1475-1488. 99. Carbon, J. & Clarke, L. (1990). Centromere structure and function in budding and fission yeasts. New Biol. 2, 10-19. 100. Ferrer, N., Azorin, F., Villasante, A., Gutierrez, C. & Abad, J.P. (1995). Centromeric dodeca-satellite DNA sequences form fold-back structures. J. Mol. Biol. 245, 8-21. 101. Collaborative Computational Project Number 4. The CCP4 suite: programs for crystallography (1994). Acta Cryst. D 50 760-763. 102. Peterson, M.R., et al., & Helliwell, J.R. (1996). MAD Phasing strategies explored with a brominated oligonucleotide crystal at 1.65 Å resolution. J. Synchr. Rad. 3, 24-34. 103. Fourme, R., Shepard, W. & Kahn, R. (1996). Application of the anomalous dispersion of X-rays to macromolecular crystallography. Prog. Biophys. Mol. Biol. 64, 167-199. 104. Hendrickson, W.A. (1991). Determination of macromolecular structures from anomalous diffraction of synchrotron radiation. Science 254, 51-58. 105. de La Fortelle, E. & Bricogne, G. (1997). Methods in Enzymology: Macromolecular Crystallography. 276 (Sweet, R.M. & Carter, C.W., Jr., eds), pp. 472-494, Academic Press, New York, USA.

861

106. Abrahams, J.P. & Leslie, A.G.W. (1996). Methods used in the structure determination of bovine mitochondrial ATPase. Acta Cryst. D 52, 30-42. 107. Brünger, A. (1992). X-PLOR (version 3.0) Manual. The Howard Hughes Medical Institute, Department of Biophysics and Biochemistry, Yale University, New Haven CT, USA. 108. Sheldrick, G. (1997). SHELX97-Program for Refinement of Crystal Structures. University of Goettingen, Germany. 109. Sasaki, S. (1984). Anomalous Scattering Factors for Synchrotron Radiation Users Calculated Using the Cromer and Liberman Method. National Laboratory for High Energy Physics, KEK, Tsukuba, Japan.