DNA G-quartets in a 1.86 Å resolution structure of an Oxytricha nova telomeric protein-DNA complex1

DNA G-quartets in a 1.86 Å resolution structure of an Oxytricha nova telomeric protein-DNA complex1

doi:10.1006/jmbi.2001.4766 available online at http://www.idealibrary.com on J. Mol. Biol. (2001) 310, 367±377 Ê Resolution Structure of an DNA G-Qu...

687KB Sizes 0 Downloads 38 Views

doi:10.1006/jmbi.2001.4766 available online at http://www.idealibrary.com on

J. Mol. Biol. (2001) 310, 367±377

Ê Resolution Structure of an DNA G-Quartets in a 1.86 A Oxytricha nova Telomeric Protein-DNA Complex Martin P. Horvath* and Steve C. Schultz Department of Chemistry and Biochemistry, University of Colorado, Boulder, CO 803090215, USA

The Oxytricha nova telomere end binding protein (OnTEBP) recognizes, binds and protects the single-stranded 30 -terminal DNA extension found at the ends of macronuclear chromosomes. The structure of this complex shows that the single strand GGGGTTTTGGGG DNA binds in a deep cleft between the two protein subunits of OnTEBP, adopting a non-helical and irregular conformation. In extending the resolution limit of this strucÊ , we were surprised to ®nd a G-quartet linked dimer of the ture to 1.86 A GGGGTTTTGGGG DNA also packing within the crystal lattice and interacting with the telomere end binding protein. The G-quartet DNA exhibits the same structure and topology as previously observed in solution by NMR with diagonally crossing d(TTTT) loops at either end of the four-stranded helix. Additionally, the crystal structure reveals clearly visible Na‡, and speci®c patterns of bound water molecules in the four non-equivalent grooves. Although the G-quartet:protein contact surfaces are modest and might simply represent crystal packing interactions, it is interesting to speculate that the two types of telomeric DNA-protein interactions observed here might both be important in telomere biology. # 2001 Academic Press

*Corresponding author

Keywords: telomere-binding protein; DNA-protein interactions; DNA hydration; sodium ion; quadruplex DNA

Introduction Telomere DNA is composed of tandemly repeated short sequences in which the 30 -terminal strand is rich in G and T residues, e.g. 50 -d(TTAGGG)-30 in vertebrates, 50 -d(TG1-3)-30 in Saccharomyces cerevisiae, 50 -d(TTGGGG)-30 in Tetrahymena, and 50 -d(TTTTGGGG)-30 in Oxytricha nova (reviewed by Zakian1). In all organisms studied to date, the 30 -terminal, G-rich strand extends past the duplex portion of the telomere to form a single strand 30 end.2 ± 4 In cells, this single strand DNA end has been observed to form complexes with speci®c proteins5,6 and to participate in three-stranded D-loop structures.7,8 In solution, single strand telomere DNA can form fourPresent addresses: M. P. Horvath, University of Utah, Biology, 257 South 1400 East, Salt Lake City, UT 841120840, USA; S. C. Schultz, Dine' College, Department of Math and Sciences, P.O. Box 251, Tsaile, AZ 86556, USA. Abbreviations used: OnTEBP, Oxytrichia nova telomere end-binding protein; ssDNA, single strand DNA; 5Br-dU, 5-bromodeoxyuridine; S.A., simulated annealing. E-mail address of the corresponding author: [email protected] 0022-2836/01/020367±11 $35.00/0

stranded structures stabilized by hydrogen-bonded G-quartets.9,10 It is not entirely clear whether G-quartet structures exist in vivo, but if G-quartet structures are important for some aspect of telomere biology it would explain, in part, why G-rich repeats in telomere DNA are conserved throughout evolution. The heterodimeric telomere end binding protein from O. nova (OnTEBP) recognizes, binds, and protects the 16-nucleotide single strand d(TTTTGGGGTTTTGGGG)-30 extension found at the ends of macronuclear chromosomes.11 ± 13 Although the b subunit of OnTEBP promotes the formation of G-quartets in vitro,14 pre-formed G-quartets are not able to assemble into the a:b:ssDNA ternary complex in the absence of Ê resolution crystal structure of ssDNA.15 The 2.8 A OnTEBP complexed with ssDNA showed that the DNA is bound in a deep cleft between the a and b protein subunits and does not form G-quartet structures.16 Instead, the ssDNA at this site is folded into an irregular, largely extended and nonhelical conformation that buries the bases into binding pockets in the protein (Figures 1(a) and 2(a)). Here we have extended the resolution limit for the structure of the OnTEBP:ssDNA complex to # 2001 Academic Press

368

G-Quartets in a Telomere Protein-DNA Complex

Figure 1. Structure of the G-quartet linked G4T4G4 DNA dimer in crystals of the Oxytricha nova telomere end-binding protein (OnTEBP) complexed with telomere DNA. (a) Stereoview of the OnTEBP:ssDNA:G-quartet DNA complex. The newly revealed G-quartet DNA structure is shown with gray deoxyribose backbones and orange bases in the upper left of the complex. The ssDNA bound between the a N-terminal domain and the b subunit is shown with a light gray backbone and dark gray bases. This Figure was made with MOLSCRIPT.37 (b) Stereoview of the G-quartet linked G4T4G4 DNA dimer observed in crystals of the OnTEBP:ssDNA:G-quartet DNA complex. One DNA is colored light gray and the other is dark gray. Phosphorous is colored yellow, phosphate oxygen atoms are colored red, and the 50 and 30 OH termini of the two DNA molecules are also colored red and are labeled. In purple are portions of a simulated annealing (S.A.) omit jFoj ÿ jFcj electron density map calculated to Ê resolution and contoured at 2 1.86 A s. In blue are portions of the same map that likely represent waters but might also represent locations that serve as weak sites of Na‡ interaction (see the text).

Ê . Remarkably, the higher resolution electron 1.86 A density map reveals that a G-quartet structure is present, located at a distinct site removed from the ssDNA binding site (Figure 1(a)). In fact, the structure of the G-quartet DNA is the same as previously seen by NMR in solution.17,18 The G-quartet structure observed here interacts with three symmetry-related telomere end binding protein-ssDNA complexes in the crystal lattice, providing a view of how OnTEBP may recognize Ê G-quartets. Additional aspects of this 1.86 A resolution structure of the OnTEBP:ssDNA:Gquartet DNA complex will be reported separately.

Results and Discussion Structure of the G-quartet linked G4T4G4 DNA dimer Electron density maps calculated with data to Ê resolution clearly de®ne the positions and 1.86 A conformations of all 24 nucleotides of a G-quartet linked dimer of G4T4G4 DNA in crystals of the OnTEBP:ssDNA complex. In this structure, the d(GGGG) bases form four sets of stacked

G-quartets, and the d(TTTT) nucleotides loop diagonally across either end of the four-stranded helix creating two grooves with parallel strands and two grooves with antiparallel strands (Figures 1(b) and 2(b)). This diagonally looped topology is identical with that observed by NMR for a G4T4G4 DNA dimer in solution17,18 (Figure 2(c)). A different joined hairpin topology was reported for a previous X-ray structure19 of the G4T4G4 DNA in which the d(TTTT) nucleotides loop along the sides of the quartet creating four grooves all with antiparallel strands (Figure 2(d)). A third all-parallel topology with four equivalent grooves was observed in the X-ray crystal structure of a G-quartet linked tetramer of TGGGGT DNA.20 These alternative structures all share a stable, central core of stacked G-quartets. The diagonally looped topology observed in the OnTEBP:ssDNA:Gquartet DNA complex and in solution by NMR is thought to be particularly stable from kinetic considerations since all 16 G-G base-pairs must be broken to completely dissociate the DNA dimer. The structural features of the G-quartet described here are very similar to those of the NMR structure,17,21 including the stacking of bases

369

G-Quartets in a Telomere Protein-DNA Complex

Figure 2. Comparison of available structures for G4T4G4 DNA. The bases are shown as yellow boxes for the 50 most G bases, green boxes for the T bases, and blue boxes for the 30 -most G bases. (a) Irregularly folded, non-helical structure of the d(G4T4G4) molecule complexed with the OnTEBP heterodimer.16 (b) G-quartet linked DNA dimer as Ê resolution crystal structure reported here with diagonally crossing d(TTTT) loops and antiparallel/ seen in the 1.86 A parallel strand topology. Contacts with the protein crystal lattice are shown to highlight the unique context of this structure (see also Figure 6). (c) Ensemble of G-quartet linked DNA structures as observed by NMR in solution with diagonal T loops and antiparallel/parallel strand topology17,18,21 identical with that observed in the crystal structure Ê resolution crystal structure19 with two DNA hairreported here. (d) Alternate G-quartet fold as observed in a 2.5 A pins associating in an all-antiparallel strand topology. Superposition of the two diagonally looped structures, the OnTEBP associated G-quartet linked DNA dimer reported here and shown in (b) and an ensemble of minimized Ê . By contrast, superposition of NMR structures shown in (c), yields a root-mean-squared deviation (RMSD) of 1.3 A the OnTEBP associated G-quartet linked DNA with the joined hairpin structures seen in a previous crystal structure Ê. and shown in (d) yields a RMSD of 8.6 A

between quartets and the alternating syn/anti conformation about the N-glycosyl bond for the G bases. The high-resolution X-ray diffraction data additionally reveals the positions of the Na‡ in the center of the helix and several well-ordered water molecules in each of the four grooves. Also, the context of this G-quartet within the crystals of a telomeric nucleoprotein complex shows how the telomere end binding protein of O. nova potentially interacts with G-quartet structures through protein-DNA contacts. Structure of the G-quartets The arrangement and interactions of the G nucleotides in G-quartets have been extensively discussed.17,19 ± 22 Brie¯y, in the structure reported here, each G-quartet layer consists of four guanine bases that hydrogen bond N7 and O6 in one base with N2 and N1 in the adjacent base (Figure 3(a)). The geometry of this base-pairing scheme places all four O6 atoms pointing to the central axis of the helix, creating a monovalent ion-binding site which is occupied by Na‡ (Figure 3(a)). The diagonally Ê ) groove, a looped topology de®nes a wide (10 A Ê ) groove and two intermediate (5 A Ê) narrow (3 A grooves. Both the wide and narrow grooves have antiparallel strands with 2-fold rotational symmetry relating one strand to the other in each groove. The two intermediate grooves have parallel strands, and 2-fold rotational symmetry relates one intermediate groove to the other but does not relate strands within a groove.

The diagonally looped topology imparts to the G-quartets a particular pattern of syn and anti conformations about the N-glycosyl bond. Across the antiparallel wide and narrow grooves a G-G pair is alternatively anti-syn or syn-anti, while across the two equivalent parallel intermediate grooves the G-G pair is alternatively anti-anti or syn-syn. By comparison, each G-G pair for the joined hairpin topology is anti-syn or syn-anti,19 while the G-G pairs for the all-parallel strand topology are always anti-anti.20,22 In G-quartet linked DNA structures, the G-quartet layers stack closely together but the extent of stacking and the particular stacking geometry is unique for the diagonally looped topology. In the step between the two central G-quartet layers the six-membered pyrimidine rings overlap with each other, while in the two steps on either side each of the ®ve-membered imidazole rings overlap. Imidazole ring stacking is similarly observed four times for the end-to-end packing of d(TGGGGT) tetrads in the crystal structure of this DNA20,22 and twice in the joined hairpin topology.19 The repeated occurrence of the ®ve-membered ring stack indicates that this is a particularly favorable arrangement, and we further suggest that having eight such imidazole stacking interactions, which is more than in any other conformer, enhances the relative stability of the diagonally looped topology. Structure of the d(TTTT) loops Each nucleotide of the d(TTTT) loop is clearly visible in our high-resolution electron density map

370

G-Quartets in a Telomere Protein-DNA Complex

Figure 3. Sodium ion coordination in G-quartet linked DNA structures. (a) One of the four G-quartets shown with a S.A. omit electron density map contoured at 1 s. Atoms are colored with carbon and nitrogen gray, oxygen red, phosphorous yellow, and sodium violet. Hydrogen bonds are shown in yellow. The O6 atoms of the four bases in the quartet closely coordinate the central sodium ion. Pairs of water molecules (blue with blue electron density) bind to the grooves along the edges of the bases bridging N2 and C8 positions with O40 and phosphate oxygen atoms of the backbone. Tyrosine 142 from the a subunit (magenta) also participates in this hydrogen-bond network. (b)-(d) Schematic comparing the monovalent ion positions in the G-quartets of (b) the sodium structure reported here, (c) Ê the ammonium form of the same GGGGTTTTGGGG DNA dimer in solution,24 and (d) a sodium-stabilized 0.95 A resolution structure of TGGGGT DNA.20 In (d) eight strands of TGGGGT DNA form two sets of four quartets, and these two quadruplex structures stack head-to-head to form eight continuously stacked layers of G-quartets that coordinate seven Na‡.

ging O40 atom of the deoxyribose group of T6 (Figure 1(b), note: * is used to distinguish sequence positions in the second G4T4G4 DNA from those in the ®rst of the dimer). Interestingly, each of the two d(TTTT) loops are involved in protein-DNA contacts, as described later, yet their conformations are very similar to those observed in solution by NMR suggesting that the d(TTTT) loop is a stable structural component of this G-quartet linked G4T4G4 DNA dimer.

leaving little doubt as to the topology of the structure seen here. Because of previous debates over strand topology in G-quartet linked DNA structure, we nonetheless directly addressed this question by replacing T with 5-bromo-deoxyuridine (5Br-dU) individually at each of the four possible positions. Fourier jFoj ÿ jFoj difference maps for each of these 5Br-dU substituted DNAs con®rmed that the G-quartet found in these crystals is topologically equivalent to the structure of this DNA in solution solved by NMR,17,18 and ruled out the possibility of mixed structures. In the d(TTTT) loop, the bases of G4, T5, T6, and T8 stack together across either end of the helix. The base of T7 ¯ips out in the opposite direction and is sandwiched between the base of G1* and the brid-

Sodium ions in the G-quartet Four sodium ions are clearly seen along the central axis of the G-quartet DNA in the OnTEBP: ssDNA:G-quartet DNA complex (Figure 1(b)).

Table 1. Sodium ion-oxygen valencies and bond lengths Ê) Bond lengths (A

Na‡

nNa‡ a

Valence

1 2 3 4 H2O/Na‡(?) a b

1.05 1.01 1.11 0.95

7 8 8 7

2.32 2.26 2.13 2.25

2.36 2.35 2.27 2.40

2.43 2.48 2.38 2.47

2.52 2.49 2.42 2.67

0.18 0.43

6 6

3.43 2.95

3.45 3.47

3.88 3.48

3.91 3.67

2.55 2.49 2.81 2.73

2.68 3.60 3.41 2.89

3.30 3.69 3.68 3.42

3.74 3.74

4.04 4.10 3.88 4.38 a Ê and n ˆ 4.29 (Nayal & Di Cera35). AuthennNa‡ ˆ (rj/ro)ÿn, where rj is the Na‡ Ð O bond length for the jth oxygen, ro is 1.622 A tic Na‡ binding sites have an expected value of nNa‡ ˆ 1.0. Sites which bind water have a distribution of nNa‡ values with a mean of nNa‡ ˆ 0.18. Bond lengths are for all oxygens within the coordination sphere of the site.

371

G-Quartets in a Telomere Protein-DNA Complex

These ions were positioned on the basis of Fourier jFoj ÿ jFcj difference electron density maps and identi®ed as Na‡ by oxygen valency considerations (Table 1). The two central sodium ions (labeled 2 and 3 in Figure 1(b)) are nearly coplanar with the G bases of the central quartets and are in distorted, octahedral coordination environments. The outer two sodium ions (labeled 1 and 4 in Figure 1(b)) are positioned above and below the planes of the outer quartets towards the T loops and, interestingly, are coordinated by two O2 atoms of bases T5 and T7 in addition to the O6 atoms of the G bases. Na‡ Ð O2 coordination has not been directly observed in the context of this G-quartet-linked DNA structure previously, but is consistent with the idea that changes in T nucleotide positions seen by NMR when sodium ions are exchanged for potassium or ammonium ions are driven by O2 coordination, which is feasible for Na‡ but not for 23 K‡ or NH‡ 4 ions because of steric considerations. Monovalent ions are known to stabilize G-quartets and have been observed previously in structures solved by X-ray crystallography20,22 and by NMR24 (Figure 3). As seen by NMR, three ammonium ions are found sandwiched between the G-quartet planes in the G4T4G4 DNA dimer with diagonally looped topology24 (Figure 3(c)). Ions with relatively large ionic radii like ammonium and potassium presumably reside between the G-quartet planes because they do not ®t into the in-plane site. In a high-resolution crystal Ê ) of parallel-stranded TGGGGT structure (0.95 A DNA depicted in Figure 3(d), sodium ions occupy three types of sites: bipyrimidal coordination sites directly between two G-quartet layers, octahedral coordination sites where the ion is co-planar with the G bases, and intermediate less symmetric positions.20,22 Inter-Na‡ distances in the parallelstranded TGGGGT DNA tetramer structure and for the G4T4G4 DNA dimer described here are greater than the average distance between G-quartet layer (Table 2) as if the Na‡ were repelling each other. The sodium ion is apparently less constrained by steric clashes than potassium or ammonium and, therefore, can occupy a range of positions that reduce electrostatic repulsion between adjacent ions. Two closely coordinated waters cap the ends of the Na‡ Ð Na‡ Ð Na‡ ÐNa‡ array in the structure reported here. These capping waters were positioned on the basis of modest (1 s) electron density in a simulated annealing (S.A.) composite omit

Ê past these map (data not shown). Located 4.0 A capping water molecules are two strong (>2 s) peaks that might represent additional Na‡ sites (labeled a and b in Figure 1(b)). These sites are rich in oxygen atoms contributed by water molecules and by the O2 and O4 atoms of T bases. The calculated Na‡ valency for these two positions is low indicating that these positions are most likely water molecules (Table 1). However, the strong electron density and the oxygen rich coordination environment of these sites suggests that Na‡ might reside at these positions at least some of the time. NMR experiments have shown that NH4‡ move between G-quartet layers in a manner reminiscent of an ion channel.24 In an analogous manner, Na‡ might also shuttle to and from the four G-quartet bound sites perhaps via these outer H2O/Na‡ sites. Water interactions in the four grooves of the Gquartet Water plays a crucial role in the folding, structure, and recognition of macromolecules. The high-resolution X-ray structure described here provides a look at ordered waters associated with a G-quartet linked G4T4G4 DNA dimer structure (Figures 3-5), nicely complementing structural information previously obtained by NMR. In general, the strongest peaks (>2 s) in the S.A. omit electron density maps are associated with water molecules that form two hydrogen bonds with the DNA, and weaker peaks (1-2 s) are observed for water molecules that form only one hydrogen bond with the DNA or that hydrogen-bond with other water molecules (Figure 4). The bidentate interactions are best understood in terms of the geometry of hydrogen-bond donors and acceptors within the DNA structure. The C8-H, N2-H, and N3 atoms of the G bases form the ¯oor of the four grooves, and the phosphodiester backbone of the DNA strands line the grooves to create unique patterns for interactions with water molecules (Figure 4). For instance, in the two equivalent intermediate grooves, the phosphate groups of the 50 -terminal strand point toward the ¯oor of the groove so that water can hydrogen-bond to both a non-bridging oxygen atom of a phosphodiester group and a C8-H or N2-H atom of a G base (Figure 4(b) and (d)). In the wide groove, the phosphodiester backbones of the two antiparallel strands point away from the

Ê ngstroÈms Table 2. Inter-Na‡ distances in A G-quartet structure Diagonally looped [G4T4G4]2 All-parallel [TG4T]4:[TG4T] 4

3.82 4.15

4.34 3.74

4.00 3.46

3.41 3.60 4.25 Ê for the four layers of the diagonally looped topology, and The overall distance from the ®rst to last G-quartet layer is 10.2 A Ê for the 8 layers in the all-parallel stranded structure. So we estimate an average inter-plane distance between G-quartet layers 22.5 A Ê. of 3.3 A

372

Figure 4. Hydration of the G-quartet linked G4T4G4 DNA dimer. Electron density maps and schematic representations are shown side-by-side for water molecules bound in the four grooves. The electron density map is colored gray for the DNA grooves and blue for the water molecules. In the schematic phosphorous atoms are colored yellow, non-bridging phosphate oxygen atoms are red, N2 (and N3 if shown) atoms are green, the C8 atom is gray, water molecules are cyan for 1-2 s peaks in the S.A. omit electron density maps and dark blue for >2 s peaks. The deoxyribose group is a pentagon and the bases are represented as rectangles. The position in the 50 ! 30 sequence and the syn/anti conformation about the N-glycosyl bond of each base is indicated. From top to Ê across), bottom the grooves are the (a) wide groove (10 A Ê across), (c) narrow (3.0 A Ê across), (b) intermediate I (4.6 A Ê across). The two intermediand (d) intermediate II (4.6 A ate grooves and associated hydration patterns are pseudo 2-fold symmetric, so each DNA-water interaction is corroborated by two independent observations. In solution, the wide and narrow grooves are each pseudo 2-fold

G-Quartets in a Telomere Protein-DNA Complex

Figure 5. Distribution of waters interacting with N3, N2, and C8 groups of the G bases. (a) All 16 G-G basepairs are superimposed and shown as a generalized GG pair. The number of observations for each hydration site is indicated. (b) The N2-water-O40 bridge and the C8-water-OP bridge are shown for the anti-anti G-G con®guration. (c) The N2-water-OP bridge and (d) the C8water-O40 bridge are other examples of bidentate water interactions and each of these bridges involves a base in the syn conformation.

groove leaving the bases more exposed such that waters bridge between bases or with an O40 oxygen of a deoxyribose group (Figure 4(a)). In the narrow groove, the phosphodiester backbones of the two antiparallel strands approach one another closely so that most of the potential hydrogenbond donors and acceptors are inaccessible (Figure 4(c)).

symmetric, but water interactions are not exactly repeated in the top and bottom halves of these two grooves presumably because of protein and lattice interactions.

G-Quartets in a Telomere Protein-DNA Complex

Superpositions of water molecules that hydrate the base edges reveal a distinct preference for the N2 and C8 positions (Figure 5(a)). This preference contrasts with that of the B-DNA minor groove in which N3 is more frequently hydrated.25,26 The relatively dehydrated state of the N3 position in the G-quartet-linked DNA dimer appears to result from two considerations. First, in the syn conformation, N3 is covered by the deoxyribose group and, therefore, is inaccessible to water molecules. Second, in the anti conformation the deoxyribosephosphate backbone points away from N3 and, therefore, potential sites for forming a bidentate interaction are too far away. Thus, even if N3 is accessible (e.g. in the wide groove) the backbone points away from the base so that a water molecule hydrogen-bonded to N3 cannot form a second interaction with the DNA. The N2-H and C8-H atoms, on the other hand, do point toward the deoxyribose backbone in all four grooves and each is accessible to water whether the base is syn or anti. Water molecules hydrogen-bonded to N2-H and C8-H can, therefore, form stable bridges with O40 and phosphate oxygens and many different combinations are observed (Figures 4 and 5). Three examples of water bridging N2 and O40 of the 30 nucleotide are found in the wide and intermediate grooves and these invariably involve an anti-syn 50 ! 30 step. Five examples of water molecules bridging C8 and a partially charged phosphate oxygen atom are

373 found in the narrow and intermediate grooves and the base involved is always anti. In two instances, the N2-water-O40 bridge and the C8-water-OP bridge occur together in the intermediate groove, and one of these is pictured in Figure 5(b). The syn bases also form distinct bidentate interactions. Three examples of water bridging N2 and a 50 phosphate oxygen atom are found in the intermediate and narrow grooves and the base is syn for each (Figure 5(c)). Finally, one example of a C8water-O40 bridge is found in the wide groove for a syn base (Figure 5(d)). Hydrogen bonds involving the aromatic C8-H appear to play an important role in the hydration of the G-quartets with seven of the 25 water molecules bound in the grooves hydrogenbonded to this position. Indeed, C8-H hydrogen bonds are increasingly discussed as generally important in the hydration of DNA.26,27 Each of the C8-H-bonded water molecules seen here is Ê in-plane with the base and is located 3.3 A from the C8 atom, a distance that is somewhat longer than that observed for hydrogen bonds between water and N2, O40 , or phosphate oxygen groups. Five of the water molecules hydrogen-bonded to C8-H also form a hydrogen bond with a partially charged phosphate oxygen atom, consistent with results from a statistical survey of B, A and Z-DNA phosphate oxygen hydration.27 The other two water molecules bridge the C8 and either O40 or another water

Figure 6. Protein:G-quartet interactions. (a) Electrostatic surface potential of the three OnTEBP-ssDNA complexes (I, III, and III) in the region of the G-quartet contact surfaces. Note the concentration of positive electrostatic potential (blue) surrounding the G-quartet structure (shown in white with phosphorous in yellow and phosphate oxygen atoms in red). Surfaces and electrostatic potential were calculated with GRASP.38 (b) Stereoview of G-quartet:protein contacts. The G-quartet molecule is shown in gray with yellow phosphorous and red phosphate oxygen atoms. The three symmetry-related telomere end binding protein-DNA complexes which associate with the G-quartet molecule are shown in different colors: complex I in magenta, complex II in blue, and complex III in green for the a subunit and cyan for the b subunit of the telomere end binding protein. Complex I presents amino acid side-chains which form direct and water-mediated hydrogen bonds with the phosphate groups and the bases of the G-quartet. Nearby, a peptide bond in complex II interacts with the bases of T6 and T8, and an arginine residue makes a water-mediated hydrogen bond with T8. The contacts with complex III are all water-mediated. Interestingly, the extended peptide loop of the b subunit (cyan) forms a ring of water mediated G-quartet contacts, and side-chains from the a subunit reach up through this ring to make water-mediated contacts of their own.

374 molecule, indicating that a phosphate interaction is not required for water to hydrogen-bond to the aromatic C8-H, as was also noted in a Ê resolution.28 crystal structure of B-DNA at 0.74 A The aromatic C6-H (from the pyrimidine bases) also hydrogen-bonds to water,28 and this type of interaction is observed here for the T bases (Figure 6(b)) which are involved in watermediated protein-DNA contacts as described in the next section. Protein:G-quartet interactions The G-quartet structure ®ts snugly into a small, positively charged cavity within the crystal lattice formed by the juxtaposition of three symmetryrelated telomere end binding protein-DNA complexes (Figure 6). Interestingly, even though identical crystals can be obtained at both low (0.05 M) and high (1.25 M) NaCl concentrations, the G-quartet structure appears to be absolutely required for the low-salt crystals to form but is no longer present in the high-salt crystals (data not shown). The higher salt concentration likely screens the positive charges, effectively stabilizing the crystal lattice and at the same time weakening the protein:G-quartet association. Electrostatic interactions evidently play an important role in the OnTEBP:G-quartet associations described here. In addition to positive electrostatic potential, each of the three protein-ssDNA complexes that form the G-quartet cavity (referred to as complexes I, II, and III) present side-chains and peptide backbone groups to form speci®c packing and hydrogen-bonding interactions with the G-quartet structure. The general scheme is that complex I interacts with the wide groove of the quadruplex DNA, complex II interacts with one of the d(TTTT) loops, and complex III interacts with the other d(TTTT) loop (Figure 6(b)). Interactions with complex I involve packing and hydrogen-bond interactions. The side-chain of Tyr142 from the a subunit of complex I inserts itself into the wide groove of the G-quartet structure and packs with the deoxyribose group of nucleotide G4 (Figure 6(b)). The sidechain of Phe141 packs against the other side of this deoxyribose group creating a hydrophobic pocket that snuggly surrounds this sugar moiety (Figure 6(b)). Hydrogen-bonding interactions include the OH group of Tyr142 which hydrogen-bonds with the base of G4 and via a water molecule with the base of T7 (Figure 6(b)). Also, the side-chains of Lys105 and Asn139 hydrogenbond with 50 phosphate oxygen atoms of G4 and G5, respectively. Complex II interacts with the T6 and T8 nucleotides at the top of the G-quartet structure as pictured in Figure 6(b). The base of T6 stacks against the main-chain peptide bond between residues Asp437 and Gly438 of the a subunit. The O group of this peptide bond also hydrogen-bonds with the N3 atom of T8, and the side-chain of Asp437

G-Quartets in a Telomere Protein-DNA Complex

hydrogen-bonds via water with the O4 group of T6. A leucine side-chain (Leu439) packs above the 437-438 peptide bond and against part of the base of T8. Additional hydrogen-bonding interactions include a water-mediated bond between Arg482 and the O2 group of T8 and interestingly, watermediated networks that link the aromatic C6-H of T6 and the backbone O and NH groups of complex II (Figure 6(b)). Complex III interacts with T5*, T6*, and T8* in the other d(TTTT) loop of the G-quartet structure through an expansive network of exclusively water-mediated hydrogen bonds (Figure 6(b)) Interestingly, these interactions largely involve residues in the extended peptide loop of the b subunit (residues 164b-192b) which folds across the C-terminal domain of a (residues 316-495) in the a:b:ssDNA complex. For example, water molecules bridge O and NH backbone positions of residues 165b and 166b with the N3 position of T8*, the phosphate oxygen of T6* and, interestingly, the aromatic C6-H of T6*. Additional networks of water-mediated hydrogen bonds bridge O and NH backbone positions of residues 187b, 190b and 191b with the 50 phosphate groups of T5* and T6*. Thus, several points along the b peptide loop form a ring of G-quartet contacts. Remarkably, residues Asp379 and Arg430 from the a subunit extend through this b subunit ring to hydrogen-bond via water molecules to the G-quartet as well (Figure 6(b)). Implications for telomere function The G-quartet linked DNA dimer observed in crystals of the OnTEBP:ssDNA complex could be a completely opportunistic guest of the crystal lattice. The GGGGTTTTGGGG DNA used in crystallization readily adopts a G-quartet linked dimeric structure in solution if 50 mM NaCl is present.9,29 Furthermore, with its 22 phosphate groups, this molecule would be strongly attracted by the large positive electrostatic potential found in this part of the crystal lattice. Favorable electrostatic interactions and speci®c protein contacts with the G-quartet structure might be taking the place of some other interaction in natural telomeres such as recognition of the 50 -most T nucleotides of the telomere ssDNA extension or non-speci®c interaction of OnTEBP with the duplex region of telomeric DNA. Consistent with this idea, the side-chain of Tyr142, which prominently features in the protein:G-quartet interactions, crosslinks to ssDNA containing 5-bromodeoxyuridine substituted for the second T of the 16-nucleotide 50 -TTTTGGG GTTTTGGGG-30 DNA.30 On the other hand, it is also possible that the O. nova telomere end binding protein is designed to interact with both extended, non-helical, single strand, telomere DNA and with telomere DNA that has folded into a G-quartet structure. This might be a stable or a transient interaction in

375

G-Quartets in a Telomere Protein-DNA Complex

assembling telomere complexes. Hairpin structures have been detected in complex assembly by resonance Raman spectroscopy,31,32 and such hairpins could additionally serve as intermediates to G-quartet formation, or G-quartets might be mimicking these hairpin structures. Since the G-quartet binding site comprises three OnTEBPssDNA complexes (Figure 6), each of which includes a natural ssDNA telomere 30 -end, G-quartets could assist in assembling higher order chromosome structures which contain some chromosome ends joined by G-quartets and other chromosome ends capped by the OnTEBP heterodimer. G-quartet DNA structures have been proposed to participate in telomere biology since it was ®rst deduced that G-rich telomere repeat DNAs could form these structures in solution.9,10,29 The ®nding reported here that a telomere end binding protein can associate with G-quartet structures even while complexed with a distinctly different conformation of the single strand telomere DNA further suggests that multiple DNA structures may play a role in telomere biology.

Table 3. Data collection and re®nement statistics A. Data collection Ê) Wavelength (A Ê) Resolution (A Completeness (%) Rsym (%)a I/s overall (highest shell) Independent reflections Total reflections B. Refinement Rcryst (%)b Rfree (%)b Number atoms Number waters Ê 2) Average B-factors (A a b 28 kDa core ssDNA G-quartet Water molecules r.m.s. deviations Ê) Bond lengths (A Bond angles (deg.)

1.0000 1.86 99.1 6.5 16.9 (2.2) 92,060 903,239 23.0 24.6 6203 503 29.6 46.4 25.6 40.0 40.2 0.007 1.28

a Rsym ˆ jIh ÿ hIh ij/Ih, where hIhi is the average intensity over symmetry equivalents. b Rcryst and Rfree ˆ jjFoj ÿ jFcjj/jFoj, where Fo and Fc are the observed and calculated structure factor amplitudes. Rfree was calculated with 10 % of the re¯ections not used during re®nement.

Materials and Methods Crystals of the OnTEBP complexed with single strand DNA containing the full-length a subunit (495 amino acid residues), the 28 kDa N-terminal core domain of the b subunit (260 amino acid residues), and the 12 nucleotide GGGGTTTTGGGG DNA were obtained by the hanging drop vapor diffusion method as described,16 except that crystals were grown at 4  C instead of at room temperature. Native and 5-Brdeoxyuridine (5Br-dU)-substituted DNAs were purchased from Operon (www.operon.com) and puri®ed by reverse-phase HPLC. Crystals with native 12-mer DNA or with 5Br-dU-substituted DNA were harvested into solutions containing 30 % ethylene glycol, 12 % polyethylene glycol 4000, 40 mM MES buffer (pH 6.6), 50 mM NaCl, 0.002 % NaN3, and 2 mM DTT and stored at 4  C. Crystals were ¯ash frozen in liquid propane and then stored in frozen propane under liquid nitrogen prior to mounting in a stream of evaporated nitrogen maintained at 100 K (Oxford CryosysÊ resolution from three tems). X-ray data to 1.86 A crystals were collected at the Advanced Photon Source, beamline B-14-C using the Quantum-4 CCD detector (ADSC). X-ray data were indexed, integrated and merged with the HKL package.33 The previously Ê resolution structure (PDB accession reported 2.8 A code 1otc) was initially rigid body re®ned followed by simulated annealing and positional re®nement using CNS.34 The G-quartet structure was built into 2jFoj ÿ jFcj and jFoj ÿ jFcj electron density maps one nucleotide at a time so as to avoid preconceived notions of strand topology. Inclusion of the G-quartet structure resulted in a substantial (5 %) drop in R and Rfree. Final re®nement yielded values of R ˆ 23.0 % and Rfree ˆ 24.6 % (Table 3). Bromine peaks in the jFoj ÿ jFoj Fourier difference electron density maps, calculated with data obtained from crystals containing 5Br-dU-substituted DNAs, were located at corresponding 5-methyl positions of T bases in the native DNA and were >8 s in height. Simulated annealing

composite omit maps were calculated with CNS and each annealing cycle started with a temperature of 2000 K. All 506 non-hydrogen atoms of the G4T4G4 DNA dimer were used for superposition and RMSD calculations that were performed with CNS. To compare the one OnTEBP associated G-quartet structure with an ensemble of eight NMR structures or with the two structures seen in a previous crystal structure, superpositions were calculated for individual members of the ensemble and an overall RMSD was calculated as the square root of the mean of the individually squared RMSD values. Sodium ion-oxygen bond valencies were calculated according to the method of Nayal and Di Cera35 using parameters of Brown and Wu.36 Coordinates The atomic coordinates and structure factors for the Ê resolution structure of OnTEBP complexed with 1.86 A ssDNA and with G-quartet have been deposited in the RCSB PDB with the accession code 1JB7.

Acknowledgments M.P.H was a postdoctoral fellow of the Helen Hay Whitney Foundation. This work was funded by the American Cancer Society under grant number RPG-93011-04-NP and the NIH under grant number 1R01CA81109. Use of the Advanced Photon Source was supported by the US Department of Energy under contract number W-31-109-Eng-38. Use of the BioCARS Sector 14 was supported by the National Institutes of Health, National Center for Research

376 Resources, under grant number RR07707. We also wish to thank the people at the beamline BioCARS 14-BM-C of the APS and at beamline X25 of the NSLS for their kind assistance.

References 1. Zakian, V. A. (1995). Telomeres: beginning to understand the end. Science, 270, 1601-1607. 2. Klobutcher, L. A., Swanton, M. T., Donini, P. & Prescott, D. M. (1981). All gene-sized DNA molecules in four species of hypotrichs have the same terminal sequence and an unusual 30 terminus. Proc. Natl Acad. Sci. USA, 78, 3015-3019. 3. Henderson, E. R. & Blackburn, E. H. (1989). An overhanging 30 terminus is a conserved feature of telomeres. Mol. Cell Biol. 9, 345-348. 4. McElligott, R. & Wellinger, R. J. (1997). The terminal DNA structure of mammalian chromosomes. EMBO J. 16, 3705-3714. 5. Gottschling, D. E. & Zakian, V. A. (1986). Telomere proteins: speci®c recognition and protection of the natural termini of Oxytricha macronuclear DNA. Cell, 47, 195-205. 6. Price, C. M. & Cech, T. R. (1987). Telomeric DNAprotein interactions of Oxytricha macronuclear DNA. Genes Dev. 1, 783-793. 7. Grif®th, J. D., Comeau, L., Rosen®eld, S., Stansel, R. M., Bianchi, A., Moss, H. & de Lange, T. (1999). Mammalian telomeres end in a large duplex loop. Cell, 97, 503-514. 8. Murti, K. G. & Prescott, D. M. (1999). Telomeres of polytene chromosomes in a ciliated protozoan terminate in duplex DNA loops. Proc. Natl Acad. Sci. USA, 96, 14436-14439. 9. Sundquist, W. I. & Klug, A. (1989). Telomeric DNA dimerizes by formation of guanine tetrads between hairpin loops. Nature, 342, 825-829. 10. Williamson, J. R., Raghuraman, M. K. & Cech, T. R. (1989). Monovalent cation-induced structure of telomeric DNA: the G-quartet model. Cell, 59, 871880. 11. Price, C. M. & Cech, T. R. (1989). Properties of the telomeric DNA-binding protein from Oxytricha nova. Biochemistry, 28, 769-774. 12. Hicke, B. J., Celander, D. W., MacDonald, G. H., Price, C. M. & Cech, T. R. (1990). Two versions of the gene encoding the 41-kilodalton subunit of the telomere binding protein of Oxytricha nova. Proc. Natl Acad. Sci. USA, 87, 1481-1485. 13. Gray, J. T., Celander, D. W., Price, C. M. & Cech, T. R. (1991). Cloning and expression of genes for the Oxytricha telomere-binding protein: speci®c subunit interactions in the telomeric complex. Cell, 67, 807814. 14. Fang, G. & Cech, T. R. (1993). The beta subunit of Oxytricha telomere-binding protein promotes G-quartet formation by telomeric DNA. Cell, 74, 875-885. 15. Raghuraman, M. K. & Cech, T. R. (1990). Effect of monovalent cation-induced telomeric DNA structure on the binding of Oxytricha telomeric protein. Nucl. Acids Res. 18, 4543-4552. 16. Horvath, M. P., Schweiker, V. L., Bevilacqua, J. M., Ruggles, J. A. & Schultz, S. C. (1998). Crystal structure of the Oxytricha nova telomere end binding protein complexed with single strand DNA. Cell, 95, 963-974.

G-Quartets in a Telomere Protein-DNA Complex 17. Smith, F. W. & Feigon, J. (1992). Quadruplex structure of Oxytricha telomeric DNA oligonucleotides. Nature, 356, 164-168. 18. Smith, F. W. & Feigon, J. (1993). Strand orientation in the DNA quadruplex formed from the Oxytricha telomere repeat oligonucleotide d(G4T4G4) in solution. Biochemistry, 32, 8682-8692. 19. Kang, C., Zhang, X., Ratliff, R., Moyzis, R. & Rich, A. (1992). Crystal structure of four-stranded Oxytricha telomeric DNA. Nature, 356, 126-131. 20. Phillips, K., Dauter, Z., Murchie, A. I., Lilley, D. M. & Luisi, B. (1997). The crystal structure of a parallelÊ resolution. stranded guanine tetraplex at 0.95 A J. Mol. Biol. 273, 171-182. 21. Schultze, P., Smith, F. W. & Feigon, J. (1994). Re®ned solution structure of the dimeric quadruplex formed from the Oxytricha telomeric oligonucleotide d(GGGGTTTTGGGG). Structure, 2, 221-233. 22. Laughlan, G., Murchie, A. I., Norman, D. G., Moore, M. H., Moody, P. C., Lilley, D. M. & Luisi, B. (1994). The high-resolution crystal structure of a parallel-stranded guanine tetraplex. Science, 265, 520524. 23. Schultze, P., Hud, N. V., Smith, F. W. & Feigon, J. (1999). The effect of sodium, potassium and ammonium ions on the conformation of the dimeric quadruplex formed by the Oxytricha nova telomere repeat oligonucleotide d(G4T4G4). Nucl. Acids Res. 27, 3018-3028. 24. Hud, N. V., Schultze, P., Sklenar, V. & Feigon, J. (1999). Binding sites and dynamics of ammonium ions in a telomere repeat DNA quadruplex. J. Mol. Biol. 285, 233-243. 25. Schneider, B., Cohen, D. M., Schleifer, L., Srinivasan, A. R., Olson, W. K. & Berman, H. M. (1993). A systematic method for studying the spatial distribution of water molecules around nucleic acid bases. Biophys. J. 65, 2291-2303. 26. Schneider, B. & Berman, H. M. (1995). Hydration of the DNA bases is local. Biophys. J. 69, 2661-2669. 27. Schneider, B., Patel, K. & Berman, H. M. (1998). Hydration of the phosphate group in double-helical DNA. Biophys. J. 75, 2422-2434. 28. Kielkopf, C. L., Ding, S., Kuhn, P. & Rees, D. C. (2000). Conformational ¯exibility of B-DNA at Ê resolution: d(CCAGTACTGG)2. J. Mol. Biol. 0.74 A 296, 787-801. 29. Sen, D. & Gilbert, W. (1988). Formation of parallel four-stranded complexes by guanine-rich motifs in DNA and its implications for meiosis. Nature, 334, 364-366. 30. Hicke, B. J., Willis, M. C., Koch, T. H. & Cech, T. R. (1994). Telomeric protein-DNA point contacts identi®ed by photo-cross-linking using 5-bromodeoxyuridine. Biochemistry, 33, 3364-3373. 31. Laporte, L. & Thomas, G. J., Jr (1998). A hairpin conformation for the 30 overhang of Oxytricha nova telomeric DNA. J. Mol. Biol. 281, 261-270. 32. Laporte, L., Benevides, J. M. & Thomas, G. J., Jr (1999). Molecular mechanism of DNA recognition by the alpha subunit of the Oxytricha telomere binding protein. Biochemistry, 38, 582-588. 33. Otwinowski, Z. & Minor, W. (1996). Processing of X-ray diffraction data collected in oscillation mode. Methods Enzymol. 276, 307-326. 34. Brunger, A. T., Adams, P. D., Clore, G. M., DeLano, W. L., Gros, P. & Grosse-Kunstleve, R. W., et al. (1998). Crystallography & NMR system: a new soft-

G-Quartets in a Telomere Protein-DNA Complex ware suite for macromolecular structure determination. Acta. Crystallog. sect. D, 54, 905-921. 35. Nayal, M. & Di Cera, E. (1996). Valence screening of water in protein crystals reveals potential Na‡ binding sites. J. Mol. Biol. 256, 228-234. 36. Brown, I. D. & Wu, K. K. (1976). Emperical parameters for calculating cation-oxygen bond valence. Acta. Crystallog. sect. B, 32, 1957-1959.

377 37. Kraulis, P. J. (1991). MOLSCRIPT: a program to produce both detailed and schematic plots of protein structures. J. Appl. Crystallog. 24, 946950. 38. Nicholls, A., Sharp, K. A. & Honig, B. (1991). Protein folding and association: insights from the interfacial and thermodynamic properties of hydrocarbons. Proteins: Struct. Funct. Genet. 11, 281-296.

Edited by I. Tinoco (Received 17 January 2001; received in revised form 10 May 2001; accepted 10 May 2001)