321
Trinucleotide repeat DNA structures: dynamic mutations from dynamic DNA Christopher E Pearson*t and Richard R Sinden* Models for the disease-associated expansion of (CTG)n°(CAG) n, (CGG)n°(CCG) n, and (GAA)n°(TTC) n trinucleotide repeats involve alternative DNA structures formed during DNA replication, repair and recombination. These repeat sequences are inherently flexible and can form a variety of hairpins, intramolecular triplexes, quadruplexes, and slippedstrand structures that may be important intermediates and result in their genetic instability.
hyper-reactivity of even a single (CTG).(CAG) trinucleotide in DNA to chloroacetaldehydc modification. This hyper-reactivity, which is dependent upon Zn 2+ or Co 2+ ions and DNA supercoiling, suggests a DNA structure in which cytosines possess unpaired character. Subsequently, unusual electrophoretic mobility, hyperflexibility, and unusnal properties of nucleosome assembly have been identified.
Addresses *Center for Genome Research, Institute of Biosciences and Technology, Texas A&M University, Texas Medical Center, 2121 West Holcombe, Houston, Texas 77030-3303, USA +Present address: Department of Genetics, Hospital for Sick Children and Department of Molecular and Medical Genetics, University of Toronto, 555 University Avenue, Toronto, Ontario, Canada, M5G 1X8 Correspondence: Richard R Sinden; e-mail:
[email protected]
(CTG)n'(CAG) n and (CGG)n'(CCG) n repeats exhibit unusual fast mobility on polyacrylamide gels
Current Opinion in Structural Biology 1998, 8:321-330
http://biomednet.com/elecref/Og59440XO0800321 © Current Biology Ltd ISSN 0959-440X Abbreviations DM myotonic dystrophy FRAXA fragile X mental retardation h helix repeat Jm molar cyclization factor Papp apparent persistant length SCA spino-cerebellar ataxia Introduction
The expansion of triplet repeats in genomic DNA is associated with human disease (reviewed in [1"]). Nine different disease loci have trustable (CAG)n°(CTG) n repeats, five chromosomal fragile sites contain unstable (C(;(;)n*(CCG) n repeats, and another disease is associated with an unstable (GAA)n.(TTC) n repeat ('Fable 1). The expansion mutation is strongly dependent upon the repeat tract length, such that the expanded alleles are more likely to undergo further expansion [21. In addition, the purity (or homogeneity) of the repeat tract can affect its genetic stability ('Fable 1). Models for expansion invoke the participation of alternative DNA structures in aberrant DNA replication, repair and recombination [1"]. Here we review the unusual helical properties of the repeats and the variety of alternative DNA structures formed by them ('Fable 2) - - characteristics that may be biologically relevant in their genetic instability Unusual helical characteristics associated with (CTG)n°(CAG) n and (CGG)n°(CCG) n repeats
Many unusual properties have been identified for DNA containing (CTG),~°(CAG) n and (CGG),~-(CCG) n repeats. Kohwi et a/. [3] initially presented evidence for an unusual ( C T G ) ' ( C A G ) helix, demonstrating the
Chastain eta/. [4] reported that duplex (CTG)n'(CAG) ncontaining DNA from the myotonic dystrophy (DM) locus and (CGG)n.(CCG)n-Containing DNA from the fragile X mental retardation (FRAXA) locus migrated faster than expected in polyacrylamide gels (up to 20% faster for (CTG)2ss.(CAG)2ss). This unusual mobility is dependent upon repeat length, gel temperature, and the percentage of acrylamide. T h e physical basis for this unusually rapid electrophoretic mobilit> which has rarely been observed previously (see, for discussion, [4,5°]), is not yet understood. Based on a reptation model for electrophoresis [6], Chastain eta/. [4] suggested that this increase in mobility may have resulted from a 20% increase in the apparent persistent length of the DNA from Papp = 43-51 nm, which is possibly indicative of a straighter and perhaps less flexible helix than typical BDNA. More recent data suggest, however, that the rapid mobility may be a resuh of the presence of a long, flexible helical region (see below) [5"]. (CTG)n°(CAG) n and (CGG)n,(CCG) n repeats are inherently flexible
Two recent studies have shown that (CTG)n'(CAG) n and (CGG),~*(CCG),~ repeats are inherently flexible. (CTG)n*(CAG) n and (CGG)n'(CCG),~ repeats exhibit unusually high molar cyclization factors {/,,), suggesting that they are more flexible or curved molecules than normal mixed sequence B-DNA [5",7"']. By analyzing the cyclization kinetics for a series of molecules with increasing numbers of repeats, Bacolla eta/. [7"] reported bending moduli of 1.13 x 10-19 and 1.27 x 10-19 ergocm for (CTG)n,(CAG) n and ((2GG)~'(CCG)n, respectively [7°°]. These values are 40% less than that for random B-DNA, and suggest persistence length values about 60% that for mixed sequence BDNA. \51ues for the torsional moduli and helix repeat (~) were essentially unchanged from that for mixed sequence BDNA. When measured using a topological band shift method, /} varied dramatically as a flmction of the repeat length - - from 9.6 to 10.3 for (CTG)n,(CAG) n and from 10.0 to 11.1 for (CGG)n'(CCG) n [7"1. This deviation in ~} with respect to repeat length could be explained if the different lengths of repeats had a variable twist, which is inconsistent
322
Nucleic acids
Table 1 Trinucleotide repeats associated with human disease. Length Disorder/disease locus
Repeat
Normal
Premutation
Full expansion
Interruption
Fragile XA/FRAXA
(CGG)n°(CCG) n
6-52
59-230
230-2000 (pure)
Fragile XE/FRAXE Fragile XF/FRAXF FRA16A FRA11 B/Jacobsen syndrome Kennedy's disease/SBMA Myotonic dystrophy/DM
(CCG)n.(CGG) n (CGG)n'(CCG)n (CCG)n'(CGG) n (CGG)n°(CCG) n (CAG)n'(CTG) n (CTG)n°(CAG)n
4-39 7-40 16-49 11 14-32 5-37
*(31-61 ) * * 80 * 50-80
Huntington's disease/HD Spino-cerebellar ataxia 1/SCA1
(CAG)n°(CTG) n (CAG)n°(CTG)n
10-34 6-39
36-39 None
200-900 306-1008 1000-1900 100-1000 40-55 80-1000, 2000-3000 (congenital) 40-121 40-81 (pure)
AGG (threshold > 34 pure repeats) None (GCCGTC)3_ 4 complex None None
Spino-cerebellar ataxia 2/SCA2
(CAG)n'(CTG) n
14-31
None
Spino-cerebellar ataxia 3/8CA3, Machado Joseph disease/MJD Spino-cerebellar ataxia 6/SCA6 Spino-cerebellar ataxia 7/SCAT Haw river syndrome/HRS (also DRPLA) Friedreich's ataxia/FRDA
(CAG)n°(CTG)n
13-44
*
34-59 (pure) 60-84
(CAG)n'(CTG) n (CAG)n°(CTG)n (CAG)n'(CTG) n
4-18 ?-17 7-25
* * *
21-28 38-130 49-?5
None None None
(GAA)n°(TTC)n
6-29
*(> 34-40)
200-900
(GAGGAA) (threshold >_34-40 pure repeats)
None None CAT (threshold _>40 pure repeats) CAA None
*A possible mutagenic intermediate length. Not all diseases are associated with premutation and protomutation clinical or repeat length status.
with the./,, calculations, or if the repeat had a quite different writhe from that of mixed sequence DNA. Bacolla eta/. [7"] suggested that (CT(})n'(CAG),~ and ( C G G ) n ' ( C C G ) n repeats are both more flexible and more highly writhed than random B-DNA. Chastain and Sinden showed that short ( C T G ) n ' ( C A G ) n tracts are inherently flexible by examining the effect of triplet repeats upon the electrophoretic mobility of curved and noncurved D N A polymers [5"]. Triplet repeats interspersed between regions of straight D N A showed a slightly reduced electrophoretic mobilit-~; excluding the possibility that (CT(~)n*(CAG) n repeat tracts possess static curvature. When (CTG)4.(CAG) 4 was interspersed between A-tract curves, the reduced electrophoretic mobility due to D N A curvature was dramatically abrogated, consistent with interspersion of a torsionally flexible and/or bendable region that would destroy the helical phasing required for A-tract curvature. T h e effect of (CTG)4.(CAG) 4 was almost identical to that afforded by two consecutive T * T mismatches [(TT)-(T'F)]. This study shows that even the very short (CTG)4e(CAG) 4 tract imparts a very large degree of flexibility to a I)NA helix [5"]. Differential assembly of (CTG)n'(CAG)" and (CGG)no(CCG)" repeats into nucleosomes
( C T G ) n ' ( C A G ) n and ( C G G ) n ' ( C C G ) n tracts exhibit opposite behavior with respect to nucleosome assembly.
(CTG)n*(CAG) n repeats assemble into nucleosomes with much higher efficiency than any othcr D N A sequences
Table 2 Summary of the properties and structures of (CGG)n,(CCG) n, (CTG)n'(CAG) n and (GAA)n'(TTC)n repeats. Repeat
(CGG)n'(CCG)n
Alternative structure
Properties of duplex DNA
S-DNA
Fast electrophoretic gel mobility Hyperflexible, resists assembly into nucleosomes
SI-DNA
(CGG) n
(CCG) n (CTG)n'(CAG) n
Intrastrand hairpins, interstrand duplexes, quadruplex DNA Intrastrand hairpins, interstrand duplexes S-DNA SI-DNA
(CTG) n (CAG) n (GAA)n°(TTC) n (GAA) n (TTC) n
Intrastrand hairpins, interstrand duplexes Intrastrand hairpins Intramolecular triplex DNA ? ?
Fast electrophoretic gel mobility Hyperflexible, preferential assembly into nucleosomes
Trinucleotide r e p e a t D N A s t r u c t u r e s Pearson and Sinden
yet identified [8-10], whereas (CGG)n*(CCG) n repeats assemble into nucleosomes with the least efficiency of any known sequence [11,12]. This difference is surprising given the general observation that flexible DNAs and DNAs that contain inherent curved regions typically assemble preferentially into nucleosomes [13]. Presumably the inherent flexibility of the (CTG)n.(CAG) n repeats alh)ws the sequence to easily adopt the conformation required in order to stably bend around nucleosomes. (CGG)n*(CCG) n repeats are inherently flexible, although they may not easily adopt the stable radius of curvature required for the assembly of DNA into nucleosonms. Length-dcpendent differences have been reported for the effects of methylation at CpG dinucleotides within (CGG)n*(CCG) n repeats; methylated short repeats (n = 13) and long repeats (n = 74-76) assemble into nucleosomes more and less easily than nonmethylated repeats, respectively [12,14J. Both triplet repeats are flexible [5",7"*], although they almost certain-
323
ly have very different biological properties in terms of the chromatin organization upon expansion in the disease state, and also depend upon the state of methylation. Inter- and intramolecular association of triplet repeats: mismatched duplexes and hairpins Mismatched duplexes and hairpins
Single-stranded oligonucieotides of (CCG) n or (CGG)~ repeats can form intermolecular duplexes and intramolecular hairpin duplexes (Figure 1). On native polyacrylamide gels, the migration of intramolecular triplet hairpins is characteristically faster than that of their complementary base-paired duplexes [15-23], whereas the migration of the intermolecular duplexes is slower than that of their corresponding \Vatson-Crick duplexes [20,23,24]. Two alternative schemes of inter or intrastrand (hairpin) mispairing exist for (CGG) n repeats: one in which the G of the GpC step participates in a (G).(G) mismatch
Figure 1
(a)
Watson-Crick
(b)
Interstrand
(c)
Hairpins
5' __ cGGcGGcGGcGGcGGc e (i)
3' - - - - GGCGGCGGCGGCG G c
(ii)
3'
G
5' __ c G G c G G c G G c G G c G G - - 3 ' 3' - - - - GGCGGCGGCGGCGGC - - 5' 5' __ cGGcGGcGGGGGcGG C - - GGCGGCGGCGGCGGC G
G
5' - - CGG CGG CGG eGG CGG 3 ' 3' - - GCC GCC GCC GCC GCC - - 5 '
e-motif
(iii) 5' - - G
CCCCC
5' - - c C G c C G c C G c C G c G G c
C G
3' - - G C C G C C G C C G C C G C C G c
C
CG CG CG CG C 3'
(ii) 5' - (iv)
5' _ cT GoT GeT GeT GoT G 3' -
(iii) (ii)
5' - - C T G
CTG CTG CTG CTG 3 '
3,--GAC
GAC GAC GAC G A C - -
G C
~' _ cT GeT GeT GcT GeT G C
-
3' - - GT CGT CGT CGT CGT C --5'
--CcGCcGCCGCcGCc
3' - - G C C G C C G C C G C C G C C G
(v)
3' - - GT CGT CGT CGT CGT C G T
(vi)
5' _ cA c-cA GcA GcA GcA G C 3' - - GA CGA CGA CGA CGA e G A
5,
Current Opinion in Structural Biology
Structures assumed by the individual single strands of triplet repeats. (a) Canonical Watson-Crick (CGG)n°(CCG) n and (CTG)n°(CAG) n duplexes between complementary strands containing triplet repeats (i and ii, respectively). (b) Intermolecular duplexes formed between individual (CGG)n, (CCG) n, and (CTG) n strands (i, ii and iii, respectively). Intermolecular duplexes have not been reported for (GAG) n strands. Single strands of (CCG) n can form an interstrand duplex, called an e-motif, in which the Cs of the G p C dinucleotide step are extrahelical. (CTG) n and (CGG) n duplexes contain T.T and GoG mispairs, respectively. (¢) A variety of intramolecular duplexes (hairpins) have been reported for (eGG)n, (COG)n, (CTG) n (v) and (GAG) n (vi) repeats. (CGG) n and (CCG) n repeats can adopt two different folding patterns in which the G°G or CoC mispair is flanked by either (CG)°(CG) or (GC)o(GC) (i or ii and iii or iv). The preferred structures are indicated in bold. The larger font denotes the mispaired or extrahelical bases.
324 Nucleicacids
(Figure lci); the other in which the G of the CpG step participates in a (G)°(G) mismatch (Figure lcii). T h e former is the preferred conformation [20,21]. Mitas and colleagues [16] reported structures with the latter mispairing scheme, however, and this exception may be due to the length of the repeat tract or the fact that the authors used sequences with nonrepeating flanking sequences that hydrogen bond, 'forcing' the observed structural alignment. For repeat lengths of n = 5-11, the equilibrium between the (CGG) n hairpin and the [(CGG)~] e duplex is affected by n and the salt concentration. Low values of n (n < 10) and high salt concentration (200 mM NaCI) favor the duplex conformation [20]. With (CGG)ll only the hairpin conformation exists [24]. T h e loops of the CGG hairpins are sensitive to diethyl pyrocarbonate modification [22]. (CCG) n hairpins can contain mismatched Cs that involve the C of the CpG and GpC steps (Figure lciii and iv). The former appears to be the preferred conformation [20], but this may be influenced by the length [21]. (CCG)n, with n = 2-5, forms a staggered interstrand duplex stabilized by Watson-Crick (CG)°(CG) basepairs, in which the (Is of the GpC dinticleotide are extrahelical and located diagonally across the minor groove (called an e-motif) (Figure lbii) [25]. T h e extrahelical Cs appear to be in equilibrium with the protonated stacked form. T h e hmger (CCG)ls tract appears to form an intramolecular hairpin [26 °] with characteristics very similar to an e-motif in which the (] residues are unpaired and cxtrahelical. (CCG) n hairpins are reportedly more stable than (CGG),~ hairpins [20]. Single stranded ((]TG)10 or ((iT(J)30 oligonucleotides can form inter and intrainolecular duplexes (Figure lb,c) [23]. (CAG)10 and (CAG).~ 0, however, only form intrastrand hairpin structures. The hairpin loop typically contains three or four bases [27]. T h e tip of the hairpin loops of both (CT(;),~ and ((lAG) n hairpins are sensitive to Pl nuclease [17,18]. Only at high temperatures (60-70°(]) were the (T)°(T) mismatches in the ((7I'G),~ hairpin sensitive to potassium permanganate modification [18], suggesting a B-like conformation for the hairpin stem.
Hairpin base, base mispairs In ((X;(;) n hairpins with n = 11-15, the (G)°((;) mismatches appear to bc hydrogen bonded [22,241, whereas with n = 3, the mismatches appear to bc unpaired and flexible in xarious d\namic conformational isomers [21]. Additionally, the misntatches alternatc G(s/'n)oG(and) throughout the (]GG hairpin 116,20,24]. Although no C-C interactions should exist in the e-nlotiE h3'drogen bonds have been detected in (CCG) n hairpins with n = 5-7. This is consistent with the presence of a ((])°(C) mispair [21,24]. Longer (CC(;),~ ranlccules, with n > 15, appear to form an c-motif-like intcrmolecular hairpin [26"], in which the c\'tnsine residues are unpaired and extrahclical.
In (CTG) n hairpins, the (T)°(T) mismatches are stacked into the helix and appear to be stabilized by two hydrogen bonds [21,27,28], similar to a wobble (G)*(T) mispair [29]. In (CAG) n hairpins, the (A)*(A) mismatches exhibit conformational instability, showing little, if an B base-base interaction [21,28]. These results provide an explanation for the red(iced biophysical stability of CAG hairpins compared with CTG- hairpins [17,23,28,30]. In either the (CGG) n or (C(]G) n intermolecnlar duplexes or intramolecular hairpins, the cytosine residues of the CpG step are either (C)-(C) mismatches, or are destabilized by either the adjacent mispaired (G)*(G) residues or the adjacent extrahelical cytosine residues (e-motif) (Figure 1). Interestingly, the cytosine residues of the CpG step are methylated in expanded FRAXA repeats. Presumably, the destabilization facilitates the flipping of the cytosine away from the helix and into the active site (if the CpG methylase.
Thermodynamics of mispaired duplexes Analysis of the thermodynamic parameters (Tin,i- v and AG~7(:) of single-stranded repeats revealed that the formation of intrastrand structures (hairpins) from Watson-Crick duplexes is favored for the disease-assnciated (CTG)n°(CAG) n and (CGG)n°(C(]G),~ rcpeats relatixe to the nondiseasc-associated (GTC),~°(GA(]),~ repeats [21]. Petruska eta/. [23] measured the length independence of the T m ~alues for (CTG),~ or (CAG-)n hairpins: the differences were 0°C and I°C, rcspectively, for repeats of n = 10 and 30 [23]. Similarl> only a minor difference of 7°(] existed between the T m values for ((]TG)10.(CAG)10 and ((]TG).~0°(CAG).~ 0 duplexes. These findings contrast with the predicted direct relationship between repeat length and T,n [17,30]. The actual length independence of T m suggests that rather than forming a single large hairpin, the longer tract mav forrn lnultiple short hairpin structures [23].
Triplex D N A structures in (GAA)n'(TTC) n Honmpurineohonlopyrimidinc tracts that contain mirror repeat symmetry can forn~ four different intramolecular triplex DNA structural isomers (see, for review, [311). This sequence requirement is met I1\ the ((;AA),°(TTC),~ tract, which forms intramolecular triplex I)NA (Figure 2a) [32]. Within this sequence, the 3' half of the T T C strand folds into the major grooxe of the other half (if the Pu*Pv tract, thus forming Hoogsteen hydrogen bonds and creating C+°G°C and 7'°A°T base triads (the italicized base represcnts the third strand). The formatinn of this I)NA structure is Llvorcd by a low pit (as a restllt of the required protonation of the third strand cytosine) and DNA supercoiling. These structures can exist in cells [31 ]. An intennolecular triplex may also form if an appropriate single-stranded triplet-containing strand is available (Figure 2b).
Trinucleotide repeat DNA structures Pearson and Sinden
cluded the formation of any other quadruplex, and it is not known whether this would be a preferred folding scheme within hmger repeats. Many triplet repeat tracts clearly form hairpins, and thesc triplet repeats may fold into quadrt, plex or other higher order structures. A rigorous demonstration of any such structures, however, is not yet available,
Figure 2 (a) d',
% 02"2, "g G A A O.x
GAAG'~A
GAA
GAAQ44
a
Slipped-stranded DNA structures
%
.a 0 TTC
TTC
TTC
TTC
TT
AAG
AAG
AAG
AAG
oo C AA G T
5'
TTC
TTC
TTC
TTC
TT
aloe
ooo
i l l
ooo
olel
ooo lee
There are two types of slipped-stranded DNA structures - 'hnmoduplex' Slipped structures (S-DNA) [36], which are formed between two complementary strands with the same number of repeats paired in an out of register fashion, and 'heteroduplex' Slipped Intermediates (SI-DNA) [37"], in which the strands have different numbers of repeats (Figure 4). SI-DNA may arise through replication slippage, resulting in repeat expansions or deletions (Figure 4a), whereas strand slippage in the absence of replication will result in S-DNA (Figure 4b).
G A A
3'
ooo
eo
C I
(b) J o~,~
325
"~.
/"//..
j
~ O7"2.~G4 ~ GA A G A A G A AGA A G A A G A AGA AGA AG,kikco
%~
o~~
°/"~ o 3'------AAG
AAG
S-DNA
&& ~,©
TTC TTC oo~ AA AAG
. . . . . .AT .TAC.G TTC
TTC AA
~
T T £7' AAG AAG
AAG------5'
TTC
TTC
oo
Current
3'
Opinion in Structural Biology
The triplex DNA structures formed by (GAA)o(TTC) repeats. (GAA)n'(TTC) n repeats can form (a) intramolecular triplex DNA structures in which the 3' half of the TTC tract folds into the major groove of the 5" half of the repeat tract [31 ]. (b) Intermolecular triplex structures may also form provided a single pyrimidine-rich strand is available. A filled bullet (*) indicates Watson-Crick hydrogen bonds and a hollow bullet (o) indicates Hoogsteen hydrogen bonds.
Formation of quadruplex DNA structures within triplet repeats Several reports have shown that, under some conditions, quadruplex D N A structures (Figure 3) can form within single-stranded e G G tracts. Fry" and Loeb [33] showed that short oligonucleotides composed of e G G residues (but not CCG) formed complexes in the presence of K +, Na +, or IA + ions that migrated as four times its molecular weight on polyacrylamide gels. Usdin and Woodford [34] have interpreted specific pauses in in vitro replication as an indication of quadruplex formation. T h e pause required the presence of K + ions and the e G G template strand, conditions required for quadruplex formation. Chemical modification studies, including protection of the N7 of guanine [33,34], have supported the contention that guanines are involved in Hoogsteen hydrogen bonding, a characteristic of quadruplex D N A (Figure 3e). Kettani eta/. [35] detected the quadruplex with two G ' C ' G ° C tetrads flanked by two G - G * G , G tetrads (G quartets) shown in Figure 3d. T h e Kettani structure, which was expected from studies of the folding of an intramolecular hairpin (Figure 3f), is different from that proposed b \ Usdin and \Voodford [34] (Figure 3e). T h e design of the Kettani sequence probably pre-
Although hmg proposed (see, for reviews, [1",31]), the first biochemical evidence for the existence of slipped-stranded DNA strnctures (S-DNA) was obtained using DNA containing (CTG)n*(CAG) n and (CGG)n'(CCG) n repeats [36]. S-DNA formed following the denaturation and renaturation (reduplexing) of DNA. S-DNAs migrate in polyacrylamide gels as a broad distribution of distinct products ranging up to twice their actual size. This reduced gel mobility is consistent with that expected for DNA containing bends introduced from the three- and four-way junctions generated by the looped-out strands [381. The broad distribution was as expected from the multiple possibilities for strand slippage involving different lengths of loop outs at multiple locations throughout the repeat tract (see Figure 4b). In fact, the complexity of the S-DNA populations, as well as the percentage of S-DNA formed, increased with increasing length of the repeat tract [36,37",39"',40"]. T h e effect of length on the propensity of S-DNA formation correlates with the effect of length on genetic stability. Electron microscopic analyses confirm that S-DNA secondary structure occurs within the repeat tract, that the S-DNA is subtended by linear mixedsequence duplex DNA, and that the S-DNAs are a heterogeneous popt, lation of products [39"']. S-DNA does not require superhelical tension for formation or stability; the structures arc remarkably" thermostable, and display minireal interconversion between isomers and the linear duplex form under physiological conditions [36,37",39"']. If the looped-out regions of the slipped-stranded DNA were unpaired, then branch migrations could occur, resulting in the reformation of linear duplex DNA. T h e remarkable stability of (CTG)~*(CAG) n and (CGG)n'(CCG) n S-DNAs may result from the formation of intrastrand hairpins, as branch migration would require breaking base pairs at the slipped-out junction and within the slipped-out hairpin. (CTG)s0,(CAG)s0 and (CTG)ess*(CAG)es s S-DNAs have recently been visualized by electron microscopy [39"']. T h e
326
Nucleic acids
Figure 3
(a)
(b)
(c)
3'
(d)
5'
F,'
5'
3'
t3
t 5'
3'
5'
3'
3'
(e)
(f)
> ,-r-(
~r J
5'
3'
5' ~
5'
3'
5' ~
Current Opinion in Structural Biology
The quadruplex structures formed by triplet repeats. Guanine-rich DNA sequences can form a variety of quadruplex DNA structures including those structures shown in (a-c) (see, for a review, [31]). Short tracts of thymine (not shown for clarity) typically intersperse guanine tracts. (d) The quadruplex structure detected by Kettani et al. [35]. (e) The quadruplex structure suggested by Usdin and Woodford [34]. This could form within a single (CGG) n tract. The duplex/hairpin precursor shown, involving G*G Hoogsteen hydrogen bonds, has not been observed and quadruplex formation probably occurs without formation of this intermediate. The precursor shown with the Kettani-type quadruplex (f) is the preferred conformation of a (CGG) n intrastrand hairpin. The quadruplex shown in (e) may be more probable than the Kettani-type quadruplex due to a higher proportion of G-quartets.
major heterogeneous (CTG)s0,(CAG)s0 S-DNA population showed single and multiple bends and kinks as expected for short slipped-out regions (15 repeats). A slower migrating minor (CTG)s0*(CAG)s0 S-DNA population and the (CTG)zss,(CAG)zs s S-DNA population both contained visible branches and loops, three-way, and four-way (cruciformlike) junctions with variable arm lengths (involving 15-50 repeats). For all molecules, the structures were observed at random locations throughout the repeat tract, as expected of
a heterogeneous population of S-DNAs (Figures 4b and 5). S-DNAs have a reduced contour length relative to linear duplex DNAs, indicating that the strands are in fact slipped with respect to each other [39"']. All data are consistent with S-DNAs being slipped-stranded structures. The effect of sequence interruptions upon S-DNA formation
Sequence interruptions within several trinucleotide repeat tracts confer an increased genetic stability upon the repeat
Trinucleotide repeat DNA structures Pearson and Sinden
Figure 4 ~.~(ii)
(a)
Delet~/,'---~
Expans=on~
s
,
Ist round of replication
J
Expansion product
2nd round of replication
(b) (i)
Deletion , , , ' - ' - - - product
S-DNA
,Ln-_
(ii)
~all --_
~k__k_~
(iii) r U-
"-CCurrent Opinion in Structural Biology
S-DNA and SI-DNA. (a) Replication slippage model showing SI-DNA. During synthesis of the triplet repeat, slippage of the nascent strand forwards (i) or backwards (ii) on the template strand results in the formation of heteroduplex slipped intermediate DNAs (SI-DNAs). Following continued replication, the DNA contains an excess of repeats in one strand. This will subsequently lead to expansion (i) or deletion (ii) products. Thin lines represent repeat tracts, and broad lines are non-repetitive flanks. (b) Various structural isomers of slippedstranded DNAs (S-DNAs) can form following denaturation and renaturation of a triplet repeat tract in an out of register fashion (i-iv). Looped-out regions can be of variable size or number, and they can be positioned throughout the repeat tract (see, for discussion, [36]).
tracts (~[hble 1) [1°]. In the SCA1, SCA2 and FRAXA loci, the repeat tracts (CAG)n'(CTG)n, (CAG)n'(CTG) n and (CGG)n*(CCG) n, are normally interrnptcd with (CNF)*(ATG), (CAA).(TTG), and (AGG).(CCT), respectively (Table 1). Loss of these interruptions results in a longer length of the pure tract, and correlates with genetic instability, expansion and disease. Sequence interruptions dramatically reduce the propensity of S-DNA formation and the complexity (or heterogeneity) of the S-DNAs formed [40"]. In fact, sequence interruptions can abolish or reduce the anmunt of certain S-DNA isomers present. T h e reduced propensity of S-DNA formation may be as a result of the reduced stability of the DNA that contains the mispaired interruptions either in interstrand slippages or in intrastrand hairpins (Figure 5a). In this fashion, the interruptions would effectively anchor the repeat tracts in an inframe Watson-Crick base-pairing conformation. This would limit the number of S-DNA isomers formed within the interrupted repeat tracts (compare Figure 5b and c). T h e inhibitory structural phenomenon specific to interrup-
327
tions of either (CAG)n'(CTG) n or (CGG)n*(CCG) n repeats suggests a possible molecular explanation for the biology associated with the protective mechanism of interruptions; interruptions may inhibit the formation of slipped mutagenic intermediates during replication and, in this fashion, inhibit genetic length changes. Thus, the effects both of the length and purity of the repeat tract on the propensity of S-DNA formation correlates with their effects on genetic stability and disease.
Slipped intermediate DNA structures (SI-DNA) SI-DNAs have been formed by reannealing DNAs containing two different lengths of repeats [37°]. SI-DNAs migrate more slowly in polyacrylamide gels than their respective linear duplexes or S-DNAs that are formed from either length of repeat. Two sister SI-DNAs composed of (CTG)m.(CAG) n and (CTG)n'(CAG) m (where n ~ m) could be electrophoretically separated, suggesting that they are structurally distinct [37°]. Similarly, two sister (CGG)*(CCG) SI-DNAs could also be electrophoretically separated (CE Pearson, unpublished data). T h e structural differences between SI-DNAs containing either loopedout C T G or CAG repeats may reflect the different singlestranded character of the looped-out strands [36,37"]. In S-DNA, the CAG strand is preferentially susceptible to mung bean nuclease compared to the C T G strand [36], indicating the greater single-stranded character of the CAG strand (consistent with the lower thermal stability of the CAG hairpin relative to the C T G hairpin). Structural differences between sister SI-DNAs have also been detected by the binding of the human mismatch repair protein hMSH2 [37°].
Conclusions DNA is not a passive participant in its procreation. Through subtle variations in helix shape, DNA controls its intimate association with regulatory proteins. Through variations in helix trajectory, stiffness, and flexibility, the DNA can specify both its pattern of conjugal assembly with nucleosomes, or its organization into specific threedimensional structures that regulate gene expression or facilitate genetic recombination. Within the past seven years, a novel type of repeat expansion mutation leading to human disease, called dynamic mutation [2]( has been identified in ( C T G ) n ' ( C A G ) n, (CGG)n-(CCG)n, and (GAA)n.(TTC) n repeats. T h e working hypothesis in this field is that an unusual property or structure of the helix, or the formation of DNA secondary structures such as a hairpin or slipped-stranded structures, is responsible for the unusual biology associated with these repeats. In addition to genetic instability in humans, these repeats exhibit orientation-dependent deletion in yeast and bacteria [41-44]. Their stability is influenced by mutations in proteins that are involved in DNA replication and repair [45-47]; they can pause replication forks in vivo [48°], and can direct intramolecular strand switching during replication in vitro [49]. T h e progress summarized here reveals previously unrecognized dynamic properties of D N A including
328
Nucleic acids
Figure 5
(a) Slippage inhibition Linear duplex Interstrand slip outs
Interstrand slippage
~ and
and/or
or
(c) Pure tract
(b) Interrupted tract
Linear duplex
Linear duplex
Limited slipped isomers with short slip outs
Multipleslippedisomers ..... & A A A &
.........
~ &__&_&__&.......
..... YY
~
&
& _ _&..... &
Y£
4- qL '
-q-
I
.....
m
Current Opinion in Structural Biology
Models for the protective mechanism of repeat sequence interruptions. The open circles represent (CAG)o(CTG) and (CGG)°(CCG) repeats, and the filled circles represent the (CAT)o(ATG) and (AGG)o(CCT) interruptions. (a) The reduced percentage of S-DNA formed by an interrupted repeat tract relative to a pure (uninterrupted) tract may result from the reduced stability of the inter and intrastrand slipped DNAs containing these interruptions. These sequence interruptions will destabilize both interstrand and intrastrand duplexes. (b) Sequence interruptions may limit the position, and thereby the length, of looped-out regions. Loop-outs in sequence interrupted repeat tracts may occur preferentially between the sequence interruptions or form with the sequence interruptions at the tip of a hairpin loop (not shown). (c) Pure repeat tracts can have loop-out strands of variable lengths throughout the repeat tract.
Trinucleotide repeat DNA structures Pearson and Sinden
unusual helical properties, such as very flexible DNA, and alternative DNA secondary structures, including a variety of intrastrand hairpins containing mispairs, slipped-stranded DNA, and intramolecular triplex and quadruplex structures. Clearly, (CTG)n'(CAG)n, ( C G G ) n ' ( C C G ) n, and (GAA)n'(TTC) n repeats represent dynamic DNA associated with dynamic mutations. Much has been revealed, although any direct relationship between DNA structure and loci-specific expansion leading to human disease remains, at present, an enigma.
Note added in proof While this manuscript was in preparation, there were two publications indicating the formation of triplex DNA structures in (GAA)-(TCC) repeats [50,51] and one demonstrating the in vivo formation of hairpins from long inverted repeats having ( C G G ) . ( C C G ) repeats at their centre [52].
References and recommended reading Papers of particular interest, published within the annual period of review, have been highlighted as: • of special interest °° of outstanding interest 1. •
PearsonCE, Sinclen RR: Slipped strand DNA, dynamic mutations, and human disease. In Genetic Instabilities and Hereditary Neurological Diseases. Edited by Wells RD, Warren ST. New York: Academic Press; 1998:585-621. This article contains a thorough review of triplet-repeat diseases, the DNA sequences involved, and expansion models. It is part of a comprehensive book describing the current state of this field. 2.
Richards RI, Sutherland GR: Heritable unstable DNA sequences and human genetic disease. Ceil 1992, 70:709-712.
3.
Kohwi Y, Wang H, Kohwi-Shigematsu T: A single trinucleotide, 5'AGC3'/5'GCT3', of the triplet-repeat disease genes confers metal ion-induced non-B DNA structure. Nucleic Acids Res 1993, 21:5651-5655.
4.
Chastain PD, Eichler EE, Kang S, Nelson DL, Levene SD, Sinden RR: Anomalous rapid electrophoretic mobility of DNA containing triplet repeats associated with human disease genes, Biochemistry 1995, 34:16125-16131.
5. Chastain PD, Sinden RR: CTG repeats associated with human • genetic disease are inherently flexible. J Mol Biol 1998, 275:405-411. This paper demonstrates that (CTG)o(CAG) repeats are unusually flexible. Interspersion of (CTG)4°(CAG)4 between A-tract curves abrogates overall DNA curvature to the same extent as interspersion of two consecutive T°T mismatches. This is a remarkable degree of flexibility for a very short repeat tract. 6.
Levene SD, Zimm BH: Understanding the anomalous electrophoresis of bent DNA molecules: a reptation model. Science 1989, 245:396-399.
7. ••
Bacolla A, Gellibolian R, Shimizu M, Amirhaeri S, Kang S, Ohshima K, LarsonJE, Harvey S, Stollar BD, Wells RD: Flexible DNA: genetically unstable CTG.CAG and CGG.CCG from human hereditary neuromuscular disease genes. J Biol Chem 1997, 272:16783-16792. This very careful analysis of cyclization kinetics measurements for series of (CTG)no(CAG)n and (CGG)n.(CCG) n repeats provides the bending and torsional moduli, persistence length, and helix repeat values. This paper presents the first demonstration of the flexibility of these triplet repeats. 8.
Wang Y-H, Amirhaeri S, Kang S, Wells RD, Griffith JD: Preferential nucleosome assembly at DNA triplet repeats from the myotonic dystrophy gene. Science 1994, 265:669-671.
9.
Wang Y-H, Griffith JD: Expanded CTG triplet blocks from the myotonic dystrophy gene create the strongest known natural nucleosome positioning elements. Genomics 1995, 25:570-573.
10. Godde JS, Wolffe AP: Nucleosome assembly on CTG triplet repeats. J Biol Chem 1996, 271:15222-15229.
329
11. Wang Y-H, Gellibolian R, Shimizu M, Wells RD, Griffith J: Long CCG triplet repeat blocks exclude nucleosomes - a possible mechanism for the nature of fragile sites in chromosomes. E Mo/ Biol 1996, 263:511-516. 12. Godde JS, Kass SU, Hirst Me, Wolffe AP: Nucleosome assembly on methylated eGG triplet repeats in the fragile X mental retardation gene 1 promoter. J Biol Chem 1996, 271:24325-24328. 13. Shrader TE, Crothers DM: Effects of DNA sequence and histonehistone interactions on nucleosome placement. J Mol Biol 1990, 216:69-84. 14. Wang Y-H, Griffith J: Methylation of expanded CCG triplet repeat DNA from fragile X syndrome patients enhances nucleosome exclusion. J Biol Chem 1996, 271:22937-22940. 15. Mitchell JE, Newbury SF, McClellan JA: Compact structures of d(CNG) n oligonucleotides in solution and their possible relevance to fragile X and related human genetic diseases. Nucleic Acids Res 1995, 23:1876-1881. 16. Mitas M, Yu A, Dill J, Haworth IS: The trinucleotide repeat sequence d(CGG)+5 forms a heat-stable hairpin containing GsYn'Ganti base pairs. Biochemistry 1995, 34:12803-12811. 17. Mitas M, Yu A, Dill J, Kamp TJ, Chambers EJ, Haworth IS: Hairpin properties of single-stranded DNA containing a GC-rich triplet repeat: (CTG)I 5. Nucleic Acids Res 1995, 23:1050-1059. 18. Yu A, Dill J, Mitas M: The purine-rich trinucleotide repeat sequences d(CAG) 15 and d(GAC) 15form hairpins. Nucleic Acids Res 1995, 23:4055-4057. 1g. Yu A, Dill J, Wirth SS, Huang G, Lee VH, Haworth IS, Mitas M: The trinucleotide repeat sequence d(GTC) is adopts a hairpin conformation. Nucleic Acids Res 1995, 23:2706-2714. 20. Chen X, Mariappan SVS, Catasti P, Ratliff R, Moyzis RK, Laayoun A, Smith SS, Bradbury EM, Gupta G: Hairpins are formed by the single strands of the fragile X triplet repeats: Structure and biological implications. Proc Nat/Acad Sci USA 1995, 92:5199-5203. 21. Zheng M, Huang X, Smith GK, Yang X, Gao X: Genetically unstable CXG repeats are structurally dynamic and have a high propensity for folding. An NMR and UV spectroscopic study. J Mol Bio/1996, 264:323-336. 22. NadelY, Weisrnan-Shomer P, Fry M: The fragile X syndrome single strand d(CGG) n nucleotide repeats readily fold back to form unimolecular hairpin structures. J Biol Chem 1995, 270:28970-28977. 23. PetruskaJ, Arnheim N, Goodman MF: Stability of intrastrand hairpin structures formed by the CAG/CTG class of DNA triplet repeats associated with neurological diseases. Nucleic Acids Res 1996, 24:1992-1998. 24. Mariappan SV, Catasti P, Chen X, Ratliff R, Moyzis RK, Bradbury EM, Gupta G: Solution structures of the individual single strands of the fragile X DNA triplets (GCC)n'(GGC) n. Nucleic Acids Res 1996, 24:784-792. 25. Gao X, Huang X, Smith GK, Zheng M, Liu H: A new antiparallel duplex motif of DNA CCG repeats that is stabilized by extrahelical bases symetrically localized in the minor groove. J Am Chem Soc 1995, 117:8883-8884. 26. Yu A, Barren MD, Romero RM, Christy M, Gold B, Dai JL, Gray DM, • Haworth IS, Mitas M: At physiological pH, d(CCG)15 forms a hairpin containing protonated cytosines and a distorted helix. Biochemistry 1997, 36:3687-3699. This paper presents a careful analysis of the characteristics of a d(CCG)15 intramolecular hairpin, which forms the 'e-motif'. The methods used in this paper are representative of those used in many similar analyses of this and other triplet repeats published during 1996 and 1996. 27. Mariappan SV, Garcia AE, Gupta G: Structure and dynamics of the DNA hairpins formed by tandemly repeated CTG triplets associated with myotonic dystrophy. Nucleic Acids Res 1996, 24:775-783. 28. Smith GK, Jie J, Fox GE, Gao X: DNA CTG triplet repeats involved in dynamic mutations of neurologically related gene sequences form stable duplexes. Nucleic Acids Res 1995, 23:4303-4311. 29. Rabinovich D, Haran T, Eisenstein M, Shakked Z: Structures of the mismatched duplex d(GGGTGCCC) and one of its Watson-Crick analogues d(GGGCGCCC). J Mol Biol 1988, 200:151-161.
330
Nucleic acids
30. Gacy AM, Goellner G, Juranic N, Macura S, McMurray CT: Trinucleotide repeats that expand in human disease form hairpin structures in vitro. Cell 1995, 81:533-540.
this effect that correlates with the effect of repeat interruptions upon the stability of repeats in humans.
31. Sinden RR: DNA Structure and Function. San Diego: Academic Press; 1994.
41. KangS, Jaworski A, Ohshima K, Wells RD: Expansion and deletion of CTG triplet repeats from human disease genes are determined by the direction of replication in E. coil. Nat Genet 1995, 10:213-218.
32. Ohshima K, Kang S, Larson JE, Wells RD: Cloning, characterization, and properties of seven triplet repeat DNA sequences. J Biol Chem 1996, 271:16773-16783.
42. Maurer DJ, O'Callaghan BL, Livingston DM: Orientation dependence of trinucleotide CAG repeat instability in Saccharomyces cerevisiae. Mo/ Cell Bio/1996, 16:6617-6622.
33. Fry M, Loeb LA: The fragile X syndrome d(CGG)n nucleotide repeats form a stable tetrahelical structure. Proc Nat/Acad Sci USA 1994, 91:4950-4954.
43. Freudenreich CH, Stavenhagen JB, Zakian VA: Stability of a CTG/CAG trinucleotide repeat in yeast is dependent on its orientation in the genome. Mo/Cell B/o/1997, 17:2090-2098.
34. Usdin K, Woodford KJ: CGG repeats associated with DNA instability and chromosome fragility form structures that block DNA synthesis in vitro. Nucleic Acids Res 1995, 23:4202-4209.
44. Miret JJ, Pessoabrandao L, Lahue RS: Instability of CAG and CTG trinucleotide repeats in Saccharomyces cerevisiae. Mo/ Ceil Bio/ 1997, 17:3382-3387,
35. KettaniA, Kumar RA, Patel DJ: Solution structure of a DNA quadruplex containing the fragile X syndrome triplet repeat. J Mol Bio/1995, 254'638-656.
45. Rosche WA, Jaworski A, Kang S, Kramer SF, Larson JE, Giedroc DP, Wells RD, Sinden RR: Single-stranded DNA binding protein enhances the stability of CTG triplet repeats in Escherichia coil J Bacteriol 1996, 178:5042-5044.
36. PearsonCE, Sinden RR: Alternative DNA structures within the trinucleotide repeats of the my•tonic dystrophy and fragile X locus. Biochemistry 1996, 35:5041-5053. 37. •
PearsonCE, Ewel A, Acharya S, Fishel RA, Sinden RR: Human MSH2 binds to trinucleotide repeat DNA structures associated with neurodegenerative diseases. Hum Mo/Genet 1997, 6:1117-1123. This paper presents the purification, characterization, and mechanism of binding to the human mismatch repair protein (hMSH2) to SI-DNA and SDNA. SI-DNA (slipped intermediate DNA) contains heteroduplex triplet repeat regions in which the lengths of the repeat tracts in the complementary strands are different. Sister heteroduplexes migrate to different positions on a polyacrylamide gel. hMSH2 preferentially binds to SI-DNAs containing looped out CAG residues compared with CTG strands. 38. Lilley DMJ, Clegg RM: The structure of branched DNA species. Q Rev Biophys 1993, 26:131-175. 39. PearsonCE, Wang Y-H, Griffith JD, Sinden RR: Structural analysis ,,* of slipped strand DNA (S-DNA) formed in (CTG)n°CAG)n repeats from the my•tonic dystrophy locus. Nucleic Acids Res 1998, 26:816-823. A paper that continues the description of the properties and characteristics of the S-DNA structures formed in (CTG)n'(CAG)n repeats. Following denaturation and renaturation, a complex, heterogeneous distribution of SDNA isomers was formed. These structures were characterized biochemically and visualized using electron microscopy. The majority of SDNAs from (CTG)5oo(CAG)5o repeats contain short slipped-out regions that are visible only as kinks. Other structures, containing the loops and threeand fourway junctions expected of slipped-stranded structures, are visible, especially in (CTG)255o(CAG)255 S-DNAs. 40. PearsonCE, Eichler EE, Lorenzetti D, Kramer SF, Zoghbi HY, Nelson *• DL, Sinden RR: Interruptions in the triplet repeats of SCA1 and FRAXA reduce the propensity and complexity of slipped strand DNA (S-DNA)formation. Biochemistry 1998, 37:2701-2708. This paper demonstrates the fact that sequence interruptions within triplet repeat tracts reduce both the percentage and the complexity of the S-DNA formed following denaturation and renaturation. A model is proposed for
46. JaworskiA, Rosche WA, Gellibolian R, Kang S, Shimizu M, Bowater RP, Sinden RR, Wells RD: Mismatch repair in E. coil enhances instability of (CTG)n triplet repeats from human hereditary diseases. Proc Natl Acad Sci USA 1995, 92:11019-11023. 47. Schweitzer JK, Livingston DM: Destabilization of CAG trinucleotide repeat tracts by mismatch repair mutations in yeast. Hum Mol Genet 1997, 6:349-355. 48. Samadashwily GM, Raca G, Mirkin SM: Trinucleotide repeats affect • DNA replication in vivo. Nat Genet 1997, 17:298-304. This paper reveals stalling of the replication fork within the repeat tract in Escherichia coil Pausing is dependent upon repeat length and orientation with respect to the origin of plasmid replication. This paper is representative of many showing an in vivo biological effect of triplet repeats, and the authors suggest that the formation of an unusual DNA structure is responsible. To date, however, these papers have not presented biochemical data consistent with the formation of any alternative triplet repeat DNA structure in vivo. This is likely to be an important area for future research. 49. Ohshima K, Wells RD: Hairpin formation during DNA synthesis primer realignment in vitro in triplet repeat sequences from human hereditary disease genes. J Biol Chem 199"7, 272:16798-16806. 50. Bidichandani SI, Ashizawa T, Patel PI: The GAA triplet-repeat expansion in Friedreich ataxia interferes with transcription and may be associated with an unusual DNA structure. Am J Hum Genet 1998, 62:111-121. 51. Gacy AM, Goellner GM, Spit• C, Chen X, Gupta G, Bradbury EM, Dyer RB, Mikesell MJ, Yao JZ et aL: GAA instability in Friedreichsataxia shares a common, DNA-directed and intra-allelic mechanism with other trinucleotide diseases. Mo/Ceil 1998, 1:583-593. 52. Darlow JM, Leach DRF: Evidence for two preferred hairpin folding patterns in d(CGG)*d(CCG) repeat tracts in vivo. J Me/Biol 1998, 275:1 ?-23.