J. iftol, Biol. (1968) 37, 446-466
Theory of Thermal Transitions in Low Molecular Weight RNA Chains NEVILLE R. KALLENBACH
Department of Biology, University of Penmyhania Philadelphia, Pa., U.S.A. (Received 23 May 1968, and in revised form 23 August 1968) A statistical
mechanical theory of thermal transitions in short polynucleotide chains which achieve their secondary structure primarily by intramolecular complementary base-pair interactions is described. Thermal transition profiles for a large class of possible arrangements of base-pairs in short RNA chains can be calculated from the theory, using parameters derived from the thermal den&nation of high molecular weight RNA double-helices. Effects of length, base composition and configuration are discussed quantitatively. The theory is applied to calculating melting curves of three yeast tRNA species, for which experimental data are available. In the case of yeast serine tRNA, close agreement
between the profile predicted from the “clover-leaf”
structural model is found.
The alanine and tyrosine models lead to profiles sharper than those observed, although the data are not so extensive for these later species. Certain quantities of interest apart from those usually measured in thermal denaturation studies
can be calculated from the theory.
1. Introduction Studies of the denaturation of helical polynuoleotides by a variety of treatments have provided fundamental information concerning the interactions which stabilize these systems in solution. Several theoretical treatments have been presented which analyze this process in terms of a one-dimensional lattice model, in whioh a number of elements, base-pairs, are assumed to be in either of two alternative states, bonded or unbonded, with a probability which varies during the course of the denaturation (Hill, 1959; Zimm, 1960; Lifson & Zimm, 1963). By fitting calculated thermal transitions for such models to the experimental melting profiles of certain synthetic polynucleotides, Crothers & Zimm (1964) were able to estimate the free energy of stacking of base-pairs within the DNA helix. Their results suggested that explicit recognition of the presence in most native DNA’s of two base-pair species, A-T and G-C, with intrinsically different stabilities, was necessary before reasonable agreement between calculated and observed transition profiles could be achieved (Crothers, Kallenbach & Zimm, 1966; Crothers k Kallenbach, 1966). From these calculations it has been shown that the over-all composition of a particular helical structure with respect to A-T or G-C base-pairs, as well as their sequential arrangement in the helix, affect both the breadth and mid-point of the denaturation profiles exhibited by that structure. In particular, where the size of the helical structure is known to be small, the number and arrangement of the two hinds of base-pairs can influence the predicted thermal transitions to a profound extent (Kallenbsch BECrothers, 1966). 445
446
N. KALLENBACH
These theoretical treatments have so far been concerned specifically with doublestranded helical systems. The existence of an appreciable secondary structure in single-stranded polynucleotides has been observed for some years (Doty, Boedtker, Fresco, Haselkorn & Litt, 1959; Spirin, 1960). More recent evidence suggests the presence, in naturally occurring and most synthetic RNA’s, of at least two distinct kinds of base interactions which are involved in such secondary structure (see Felsenfeld & Miles, 1967). The first is an extremely short-range stacking of bases which occurs in many oligonucleotides, including dimers, as well as polynucleotides, and which exhibits an essentially non-co-operative absorbance or optical rotation profile on denaturation (Leng & Felsenfeld, 1966; Brahms, Michelson & Van Holde, 1966; Applequist & Damle, 1966). This structure can exist in the absence of any capability for hydrogen bonding (Griffi, Haslam & Reese, 1964; Van Holde, Brahms & Michelson, 1965). The second kind of structure clearly involves hydrogen bonding as originally proposed by Doty et al. (1959) and Spirin (1960), and arises from the presence of regions of mutually complementary nucleotide sequences along the chain (Felsenfeld & Sandeen, 1962 ; Fresco, Klotz & Richards, 1963 ; Englander & Englander, 1965 ; Cantor, Jaskunas & Tinoco, 1966). In addition higher kinds of structure may be present, in which mutual interaction between the secondary structural elements may be involved, although there is presently no data available concerning the thermodynamics of these interactions (Fresco, Adams, Ascione, Henley & Lindahl, 1966). In the present communication, a statistical mechanical theory is developed for the case of low-molecular weight single-stranded polynucleotides in which there is appreciable potential for intramolecular helix formation due to interactions between sequences of mutually complementary base-pairs. The effects of helix size and the sequential arrangement of base-pairs within helical regions are discussed, using thermodynamic data on thermal transition temperatures in high molecular weight double-stranded RNA’s to estimate the stability difference between A-U and G-C base-pairs. Finally, the theory is applied to the prediction of thermal transition profiles for a number of proposed structures corresponding to several short chain RNA species of known primary structure (Holley et al., 1965; Madison, Everett 8t Kung, 1966; Zachau et al., 1966; Raj Bhandary et al., 1967) and a preliminary comparison with experimental profiles for certain of these is attempted.
2. Theory Consider a solution of single-stranded polynucleotides capable of adopting intramolecular configurations with complementary base-pair formation, at some temperature. We assume that the configurational partition function, Q, for this system depends only on the presence or absence of two kinds of interactions: (1) bonding between base-pairs which are complementary in the usual sense; and (2) the stacking interaction between neighboring base-pairs in an ordered region. The bonding interaction contributed by complementary base-pairs is described by a set of equilibrium constants, sor,where the subscript u refers to each species of bases involved, A-U or G-C for instance. In each case, the value of s, is dependent on the temperature, obeying the classical van% Hoff law, ZJIn 8, dHa =aT RTa
(1)
THERMAL
TRANSITIONS
IN
447
RNA
where AHa is the molar enthalpy per base-pair for formation of a helix of the pure ath base-pair species from the corresponding coil. If T, (a) denotes the midpoint of the thermal transition for a very high molecular weight polymer of speoies a, sor= 1 at T = T, (a) so that the free energy of the helix-coil conversion is zero at this point. Equation (1) can then be integrated with this condition to give In sot= AHa R&&l which relates the equilibrium constant for pairing of each species to the temperature, providing AHa is known for each species. For polymers containing only the A-U base-pair species, data on thermal denaturation and its dependence on ionic strength of the solvent are available for both the rA:rU and rAU:rAU copolymers (Stevens t Felsenfeld, 1964; Chamberlin, Baldwin t Berg, 1963). In the pure G-C case, data for the rG:rC copolymer only have been reported (Chamberlin, 1965). Thermal denaturation curves in 0.2 M-Na+ solution, which serves as a useful reference concentration, have been obtained for a number of high molecular weight viral RNA’s which are either natively double-stranded or can he isolated in this form as a replicative intermediate. These data and their sourcea are summarized in Table 1, and Figure 1 represents a plot of the reciprocal of the temperature against mole-fraction G-C (ve) for these RNA’s together with the pure A-U and G-C copolymer values. The linear relationship of l/T, to v. for double helical RNA suggests that, as for the DNA case, one may write so = k s,
(3)
where species A is A-U and G denotes G-C (Crothers et al., 1965), ignoring nearest neighbor effects which clearly exist, but are small in magnitude compared to the 1
TABLE
iffelting temperatures (T,) and QC content of double-stranded RNA species numbered in Fig. 1 Mole per cent G&C 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.
rA:rU rAU Wound tumor virus Tobacco mosaic virus$ Reovirue Encephalomyocarditis virus1 Newcastle disease virus j, Phage fr$ Phage B$ Phage MS2$ rG:rCf
Reference
Tm (“W
0 0 38 44 44.3
62.5 71.7 91 97 94
Stevens & Felsenfeld, 1964 Chamberlin, Baldwin & Berg, 1963 Gomatoa & Tamm, 1963 Burdon et al., 1964 Gomatos & Tamm, 1963
47
96
Montagnier
48.6 60.7 51.6 62 100
100 102 99 103 140
& Sanders, 1963
Kingsbury, 1966 Kaerner & Ho*-Berling, 1964 Nonoyama 6 Ikede, 1964 Bill&r, Weissmann & Warner, 1966 Chamberlin, 1965
t Determined in saline citrate buffer (Marmur, 1961) or equivalent sodium ion). $ Double-stranded replicetive intermediate form. 5 Extrapolated from data at low ionic strength.
salt concentration
(0.2
M-
448
N.
KALLENBACH
2.7 0 2 x l-s . - 2.5-
, 0
I 0.5 Mole fraction C + C
I
I
I I.0
FIQ. 1. The relation between melting temperature (T,), plotted as the reciprocal of the absolute temperature (OK), and mole-fraction G+C for the high molecular weight double-stranded RNA species of Table 1. The DNA relation is taken from Murmur & Doty (1962).
primary effect of G-C content alone. The constant k may he evaluated from the slope of the line in Figure 1, if it is assumed in addition that AH, = AH, = AH since, from equations (2) and (3),
Taking dH = -8 kcal. per mole of base pairs (Bunville, Geiduschek, Rawitscher t Sturtevant, 1965), one finds k = <. It is evident from Figure 1 that the difference in stability between A-U and G-C base-pairs in the RNA case considerably exceeds that between A-T and G-C pairs in DNA, for which a value of k = 3.7 (Kallenbach & Crothers, 1966) has previously been estimated. For species of base-pairs other than A-U and G-C, few data are available, although several copolymers of substituted or otherwise modified nucleotide bases have been studied (see the review by Michelson, Massoulie & Guschlbauer, 1967). High molecular weight complexes of a number of potentially bonding species of bases known to occur naturally in tRNA (Brown, 1963) would be of direct interest in this connection. The second interaction, involving stacking between neighboring base-pairs in a helix, is that which determines the co-operativity of the transition, While each basepair probably interacts differently depending on pairs adjacent to it, this effect is assumed to be small, so that the stacking is independent of the species. Following t There is presently little information concerning the dependence of AH for polynucleotide reactions on the GC content, so that the severity of this assumption is difficult to assess. However if AHG = AH, + AH’, say, then the value of k becomes dependent on temperature: lnkUnk+q(&;) where k’ is the corrected value of k, and k is given by the equation in the text with AH = AH,, which is nearly -8 kcctl./mole at 0.2 M-N~+ (Krakauer t Sturtevant, 1966). The values of k’ are not grossly altered however: if AH’ = 3 kc&l., for instance, k’ = 3.7 at T = T,, some 70°C below To. Hence the qualitative behavior described here will remain valid. Measurements of AHG and its dependence on T,,, obviously would permit improvement in these celculations.
THERMAL
TRANSITIONS
IN
RNA
449
Lifson t Zimm (1963), each set of adjacent bonded base-pairs is assigned a weight u,, = 1. Since the chains considered here are presumed to be short enough so that melting is confined to the ends of each helical region, the configurations which must be accounted for in single-stranded molecules are those produced by folding over of a chain to give a loop of a given number of bases, limited by a number of paired bases, in the manner of a “hair-pin” as is schematically represented in Figure 2(a) for the case of a single complementary pair of bases. The equilibrium constant for this condensation can be written as K = u, s,
(4)
for a chain in which m non-bonded bases are formed in the ring (see Kallenbach & Crothers, 1966). The factor a,,, is a measure of the loss of configurational freedom when the ends of the chain are apposed for bond formation. In general urnis a function both of m itself (Jacobson $ Stockmayer, 1950) and of the parameter usually designated by 7, the equilibrium constant for stacking two adjacent base-pairs upon each other (Lifson & Zimm, 1963). To assign a value to the constant 7 requires specification of the form of =m, and is uncertain because the dependence of a, on m is not known, particularly in the case of small loops. An estimate of the form of a, is provided by the JacobsonStockmayer (1950) approximation, (3, a m-3’2 /T, and the numerical computations presented here use this result, although the equations derived hold generally for any functional form of G,,,.While the Jacobson-Stockmayer approximation is probably a reasonable one only for very large loops, the calculations for molecules in the size range of tRPu’A’s are quite insensitive to the form of u, adopted. An additional point to consider is the fact that the equilibrium constant for base-pair stacking, 7, is taken to be invariant with temperature over the rather broad transition regions observed, an assumption difficult to avoid in the absence of further experimental data. The parameter 7 estimated from the transition breadths of synthetic copolymers (Crothers & Zimm, 1964) applies to a comparison between the two states illustrated in Figure 2(b). Using the Jacobson-Stockmayer approximation, values for T somewhat in excess of lo3 have been estimated previously. Scheffler, Elson & Baldwin (personal communication, 1968) have investigated oligomers of dAT capable of assuming hairpin configurations of the kind shown in Figure 2(a), and suggest a value of 7 between lo2 and 103, depending on the ionic strength. In all cases the parameter actually evaluated is a composite of several quantities besides the true equilibrium constant for stacking of adjacent base-pairs (Kallenbach 6 Crothers, 1966).
FIQ. 2. Schematic representation of (a) base p&ring between nucleotides located at the ends of a linear polynucleotide chain and (b) two different arrangements of base-pairs linking two chains.
450
N. KALLENBACH
This evident uncertainty in the magnitude and form of a,,, in equation (4) does not seriously affect the calculations for molecules in the size range of interest here. The effect of varying the value of 7 is primarily one of shifting the transitions in temperature rather than an alteration in the form of the profiles (Fig. 7). Moreover, for a given functional form of am,a value of7 can be estimated by comparing the theoretical profiles to experimental ones, for example in the case of yeast serine tRNA. The configuration partition function, Q, for a solution of single-stranded molecules can now be written in terms of the above interactions provided that the geometry of the system is specified, and all states accessible to the molecule in the course of the transition are enumerated and assigned appropriate free energies. For a molecule in which both A-U and G-C complementary base-pairs are present, Q will depend on s = s,, ks = so so that in general Q = Q (s, k, T). The fully unpaired chain which represents the high temperature form of the molecule is assigned a factor of unity so that
Q=l+Qb where Qb denotes the partition function for all states with at least one bonded basepair. The average number of bonded bases, I, in the system is then given by i=.--
a In Q
ah8
and if L is the maximum number of base-pairs of both species available at the low temperature extreme, the average fraction of bonded base-pairs of both species is
6 = ip 1 alnQ =miz
(5)
The average number of G-C pairs bonded is given by
=aInQ
i G
a In k
since each factor ks in Q corresponds to a G-C pair, and if L = LA + L,, the sum of LA A-U pairs and La G-C pairs, the average fraction bonded of this species is by
analogy eG= &/LO 1 aIn& =L,TiiZ
(6)
If only A-U and G-C pairs are present,
e, = (Le - L,e,)j-r;,
(7)
represents the average fraction of bonded A-U pairs. If other species are present, these equations can be extended with no diBculty. In similar fashion the average number of helical regions present, z,,, can be written ;r;, =
a In Q a In (I/T)
l
(8)
THERMAL
TRANSITIONS
IN
RNA
451
Finally, the average fraction of fully opened chains in the system appears as the quantity I+=; (9) since the unbonded chain has weight unity. Equations (5) to (9) provide a basis for comparison of the theoretical calculations with experimental data, in combination with equations (2) and (3) which relate the stability constants to temperature. Since the ordered forms are hypochromic with respect to the denatured species, the normalized increase in absorbance during the transition can be tentatively equated with the quantity l-t?, for example. This procedure makes the gross assumption that the hypochromism exhibited by each base-pair in the molecule is the same. There is evidence that the hypoohromism of A-U and G-C base-pairs is different at most wavelengths (Felsenfeld & Sendeen, 1962; Fresco, Klotz & Richards, 1963). Moreover, for a given base-pair species, the hypochromism may depend on the size of the helix in which it is located (Rich & Tinoco, 1960; Applequist, 1967). Measurements of the wavelength dispersion of the hypochromism permit resolution of the relative contributions from A-U and G-C pairs, so that these may be taken as proportional to the quantities 1 -eA and 1 -&., respectively, provided the size dependence is not severe. In any case, it must be recognized t,hat the above assignments are approximate, although there are data which suggest that the thermal profiles of absorbance, optical rotation and viscosity for unfractionated tRNA are not very different (Millar & Steiner, 1966). With the availability of data on the size dependence of the hypochromism, more precise specification of these quantities should be possible. The theory is limited to the class of structures consisting of any number of loophelix subunits, arranged linearly or in branches in a simply-connected manner, as in the examples in Figure 4. Excluded are structures of the kind in which two or more helices share common base-pairs, as in the example
although equations applicable to such situations can be derived. (a) Single kwp-helix structures The partition function is derived by application of a set of rules for assigning statistical weighting factors corresponding to the interactions described above to each state accessible to the denaturing molecule. These are (1) a factor ornO= (m/2) -3/2/~ to a folded loop containing a minimum number m of non-bonding bases, as discussed above, (2) a factor u,,,~= (m/2 + j)-3/2/~ for a folded loop in which j base-pairs adjacent to the minimum loop are unpaired, and (3) factors of s for each A-U pair and ks for each G-C pair in an ordered or helical region. Application of the methods for obtaining the statistical weights of the bonded St&es of a helix with L = 3 is illustrated in Figure 3. In this case Qb = o,,,~ (ks3 + ks2 $- s) + r~,,,~ (ks2 + ks) + u,,,~ s. Note that successive columns of the diagram represent all configurations terminating at the last, next to last, etc., base-pair in the
N.
452
KALLENBACH
original helix, while the rows correspond to states with fixed loop-size and varying numbers of bonded pairs. If a sequence vector of L elements Tj = 0 or 1 is defined according to whether the jth base-pair species in the helix is A-U or G-C, then the column sums can be obtained in general as L-j
cj = c u,v s
[
L-(i+v-1)
exp Ink
!J=O
and
& (m, L) = 1 + i
L-f+1
x
Tp
9=v+1
1
(10)
(11)
Cj.
I=1
It is important to note in equation (11) that the argument m specified on the lefthand side represents the minimum value of the number of bases looped out, while L corresponds to the maximum number of pairs formed. For the example of Figure 4,
D urn0 s2(ks)
a a,0s(ks)
a,, dks)
(Trnl
E Cr,O
s
ks
A G um2
s
FIQ. 3. The bonded states and corresponding statistical weights generated in the denaturation of a hypothetical chain of ten nucleotides capable of forming a maximum of three base-pairs, two A-U and one G-C, in the arrangement shown in the top left-hand sketch.
T = (O,l,O), and it is readily verified that the partition function calculated in this way agrees with the expression given previously. When the ordered region is homogeneous, consisting only of one base-pair species, it is found that
L-l #Lm
ts) =
IZ =rnj sL-j, 1-O
THERMAL
TRANSITIONS
IN RNA
453
if configurations in which sliding of the two chains relative to each other are neglected. These can be included so that instead of (12) there results
in which L’ = L - X and m’ = m + A. Equation (13) applies only to the case in which the bases on each arm of the helix are identical. The result for arms of alternating base arrangements can be obtained similarly. (b) Multiple
loop-helix &-uctures
(i) Branched chains Equations for structures involving several loop-helix subunits are most easily generated by a recursive scheme, in which the addition of a single loop-helix unit constitutes a step. Consider the structure shown in Figure 4(a), and let the subscript
(b)
(cl
Fro. 4. Schematic representation of three different structure1 8rrangements single polynucleotide ch8ins. (a) Two loop-helix units; (b) three loop-helix units, the third emaneting loop in (a) snd (c) linear arrangement of three loop-helix units.
of base-pairs
within
from the right-hand
1 refer to the right-hand unit, and 2 to the left, so that M, and M, denote the numbers of non-bonding bases adjacent to helices 1 (L = L,) and 2 (L = La), respectively, with sequence vectors T1 and T,. If a superscript next to the partition function denotes the number of loop-helix subunits contained in the structure to which it corresponds, then it can be verified that
Q(‘) (ml, L,; m2, L,) = Q(l) (7n+m,+2L2, + 2 C,, Q(l) (m,+i--I, 1-l
L,)
L,)
N.
454
KALLENBACH
where Q(l) (m,L) is given by (ll), and C’s, by (10) with appropriate subscripts. As successive loop-helix units 3, . . . . . , K are introduced into the right-hand loop, resulting in “clover-leaf” arrangements of the kind illustrated in Figure 4(b), it is found that
QK (ml, Ll ; m2, L,; * * * ; mKj LK) = Q (K-l) (m,+mK+2LK, + g C,, Q cK-l) (m,+i-1,
L,,
. . .,
L,;
mK-lr
...;
mK-lr
LK-1) (14)
LK-1).
t=1
The states generated by adding a new loop-helix unit to a chain with K -1 such units can therefore be derived from the partition function of the latter structure. Successive applications of equation (14) generates the partition function for all possible branched chains of the kind in Figure 4(b), that is with all arms emanating from a single loop adjacent to the free end of the chain. (ii) Linear
chains
If instead of introducing new subunits onto one loop as in (i), successive loop-helix units are attached linearly to the right-hand unit, as in the example diagrammed in Figure 4(c), there results similarly QtK) (ml, L,;
...;
mK,LK)
=
Q(l) (5
mt
-I-
+ Q(') (ml, L,; km, + &L +&CR-l) +
;
i=l
cK,
(ml, L,;
Q tK-l)
....
mK-l
L
Ld
L,) + . . .
i=3
t=2
.&
k2
1=1
$
llEK
(ml, L, ; . . . ; mK-l
+
2LKj
+ i -
LII-l)
1, LK-1)
(15)
in which the first K -1 terms on the right-hand side represent new states in which the Kth newly added member loop-helix fuses with each of the preceding K-l units successively. By appropriately combining equations (14) and (Xi), the partition function for arrangements which consist of both linear and branched structures can be derived. While cumbersome in appearance, these forms are particularly suitable for numerical computation procedures. It should be noted that in all the equations given it is assumed that melting occurs only from the ends of any given helical segment, due to the magnitude of parameter 7 (Crothers et al., 1965).
3. Results (a) Single loop-helix
structures
(i) Length effects in homogeneous molecules The transition profiles of very short, ordered configurations are extremely sensitive to variation in the number of base-pairs in the ordered structure (Kallenbach & Crothers, 1966). This effect is shown in Figure 5 which illustrates the dependence of transitions for single loop-helix structures with pure A-U or G-C base-pairs in the
THERMAL
TRANSITIONS
IN
456
RNA
::;~yj
0’
’ 20
I
” 40
’ 60
I
I 40
20
I
I 60
I
I
a0
I
100
I
I I20
Temperature
(a)
(b)
FIG. 5. Theoretical trmsitions for single loop-helix structures &B a function of the number base-pairs in the helix. (a) Pure A-U be,se pair helices. (b) Pure G-C base pairs. In each case m = 3 and 7 = 103.
of
helical region on the maximum number of base-pairs which can be bonded, L. In each case it is assumed that a minimum of three unpaired bases exists in the loop, and the calculations are performed using equation (13), so that only configurations with identical bases on each arm,
or GGGG
IIII cccc are considered. For a given value of L, the midpoint of the transition T,=T, (L), is determined by the maximum term in Qb, which is u,,,~sL for sufficiently large values of 2. If L is small, the value of s at the midpoint of the transition is 8 + $ 1, so that from equation (4) and the relation (2) between s and the temperature, there results l/T,
(L) = l/T, = l/T,
(co) + R/LAH In u,,,~ (00) - R/LAH In [T (vz/~)~‘~]
W-3
if the value of a, = (m/2) - 3/2/~ is substituted. A similar argument applied to the quantity of equation (5) gives for the dependence of the transition breadth, measured by the slope of the profile at T=T,, on L the form
ae
LAH Ez( ZF 1 T=Tm 4RT,2 accounting for the sharpening of the profiles with increasing L. Since the equilibrium constant for A-U pairing, sA, and that for are taken to be related by the constant k in equation (3), similar transition midpoints and slopes is predicted for an analogous molecule apart from the increase in stability seen in Figure S(b). Comparison either case computed from equation (12) instead of (13) reveals that,
(17) G-C pairing, so, behavior of the with G-C bases, of transitions in for L N 10, the
456
N.
KALLENBACH
contribution of states arising from “sliding” as illustrated by:
is negligible, a particularly below.
the chain with respect to the paired region
important result for the more complicated cases considered
(ii) Base wmposition and sequence effects Consider a chain containing both A-U and G-C base pairs, in which a single paired region exists. If the helix contains a fraction v of G-C pairs the average equilibrium constant for the region is k’s (Crothers et al., 1965) and as before, for small L, Qb E a,,,, (Ic’s)~, from which a slightly modified version of equation (16) can be obtained, 1 1 T,=T,+
1
(18)
the last term on the right being the contribution of the base compoeition. The effect of base-pair arrangement, given a particular composition of G-C and A-U pairs, is not severe for L E 5 base-pairs, due to the dominance of the simple size effect. However, sequence becomes potentially more important as the chain length of the paired region increases, and the sequenoe heterogeneity can beoome proportionally greater. This is shown in Figure 6. The transitions in (a) correspond to struc-
,5
0
40
60
60
80
80
100
Temperature (a)
(b)
FIU. 6. Theoretical transitions for single loop-helix structures of constant base composition but verying arrangement of bases within a helix. (a) Helix of L = 6 bese pairs, 50% G-C content. (b) Helix of L = 12 bsse pairs, 50% G-C content. The loop is assumed to be at the left of the structures indicated, with A designctting an A-U pair and G a G-C pair. Again m = 3 and 7 = 103.
tures of L = 6, composed of a minimum loop of three unpaired bases with a helix of 50% G-C base composition. In one case the arrangement of base pairs is
C Ad% while in the second it is the reverse, the G-C block being adjacent to the loop. The difference between the profiles in the two cases is not very great, although noticeable.
THERMAL
TRANSITIONS
IN
RNA
457
When, however, these two arrangements are compared for a molecule of L = 12, again with 50% G-C composition in the helix, the tendency for the
transition to polarize into a “biphasic” curve becomes evident (Fig. 6(b)), as does the difference between the two profiles. It is concluded therefore that distinctly biphasic transition profiles are only possible in molecules of one loop-helix unit when the length of the helix exceeds 10 base-pairs or so, in which case the sequential arrangements necessarily comprise substantial blocks of G-C or A-U pairs. This is not the case for multiple loop-helix structures, in which different helices can melt almost independently of other regions in the same molecule. (iii) Loop size and variation in the parameter The minimum number of unbonded bases capable of folding to produce a “hair-pin” chain configuration has been estimated as three, using molecular models (Spencer, Fuller, Wilkins & Brown, 1962). In the various proposed structures for tRNA, the numbers of unpaired bases in such loops varies from 3 to 12 (see Fig. 8). From equation (16), increasing the loop-size, m, decreases the midpoint of the thermal transition (the dependence of this effect on m312 is a consequence of using the Jacobson-Stockmayer approximation for the statistical weighting rules). The result of varying m is shown in Figure 7, in which transition profiles for a single loop-helix unit with a
?
0.5
m=IO m=3
-87; 0
20
40
Temperature
FIG. 7. Effect of with single loop-helix 7 = 10s.
variation of the number of unbonded bases, vn, in the loop region of chains units.
A structure
with L == 5 A-U base pairs is assumed in each case, and
maximum of L = 5 A-U base-pairs present and m = 3, 10 and 50 bases in the minimum sized loops available are presented. Note that little alteration in the form of the profile results from more than a tenfold increase in m. Variation in the value adopted for the parameter 7 exhibits the same qualitative behavior, since the product mt3i2 is the quantity affected. Thus increasing the value of 7 produces the same result as a comparable increase in m312, i.e. lowering T, while roughly preserving the transition breadth. This result is found to hold for multiple loop-helix configurations as well. Using other approximations for the entropy contributions of loops of different sizes should not appreciably alter these conclusions, as has been noted before,
468
N. KALLENBACH
(b) Transitions in multi$e loop-helix structures Qualitatively the results of helix and loop size, and of base composition and sequence described above can be extended to multiple loop-helix containing molecules by the device of treating these configurations as sums of independently melting single loop-helix molecules. That is, the melting of a molecule such as illustrated in Figure 3(b) or (c) is assumed to correspond to a mixture of the separate loop-helix components the individual transition behavior of which has been discussed. The behavior exhibited by structures of different number, size and composition of helices is perhaps best seen by analyzing the transitions predicted for the set of “clover-leaf” configurations illustrated in Figure 8. The identity of the nucleotides b-0-Phe
A-O-Ah C
A-0-k 2
FIG. 8. Proposed “clover-leaf” structures for four species of yeast tRNA’s which have been sequenced: phenylelanine, alanine, serine end tyrosine specific tRNA’s. The identity of nucleotides in loop regions is deliberately omitted. Where G-U base pairs have been assumed the bonding is shown with dotted lines. The symbol $ denotes pseudouridine.
within unpaired regions has been omitted specifically because the present treatment makes no distinctions among non-bonded bases. That these proposed structures lead to distinguishable transition profiles in each case is shown by a plot of the over-all fraction denatured, l-6, against the temperature (Fig. 9). The quantity 0 is calculated from equation (5), using equation (14) to obtain the partition function. The solvent conditions are assumed to be equivalent to those of the transitions in Figure 1, very nearly 0.2 M-sodium ion and pH ‘7. A value of T = 500 is used, based on the best estimate obtained below from the experimental data available for serine-specific
THERMAL
TRANSITIONS
IN
RNA
459
Temperature FIG. 9. Theoretical
transitions
for the structures
shown in Fig. 8. A value of I = 500 is assumed.
t RNA isolated from yeast. Absorption optical transition data are available for the yeast serine, alanine and tyrosine species, and these will be discussed more fully in that order. It must be emphasized that the proportionality assumed between the measured fraction of bases denatured and l-0 implies that, at any particular wavelength, A-U and G-C bases exhibit identical hypochromicity, and neglects any dependence of hypochromism on helix size. If the effect of size is significant (Applequist, 1967), the optical profiles should always be sharper than the predicted transitions of l-i?. Furthermore, the infrequent G-U base-pairs which occur in the tRNA structures are approximated by a pair with the stability constant of A-U, an upper limit. (i) Yeast serifze tRNA The sequences of two serine-specific yeast tRNA molecules from yeast have been reported by Zachau et al. (1966), and the model in Figure 8 is redrawn from their structural model. Felsenfeld & Cantoni (1964) have presented transition data on a purified fraction of serine tRNA from yeast, prepared by partition chromatography (Cantoni, Ishikura, Richards & Tanaka, 1963). The points in Figure 10 are from measurements of Felsenfeld (personal communication) at a sodium ion concentration of 0.01 M. The line in Figure 10 represents the l-0 curve from Figure 9, shifted 10 deg. C
Temperature
10. Comparison of experimental and theoretical transition curves for serine-specific tRNA from yea&. The points represent experimental d&a of Felsenfeld, the line the 1-O transition curve for eerine tRNA in Fig. 9 shifted 10 deg. C to the left to compenectte for the different salt concentration. FIG
31
N. KALLENBACH
460
in temperature to compensate for the difference in salt concentration. The magnitude of the temperature shift required was arbitrarily chosen to adjust the T, of t’he theoretical profile to that of the experimental data, since the behavior of thermal transitions of tRNA solutions as a function of cation strength is anomalous with respect to high molecular weight RNA species (Kallenbach, Goldstein & Englander, manuscript in preparation), and may also differ from one amino acid-specific tRNA molecule to another. If the four loop-helix units of the serine tRNA (Fig. 8) are treated as individual molecules, as might arise, for example, by cleavage of the central ring at appropriate points, the transition curves numbered 2 to 5 in Figure 11 are obtained. If the helical arm near the ends of the chain is treated as an additional loop-helix unit, the loop being of the dimension of the central ring of structure illustrated, the transition numbered 1 resultst. The numbering of the transitions corresponds to those on the drawing (Fig. 8). Superimposed on these profiles in Figure 11 is the variation of the quantity n,, the average number of helices present, calculated for the total molecule, as a function of
0
20
40
60
80
Temperature
FIG. 11. Theoretical transition curves for the five individual loop-helix units in the structure for serine tRNA shown in Fig. 8 and numbered aa in that Figure. In addition, the temperature dependence of the quantities %,, and 1-f for the complete serine model in Fig. 8 are shown (T = 500) ) 1-e; (----) &; (. . . . *) l-5. (------
temperature, using equation (8). Initially the value is 5, reaching 2 near 60°C when the individual arms 1, 2 and 3 have melted, leaving the two stable loop-helices 4 and 5 which subsequently undergo transition in the region where Z,, decreases to zero. Thus it can be predicted for instance that above 75°C only region 4 remains appreciably paired in the model for serine-specific tRNA. The profile on the far right of Figure 11 exhibits the temperature dependence of the quantity l-t, calculated for the complete molecule from equation (9). This represents the average fraction of fully opened chains (no bonded base-pairs present) in the system at the temperatures indicated. The results are clearly in accord with the behavior of nn and 1 --B for the system, in that below 60°C there are effectively no t The calculation performed assumed a value of m = 14, representing the minimum value for the effective loop size of this structure. In view of the results of Fig. 11, it is clear that this value is too small since loop-helices 2 and 3 have undergone appreciable melting at the temperature where helix 1 melts. The result of including this effect would simply be to lower the transition in temperature by less than 10 deg. C. However the point of the figure is to illustrate the approximate independence of the transitions in the individual arms of the model.
THERMAL
TRANSITIONS
IN
RNA
461
fully denatured chains present, and that only after melting of region 4 are all the complete serine chains fully unbonded. The most sensitive test of the agreement between the theoretical profile from the model and the data for the serine tRNA results from comparing the theoretical profiles of l-0, and 1 -eG, calculated from equations (6) and (7) using the same parameters as above, with the results of the analysis of the wavelength dispersion of the hyperchromism exhibited by serine tRNA on thermal denaturation (Felsenfeld & Cantoni, 1964). Since the difference spectrum between native and denatured rA:rU or rAU differs appreciably from that for rG : rC, the course of the transition measured at several different wavelengths can be used to assay the relative amounts of A-U and G-C base-pairs denatured at any given temperature. The points in Figure 12 are
Temperature
FIG. 12. Comparison between the results of wavelength dispersion analysis of the hyperchromism of serine tRNA (Felsenfeld & Cantoni, 1964), shown as points, and theoretical transitions l-0, (A Umelting) and 1 --B. (GC melting) for the five-am structure in Fig. 8. The theoretical transition curve is shifted to compensate for salt concentration as in Fig. 10, and 7 = 500.
taken from the paper by Felsenfeld & Cantoni (1964), and represent the normalized separate A-U and G-C denaturation spectra for serine tRNA. The three different sets of points correspond to using the extinction coef6cient for rA : rU, rAU, or an average of these two values as a basis for the dispersion analysis. The lines in Figure 12 are the theoretical transition profiles of 1-0, and 1 -f& for the serine model illustrated in Figure 8. Again the theoretical profiles have been shift,ed 8 deg. C lower to compensate for the difference in T,,, due to the difference in salt concentration between the experimental data (0.01 M-Nacl) and the solvent conditions assumed in the theoretical calculation, equivalent to 0.2 M-Nacl. The agreement in form between the theoretical curves and experimental points is obvious. It is found, moreover, that the difference in T,,, of the A-U and G-C profiles depends on the value of the parameter T adopted. The value 7 = 500 used is that which makes this difference in T,,, best 6t the experimental results. The actual forms of the profiles
462
N. KALLENBACH
are relatively insensitive to the value of 7 used, between the limits 50 < 7 < 5000 which encompass the range of values estimated for this parameter. The agreement between the profile predicted for the “clover-leaf” arrangement (Fig. 8) and the denaturation data (Felsenfeld L%Cantoni, 1964) does not rule out a number of alternative configurations for the serine specific molecule. For example, the melting profile of an alternative model for serine tRNA similar to the “extended model” described by Zachau et al. (1966), corresponds in form to that of the five arm model (Fig. 8), although with slightly lower T,. The transition computed in Figure 13 corresponds to 8 three arm structure consisting only of helices 1, 3 and 4 (Fig. 8).
Temperature
FIG. 13. Comparison between theoretical transition profiles for two alternative serine models. The solid line corresponds to the structure in Fig. 8 with all loop-helix units intact, while the dashed line represents the proiile of a structure consisting only of the units numbered 1, 3 and 4.
The extended model of Zacheu et al. (1966) has two small helical regions, one with 2, and the other 3 base pairs, in addition. Depending on the stabilization gained by these regions due to stacking onto helix 1, these residues may modify the profile. However neither the behavior of l-9 nor the 1 -l?* or 1 --B. transitions is altered very much. There is thus no claim made that, of all possible arrangements for the sequence for serine tRNA, the 5 arm model is unique in agreeing with experiment, nor that it is the most stable possible arrangement, since there exist configurations where extensive intermediate stsking free energy contributions must be considered, for which thermodynamic data are presently lacking (see Kallenbach & Crothers, 1966). Hence it is difficult to evaluate certain linear arrangements where these interactions may play a major role. (ii) Yeast alanine and tyrosine tRNA’s The experimental data so far published on these species are not as detailed as in the case of serine tRNA, and these are therefore discussed together. Transitions corresponding to fraction A-U and G-C melted as a function of temperature are presented in Figure 14 for the models of Figure 8. Mahler, Dutton & Mehrotra (1963) and Fresco et al. (1963) have reported thermal transition data for alanine and tyrosine tRNA purified by Holley et al. (1965). In O-15 M-N&I, O-015 M-sodium citrate, Mahler et al. (1963) find that an&line tRNA exhibits a T, of 68”C, with Az13 (the temperature range over which 213 of the observed transition occurs, measured symmetrically about the T,) being 36°C while for tyrosine tRNA, the T, is 63°C and A2,3 approximately 40°C. The theoretical profiles for the structures in Figure 8 give, for alanine tRNA, T, = 68‘72 and A2,3 = 17”C, and for tyrosine tRNA, T, = 64’C
THERMAL
TRANSITIONS
IN
‘463
RNA
and A2,3= 32°C. The agreement between the T, values in each case is good, whereas the theoretical prof?le for tyrosine is sharper than the experimental data, and that for alanine very much so. This is also indicated in the case of the profiles for these species determined by Fresco et al. (1963) at 280 rnp, which should be comparable to the l- 8, transitions in Figure 14, since the A-U denaturation spectrum contributes relatively little at this
(a)
(b)
20
40
60
80
Temperature Cc)
FIG. 14. Theoretical melted, for the &mine,
transition tyrosine,
curves of l--8*, and phenylalanine
AU fraction melted, and I-&, models shown in Fig. 8.
CC frection
wavelength. Experimentally, A2,3is roughly 48°C for tyrosine and 42°C for alanine, whereas the theoretical curves give values of 36°C for tyrosine and 18°C for alanine tRNA. While the predicted behavior of the tyrosine tRNA structural model is not completely in agreement with the experimental results, the discrepancy is much less serious than for alanine tRNA, in which the differences are sufficiently serious to warrant further explanation. First, the model may be incorrect by including too many base-pairs in each helix, thereby unduly sharpening the profile, or by neglecting a considerable quantity of less-stable ordered regions which melt at a lower temperature and thus broaden the actual profile. Second, the contribution of other factors to the optical properties of this molecule may be stronger than in the case of serine or tyrosine tRNA so that the agreement in these latter cases may be fortuitous. Finally, the assumptions made in equating theoretical parameters with optical data whioh have been disoussed previously could be in error. It seems clear that more extensive data on purified aminoacyl tRNA’s are required before any real evaluation of these possibilities, or a number of obvious alternatives can be made.
464
N. KALLENBACH
4. Discussion The role of a number of parameters which govern the thermal transitions of short RNA chains in which the secondary structure arises because of intramolecular complementary base-pairing has been analyzed in terms of a statistical theory. In particular, the effects of length, base composition and sequence on the T, and form of the transition profiles have been quantitated. The treatment described here can be applied to predict the transition profiles of a large class of possible structural models for such chains. In the case of the yeast serine-specific tRNA model suggested by Zachau et al. (1966), the predicted transition profile is in remarkable agreement with experimental data obtained by Felsenfeld & Cantoni (1964) on a serine tRNA fraction from yeast prepared by different chromatographic methods. The agreement between the theoretical profiles for models of yeast alanine and tyrosine-specific tRNA’s and experimental transitions is not as good, although the available data in these latter cases are not extensive. The ability of the theory to compute several quantities in addition to those ordinarily measured experimentally should be noted. For instance, predicted profiles of average numbers of helical regions per molecule and average fraction of completely unbonded chains have been calculated for the structures shown in Figure 8. These latter quantities should have particular application to the analysis of hydrodynamic transition data of the kind presented by Fresco et al. (1966). The theory presented here is clearly not comprehensive, in that several relevant effects have not been taken into account or have been incompletely dealt with. The contribution of base stacking between adjacent unpaired bases to the stability of the moIecule is neglected. Apart from the effects of these interactions on the optical absorption and rotatory dispersion transition data (see e.g. Vournakis & Scheraga, 1966; Cox & Kanagalingam, 1967), this structure has a possible direct effect on base-pairing as well, in that formation of a base-pair may be difficult or even impossible if one base is strongly stacked upon a neighbor. Moreover, unbended regions of the chain which are presently assigned equivalent statistical weights at all temperatures represent different structures at each temperature as a consequence of these interactions. The existence of higher structure, involving interactions between already paired regions (e.g. Fresco et al., 1966), has also been neglected. The several examples of transitions between ordered alternative forms of tRNA which are active or inactive in accepting amino acids may reflect structure at either level (Gartland & Sueoka, 1966; Ishida & Sueoka, 1967). In this case the absence of thermodynamic data precludes any detailed investigation, and the possibility remains that such interactions may appreciably influence the form of thermal transition curves. Finally a discussion of effects of monovalent or divalent cations on these systems has been avoided. The effects of magnesium ion on RNA structure are profound (Boedtker, 1960; Stevens t Felsenfeld, 1964) and probably reflect substantial changes in the structure present. It seems plausible that the “clover-leaf” models may represent the conformation of the chain under limited conditions of salt concentration only, and that alternative configurations can exist in the presence of high magnesium ion concentrations. From the present study certain definite statements concerning secondary structure in tRNA can be made. If in unfractionated tRNA preparations there exist appreciable amounts of molecules (10% or so) which possess secondary structures contaming
THERMAL
TRANSITIONS
IN
RNA
465
uninterrupted helices of twenty base-pairs or more, as in the model proposed by Cantoni et al. (1963), these would be detectable as a very sharp tail transition at high temperature because of the length effect described above. From the transition profiles of tRNA which have been measured, no discontinuities at high temperatures are evident (see, e.g., Felsenfeld 6 Sandeen, 1962), so that the contributionof such structures must therefore be small. Hence species containing perfectly stacked helical regions with lengths substantially in excess of those in the molecules already sequenced must be rare, implying a homogeneity extending to all these molecules in accord with the obvious similarities among the sequenced species which have been pointed out by Jukes (1966), Crick (1966), and others. The extent of the agreement in the serine tRNA case illustrates the potential applicability of the theoretioal approach to the resolution of individual transitions in a molecule where these transitions can overlap extensively. This could lead to a means of identifying the functions associated with an individual region or group of regions from their relative thermal stabilities. With the availability of more complete transition data, it is anticipated that the description presented will be of some utility in elucidating the secondary structure of tRNA, 5 s RNA (Brownlee, Sanger & Barrell, 1967) and similar species of biological interest. This work was Donald Crothers making available on an IBM 360
supported by NSF grants GB-4200 and GB-6545. I wish to thank Dr for many helpful discussions and Dr Gary F&e&Id for generously his unpublished data. The computations in this article were performed computer at the University of Pennsylvania Computer Center. REFERENCES
Applequist, J. (1967). In Conformation of Biopolymers, vol. 1, p. 403, London: Academic Press. Applequist, J. & Damle, V. (1966). J. Amer. Chem. Sot. 88, 3895. Billeter, M. A., Weissmann, C. & Warner, R. C. (1966). J. Mol. BioE. 17, 145. Boedtker, H. (1960). J. Mol. Biol. 2, 171. Brahms, J., Michelson, A. M. & Van Holde, K. E. (1966). J. MOE. BioZ. 15, 467. Brown, G. L. (1963). Prog. Nut. Acid Rea. 2, 260. Brownlee, G. G., Sanger, F. & Barrell, B. G. (1967). Nature, 215, 735. Bunville, L., Geiduschek, E. P., Rawitscher, M. & Sturtevant, J. M. (1965). Biopolymers, 3, 213. Burdon, R. H., Billeter, M. A., Weissmann, C., Warner, R. C., Ochoa, S. & Knight, C. A. (1964). Proc. Nat. Ad. Sci., Wash. 52, 768. Cantoni, G. L., Ishikura, A., Richards, H. H. & Tanaka, K. (1963). Cold Spr. Ha&. Symp. Quant. BioZ. 28, 123. Cantor, C. R., Jaskunas, S. R. & Tinoco, I., Jr. (1966). J. MoZ. BioZ. 20, 39. Chamberlin, M. J. (1965). Fed. Proc. 24, 1446. Chamberlin, M., Baldwin, R. L. & Berg, P. (1963). J. Mol. BioZ. 7, 334. Cox, R. A. & Kanagalingam, K. (1967). Biochem. J. 103, 749. Crick, F. H. C. (1966). J. Mol. BioZ. 19, 548. Crothers, D. M. & Kallenbach, N. R. (1966). J. Chem. Phys. 45, 917. &others, D. M., Kallenbaoh, N. R. & Zimm, B. H. (1965). J. MOE. BioZ. 11, 802. Crothers, D. M. & Zimm, B. H. (1964). J. Mol. BioZ. 9, 1. Doty, P., Boedtker, H., Fresco, J. R., Haselkorn, R. & Litt, M. (1959). PTOC. Nat. Acud. Sci., Wash. 45, 482. Englander, S. W. & Englander, J. J. (1965). Proc. Nat. Acud. Sci., Wa-~h. 53, 370. Felsenfeld, G. & Cantoni, G. L. (1964). Proc. Nut. Acad. Sci., WCLQ~. 51, 818. Felsenfeld, G. & Miles, H. T. (1967). Ann. Rev. Biochem. 36, 407. Felsenfeld, G. & Sandeen, G. (1962). J. Mol. BioZ. 5, 687.
466
N. KALLENBACH
Fresco, J. R., Adams, A., Ascione, R., Henley, D. & Lindahl, T. (1966). Cold Spr. Harb. Syrnp. Quant. BioZ. 31, 527. Fresco, J. R., Klotz, L. C. & Richards, E. G. (1963). Cold Spr. Hurb. Symp. Quunt. Biol. 28, 83. G&land, W. J. & Sueoka, N. (1966). Proc. Nat. Acad. Sci., Wmh. 55, 948. Gomatos, P. J. C Tamm, I. (1963). Proc. Nat. Acd Sk., Wash. 50, 878. Griffin, B. E., Haslam, W. J. & Reese, C. B. (1964). J. Mol. Biol. 10, 353. Hill, T. L. (1959). J. Chem. Phys. 30, 383. Ho&y, R. W., Apgar, J., Everett, G. A., Madison, J. T., Marquisee, M., Merrill, S. H., Penswick, J. R. & Zamir, A. (1965). Science, 147, 1462. Ishida, T. & Sueoka, N. (1967). Proc. Nut. Ad. Sci., Wash. 58, 1080. Jacobson, H. & Stockmayer, W. (1950). J. Chem. Phya. 18, 1600. Jukes, T. H. (1966). Biochem. Biophy8. Ree. Comm. 24, 744. Kaerner, H. C. & Hoffmann-Berling, H. (1964). Nature, 202, 1012. Kallenbach, N. R. & Crothera, D. M. (1966). Proc. Nat. Ad. Sk, Wash. 56, 1018. Kingsbury, D. W. (1966). J. Mol. BioZ. 18, 204. Krakauer, H. & Sturtevant, J. M. (1968). Biopolymer.~, 6, 491. Leng, M. & Felsenfeld, G. (1966). J. Mol. BioZ. 15, 455. L&on, S. & Zimm, B. H. (1963). Bivolymera, 1, 15. Madison, J. T., Everett, G. A. & Kung, H. K. (1966). Cold Spr. Had. Symp. Quant. Biol.
31, 409. Mahler, H. R., Dutton, G. & Mehrotra, B. D. (1963). Biochim. biophye. Acta, 68, 199. Marmur, J. (1961). J. Mol. BioZ. 3, 208. Marmur, J. & Doty, P. (1962). J. Mol. BioZ. 5, 109. Michelson, A. M., MaesouliB, J. & Guechlbauer, W. (1967). Prog. Nut. Acid. Res. and Mot. BioZ. 6, 83. Millar, D. B. t Steiner, R. F. (1966). Biochemistry, 5, 2289. Montagnier, L. t Sanders, F. K. (1963). Nature, 199, 664. Nonoyama, M. & Ikeda, Y. (1964). J. Mol. BioZ. 9, 763. RajBhandary, U. L., Chang, S. H., Stuart, A., Faulkner, R. D., Hoskinson, R. M. BE Khorana, H. G. (1967). Proc. Nat. Acd. Sk., Wash. 57, 751. Rich, A. & Tinoco, I., Jr. (1960). J. Amer. Chem. Sot. 82, 6409. Spencer, M., Fuller, W., Wilkins, M. H. F. & Brown, G. L. (1962). Nature, 194, 1014. Spirin, A. S. (1960). J. Mol. BioZ. 2, 436. Stevens, C. L. & Felsenfeld, G. (1964). BiopoZymers, 2, 293. Van Holde, K. E., Brahms, J. t Michelson, A. M. (1966). J. Mol. BioZ. 12, 726. Vournakis, J. N. t Scheraga, H. A. (1966). Biochemktry, 5, 2997. Zachau, H. G., Dutting, D., Feldmann, H., Melchers, F. & Karau, W. (1966). Cold Spr. Harb. Symp. Quant. BioZ. 31, 417. Zimm, B. H. (1960). J. Chem. Phy8. 33, 1349.