[21] RNA pseudoknot: Structure, detection, and prediction

[21] RNA pseudoknot: Structure, detection, and prediction

[21] RNA PSEUDOKNOTS 289 [21] R N A P s e u d o k n o t s : S t r u c t u r e , D e t e c t i o n , a n d P r e d i c t i o n By C. W. A. PLEU and...

913KB Sizes 15 Downloads 84 Views

[21]

RNA PSEUDOKNOTS

289

[21] R N A P s e u d o k n o t s : S t r u c t u r e , D e t e c t i o n , a n d P r e d i c t i o n

By C. W. A. PLEU and L. BOSCH RNA pseudoknots are structural elements of the RNA architecture. Although detected first in tRNA-like structures at the 3' ends of various plant viral RNAs,~-4 pseudoknot occurrence is much more widespread,5-s and the pseudoknots may be looked on as expressing one of the basic principles of the folding of RNA chain segments in space. It should be emphasized that our knowledge of the spatial folding of RNA in general is limited. Crystal structures of only a few tRNA species have been elucidated, 9A° and, although these analyses have revealed a wealth of new intramolecular interactions, it has not become clear whether these structural features have a more general significance and whether these structural features have been used more frequently for realizing the intricate folding of RNA. It is conceivable, however, that a few more general building principles will become apparent when the three-dimensional structure of a larger number of naturally occurring RNA species becomes available. Clearly, the determination of the higher order structures of RNA is a great challenge and probably will prove difficult, because these molecules remain refractory to crystallization. A basic feature of the secondary structure of RNA is the classical stem-loop structure. Pseudoknots arise on base pairing of the singlestranded loops enclosed by these stems, with complementary unpaired regions elsewhere in the RNA chain. In this chapter, we describe the principle underlying this folding of the secondary structure into the higher order pseudoknot. Some of the steric problems encountered by the foldi K. Rietveld, K. Linschooten, C. W. A. Pleij, and L. Bosch, EMBO J. 3, 2613 (1984). 2 A. Van Belkum, J. P. Abrahams, C. W. A. Pleij, and L. Bosch, Nucleic Acids Res. 13, 7673 0985). 3 C. W. A. Pleij, K. Rietveid, and L. Bosch, Nucleic Acids Res. 13, 1717 (1985). 4 C. W. A. Pleij, J. P. Abrahams, A. van Belkum, K. Rietveld, and L. Bosch, UCLA Syrup. Mol. Cell. Biol. [N.S.] 54, 299 0986). 5 H. F. Noller, Annu. Rev. Biochem. 53, 119 (1984). 6 I. C. Deckman and D. E. Draper, J. Mol. Biol. 221, 235 0987). R. W. Davies, R. B. Waring, J. A. Ray, T. A. Brown, and C. Scazzocchio, Nature (London) 300, 719 (1982). 8 D. S. McPheeters, G. D. Stormo, and L. Gold, J. Mol. Biol. 201, 517 (1988). 9 j. p. Robertus, J. E. Ladner, J. T. Finch, D. Rhodes, R. S. Brown, B. F. C. Clark, and A. Klug, Nature (London) 250, 546 (1974). 10 S. H. Kim, F. L. Suddath, G. J. Quigley, A. McPherson, J. L. Sussman, A. H. J. Wang, N. C. Seeman, and A. Rich, Science 185, 435 (1974).

METHODS IN ENZYMOLOGY,VOL. 180

Copyright© 1989by AcademicPress, Inc. All rightsof reproductionin any form reserved.

290

CHARACTERIZATION OF

RNAs

[2 i]

HAIRPIN LOOP

FZG. 1. Elementary secondary structure elements of RNA. ing will be addressed, and some of the characteristic properties of pseudoknots will be discussed. Methods for the detection and prediction of pseudoknots will also be presented and illustrated by a number of welldocumented examples.

Principle of Pseudoknotting Standard secondary R N A structures, such as the well-known cloverleaf of tRNA, are characterized by a small number of typical features as depicted in Fig. I. Beside double-helical or stem regions and hairpins, which result from intramolecular base pairing, interior and bulge loops, bifurcations, and bifurcation loops can be recognized. All current computer programs predict these features.l~ However, as first realized by Ninio 12 and later by others, 13 it is conceivable that the single-stranded regions of the various loops in turn can participate in base pairing with complementary single-stranded regions elsewhere in the chain. The result of such a base pairing was initially called a knotted structure, because of the potential formation of real 11 M. Zuker and P. Stiegler, Nucleic Acids Res. 9, 133 (1981). 12j. Ninio, Biochimie 53, 458 (1971). 13 G. M. Studnicka, G. M. Rahn, I. W. Cummings, and W. A. Salser, Nucleic Acids Res. 5,

3365 (1978).

[21]

R N A PSEUDOKNOTS

291

knots, in particular when stretches of 10 or more nucleotides are involved. 14 Presently, the term pseudoknot, originally proposed by Studnicka et a l . ) 3 is used. A general definition of a pseudoknot may be given as follows: a structural element of RNA formed on base pairing of nucleotides within one of the four loops in an orthodox secondary structure (see Fig. 1) with nucleotides outside that loop. According to this definition, base pairing is not restricted to interactions between one of the loops and the free singlestranded stretches at either the 3' or 5' terminus of the RNA chain as depicted in Fig. 1. Also loop-loop interactions satisfy the definition, for instance, that of the hairpin loop with either side of the interior loop. Such an interaction may be difficult to realize physically, in some cases, for steric reasons. Clearly, steric limitations have to be considered when searching for pseudoknots of any kind (see Searching for Pseudoknots in RNA). In principle, the location and the number of nucleotides within the various loops involved in pseudoknotting are not subject to limitations, but for reasons outlined below, we will restrict ourselves to interactions involving at least three base pairs or more. This means that the wellknown interaction between the D- and T-loop of tRNA, involving, among others, the conserved G. C and G. $ base pairs, will not be considered in this chapter. We further assume that the double-helical segments in pseudoknots are of the R N A - A type and that the interaction of the complementary strands occurs in an antiparaUel fashion, such as that in standard secondary structures. Before discussing methods to find and predict pseudoknots in RNA, the characteristics of a relatively simple type of pseudoknot will be described, because of its widespread occulTence and since its geometric properties are reasonably well understood. Representative Example of Pseudoknot Figure 2 shows one of the simplest examples of a pseudoknot, displaying all the characteristics of this structural element. A stretch of three or more nucleotides of a hairpin loop base pairs with a complementary region outside the hairpin. The loop sequence, participating in this tertiary interaction, borders directly to the stem region of the hairpin. This particular type of pseudoknot will be called the H pseudoknot (H standing for hairpin loop). The H pseudoknot was detected first as an essential part of t4 C. R. Cantor, in "Ribosomes" (G. Chambliss, G. Craven, J. Davies, K. Davis, L. Kahan, and M. Nomura, eds.), p. 23. University Park Press, Baltimore, Maryland, 1980.

292

CHARACTERIZATIONOF RNA8

-5"", ",

A

[21]

B

Li,~L2 S1 L2

5'

C

3'

L1

5' Q.__j,

L2

D

3'

i i i i

$2

5'

%,

FIG. 2. The formation of a pseudoknot of type H. S~ and Sz represent stem regions obtained by Watson-Crick base pairing, and L~ and L2 represent the single-stranded connecting loops. (A) Conventional secondary structure presentation. (B) Schematic presentation in which the RNA sequence is given in a circular form. (C) A "two and one-halfdimensional" presentation. (D) Artistic view of the three-dimensional folding, illustrating the coaxial stacking of the eight base pairs.

the a m i n o a c y l a c c e p t o r a r m o f the tRNA-like structure at the 3' terminus o f turnip yellow m o s a i c virus R N A ( T Y M V RNA). 15 This plant viral R N A is able to specifically a c c e p t valine, analogous to its standard t R N A counterpart. This suggests that the 3'-terminal structure of T Y M V R N A displays a rather close r e s e m b l a n c e to the structure o f t R N A v~. Probing endlabeled 3'-terminal f r a g m e n t s o f T Y M V R N A for their s e c o n d a r y structure did not reveal a c l o v e r l e a f structure, however. ~5J6 The techniques used in studies o f this type include chemical modification with diethyl p y r o c a r b o n a t e (DEPC) and dimethyl sulfate (DMS) according to Peattie and Gilbert 17 and e n z y m a t i c digestion with $1 nuclease, R N a s e T1, and the double-stranded-specific nuclease f r o m the v e n o m o f the c o b r a N a j a n a j a o x i a n a . T h e s e and a few other p r o b e s consist of the standard tools for m a p p i n g R N A structures and will not be described in t5 K. Rietveld, R. van Poelgeest, C. W. A. Pleij, J. H. van Boom, and L. Bosch, Nucleic Acids Res. 1O, 1929 (1982). ~6C. Florentz, J. P. Briand, P. Romby, L. Hirth, J. P. Ebei, and R. Giege, EMBO J. 1, 269 (1982). t7 D. Peattie and W. Gilbert, Proc. Natl. Acad. Sci. U.S.A. 77, 4769 (1980).

[21]

RNA PSEUDOKNOTS

A

293

5' B

AOH

C C

go

oACcG~:UCG 70

U "A~AA C'G U'A G. C-so U.A 6o-C.G C C

UCA

10

I

10

AAGGCu~,Cu U U A

r[

uUCCGAGG G C }o U C C C 30

C Uo o o C

GGG'~AGCCU-2o

'cucd

G ,'ol

uUG A A

I~

C

O'C

7o-C • G

U

A A

U

CGc~cUCA ~.0

G • CGAUU . . . . . G / U'A C'G U'A m G - C-so U-A 6o-c. G C C

5'

FIG. 3. Secondary and tertiary structure of the 3' terminus of TYMV RNA. (A) Secondary structure. Hairpins are indicated with roman numerals (1-IV). Numbering of the nucleotides is from the 3' end. (B) L arrangement of the tRNA-like structure of TYMV RNA. The aminoacy! acceptor arm contains the R N A pseudoknot.

detail in this chapter. The reader is referred to Ehresmann et al. m for a recent review, to the chapter by Knapp m in this volume. 2° In the case of the 3' end of TYMV RNA, the secondary structure illustrated in Fig. 3A was found. This structure consists of four regular hairpins, but lacks the conventional aminoacyl acceptor stem of tRNA generated by base pairing of the 3' and 5' endsJ 5,16 Indications that the secondary structure of Fig. 3A can be folded into a tertiary structure resembling the familiar L shape of canonical elongator tRNA were provided by the following data. 1. The ACCA end is immediately followed by a stem region (stem I), reminiscent of the aminoacyl acceptor arm of tRNA. 2. The double-stranded-specific cobra venom nuclease cuts in the triple G sequence (G-13-G-15) of the loop of hairpin I and somewhat more weakly in the triple C sequence adjacent to stem II (C-25-C-27), suggesting base-pairing interactions. 3. The triple C sequence is protected from chemical modification with DMS under so-called native conditions (10 mM Mg2+), but not m C. F. Ehresmann, F. Baudin, M. Mougel, P. Romby, J. P. Ebel, and B. Ehresmann, Nucleic Acids Res. 15, 9109 (1987). m G. Knapp, this volume [16]. 2o C. W. A. Pleij and L. Bosch, this volume [21].

294

CHARACTERIZATION OF R N A s

[21]

under semidenaturing conditions [1 mM ethylenediaminetetraacetic acid (EDTA)]. 4. The complementarity of the possible triple G-triple C sequence interaction is conserved among related viral RNAs. Base pairing of the triple G and triple C sequences enables building of a model in which stem I, the (G-13-G-15) : (C-25-C-27) segment, and stem II are stacked coaxially on top of each other. This model closely resembles the aminoacyl acceptor arm of canonical tRNA (Fig. 3B) (compare also Refs. 15 and 21). Note that the model contains a pseudoknot of the same type as the one shown schematically in Fig. 2. An additional feature is that the pseudoknot itself is stacked on top of stem II. It may be realized that the discovery of the pseudoknot at the 3' end of TYMV RNA, as outlined above, was strongly guided by the aim of finding a resemblance with the classical three-dimensional structure of tRNA. It is doubtful whether the outcome would have been the same without the knowledge of the tRNA data. Pseudoknots of the type illustrated in Figs. 2 and 3 are relatively widespread in the noncoding sequences of viral RNAs, as became apparent when the 3' termini of various plant viral RNAs were studied. 4,22 These phylogenetic comparisons also permit derivation of some general rules concerning the steric requirements for their formation) The rules are underscored and extended by model building using computer graphics. 21 Properties of H Pseudoknot The main properties of the H pseudoknot are briefly reviewed in this section, since these properties are relevant in searching for pseudoknots in other RNAs. The properties illustrate the importance of the doublehelix geometry of RNA in considerations of pseudoknots. The H pseudoknot of Fig. 2 always consists of two stem regions, $1 and $2, and two single-stranded loops, Lt and L2. St and Sz are thought to be coaxially stacked, so that a quasi-continuous double helix is formed. It should be emphasized that this coaxial stacking, though strongly suggested by the models of the tRNA-like structures, is an assumption for which direct proof remains unavailable. The minimal number of base pairs in each stem is found presently to be three base pairs. When three base pairs are found in one stem, then five to seven base pairs are found in the 2~ p. Dumas, D. Moras, C. Florentz, R. Giege, P. Verlaan, A. van Belkum, and C. W. A. Pleij, J. Biomol. Struct. Dyn. 4, 707 (1987). 22 C. W. A. Pleij, unpublished results (1988).

[21]

RNA PSEUDOKNOTS

295

other stem. 3,4 To date, we have not been able to derive any rule for the nature or sequence of the base pairs in either $1 or $2, except that a short run of three base pairs usually contains two or three G. C pairs. It is crucial to realize that, due to the polarity of the RNA chain and the geometry of the R N A - A double helix, the connecting loops LI and L2 a r e not equivalent. L~ always crosses the deep groove, and L2 c r o s s e s the shallow groove of the quasi-continuous helix formed by S 1 and $2. Analysis of the R N A - A double helix shows that surprisingly few nucleotides are needed for Ll to span the deep groove. No steric problems are encountered when L~, containing two nucleotides, has to span five to seven base pairs, but even a L1, with one single nucleotide, is capable of doing so. 3,21 In the latter case, for reasons unknown to date, this single nucleotide is often a G residue. An example of a H pseudoknot with a single A residue in L I is found in Ref. 8. The distance for bridging the shallow groove is larger (reason that L 2 has to consist of two or more nucleotides). In principle, there is no upper limit for the size of Ll and L2. These singlestranded loops can have lengths of hundreds of residues and can possess an elaborate secondary structure of their own. 7,23 As we will see below, the very steric properties of these connecting loops are responsible for the anomalous behavior of the constitutive base residues in structure mapping experiments.

Searching for Pseudoknots in R N A In some cases, pseudoknots can be traced by the following simple approach. Starting from an orthodox secondary structure, one inspects visually the loop regions for potential base pairing with other singlestranded regions. Though this method appears time-consuming and unscholarly, the approach certainly is rewarding, when the size of the R N A is relatively small or the secondary structure is well established (e.g., rRNAs) or when pseudoknots can be expected (c.g,, noncoding sequences of viral RNAs). This approach has revealed the existence of various consecutive pseudoknottcd structures in a number of plant viral RNAs. 4,22Finding pseudoknots in this way is not a rare phenomenon. Statistically,one can expect that roughly I out of every I0 hairpin loops can form a pscudoknot of the H type, provided that the connecting loops LI and L2 do not cxcccd some 10 nuclcotidcs. Support for the existence of pseudoknots found in this way should come from other techniques, such as structure mapping or sequence comparisons (sec Detecting Pscudoknots and Phylogenctic Sequence Comparisons). The search for H 23 K. Rietveld, C. W. A. Pleij, and L. Bosch, EMBO J. 2, 1079 (1983).

296

CHARACTERIZATION OF R N A s

[2 1]

pseudoknots can be expedited by using the program of Zuker and StieglerH in the mode that yields a so-called open structure (Zuker-2). If one also varies the maximal length of the hairpins to be folded, one obtains a collection of hairpin structures that can be rapidly screened for possible pseudoknots. A more sophisticated method is to develop a program or a subroutine that enables the search for extra base pairing in an otherwise fixed secondary structure. This was performed by Salser, who enumerated these tertiary interactions in the rabbit fl-globin mRNA. 24 Finally, we provide some additional suggestions for finding pseudoknots of the H type. Good candidates for forming pseudoknots are regular hairpins having 5-10 nucleotides in the loop and 5-8 base pairs in the stem. Bulge loops are seldom present in SI and $2 (Fig. 2), the only documented examples being a pseudoknot in TMV RNA 2 and the rather complicated pseudoknot in E s c h e r i c h i a coli c~ mRNA reported by Deckman and Draper. 6 In some cases, a disruption of the upper base pair of a hairpin stem by increasing the hairpin loop size by two nucleotides can be helpful. Similarly, a temporary disruption of other structural elements can sometimes reveal an interesting alternative structure of the pseudoknot type. On screening hairpin loops, one should not infer from Fig. 2A that only the 3' side of a loop can be involved in base pairing. If the hairpin with the other stem segment ($2 in Fig. 2) would have been drawn (or found) first, then the 5' side of the loop would have been part of the basepairing interaction. The final result, however, remains the same. Thus, one should examine both the 3' and 5' side of any hairpin loop under consideration. Stacking of the loop nucleotides on top of the SI double helix occurs toward the 3' end, whereas stacking of the loop nucleotides on top of the $2 double helix occurs toward the 5' end. Finally, there is also the question of which connecting loop crosses the deep groove. In the H-type pseudoknots, the first connecting loop encountered following the RNA chain from the 5' end always crosses the deep groove. Detecting Pseudoknots Currently available chemical and enzymatic probes can provide information specific for pseudoknots. The tertiary interactions of these structural elements can be utilized. Lowering the salt concentration or elevating the temperature leads characteristically to a relatively rapid unfolding of these tertiary interactions and thus of pseudoknots. On pseudoknotting, the RNA chain adopts a more compact conformation, which requires screening of the negative phosphate groups (e.g., by Mg2+). Structural 24 W. Salser, Cold Spring Harbor Syrup. Quant. Biol. 42, 985 (1977).

[21]

RNA PSEUDOKNOTS

297

mappings, carried out under varying ionic conditions or at varying temperatures, may thus reveal which R N A sequences are involved in tertiary interactions and in pseudoknotting. Specifically, the reader is referred to a study by Peattie and Gilbert of tRNA Phe from yeast, t7 Probing with DMS or DEPC can be performed at 37° in the presence of 10 mM Mg 2+ (native conditions) or 1 mM EDTA (semidenaturing conditions). Chemical probing is also possible under so-called denaturing conditions at 90°. A good example of this experimental approach is the study of the tRNA-like structure of brome mosaic virus RNA, 23 revealing the disappearance of the pseudoknot under semidenaturing conditions with concomitant transition of the involved R N A chain segment into a new secondary structure. Chemical probing in the presence and absence of Mg 2÷ is therefore recommended strongly. Probing of the structure at Mg 2÷ concentrations varying over a whole range may present a clearer picture and moreover may provide some insight into the dynamic behavior of the RNA molecule. Similar information may be derived from melting transitions observed on chemical modifications and nuclease digestions at various temperatures. Studies of this type were first reported by de Bruyn and Klug, who were able to propose a three-dimensional model for a mammalian mitochondrial tRNA, mainly on the basis of base modifications with DEPC and DMS at temperatures from 0 to 900.25 Similar experiments were carded out for ribosomal 5 S RNA and for the so-called 3'-terminal cloacin fragment of 16 S rRNA from E. coli. 26,27 The method can be extended to other chemical probes, such as sodium bisulfite, which converts unpaired cytosine residues into uridines. A drawback of bisulfite is that the modification reaction has to be carried out at high Na ÷ concentrations, hampering comparisons of the results with those results obtained by other means. 2g Information concerning the proper reaction conditions for each of the chemical probes mentioned above may be found in Refs. 25 and 28. Unfolding of the R N A secondary and tertiary structure generally renders the base residues more accessible to modifying agents. An interesting exception has been found for the connecting loop L2, spanning the shallow groove of the pseudoknot H type (compare Fig. 2). When this loop is short, such as that of the pseudoknot at the 3' end of TYMV RNA (three nucleotides), its base residues are forced to point outward into the solvent, rendering the base residues very accessible to probes, such as 25 M. H. L. de Bruyn and A. Klug, EMBO J. 2, 1309 (1983). 26 T. Pieler, M. Digweed, and V. A. Erdman, J. Biomol. Struct. Dyn. 3, 495 (1985). 27 H. A. Heus and P. H. van Knippenberg, J. Biomol. Struct. Dyn. 5, 951 (1988). 2a A. van Belkum, P. Verlaan, J. Bing Kun, C. W. A. Pleij, and L. Bosch, Nucleic Acids Res. 16, 1931 (1988).

298

CHARACTERIZATIONOF R N A s

[]21]

FIG. 4. Stereoscopic view of the pseudoknot in the aminoacyl acceptor arm of the tRNAlike structure of TMV RNA, as modeled with computer graphics. The model represents residues A-l-C-30 (see Fig. 3). The two connecting loops [(A-10-U-12) and (U-21-C-24)] are indicated by thick lines. The arrows indicate residue A-10.

DMS or DEPC. This rather unusual conformation of unstacked bases, not taken up in a more hydrophobic environment, is clearly visualized by model building. 2~ The A residue at position 10 (compare Figs. 3 and 4) appears to be highly reactive with DEPC at 10 m M Mg 2+ even at 0°. Similarly, C residues have been found to be exposed, when probed with DMS or sodium bisulfite) 5,~s On unfolding, however, the adenine becomes much more shielded against DEPC, probably as the result of its incorporation into the stem of a new hairpin. A similar behavior has been found for an adenine residue in TMV RNA 2 (see Fig. 5) and for an adenine in an identical position in a pseudoknot at the 3' end of tobacco rattle virus RNA. 29 Such an anomalous behavior of a single base can therefore form an indication for the presence of a pseudoknot. Enzymes, such as nuclease S1 and RNase T1, can also be used for temperature-dependent probing. 26,28 RNase T1, though cutting only after single-stranded G residues, appears to be superior to nuclease S1 in that RNase T1 is active up to 80 ° and in the presence or absence of Mg 2+, A. Van Belkum, B. Comelissen, H. Linthorst, J. Bol, C. W. A. Pleij, and L. Bosch, Nucleic Acids Res. 15, 2837 (1987).

RNA PSEUDOKNOTS

[21]

299

A

A

(,~(~0 A U ,,,.~U U A ''A FT.. ]'A A , , / / .t., C_lt.0 /,,',// C u O U-"" O f C O U / , t ,' u G'L',",. , C A" U "."..". , O C ,//' u A //,'/ C 0 •c ",'.,":... I r" ",~,,,, f i i O uAVU,dUUUUU c 5 ' - - --AUAAUAAAUAAC G " ~GUAAUCACACGUG i I 189

I 180

I 160

v

B

A A U C G-.,, AA-11o G G

130

Iv

In

123

FIG. 5. Pseudoknots in the 3'-terminal noncoding region of TMV RNA, located upstream of the tRNA-like structure. (A) Conventional secondary structure presentation. The dashed lines represent the pseudoknotting. Roman numerals (I-VI) indicate the six stem segments that can be stacked into one quasi-continuous double helix. Numbering is from the 3' terminus of TMV RNA. (B) Probing of adenosine residues with DEPC under denaturing (lane 1), native (lane 2), and semidenaturing (lane 3) conditions. The numbering of adenosine residues corresponds to that given in A. Note the strong accessibility of A-144 in lane 2.

w h e r e a s nuclease S1 is inactive a b o v e 60 °, needs Zn 2÷, and has a p H o p t i m u m in the acidic range. It is n o t e w o r t h y that the two connecting loops, L~ and L2, in the H - t y p e p s e u d o k n o t , in contrast to w h a t one expects, usually are not v e r y sensitive to the single-stranded-specific nuclease S1 and R N a s e T1. This is m o r e r e m a r k a b l e for the loop spanning the shallow groove. Possibly the protruding b a s e residues and the strong steric constraints m a k e this loop less e a s y to handle b y the nucleases. As such, this p r o p e r t y m a y serve as

300

CHARACTERIZATIONOF RNAs

[21]

another characteristic to distinguish loops in pseudoknots from other more orthodox loops.

Phylogenetic Sequence Comparisons Probably the most powerful approach for establishing the secondary structure of R N A is the approach based on phylogenetic sequence comparisons.3° Double-stranded regions arc considered proved, when an equivalent pairing is maintained in related homologous R N A s despite differences in nucleotidc sequence. To date, the most reliable secondary structures arc largely based on this principle. A detailed discussion of this method is given elsewhere in this volume. 31 The demonstration of so-called covariations or compensatory base changes is also of utmost importance for proposing pscudoknottcd structures. Pscudoknots differcsscntiaUy, however, from other structuralelements in the sense that pseudoknots can be considered proved only when covariations arc found in at least two stem regions ($I and $2 of Fig. 2). The criteriafor proof or disproof of a helical segment in a pscudoknot do not differfrom those of any other putative helix. The rule adopted to date is that at least two independent covariations are needed per stem region.31 This means that four or more covariations have to bc found in a pscudoknottcd structure. As outlined previously, sequence comparisons in combination with computer-aided predictions of secondary structure can bc quite successful in detecting pscudoknots and can bc sufficient,provided that enough related sequences are available. A number of pscudoknots were thus revealed in the noncoding regions of plant viral RNAs. 4,22The evidence for a proposed pscudoknot can be reinforced when the complementarity is not preserved only among various strains of a particularviralR N A , but also among duplications of a pseudoknot in one R N A chain. This was found to be the case in the 5'-noncoding region of foot and mouth disease virus ( F M D V ) R N A . 32 Predicting Pseudoknots One of the major aims in molecular biology is to predict a functionally meaningful three-dimensional structure of a protein or nucleic acid on the basis of sequence data alone. Especially for R N A , the fulfillmentof this 30 G. E. Fox and C. R. Woese, Nature (London) 256, 505 (1975). 3t B. D. James, G. J. Olsen, and N. R. Pace, this volume [18]. 32 B. E. Clarke, A. L. Brown, K. M. Currey, S. E. Newton, D. J. Rowlands, and A. R. Carroll, Nucleic Acids Res. 15, 7067 (1987).

[2 1]

RNA PSEUDOKNOTS

301

aim is yet to be realized. Pseudoknots have been ignored or not allowed thus far in all attempts to deduce a R N A structure from a sequence, for reasons of inherent complexity of the algorithms to be developed or because naturally occurring examples were not known at the time H:3 (see elsewhere in this volumea3). Based on our findings on some plant viral RNAs, an attempt was made by Abrahams e t al. 34 to develop a computer program that considers pseudoknotting as a possible step during the folding of a RNA chain. A summary of the principles and the merits of this program follows. The algorithm, developed by Abrahams e t al., turns out to be basically the same as the one reported by Martinez 35 in that the algorithm simulates the R N A folding process itself. It is postulated that this folding is equivalent to adding one stem at a time to a growing structure. Abrahams and co-workers assumed that each next stem to be added is the one which contributes most to lowering the free energy and which is compatible with the intermediate structure. For the calculation of the AG of the stem regions to be added to the intermediate structure, stacking interactions with the structure are taken into account. This procedure leads to a series of intermediate structures that are well defined, stable, and not subject to rearrangements. An essential feature of this algorithm is that the formation of a pseudoknotted structure is allowed on the addition of the next doublehelical stem region. In a pseudoknot of type H (see Fig. 2), the two segments are stacked, and this energy gain is taken into account by the program. The structure thus obtained does not necessarily have the global energy minimum as is found by the algorithm developed by Zuker and Stiegler.11,33 More specifically the following three steps can be discerned in the algorithm. 1. List all possible stems in a given sequence. 2. Incorporate that stem in the structure which adds most to the stability of the structure. 3. Update the list of remaining stems, dependent on the new intermediate structure formed, by trimming incompatible stems and by calculating the new AG values. These steps are repeated until no further energy gain is possible. The calculation of the free energies associated with the stem regions uses currently available thermodynamic parameters for base-pair stacking and 33 M. Zuker, this volume [20]. 34 j. p. Abrahams, E. van Batenburg, and C. W. A. Pleij, to be published. 35 H. M. Martinez, Nucleic Acids Res. 12, 323 (1984).

302

CHARACTERIZATION OF R N A s

[21]

loop formation. An important exception was made for large loops. For instance, energy values for hairpin loops containing more than six nucleotides were not derived from the Jacobson-Stockmayer equation (see Ref. 24), but rather were derived from a logarithmic extrapolation of the experimental values for small loop sizes. The consequence of this calculation is an extra penalty for long-range interactions. A fundamental problem in such an algorithm is to introduce energy parameters for each structural motif that is created by pseudoknotting. Until now, no experimental values have been published whatsoever. A first attempt to collect data for the pseudoknot of the H type was published recently. 36 Based on the experimental work on the pseudoknotted structures in the noncoding regions of some plant viral RNAs, such as TMV RNA, 2 it is estimated that the loop energies for both L~ and L2 in the H-type pseudoknot (see Fig. 2) will be of the order of 3-5 kcal/mol. The current value used is set to 4.1 kcal/mol. This seems to be reasonable, when these so-called connecting loops are compared to the unstacked regions in normal hairpin loops. Apart from possible modifications, the seven-membered anticodon loop of tRNA should have a AG of 4.5 kcal/mol. 24 In this loop, five of the seven bases are stacked, and the two unstacked ones are more or less equivalent to LI in the H-type pseudoknot (Fig. 2). Also, the findings about the minimal size of L~ and L2, as outlined previously, were introduced in the program. Some types of more complicated pseudoknotted structural elements were ruled out by assigning them high, positive AG values. The RNA structurepredicting program of Abrahams and co-workers, written in APL, was adapted for running on an IBM or IBM-compatible personal computer and is able to fold a sequence of 1000 nucleotides overnight. This program was tested on a large number of different RNAs, some of which contained pseudoknots. A better score was obtained for the prediction of the cloverleaf of tRNAs, as compared to the widely used program of Zuker and Stiegler. 11 The current version of the program of Abrahams and co-workers is able to predict the pseudoknot in the tRNA-like structure of TYMV RNA (Fig. 3) and finds five out of the six stems of the three consecutive pseudoknots in the 3'-noncoding region of TMV RNA (Fig. 5a). The program is therefore reasonably successful in predicting pseudoknots of type H. The program also revealed some other new pseudoknot structures both in plant viral and animal viral RNAs, which could be confirmed by sequence comparisons. 22,32However, occasionally loop-loop interactions are predicted which appear sterically less likely. It is clear that the current program needs further improvement on this and 36 j. D. Puglisi, J. R. Wyatt, and I. Tinoco, Jr., Nature (London) 331, 283 (1988).

[21]

RNA PSEUDOKNOTS

303

other points, which, in turn, is dependent on new experimental data or on new pseudoknots found in natural RNAs. Concluding Remarks T h e r e is little doubt that pseudoknots are important structural elements in R N A that can play a decisive role in the higher order folding. Pseudoknots o c c u r in a variety of natural RNAs, such as ribosomal RNAfl m R N A s , 6,8 ribozymes, w and plant viral RNAs. 4 The main techniques, which give evidence for their existence, are chemical modification, enzymatic digestion, computer-aided prediction, and phylogenetic sequence comparisons. Of these techniques, the search for covariations in the double-helical stem regions of the pseudoknot is probably the most powerful m e t h o d to date. Reliable proposals can usually be made only when a combination o f these methods is used, as is also true for most other elements in a R N A (secondary) structure. A serious drawback in the analysis of R N A pseudoknots is the almost complete absence of structural data from biophysical techniques, such as nuclear magnetic resonance (NMR) or X-ray diffraction. These methods not only may prove pseudoknotting itself, but may also answer the important questions about the stacking of the constitutive double-helical segments or about the conformation of the connecting loops. These techniques also may yield the t h e r m o d y n a m i c parameters that are required for a successful prediction of pseudoknots in a given R N A sequence. Acknowledgments We thank Krijn Rietveld, Alex van Belkum, and Paul Verlaan of our laboratory for their valuable contributions to the study of RNA pseudoknots. The pseudoknot-predicting program was developed in collaboration with Eke van Batenburg of the Department of Theoretical Biology, Leiden University, and Jan Pieter Abrahams of the Department of Biochemistry, Leiden University. We gratefully acknowledge the collaboration with Richard Giege and Dino Moras of the Institut de Biologie Moleculaire et Cellulaire du Centre National de la Recherche Scientifique (CNRS), Strasbourg, France.

37B. D. James, G. J. Olsen, J. Liu, and N. R. Pace, Cell (Cambridge, Mass.) 52, 19 (1988).