]10
Biochimica et Biophysica Acta, 1050 (1990) 110-118
Elsevier BBAEXP 92153
Secondary structure at the 3' terminal region of RNA coliphages: comparison with tRNA Malti R. Adhin, Jacqueline Alblas and Jan van Duin Department of Biochemistry, University of Leiden, GorlaeusLaboratories, Leiden (The Netherlands)
(Received 16 May 1990)
Key words: Phage RNA; Helix stability; tRNA comparison
Secondary structure models for the 3' non-coding region of the four groups of coUphage RNA are proposed based on comparative sequence analysis and on previously published data on the sensitivity of nucleotides in MS2 RNA to chemical modification and enzymes. We report the following observations. (1) In contrast to the coding regions, the structure at the 3' terminus is characterized by stable regular helices. We note the occurrence of the loop seqmmces 5'-GUUCGC and 5'-CGAAAG, that are reported to confer exceptional stability to stem structures. These features are probably present to promote the segregation of mother and daughter strands during replication. (2) Comparison of homologous helices indicates that only those base pair substitutions are allowed that maintain the thermodynamic stability. (3) We have compared the structure of phage RNA with tRNA. Overall similarity is low, but one common element may exist. It is a quasi-continuous helix of 12 basepairs that could he the equivalent of the 12 basepair long coaxially stacked helix, formed by the T ~ C arm and the aminoacyl acceptor arm in tRNA. As in tRNA, this structure element starts after the fourth nucleotide from the 3' end. (4) Phage RNA contains a large variable region of about 35 nucleotides bulging out from the quasi-continuous helix. We speculate that the variable loop in present-day tRNA could he the remnant of the variable region found in phage RNA. The variable region contains overlapping binding sites for the replicase enzyme and the maturation protein. This common binding site may serve as a switch from replication to packaging.
Introduction The single-stranded RNA phages have been classified into four distinct groups by several biophysical, serological and genetic criteria [1]. The genetic map of a representative of each group is presented in Fig. 1 and shows the same genetic organization for groups I and II (group A) on one hand and for groups III and IV (group B) on the other. Infection of male Escherichia coli bacteria occurs via the F-pili and replication proceeds by formation of a free ( - ) strand intermediate, which in turn produces progeny ( + ) strands. The replicase responsible for these copying reactions has been studied best in phage Qa (group III) and consists of four subunits, of which only subunit II, an RNA-dependent RNA polymerase, is
Abbreviation: EF, elongation factor. Correspondence: J. v. Duin, Department of Biochemistry,University of Leiden, Gorlaeus Laboratories, P.O. Box 9502, 2300 RA Leiden, The Netherlands
encoded by the phage. Subunit I is ribosomal protein S1, which in the uninfected host is required for messenger RNA binding. Subunits III and IV are the protein synthesis elongation factors (EF)-Tu and -Ts, respectively. Subunit I is necessary for replication of the ( + ) but not of the ( - ) strand. The fifth component of the replicase complex is called the host factor (HF) and also is only required to copy the ( + ) strand. The role of H F in the uninfected cell is unknown. More detailed information can be found in a recent review [2]. Interest in the R N A phages has recently been revived by speculations of Weiner and Maizels [3] that presentday tRNA may be derived from ancient genornic RNA, where its presence initially served to earmark R N A species for replication. Several plant viral RNAs indeed have a tRNA-like structure at their 3' end [4,5]. In vitro studies show that these RNAs can even be charged with an amino acid and their C C A o n termini can be exchanged by nucleotidyl transferase [5]. The conservation of a CCAor t terminus and the presence of EF-Tu and EF-Ts in the repliease complex suggest that bacteriophage R N A may also harbour some tRNA-like structure at its 3' end. Indeed, some evidence
016%4781/90/$03.50 © 1990 ElsevierScience Publishers B.V. (BiomedicalDivision)
111 ~
7
I
',I
Lln
I
I
I
t/I
ol
t~
tD
I
I
I
I I c°a"°ll I r'p''ca" ,oi I
I
N
~
I'I group A
~g I
group/I(GA)
I
I [ rnaturati°n(A)('l)
I
I
I
I Jcoot(0)
I
I
] [replicase(.,l, ! lysis(-l]
I
I
I
I
t~ Oral
groupm
oBI
I I
J
I
', Imaturation-,,sistA21[ Jcoat
OI ~
I I
I
=lrep"ca"
I
!
II
[ read-through (A 1) [
i
I
I'
I
group B I
group l"V'(Sp)
= Jmoturotion (A 2)
Jcoat
J
[ [reod-thr°uqhlA'll
!
, Jreplicose
Fig. 1. Genetic map of groups I-IV RNA coliphages. Sequences are taken from Refs. 12, 14, 15, 16.
for the exchange of the C C A o H terminus by tRNA nucleotidyl transferase has been presented for fragmented phage RNA [6]. On the other hand, amino acid charging could not be demonstrated [6]. Here, we propose a consensus secondary structure for the 3' terminal regions of RNA coliphages. The model is primarily based on phylogenetic sequence comparison using the sequences of the phages shown in Fig. 1. Furthermore, we recently determined the 3' terminal sequence of group I phage fr [7], and this information has been indispensable in deducing the folding of group I RNA and in setting up the alignment with group II. Our proposal also incorporates data concerning the sensitivity of MS2 RNA to chemical modification [8,9] and enzymatic digestion [10]. In the past, several models for the secondary structure of the 3' end of RNA phages have been put forward [8,10-13], but comparative analysis could not be applied and these models therefore show inconsistencies among themselves, although some structural elements have been deduced correctly. The phylogenetic approach provides an improved model with a high probability of being correct. Nevertheless, some of the details as well as the nature of possible tertiary interactions still remain to be solved. Results
Structure of the 3' extracistronic region in group A An important contribution to the deduction of the secondary structure of the 3' end is provided by the
recently elucidated sequence of the group I phage fr [7]. Its alignment with MS2 is straightforward and the pattern of base substitutions and deletions suggested a probable secondary structure that was compared to a provisional structure for GA (group II). These structures were used to improve the preliminary group Igroup II sequence alignment which, in turn, served to refine the secondary structure further. This stepwise process is not repeated here but, instead, we directly present the deduced sequence match for the group A phages (Fig. 2a). The alignment confirms that MS2 and fr are close relatives, whereas GA is more distant, as witnessed by the large deletion. The region indicated as V is highly variable among groups I and II and alignment has not been attempted. Relative positioning of bases in this region is therefore arbitrary. The secondary structure model compatible with this alignment and with the chemical and enzymatic sensitivity data is shown in Fig. 3 for the group A phages MS2, fr and GA, respectively. To facilitate the discussion of the various helices, they have been numbered. The occurrence of basepair conservative substitutions (covariations) and basepair deletions is considered as evidence for the existence of a stem structure. In addition, when deletions or insertions of one or a few nucleotides occur in loop regions this is taken as support for the existence of a hairpin. Throughout Fig. 3 covariations occurring within group A are boxed by a solid line and basepalr insertions are marked by a dashed box. Nucleotide insertions are indicated by a dashed circle. Helix 1. Helix 1 of group A RNA contains two
112 (o)
f~
< 4 > ( 3 CUCUCCUCAGUAGCAAAACUGAGGG.ACCCCCGUAAGCGGGGUGGGUG IIIIIIII
IIIII
,,~,~,,,
II
,,,,~
IIIII
,~
,,,,,
llllllllll
CUCUCCUCGGUAGC.UGACCGAGGG.ACCCCCGUAAACGGGGUGGGUG
GA
CUCUCCU.GAUAGU...AUC.AGGACCUCCCCGU..AUGGGGUGGGUG
3434
< 2 > ( 6 ) ( UGCUCGAAAGAGCACGGGUCCGCGAAAGCGGUGG.CUCAUCAGAAAUG
I I III
3431
IIIIII I IIIIIIIIII
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
II II
3432
UGCUCGAAAGAGCACGGGUCCGCGAAAGCGGUGGCUCCACC.GAAA.G
3380
UGACCGAAAGGCCACUAUG
3481
GUGGGUGAGGGUUCGCCCUCUAGGACUGGCCCCGAAAGGUCAGGCCCG
II
3433
lllllllllll
MS2
IIIIIII I III
><--5--->
3379 7--
I I
3480
II Ill:
I 3477
III"I,, III >
<
8
............................. )
(
IIIII I II ,,"II II I IIII
3398 Y --
IIII IIII
3528
,' '"',,,,
3478
GUGGGCG.GGCUUCGGCC.CAGGGACCUCCCCCUAAAGAGAGGACCCG
3523
3399
......................
3418
3529
GGAUUCUCCCGAUUUUGGUAGCUAGUU.GCUUGGCUAGCUACCACCCA
3575
3524
GGAUUCUCCCGA.UUUGGUAACUAGCU.GCUUGGCUAGUUACCACCCA
IIIII]]"III,, II"III,, IIII I I]II]I]III III"IIII,, ] III IIIII I IIIIIIIIIII III]IIII
3569
3419
CCAAAAGGCGGUUCUCGGUGACUAGUUUGCUUGGCUAGUCACCACCCA
3466
GAGGUGAACCCUCCC
...... ACCGC
(b) (
Q~ SP
3
UACGGGGUCUUCCAGGGCACGAAGGUUGCGUCUCUACACGAGGCGUAACCUGGGAG llIlll I li II I II III II I I Jl]lll i II II I II III II I I UACGGGAAUUCCCGAGGUGAAGCUCGCAAG.CUAGGCACUAG.CUUGUGAUGGGAA
)/,----5
4129 llIII IIIII
4193
( 2 >( V GG.CGCCAAUAUGGCG.CCUAAUUGUGAAUAAAUUAUCACAAUUACUCUUACGAGU II I II II II I II II GGGUGGUCUCUGACCGCCCGAGAGGAGAAAGAAA ...... GGAAACUCCCCUCCGC
4184
1 ) ~ 5---~ GAGAOGGGGAUCUGCUUUGCCCUCUCUCCUCCCA
4243
OAGGGUGGGCUCUGCUUUGCCCACUCUCCUCCCA
4130
ill I "],, ,,"llillll]l] ]i]]li"ili,,
4192 >
4183 4242
4217 4276
Fi& 2. Sequence a l i ~ e n ~ of the 3' n o n ~ l a t e d re~ons in RNA coliphag~. (a) Groups I and II. (b) Groups Ill and IV. The ~ sequence is presented in the precedin8 paper [71. The numbers between the arrows above the sequence correspond to the helices shown in Figs. 3 and 4. A ~ t of the variable (V) region is not attempted and the relative ~ i f i o m n g of bases is arbitrm~. Hon~logy hyphens have been o~fitted in these re~ons.
covariations and the pattern of T1 cleavage is consistent with the structure proposal. Support is also provided by the fact that the insertion of a single U residue in GA occurs in the loop of this helix. Further evidence is derived from comparison with group B RNA where a similar hairpin is present (see below). It is not clear from the group A sequences where exactly this helix ends. As shown in Fig. 3, it could be extended by two basepairs in group I and by one pair in group II. Since
these extensions are not possible in group B (Fig. 4) we have chosen the 10 basepair stem that can be formed in all groups (see however, Discussion). Helix 2. Helix 2 is based on three covariations occurring between group I and group IL In this comparison we have taken into account the partially known sequence of group II phage KU1 [13]. The group A sequences allow an extension of helix 2 and our choice for the short version is based on comparison with the
Fig. 3. Secondary structure models for the 3' nomaxling regions of group A RNA phages. Basepair conservative substitutions are indicated by solid boxes, hasepair insertions by dashed boxes. Nucleotide insertions are encircled by a dashed line. The termination codon of the replicase gene is indicated by a solid line. (a) MS2 RNA. Solid and open arrows indicate RNase T1 and RNase A cleavage sites, respectively [10]. The number of feathers is proportional to the intensity of cutting. C residues marked by an asterisk are sensitive to methoxyamine [9]. Methoxyamine data were obtained by treatment of a T1 fragment consisting of residues 3483-3550. This fragment contains the 5' half of helix 1 and the two modifications found in this half helix are evidently irrelevant and have been omitted from this figure. Encircled residues are substitutions found in azure mutants or their revertants after treatment of phai~ RNA with nitrous acid [8]. It should he pointed out that only substitutions that yield viable phages were recovered. The absence of substitutions at other loop positions, therefore, does not conflict with our model. The dashed line in the V region indicates the sequence bound to the maturation protein. The model presented here is similar to the one su88ested in Ref. 10, but quite different from later proposals [8,12]. (b) fr RNA. (c) GA RNA. For helix 2 the sequence and structure of group II phage KU1 is presented.
113
®
®-~c ~AAG
u
.__.o • u G C C -- G
o
,_,.u-,,.
u
~ lU-AI
I'°Zl
----.,
® ----
, - - ' --. "U
; o oo~ ~ ~
"Y - - - ~-...-,.-~A G ~" A A A%F~AI°,/ ~ ."-=
C U r .
z I I I ' ~|,,|c|C '-, u C G R G A C U~LV'J'~
~ at"
~" / / ' t G
i® .#r,,..,.,m,~C -- G ~ N U U
xo M ~-.i- i-rli.,
u~
-o " C C ClGI
J.~
A _
-
C~
C ,.,
i l illl
t!)
" A C c _ ~_ U G I . O- ~Ou~- G A ~O G . U ; ~ C - G uo~:c',, f ~ J . O , , c. ° ° 6 ~ - -u
cuc
C-u C-G C-G G-C U
./':~--~,~
~ ~ ~ C~ A
G
A
AA
®~©
~C#i~
u
C-G /C-G~ G-C C-G C-G A-U G A C - §,,_~ A A C-G
®
~ @
®
o
A
®
(b}
®
;%
c®
G.U G'U
(~) AA
o ,- U A
[]
CUC
C-G C-G U'G
o
o
®H
G-C
C UC
C
JA
HoACCC~']CC - GGuU~',A : AC C
U
GUGGGI~[~GGGUc
U CIIJ'~CcF~ G
~llh Illl
G UGG
C:G
~
C:G
C-G C-G G-C
~ ~
G-C C-G
~-~
G
C-G G A
(~
G
,
CGG '" ' ~ CG G G A C
C-G
uG~JGGIGIu
CCCc
c® U
u- O A
A A
C-G A.-U U • G C-~ ~ - :.." n
(~)
AA
® (c)
0
A
G
U
G'U C-G
AA C
U-A A-U
A CC A "G -
C-G
AA A AG - CC
~
HoACCC
G- C GCUCuu - A
cucU_Accu * U GGG~G,.~m..,C U A U GG . C' or ~-u LU~J .~-~; C-G ~ OC-G A C o-C C-G ~ ] U C O U G~ ~ UA G A AA
®
O
A #'
A AA
®
CUAUG- C G -C ~, C G" U G_C U C O C
®
G
114
(a}
0
C-G C-G C-6
HoAC CC~-TC
U
0
Fd I~]1 ,u-AJl
c-G
u-Al u-A~l
6-c
!U. Gi
A
in VK
a )A--vA'N
W.
u-A
C)
AcA
® ct, I
®
[] i~)
C-G C-° C-G C-G U'G C-6 U-A
HoACCCj[]-ICC- GCGC
G-C
A-U
~
~
ffr='Ol
~
G-C
,." c.~
G-C C-G
IU-AI ~
~, },
r ..--,.~.
C
0
.~
P
txc~ I,, ~ ~"
A C
(~)
AA
G-C G-C
G
g--o C A C-G
CA::
U
~
u
U
Fig. 4. Secondary structure modds for the 3' noncoding regions of group B phages. For the meaning of symbols see legend to Fig.3. (a) Qp RNA. The sofid line in the V region indicates the T1 fragment bound to proteins S1 and host factor in a filter binding assay following RNase treatment of the protein-RNA complex. On the fight site of the V region is a partial VK sequence (also group liD. The structure proposed is, except for interaction 5, identical to that published by Senear and Steitz [lll. (b) SP RNA. The sequence and possible structure for helix 2 are given for phage TW28 (group IV).
corresponding structure in group B (see Fig. 4). The absence of RNase T1 and RNase A cuts in this region provides some support for the model. Helix 3. Inspection of the alignment of groups I and II shows that helix 3 has undergone a two nucleotide (AA) insertion in the loop region of group I. In effect, this insertion extends this helix by one basepair and it can thus be viewed as a basepair insertion. Furthermore, an A in the loop of MS2 RNA was found substituted by a G in a phage mutant obtained by mutagenesis of the R N A in vitro with nitrous acid. Basically, this reflects a chemical modification which is known to occur preferentially in single-stranded regions. Although the nitrous acid treatment may have changed more single-stranded A or C residues, only those changes that lead to viable mutants were recorded in these experiments [8]. Thus, the absence of modification for most loop nucleotides or other single-stranded residues does not argue against our proposal. The enzymatic cuts in the loop also support this helix. Helix 4. This helix contains the termination codon that marks the end of the replicase gene. The loop is sensitive to T1 RNase and to nitrous acid. Between MS2 and fr one covariation is apparent (3rd pair from the top) and fr contains an additional nucleotide in the loop. Further support for this helix is derived from comparison with GA; an additional covariation between groups I and II shows up at the second pair from the top and a C- G basepair is inserted in the middle of the stem in group I. We also note the insertion of two and three nucleotides, respectively, in the loops of MS2 and fr as compared to G A (not indicated in Fig. 3). Helices 6, 7 and 8. This whole region is deleted in GA and we can only rely on comparison between MS2 and fr and on the modification data. Helix 7 is confirmed by a basepair insertion in fr at the top of the stem. Furthermore, the T1 sensitivity data and the A--, G substitution in the loop obtained by modification with nitrous acid support the proposal. Helix 8 carries a basepair insertion in fr and a covariation at the top of the stem. Helix 6 is only supported by a chemically induced substitution in the loop. Circumstantial evidence for the probability of this structure is the fact that the presence of the now established neighbouring helices 2 and 7 do not leave any realistic alternative to fold this region. Besides, similar to helices 2 and 7, helix 6 contains the loop motif CGAAAG, reported to confer exceptionally high stability to a stem region (Uhlenbeck, O., personal communication and Ref. 17). The variable region ( V region). This helix has been drawn in a conspicuous bent manner to set it apart from the other elements. In fact, the V region is proposed to consist of two stems that may be stacked on to each other. The right side (5') helix contains three consecutive covariations between MS2 and fr. Further-
115 more, modification studies on the T1 fragment 34333550 of MS2 RNA showed that the C residues marked by an asterisk are reactive with methoxyamine [9]. The C residue in the loop of the left side (3') helix was reactive with both methoxyamine and nitrous acid. Also the enzyme cleavage sites are consistent with our proposal. A somewhat similar structure can be drawn for the corresponding R N A sequence in GA. In fact, two slightly different foldings are possible (Fig. 3c). A meaningful alignment for this region between groups I and II has not been achieved because of large sequence variability. Our proposal for GA therefore lacks phylogenetic support. Nevertheless, there is circumstantial evidence that we have correctly identified the GA equivalent of the V region. Firstly, the substitutions in the V region of group II phage KU1 all occur in the proposed loops. Furthermore, one finds non-canonical basepairs (G-A, C. A; A. A) in this region, which are virtually absent in the other helices. Thirdly, pyrimidines and purines display a tendency to cluster in the V region. Another reason is the presence of loops containing more than five nucleotides, which are absent from the other 3' terminal helices in group A. Finally, we note that in GA also it is possible to stack the two helices coaxially. Structure of the 3' extracistronic region in group B In Fig. 2b we have aligned the Qa and SP sequences according to the principle outlined above. Homology between Qa and SP is low outside the 35 terminal nucleotides. As in the other phages, the V region cannot be properly matched. Alignment of regions 2 and 3 is, to a certain degree, based on the potential to form identical stem-loop structures. The deduced secondary structures are displayed in Fig. 4a and b. Helix 1. As in group A, we notice the presence of a helix starting after the 6th nucleotide from the 3' end. There is only one covariation between Qa and SP, but it is likely that we are dealing with the counterpart of helix 1 in group A, since it occurs at the same position from the 3' end and harbours the same conserved GCUU sequence in the loop. We have already noted in group A that the number of nucleotides in the loop of helix 1 is flexible (GA has an extra U). This property also appears in group B where four nucleotides have been inserted with respect to group I. Helix 1 of the group A phages has the potential to expand at the bottom with an extra G . C pair. This is not true for group B. Assuming that all phages have similar secondary structures at their 3' ends, helix 1 of the group A phages is drawn with ten basepairs to fit the group B size (Fig. 3a,b,c). Helix 2. This helix shows three consecutive covariations between Qa and SP, next to a basepair insertion in SP. The published partial sequence of group IV phage TW28 [13] provides further phylogenetic proof for helix
2 by yielding an additional covariation, a basepair deletion compared to SP and an insertion in the loop. The existence of this helix, therefore, seems well established. Its correspondence to helix 2 in group A is based on its position relative to both interaction 5 (see below) and the variable region. Helix 3. In group B this helix seems present as a stem of 11 basepairs containing the termination codon of the replicase gene (indicated by a bar). Sequence similarity between SP and Qa is very low in this region and the alignment is in part based on the proposed secondary structure. The thermodynamic stability of these hairpins makes their existence likely. Support is also provided by the fact that helix 3 is found at the same position vis vis interaction 5 in both phages and this forms the basis for considering helices 3 in groups A and B as corresponding structures. The variable region ( V region). The number of nucleotides between the conserved helices 1 and 2 in both Qa and SP is small as and so is the number of possibilities to form local RNA secondary structure. Our proposal for Qa (Fig. 4a) is supported by one covariation with group III phage VK [13]. In addition, Qa RNA has the potential to form two coaxially stacked helices and contains a large A-rich loop analogous to that found in group A. Phylogenetic support for our SP model of the V region is not available and slightly different alternative foldings are possible by shifting the long purine and pyrimidine tracts with respect to each other. The V region in SP contains an unusual C- A base pair, a large loop rich in A residues and purine and pyrimidine dusters. These particular features common to all V regions, together with their location immediately 5' of helix 1 suggest that they represent homologous structures serving the same purpose in all four groups. (In the discussion we will argue that the variable regions form part of the binding sites for the maturation protein and the replicase). Interaction 5 in groups A and B The secondary structure models for groups A and B U leave an unpaired stretch with the sequence U G G G ~ between helices 2 and 3. The sequence similarity and the positioning within the model suggests that we are dealing with an homologous element. Basepairing of this element with the 3' terminus should be thermodynamically favoured and is supported by the boxed covariation existing between group A and B. This interaction 5 is adopted by two models [12,13], but two points suggest that we should consider alternatives for this region. One is the occurrence of rather strong enzymatic cleavages in interaction 5 (Fig. 3a), the other point is that at least some nucleotides at the 3' terminus are expected to be unpaired to allow the start of the copy-
116 (~1
(o}
O A
A
AC
G'U C-G U-A
U
A C A C A "G
A-U
HoAcc c
(~)
[r~--
ACC~-I -
G-C C-G C-G C-G
(9 ~.~
®
UG
®
G-C A C ~ C-O O A AA
C-0
(~) U.G C He CCCuccp] C-GAGcG s'C
UOOO~A
~- C3 ",
O-C
G
GI oCUC
_o uooou o _
C-G C-O c-o O U UA
(S)
¢,'.. C,,
O-C ~ U.G
G - CA
"
'~-
UuG- C
0-c C-G
o
O -C G-C u c OA A
O-C 0 A C
B
: C
I,, I',
(~)
®
Fig. 5. Alternative structure proposal for the 3' non-coding regions of phage RNA. (a) GA (group A) and (b) SP (group B). The difference with the model proposed in Figs. 3 and 4 is a three nucleotide shift in the basepairing scheme at interaction 5.
ing reaction. Accordingly, we have designed an altemafive for interaction 5; the two lower basepairs of helix 1 are dissolved and the slack created is used to shift the 3' terminus three nucleotides to the left with respect to the UGGG A sequence. The resulting structures (Fig. 5) are shown only for GA and SP as they are the same for MS2 and Q#, respectively. This folding fits the enzymatic cuts better, while the covariation reappears. In addition, this alternative shows more similarity with tRNA because of the four protruding terminal nucleotides (see Discussion). Discussion
Regular and stable helices are a hallmark of the 3' non-translated region A noteworthy feature of the folding in the 3' extracistronic region of bacteriophage RNA is that apart from the V region all helices are regular. They lack bulges or internal loops and contain as a rule only Watson-Crick pairs. This pattern is also observed in all 6S RNA templates [18-20], but contrasts strongly with what is found in the coding regions of phage RNA [12,21], where uninterrupted helices are virtually absent. A similar distinction in helix shape has also been observed for the coding and noncoding regions in plant viral RNA (Pley, C., personal communication). The preference for stable structures is further illustrated by helix 8 of MS2 and fr carrying the loop sequence 5'-CUUCGG, which was shown to confer unexpected stability [22]. Group A further contains the helices 2, 6 and 7 in which the loop sequence is 5'-CGAAAG.
There is evidence that also this feature bestows exceptionally high thermodynamic stability to a stem region in RNA and DNA (Uhlenbeck, O., Hilbers, C.W. personal communication, and Ref. 17). One reason for their presence could be that helices with high stability facilitate segregation of mother and daughter strands [23] and this may be particularly important when nascent chains are still short. The absence of such regular and therefore stable helices from the coding regions suggests that translation is hampered by secondary structure, but there is no experimental evidence for this idea. Maintenance of helix stability We draw attention to the fact that base substitutions between homologous helices appear constrained in the sense that the stability of a helix must be maintained. As it is not generally known how the identity of loop nucleotides contributes to the stability we can only illustrate this point using helices with identical loops. Helix 1 in MS2 and fr has two different pairs (2nd and 7th from the top), but the stability is virtually the same ( - 12.6 and - 12.2 kcal/mol, respectively). In helix 7, fr has an extra basepair with respect to MS2 but this is neutralized by several C. G ~ U . G changes elsewhere in the stem. The calculated stabilities of helix 7 including terminal stacks and using the parameters given in Ref. 24 are -5.1 and - 4 . 8 kcal/mol for MS2 and fr, respectively. Helix 8 also has an extra basepair in fr, but the expected higher stability is compensated for by smaller stacking energies resulting in an energy difference of only 0.2 kcal/mol between the two helices. Helix 2 does not seem to fit this pattern and we suspect that the destabilization by the A- C pair in phage GA is
117 individual helices. This is 5 and 7 basepairs, respectively, for the T g ' C and aminoacceptor arms, whereas the tentative homologues in phage R N A are, respectively, 8 and 4 basepairs long. When EF-Tu is bound to aminoacyl-tRNA it strongly protects the four unpaired bases as well as the combined aminoacyl acceptor and T g ' C stems from enzymatic and chemical attack [26]. Phage R N A could thus potentially form a structure recognizable by EF-Tu. However, caution should be exercised, since the binding of EF-Tu to tRNA containing a free terminal adenosine is weak and efforts to crosslink the elongation factor to Q# RNA in an initiation complex have consistently failed [27]. It is not possible to find the equivalents in phage RNA of the dihydrouridine or the anticodon arm. On the other hand, it is an interesting possibility that the variable region of phage R N A and the variable loop of tRNA may be homologous structures. They share equivalent positions 5'- to the T ' / ' C ann and in evolutionary terms the variable loop could be the remnant of a structure that once had a defined meaning (see below).
overestimated. Evolutionary conservation of hairpin stability has previously been observed by comparing homologous helices in coding regions [21]. Also in that study, a C . A pair seemed to destabilize a helix less than a mismatch did. Our observation concurs with a report that a C . A mismatch may not destabilize a helical region much more than does a U . G pair [25].
Comparison with tRNA By comparing the two structure proposals of Fig. 5, a common folding pattern can be derived, which is shown in Fig. 6a. The most striking element is the eight basepair long helix 1 carrying the conserved GCUU sequence in the loop. Helices 2 and 3 are of variable length and the potential to form interaction 5 is present in all phages, albeit including the non-Watson-Crick pairs G . U and G-A. The variable region as drawn in Fig. 6a refers, strictly speaking, only to groups I1, III and IV. In group I, the-extra helices 6, 7 and 8 are present which do not belong to the variable region. In Fig. 6d we display the consensus structure for tRNA in the cloverleaf version and Fig. 6c shows the coaxially stacked T~/'C and aminoacyl acceptor arms. Comparison with the generalized phage RNA structure suggests that helix 1 corresponds to the T ~ C arm and that interaction 5 is the equivalent of the aminoacyl acceptor stem. In tRNA, these combined arms are always 12 basepairs long and leave the four 3' terminal nucleotides XCCAoH protruding from the stem. This feature can also be observed in phage RNA. In Fig. 6b helix 1 and interaction 5 are redrawn in a coaxially stacked fashion. The combined stem is 12 basepairs long and leaves the four terminal bases unpaired. The difference with tRNA is the number and kind of nucleotides in the loop of helix 1 and the length of the
The variable (V) region is the binding site for the A protein and the replicase We have seen that all phages possess a highly variable sequence of about 35 nucleotides, immediately upstream of helix 1. Experimental data implicate the V region of MS2 RNA in A protein binding. This protein is present at one copy per virion and is necessary for proper packaging of the R N A and for infection. Upon contact with the pilus, the A protein is cleaved and the RNA is released from the virion. In A protein nonsense mutants, the 3' end of phage R N A dangles out from the capsid and is degraded rapidly by RNases. Binary complexes between phage R N A and the A
®
o)
% _ ; ))
~o) (o) G-o
o--o
o--o
C-G
o-o
o~o
o-o
o -o
var
= UGGG6G
ab,,
r.gio,,)
A~ Oo-o--.-.,.~ ++'+" -C --~
0~0
0--0
o -- o
(o®o:,
.o-o.,
-~
I
~_~
j
~ -
oo i
~
+--, ~"_"~.O.-~
OH
~
o -
iab :e rag
o,4 .~
oo_o-. -o
helix2
- ..... A- G C U. C " C ,
/
/ -
oO_o-O
\\DI°°P
o-o
~
HoACCoooooooo Illllll
x "--"
o-o o-o "5'
\~
.÷
//
T
oooooo :
~
AOH
5'
(b)
(c)
loop
variable I
I
/
~x /I
o oo o • I I I I I
o°
oooo o
0
o o
0
0
• o o
anticodon loop
oo - - o o- o 0 0 0~0 0
0
I:I o
(a)
loop
j-
O~ ~ 0 0
•
c
T(t)C C
o--o 0--0 o--o
--,,,
/variable
o-o
o-
:-oo ~ - - - ~ . ,
~
o~o
o-o
CACC?
~ oo--oo
ce T
~- G o-o
s'.
c
o
o
.0"Cc
,
*
N
A = o
°o o o(O
~ol
D loop
(d)
Fig. 6. Possible structure homologiesof phage RNA and tRNA. (a) Generalizedstructure of the 3' terminal region of phage RNA for groups II, Ill and IV. (b) Phage RNA redrawn to resemble tRNA. (c) Generalized structure of the stacked T~C and aminoacyl acceptor arms of tRNA. (d) Generalized cloverleafstructure of tRNA.
118 protein are, in contrast to naked RNA, infectious. Shiba and Suzuki [28] have used these infectious complexes to determine the A protein binding sites on MS2 RNA using an RNase protection assay. One binding site is localized in the gene encoding the A protein and the other lies in the variable region, as indicated by the dashed line in Fig. 3a. The isolated RNA fragment from the V region is a very active competitor of intact MS2 RNA for binding the A protein, indicating that the target of the protein is single-stranded RNA. Surprisingly, the V region of Qt~ RNA is the site where, under initiation conditions, the replicase holoenzyme binds [29]. Binding is probably mediated by the auxiliary proteins $1 and the host factor (HF), since these proteins protect the same stretch of Qa RNA against RNase in the binary complex [11,30]. The protected Qa RNA region is indicated by a solid line in Fig. 4a. Binding probably involves melting of the RNA since Qa replicase binds the same fragment when the RNA is predigested with T1 RNase [29]. The above findings agree well with the observation that ribosomal protein $1 has melting properties and binds preferentially to single-stranded RNA rich in pyrimidines [31], whereas the host factor has a high affinity for A-rich regions [32]. Assuming that the results of the protein-RNA interactions studied in group I (MS2) and group III (Qfl) can be generalized and that we have correctly identified the V regions as homologous structure elements, the implication is that the maturation protein and the replicase have overlapping RNA binding sites. This property could serve as a device to switch from replication to packaging.
Acknowledgement We thank Dr. Skripkin for valuable discussions.
References 1 Furuse, K. (1987) in Phage Ecology (Goyal, S.M., Gerbe, C.P. and Bitten, G., eds.) pp. 87-115, John Wiley and Sons, New York. 2 Van Duin, J. (1988) in The Bacteriophages Vol. 1 (Fraenkel-Conrat, H. and Wagner, R., eds.) pp. 117-167, Plenum Publishing, New York. 3 Weiner, A.M. and Maizels, N. (1987) Prec. Natl. Acad. SCi. USA 84, 7383-7387. 4 Pleij, C.W.A., Rietveid, K. and Bosch, L. (1985) Nucleic Acids Res. 13, 1717-1731.
5 Haenni, A.L., Joshi, S. and Chapeville, F. (1982) Progr. Nucleic Acids Res. Mol. Biol. 27, 85-104. 6 Prechiantz, A., B6nicourt, C., Carr6, D. and Haenni, A. (1975) Eur. J. Biochem. 52, 17-23. 7 Adhin, M.R., Avots, A., Berzin, V., Overbeek, G.P. and Van Duin, J., (1990) Biochim. Biophys. Acta 1050, 104-109. 8 Iserentant, D., Van Montagu, M. and Fiefs, W. (1980) J. Mol. Biol. 139, 243-263. 9 Iserentant, D. (1981) Ph.D. Thesis, University of Ghent, Ghent. 10 Van den Berghe, A., Min Jou, W. and Fiers, W. (1975) Prec. Natl. Acad. Sci. USA 72, 2559-2562. 11 Senear, A.W. and Steitz, J.A. (1976) J. Biol. Chem. 251, 1902-1912. 12 Fiers, W., Contreras, R., Deurinck, F., Haegeman, G., Iserentant, D., Merregaert, J., Min Jou, W., Molemans, F., Raeymakers, A., Van den Berghe, A., Volckaert, G. and Ysebaert, M. (1976) Nature 260, 500-507. 13 Inokuchi, Y., Hirashima, A. and Watanabe, I. (1982) J. Mol. Biol. 158, 711-730. 14 Inokuchi, Y., Takahashi, R., Hirose, T., Inayama, S., Jacobson, A.B. and Hirashima, A. (1986) J. Biochem. 99, 1169-1180. 15 Inokuchi, Y., Jacobson, A.B., Hirose, T. and Hirashima, A. (1988) Nucleic Acids Res. 16, 6205-6220. 16 Mekler, P. (1981) Ph.D. Thesis, University of Ziirich, ZUrich. 17 Hirao, I., Nishimura, Y,, Naraoka, T., Watanabe, K., Arata, Y. and Miura, K. (1989) Nucleic Acids Res. 17, 2223-2231. 18 Schaffner, W., Riiegg, K.J. and Weissmann, C. (1977) J. Mol. Biol. 117, 877-907. 19 Mills, D.R., Kramer, F.R., Dobkin, C., Nishihara, T. and Spiegelman, S. (1975) Prec. Natl. Acad. Sci. USA 72, 4252-4256. 20 Mills, D.R., Kramer, F.R. and Spiegelman, S. (1973) Science 180, 916-927. 21 Skripkin, E.A., Adhin, M.R., De Smit, M.H. and Van Duin, J. (1990) J. Mol. Biol. 211,447-464. 22 Tuerk, C., Gauss, P., Thermes, C., Groebe, D.R., Gayle, M., Guild, N., Stormo, G., d'AubentonCarafa, Y., Uhlenbeck, O.C., Tinoco, J., Brody, E.N. and Gold, L. (1988) Proc. Natl. Acad. Sci. USA 85, 1364-1368. 23 Priano, C., Kramer, F.R. and Mills, D. (1987) Cold Spring Harbor Symp. Quant. Biol. 52, 321-330. 24 Freier, S.M., Kierzek, R., Jaeger, J.A., Sugimoto, N., Caruthers, M.H., Nelson, T. and Turner, D.T. (1986) Prec. Natl. Acad. Sci. USA 83, 9373-9377. 25 Tibayenda, N., De Bruin, S.H., Haasnoot, C.A.G., Van der Marel, G.D., Van Boom, J.H. and Hilbers, C.W. (1984) Eur. J. Biochem. 139, 19-27. 26 Faulhammer, H.G. and Joshi, R.L. (1987) FEBS Lett. 217, 203211. 27 Blumenthal, T. and Carmichael, G.G. (1979) Annu. Rev. Biochem. 43, 525-543. 28 Shiba, R. and Suzuki, Y. (1981) Biechim. Biophys. Acta 654, 249-255. 29 Meyer, F., Weber, H. and Weissmann, C. (1981) J. Mol. Biol. 153, 631-660. 30 Goelz, S. and Steitz, J.A. (1977) J. Biol. Chem. 252, 5177-5179. 31 Subramanian, A.R. (1983) Progr. Nucleic Acids Res. Mol. Biol. 28, 101-142. 32 Carmichael, G.G., Weber, K., Niveleau, A. and Wahba, A.J. (1975) J. Biol. Chem. 250, 3607-3612.