Studies on the bacteriophage MS2

Studies on the bacteriophage MS2

iol. (1971) 57, 597-613 Studies on the Bacteriophage MS2 x The Heptanucleotide Sequences Present in the Pancreatic ibonuclease Digest of the Viral R...

4MB Sizes 0 Downloads 121 Views

iol. (1971) 57, 597-613

Studies on the Bacteriophage MS2 x

The Heptanucleotide Sequences Present in the Pancreatic ibonuclease Digest of the Viral RNA G. HAECEMAN, W. MIN Jou AND W. FIERS Laboratory of Molecular Biology and Laboratory of Physiological C&em&try State University of Ghent, Belgium (Received 21 July 1970)

The heptanucleotides, present in the pancreatic ribonuclease hydrolysate of M82 RNA, were isolated by chromatography according to chain length at neutral pH, followed by chromatography at pH 3.0. Except for the sequence isomers, they were all resolved from each other, and all were obtained in nearly one molar yield In total 15 heptanucleotides were present, and their complete nucleotide sequences were established. Several examples of internal homology were encountered. Some sequences could be part of the coat protein cistron. In addition, the 5’terminal oligonucleotide was again isolated, and the sequence confirmed as pppGGGU. A quantitative analysis based on the 24 oligonucleotides so far sequenced, suggests a chain length of 3500 (& 200) monomers for the viral RNA.

1. Introduction Complete digestion of bacteriophage MS2 RNA with a specific enzyme such as ribonuclease T, allowed the isolation of a 3’-terminal fragment (De Wachter & Fieiers: 1967) and of internal oligonucleotides corresponding to amino acid sequences of the coat protein (Adams, Jeppesen, Sanger & Barrell, 1969; Robinson, Frist & Kaesberg, 1969; Jeppesen, Steitz, Gesteland & Spahr, 1970). Hydrolysis with pancreatic tibonuclease allowed the isolation of the 5’-terminal end, pppGGGU (De Waehter, Verbassel & Fiers, 1968), and of sequences with the general structure (Pup),P Bmong the latter, those with chain length eight and more were previously sequen (En Jou & Fiers, 1969). This study has now been extended to the heptanucleotides. Of these, 15 are present in the total digest, and their nuoleotide sequences were established. Some of these heptanucleotides may code for a few amino acids of t phage coat protein. In addition, several possible examples of internal homology were e~~o~~t~red.

2. Materials and Methods l?xeparation of 32P-labelled bacteriophage MS2 RNA, pancreatic ribonuolease hydrolysis, co1um.n chromatography, effluent monitoring and desalting were as described by Min Jou & Fiers (1969), except where indicated otherwise. For the separation according to chain length, the elution volume was 2 1. Pancreatic ribonuclease was further purified by chromatography (Mm Jou & Fiers, 1969), while ribonnclease T, was purchased from Sank.yo Ltd., Tokyo, Japan. -t Paper VIII in this series is Slegers 85Fiers, 1976. 597

598

G. HAEGEMAN,

W. MIN

JOU

(a) Chromatography

AND

W.

FIERX

at pH 3.0

A column (1 cm x 50 cm) of DEAE-Sephadex A25 (Pharmacia, Uppsala), previously equilibrated with the starting concentration, was packed under hydrostatic measure. After loading the heptanucleotide mixture, to which 10 O.D. units of mononucleoiides had been added, elution was carried out with a linear sodium chloride gradient from 0 to O-4 M (total volume 2 1.) in the presence of 7 M-urea and adjusted to pH 3.0 with HCl (pH meter reading). The residual radioactive material was eluted with a 1 M-NaCl solution in 7 M-urea, pH 3.0, and finally the column was purged with 2 M-Na,COs. (b) Pur$cation

of oligonucleotides

After desalting, the heptanucleotides were precipitated by addition of 2 vol. ethanol. They were taken up in 0.2 ml. of 0.05 M-NH,HCOs and treated with freshly equilibrated phenol (containing also 0.1 to 0.5 vol. chloroform and 0.5% sodium dodecyl sulphate). If the radioactivity stayed predominantly in the phenol phase, more chloroform or even ether was added. The aqueous phases were distributed in several samples and the oligonucleotides were precipitated again. Heptanuoleotides rich in Ap sometimes proved difficult to precipitate. In these cases isopropanol (2 vol.) was used instead of ethanol. (c) Structure determination The sequencing methods of Sanger and co-workers (Sanger, Brownlee & Barrell, 1965; Sanger & Brownlee, 1967; Brownlee & &anger, 1967) were used, with some modifications when necessary. One viral RNA preparation yielded about 30 pg of an individual oligonuoleotide and this amount was distributed in 2, 3 or 4 portions for different experiments :

(i) NzLcleotide composition Heptanucleotides, or their degradation products obtained from enzymic digests, were incubated in 20 ~1. of O-2 N-NaOH for 18 hr at 37°C and applied to Whatman no. 52 paper for ionophoresis at pH 3.5 and 100 v/cm for 40 min. The buffer system used was 0.5% pyridine-5% acetic acid (v/v) when the mononucleotides were localized by autoradiography. If, however, reference mononucleotides were added in order to visualize the spots under ultraviolet light, a buffer containing 5% acetic acid adjusted to pH 3.5 with morpholine was used. (ii) Degradation with ribonuclease TI Purified heptanuoleotides were dissolved in 20 ~1. of 0.02 M-Tris chloride-O.002 M-EDTA (pH 7*4), and digested with ribonuclease T, for 30 mm at 37”C, with an enzyme-tosubstrate ratio of 1 : 20. Components rich in Gp degraded more slowly, presumably because of aggregation. In these cases higher enzyme concentrations (ratio of 1 : 10) were used. The digestion mixture was applied to DEAE paper (Whatman Chromedia DE81) for high-voltage ionophoresis at pH I.9 (2.5% formic acid-8.7% acetic acid, V/V) and 30 v/cm. (iii) Partial digestion with spleen phosphodiesterase The components were heated for 3 min at 85°C in 0.3 M-ammonium sue&ate O-002 M-EDTA-0.05% Tween 80, pH 6.5, in order to minimize aggregation. After cooling they were incubated at 37°C with 1 to 2 pg spleen phosphodiesterase (or more for mixtures of isomers). Samples were taken at different times (e.g. 10, 30 and 60 mm) and applied to DEAE paper of 56- or 85cm length. Ionophoresis was carried out at pH 1.9 or, for products rich in Gp, in 7% formic acid (v/v), and 30 v/cm. The M-values (Sanger et ab., 1965) for separations in the 7% HCOOH system were found to be similar to those obtained in the pH 1.9 system. (iv) Partial digestion with venom phosphodiesterase Some heptsxmoleotides could not be completely sequenced using the preceding methods, mainly because of their high tendency to aggregate. Complementary information was then obtained by dephosphorylation followed by partial digestion with venom phos-

HEPTANUCLEOTIDES

FROM

BACTERIOPHAGE

MS2

589

RNA

phodiesterase, In an attempt to reduce the aggregation the heptanucleotide was taken up in 20 E/-I.buffer (O-02 M-ethanol-amine-O.01 M-magnesium chloride, adjusted to pH 16 with HCI) and heated for 3 min at 85°C. After cooling to 37°C and pW adjustment if necessary, 10 $. was added of a suspension, containing 10 mg bacterial alkaline phosphatasq’ml. in 0.65 saturated ammonium sulphate solution. Next, 1 to 2 pg of venom phosphodiesterase was added, and samples were taken at appropriate times (e.g. 10, 30 and 60 min). Ionophoresis on DEAE paper was carried out in the system 0.5% acetic aoid-morpholine, pH 3.5. Care must be taken to keep the pH of the reaction mixture constant during the digestion (the buffer is volatile) phosphatase-ammonium snlphate solution.

especially

after

the addition

of the

3. Results (a) Xeparation

to chain length

according

A typical separation of a pancreatic ribonuclease digest of MS2 RNA was shown by Kn Jou & Fiers (1969). The first peak consists of cyclic pyrimidine mononucleotides, followed by the mono-, di-, tri- and tetranucleotides (Fiers, De ~achter~ Lepoutre & Vandendriessche, 1965; Vandenbussche C%Fiers, 1966). Min Jou & Biers (1969) studied previously the octa, nona-: deca- and undecanucleotides. The quantitative distribution of some pancreatic ribonuclease digests, used in the present study, is reported in Table 1. In general, the results are in good agreement with our previously published data. On the basis of such distributions a former estim.ate for the chain length of MS2 RNA (Fiers, Lepoutre & Vandendriessche, 1965) can be improved. Indeed in six experiments, including those reporte Table 1, it was found that 5.01, 5.27, 6.01, 5.45, 5.57 and 4.67%: respectively, of the total eluted material was present in the peaks 7 to 11. Various arguments have TABLE

Distribution Radioactivity

of

1

(Pup) Pyp, tracts

(% of total)

Peak no. Expt

1

Expt 2

Expt

3

Average

No. of nucleotides/ moleculea

No. of tracts/ molecule C&X

Found

I____ 25.24 21.92 19.93 11.61 10.63 5.31 3.14 0.91 o-49 0.47 0.44 -

25.21 24.20 19.37 12.04 8.59 3.86 3.12 0.81 0.64 0.36 0.64 1.18

25.19 22.89 22.18 12.01 8-94 3.61 2.87 0.68 0.41 0.30 0.42 0.49

25.21 23.00 20.49 11.85 9.39 4.26 3.04 0.80 o-51 0.38 0.50 O-84

882 805 717 415 329 149 106 28 18 13 18 29

882.0 4025 239.0 104.0 65.8 24.8 15.1 3.5 2.0 1.3 1.6 -

1t-P 3 2 1 2

52P-labelled MS2 RNA was digested with pancreatic ribonuclease, and the resulting products were separated according to chain length on a DEAE-cellulose column (Min Jou & Fiers, 1969). The peaks were pooled and assayed for radioactivity. * Calculated on a total number of 3500 nuoleotides for the whole RNA chain (cf. text). * The cyclic mononuoleotides have been included. a Includes the V-beginning pppGGGU sequence.

600

G. HAEGEMAN,

W. MIN

JOU

AND

W. FIERS

previously been discussed (Min Jou & Fiers, 1969), which indicat,e that under the conditions used here, the distribution is not grossly distorted, e.g. due to aspecific cleavage, incomplete hydrolysis or non-random loss of material. As the number of individual sequences, present in each isoplethic peak, has been unambiguously determined, and as all are present in almost integer ratios, we may conclude that an average of 5.33%? of the total material is accounted for by 24 sequences of known structure and representing a total of 186 P atoms. Hence, the total chain corresponds to about 3500 monomers; the error in the above estimate is probably lower than 5 200 mononucleotides. (b) Xeparation according to base composition The chromatography at acidic pH of the heptanucleotide fraction reveals 11 wellseparated peaks (Fig. 1). Although some preliminary experiments were run at pH 2.7, a more regular pattern was obtained at pH 3-O. Indeed, at the former pH, the first components (nos. 1, 2, 3, 4, 5) were almost not retarded and the resolution was poor.

0

20

40

60

80

100

120 140 Fraction no.

160

180

200

220

240

260

-..I

FIG. 1. Rechromatography of the heptsnucleotide peak from & pancreatic ribonuclease digest of MS2 RNA. Elution w&s carried out with s 2000 ml. gradient from 0 to 0.4 M-N&~, in the presence of 7 M-urea, adjusted at pH 3.0 with HCI.

The relative amount of material, present in the different peaks, corresponds nearly to integral numbers (Table 2). On this basis, and considering the chain length and the percentage amount of total heptanuoleotides, we may conclude that peaks 1, 2, 3, 6, 8, 9, 10 and 11 correspond to unique sequences. Peaks 4 and 5 contain three and peak 7 two unresolved isomers. The peaks of the 1 M-sodium chloride or 2 ix-sodium carbonate wash contained only a negligible amount of radioactivity (together less than 2%). In the presence of urea, the order of elution is in agreement with the net negative charge at pH 3.0, determined by the base composition (Table 3). Up-terminal oligonucleotides run three peak positions behind their Cp-terminal homologues, while an Ap by Gp replacement retards the sequence by two positions$. t The ssme result is obtained if the (atypical) highest and lowest v&~es are left out. 1 This statement is only strictly true if the hypothetical petlks of (A,)U rend (G,)C are also considered, which in fact did not occur in the MS2 RNA digest.

WEPTANUCLEOTIDES

FROM

BACTERIOPHAGE

MS2

601

RNA

T~BLEZ Distribution Expt Peak no.

1 2 3 4 6 6 7 8 9 10 11

of heptawcleotides Expt

1 (pH 2.7)

Radioactivity (% of total)

Radioactivity (O/e of total)

No. of tr&cts~

1.0 1.1 o-9

6.36 6.72 5.92 1968 23.21 6.36 10.41 5.44 5.79 4.06 6.15

2 (pH 3.0)

6.86 5.60 8.25 18.98 22.09 4.93 13.41 5.10 536 588 352

3.1 3.7b 1.0 1.7 0.9 0.9 0.6 1.0

No. of tracts”

Deduced molar frequenoy of heptanucleotides

1 L 1 3 a 1

1.1 0.9 1.3 3.0 35” 0.8 2.1 0.8 0.9 0.9 0.6

2 1 1 1 1

The heptanucleotide peak (cf. Table 1) was concentrated and rechromatographed according net charge at pH 3.0 (cf. Fig. 1). D Calculated on the basis of a total number of 16. b The reoovery of this component was better than average; detailed sequence analysis led unambiguously to the conclusion that the molar frequency was in fact 3. to

TABLE

3

Nucleotide composition Molar yield of* Peak no.

PPPGb G

A

G

U

1 2

0.89 0.90

4.94 3.97

3 4 5 6 7 8 9 10 110

0.12 0.83 O-16 6.93 0.10 0.96 0.06 Q-03 091

4.83 3.01 3.83 2.08 2.87 1.14 1.92 0.97 0.02

1.04 2.03 1.21 2.98 2.08 3.82 3.03 4.79 3.98 5.00 1.91

0.14 0.10 0.83 0.18 0.93 0.17 l-00 0.12 1.05 l-00 0.91

Composition deduced

a The compositions are expressed as the molar yield of each nucleotide; all products contain 7 P atoms. b By analogy with the mononucleotides, the 3’(2’)phosphate is omitted; thus pppG stands for 5’-triphosphoryl~ 3’(Y)-monophosphoryl-guanosine. c As there was some trailing from the preceding peaks, this component was reohromatographed on DEAE-paper (cf. text) before analysis.

G. HAEGEMAN,

602

W. MIN

JOU

AND

W.

PIERg

(c) Xequence anaZysis The nucleotide sequences were determined by hydrolysis with ribonuclease T, (Plate I and Tables 4 and 5) and by partial degradation with spleen phosphodiesterase (Plate II and Tables 6, 7, 8 and 9). Partial digests with snake venom phosphodiesterase were sometimes used for additional confirmation (Table 16). The structure determination of the different components is presented in order of their elution from the acidic column.

TABLE

4

Products obtained by digestion of single components with ribonuclease T, Peak no.

Yield of degradation

product@

1 2 3 6 8 9 10

AAG(3*12)-AAAC(3.69)-X(0.19)b AAAAG(4.94)-G(l-11)C(0.95) AAAAG(5.06)AU(1.93) AG(2.00)G(3.30)AC(l.70) AG(1.95)G(4.14)C(0.92) AG(4.17)G(2.09)U(O.74) AG(l~98)-G(3%3)-U(l~l4)

AAGAAAC (AAAAG,G)C AAAAGAU (AG,‘&W (AC*,G,)C

I(A~)&IU WWWJ

a The yield is expressed on the basis of nucleotide equivalents b Unidentified spot.

TABLE

Structure

(total

equal to 7).

5

Products obtained by digestion of the mixtures of isomers with ribonucbase T, Peak no.

Yield

of degradation

products&

structure

4

AAG(92%)-AG(97%)-G(S7%) AAC(lOO%)-AC(lOO%)-C(loo%)

5

AAAAG(4.13)-AAAG(0.06)b AAG(8.40)~AAU(3.17) AG(O53)“-G(2SS)-U(l.83)

mG(5) AAG(S)-AAU(3)

7

AAAG+AAAU(7.56)“-AG(O.18)” AU(0.2S)b-G(5.23)-U(075)

AAAGfAAAU(8) G(5)-U(I)

Observed

(A.AG,AG,G)C (~‘%Gs)AC WV&WC

Deduced

GP)-W)

AAGAAGU (~WWJ (AAG,G)AAU GGGAAAU -i (=G,WU

a For peak 4 the extent of digestion was incomplete and unequal for the different sequences, present in the mixture. Consequently, the pyrimidine-containing products (A&, AC and C) were taken as a basis for the calculation of the other products, considering that each heptanucleotide has the composition (A,,Gz)C. For peaks 5 and 7 the yields are expressed as nucleotide equivalents. They are calculated on the basis of 21 nucleotide units (3 heptanucleotides) for peak 5 and on 14 nucleotide units (2 heptanucleotides) for peak 7. b spot presumably arising from non-specisc degradation (e.g. AG from AAG). 0 These products were separated by a second ionophoresie at pH 3.5 on Whatman no. 52 paper.

I'LA~E

on UEAE

1. Radioautograph of T, digests of the components paper a,t pH 1.9. B stands for blue marker.

1 to i: fracrionated

by lonophoiea+s

la1

Plate II. respectively,

Radioautograph of partial sepamted by ionophoresis

lb)

spleenphosphodiesterase digests on DEAE paper at pH 1.9.

ofpeak

4 (a) and peak

5 (b),

NEPTANUCLEOTIDES

FROM

BACTERIOPHAGE

TABLE

Partid

Peak no.

Nucleotide

composition*

-

0.98 0.92 I,23 l-12 -

0.17 1.08 1.92 2-92 -

0.77 0.80 0.76 0.95 -

0.08 0.21 0.08 0 -

0.6 0.5 0.6 o-5 1.2

3 4 58 6 7

0.90 0.89 0.86 0.81

1.05 1.03 1.84 2.05

1.05 2.07 3.05 3.85

0 0.01 0.25 0.28

1.1 1.4 0.2 2-2

3 4 5 6 1

0.87 0.98 0.87 O-88 0.91

o-33 0.97 1.04 0.92 1.06

l-40 1.80 2.67 3.90 4.82

0.39 0.25 0.46 0.29 0.21

0.3 1.2 3.6 -0

9

2 3 4 6 6f 7

0 o-03 0.01 o-05 -

0.04 0.12 0.91 0.75 -

0.95 1.74 1.96 2.74 -

1.01 1.11 l-12 l-50 -

l-9 0.3 l-3 l-9

IO

5 6 7

0.16 O-06 o-07

0.63 0.84 0.95

3.23 4.00 4.93

0.98 l-10 I.06

-

8

structu.re

-c

G

2 3 4 6 6C 70

6

603

6

iE-value@ A

c

-2

BNA

digestion of single components with spleen phosphodiesterase

Spot no.

MS2

-

-

GA@ GGAC GGGAG AGGGAC GAGGGAC GGC AGGC GAGGC GGAGGC GGGAGGC GU GGU AGGXJ GAGGU GGAGGU AGGAGGU

* The compositions are expressed as the molar yield of each nucleotide for a sequence of indicated chain length. JJM-values have been defhred by Sanger et al. (1965). Only sequences which still move slower than the blue marker upon removal of one residue are considered. a Amount not sufficient for nuoleotide analysis. * The base analysis of this component failed; the M-value, as well as analysis of a sti5r spot in another experiment, indicates that the sequence is GGGAC. B The undegraded material almost remained at the origin; consequently no M-value could be caiculated. I Thisspotwasamixtureof the hexa- and heptanuclaotide and no unambiguous composition could be determined. Part of the spot could be identified tentatively es the hexanucleotide and on this basis, en M-value wss deduced and a sequence tentatively proposed. Further confirmation is shown in Table IO.

0.80

0.88

0.92

0.95

9

10

11

12

1.83

2.91

2.03

2.98

2.92

2.73

1.21

0.16

0.12

0.19

0.16

0.02

I.93

2.05

0.02

1.14

1.99

I:13

0.06

0.03

0.03

0.04

U

0.97

0.91

O-98

l-10

0.08

0.88

G

1.86

0.03

A

compositiona

(Ad&&

L&WC

&&4C Wh)C

@JL)C

&,GF

(A&W

(UN

A&

GC

AC0

-

AG(91%)-G(105%) C(lOO%)-AC(lOO’j&) AAC( 100%)

AG(lOO%)-G(99%) C(lOO~O)-AC(lOO~o)

G(2.37)AAC(2.63)

AG(98%)-G(105yo) C(lOO%)-AC(lOO%) >

>

B

0.50

9.56

2.00

--

1.76

0.57 -

AG(Z*OO)-C( 1.01) d

-

G(1*05)-AC(1.95) G(1.20)AAC(2.81)

-

-

T1 hydrolysat@

0.50

0.50

P-96

-

2.00

OTd3

2.08

-

-

1.15

2.01

-

-

--

-

M-value

AAGGAGC

AGGAGC

GGAGC

-

GAGC

AGC -

-

GC -

-

AAGGGAC

AGGGAC

GGGAC

-

GGAC

“.-

AGGGAAC

GGGAAC

GGAAC

GAAC -

-

AAC -

GAC -

AC -

deduced

AC -

Structure

a Expressed as the molar yield of each nucleotide for a sequence of indicated chain length. b Number of nucleotide equivalents, based on the partial structure indicated in the previous column. 0 Identified by its unique position on the ionopherogram. 6 Analysis of the C-contaming products indicated a mixture of two sequences, 61 o/o ending in C and 39% ending in AC. The latter were taken as a reference ( 100%) and the yields of the other components normalized on this basis. 8 Idem as for spot 8; 60% of the material ended in C (and consequently must have the structure (Ga,AG)C and 40% ended in AC! (structure (Ga)AC). These C-containing products were again taken as a reference in order to normalize the yield obtained of the other products. r Analysis indicated 59% of the material ended in C (hence must have the structure (G,(AG)a)C), 27% ended in AC (structure (G,,AG)AC) and 13% ended in AAC (sequence GGGAAC). The C-containing products were set equal to 100% and the yield of the other products calculated on this basis.

l-00

0.87

6

0.92

1.08

5

8

1.04

7

1.04

4

c

Nucleotide

3

2----

spot no.

TABLE 7

Partial digestion of compound 4 with spleen phosphodiesterase

2.77 -

0.07 -

0.05 -

6

8

(LGdU

(A&W -f

CUWJ -

(A&W

(-4,G)U

-

AAG(3.31).AG(1.69)~U(2.02) AAAG(3.64)-G(1.36)

AAG(2*94)-G(l*lO)-U(O.96)

AG(l~S4)-G(l~18)-U(O~98) AAG+AAU

AG(1.95).AAU(3.05)

>

k

2.69

-

0.30

0.94

-

-

AAG(3.01)-U(0.99)

2.25 -

>

0.68 -

G(0*93)-AAU(3.0’7)

AG(2,00)-U(O.99)

-

T, hydrolysateb

0.63

-

0.62

0.57

0.45

(0.73)8

0.62

0.57

2.15

-

-

1.17 -

-

(3.42)8 -

A.Au -

GAAGAAU

-

-

AAGAAU

AGAAU

-_

GAAU -

AU

-

-

.-

-

__-..

M-value

deduced

AAAAGGU

AARGGU

AAGGV

AGGV

-

GGU

-

GU -

-

Structure

-

AAGAAGU

AGAAGU

GAAGU”

-

-

AAGU -

AGU -

GU

a Expressed as the molar yield of each nucleotide for a sequence of indicated chain length. b Number of nucleotide equivalents baaed on the partial structure indicated in the previous column. a In longer runa this spot was resolved in the two components, as shown in Plate II(b). d This spot was very weak and could not be directly identified. It was presumably GGU. @These M-values are based on the assumption that spot 7 is GGU. r This spot did not give a simple nucleotide composition; indeed it consisted of two sequences with different chain length. g This sequence m&as shown, and not GAGU because no AG is present in the total T1 digest of these .heptanucleotides. b The relative amounts of these sequence isomers were not determined. 1 The composition was deduced from the T, digestion products. b Except for Ui, it WDSpossible to cdoulato the yield of each sequence, a.s tho products were all separated (k4G and AG for one sequence, and AAAG and G for the other). The sequences, however, were not present in equal amount.

.-

12

-

-

11

-

-

-

o-97 -

0.93 -

o-91

-

1.21 -

1.29 -

1.07

om

0.80 --

u

-

l-95

1.05

0.16 -

G

10

9

76

1.70 -

0.07

5

I.05

0.12

4

I.05 -

A

0.02 -

c

Nuoleotide oompo&tion*

3”

2

hSp0t no.

Partid digestion of compound ti with qben phosphodiesterase ---.. -.._-.“.-..-I1_X_I_-.__i___II--~---~-,

l-05

0.01

0.07

O*OL

1.96

0.74

0.93 0.96

l-04

0.88

0.95

161

2.94

1.88

2.71

separationa

1.02

0.92 0.89 o-97 O-96

0.87

U

0.03 0.02

I.01

l-82 2.01 2.25 3.01

0.09 0.13 -

iono@oresis

G

composition”

0.03 0.07 0.16

0 0.01 0*02

Fwo-dimensional

2.78 2w

0.01 o-10

(W

1.96

l-25

1.03 2-90 -

One-dimensional

A

0.35 0.06

-

0.06

0.01 -

(4

C

Nueleotide

G,U

GU

AJJ AsU bMWJ

AU

(UWU &,WU

tA%U (&JWU

AU &U

3.1

1.9 --

0.8

0.7

-_

1.33

l-91

0.68 0.61 1.25

-

M-value

-

1.33

0.52 0.40 0.36

2.59

AU AAU AAAU GAAAU GU GGU

GGAAAU GGGAAAU

-

AU AAU AAAU GAAAU

Structure

R Expressed as the molar yield of each nucleotide for a sequence of indicated chain length. b No unambiguous composition could be determined; indeed these were mixtures of products with different nucleotide oomposition. QThe sequence of these oligonuoleotides follows from the faot that no AC or AAG is pregent in the total T1 hydrolysate (Table 7). d Two-dimensional olectrophoresk, first at pH 3.5 on cellulose acetate and subsequently on DEAE paper at pK 1.9 @anger et al., 1965).

6 7 8 9

2 3b 4 5b

no.

spot

Partial digestiolz of compound 7 with spleen phosphodiesterme

TABLE 9

,-.

GU GGU AGGUO AAGGUa AAAGGU GAAAGGU

deduced

HEPTANUCLEOTIDES

FROX

BACTERIOPHAOE

XS2

RNA

005

Peak I-(B,,C)C. T, digestion of this component gives AriG and AA8C (Table 4). Hence, the sequence is ,4AGAAAC. Pe& %-(A4,C2)C. Digestion of this component with ribonuclease T, led to the partial structure (AAAAG,G)C. The sequence was further established hy partial hydrolysis with spleen phosphodiesterase (Table 6). Most degradation products were identified by alkaline hydrolysis, while M-values confirmed the nature of the nucleotide split off at the successive steps. The deduced sequence is GAAAAGC. PeaL 3---(;1,,G)U. The nucleotide fragments AAAAG and AU, found after ribonuclease T, hydrolysis (Table 4), establish the sequence as AAAAGAU. Peak 4-(B,,G,)C. It is obvious from the principle of separation at acidic p {Fig. 1) that the oomposit,ion, determined by alkaline hydrolysis, is valid not only for the tot,al peak, but also for each individual sequence present in the mixt,ure. The quantitative distribution (Table 2) suggests the presence of three isomers. Indeed, hydrolysis with ribonuclease T, also releases three different pyrimidine-containing products {Table 5). They were produced, however, in unequal amounts, VZL a molar ratio of 057 for the C-terminal sequence, O-21 for the AC-terminal sequence and 0*22 for the AAC-terminal sequence. A large amount of undegraded material stayed at tbe origin (Plate I). By using higher enzyme concentrations or longer incubation times no improved distribution could be obtained, as non-specific hydrolysis became a,ppreoiable. We tentatively conclude that the high G-content of these oligonueleo,tides led to aggregation (vi& infra), which in turn caused resistance towards enzymatic degradation. This effect was most severe with the oligonucleotides ending in RC and SAC, as these contained a row of three adjacent G residues. By taking the three C-containing products as ;I reference and knowing tha,t 3 4 residues are present in each sequence, we could assign the other T, products to the three presumed sequences. Their molar yield corresponded indeed closely to t,he theoretical value expected on this basis (Table 5). This means that the resista.nce of the oligonucleotide, due to aggregation, is an all or none effect; onae attacked by the nuclease, all end products were obtained in quantitative yield. In this partial sequenoes (AAG, AG, G)C, (AAG,Gs)AC and (AG,G&AAC were de The complete sequence was established by partial digestion with spleen phosphodiesternse (Plat,e II(a) and Table 7). Degradation products were identified by alkaline hydrolysis and by digest)ion with ribonuclea,se T,. From the resuhs, three sequences could be reconstructed logically by stepwise addition of a mononucleotide. The sequence of two product,s was already known at the pentanueleotide level (indeed . . I GGAGC and . . . GGGAC must be completed to AAGGAGC and AaGGGAC respectively, for their composition is (Aa, G,)C). The presence of the’ hexanucleotide GGGAAC: in spot 11 is already evident from the total nucleotide composition and the presence of AAC. Also the M-values of the AAC-containing spots point to t,he same hexauucleotide sequence. This means that the t.hird heptanueleotide of the mixture is AGGGAAC. Confirmation of these sequences was obtained by partial digestion *vith snake venom phosphodiesterase (Table 10). Starting from the dinucleoside ~~onophosphate BpX, two different series of 5’-end products were developed : AAGG. a, and AGG.. . . Peak 5-(A,,G2)U. The quantitative analysis of the T, hydrolysate of this mixture olearly showed a ratio of two molar equivalents of U for one of AAU (Table 5). Hence it was concluded that three heptanucleotide tracts were present. As one of these contained two equivalents AAG, its nucleotide sequence was faxed as AAGAAGU. 39

-

0

0.02 0.02 0.01 0

Fi e

9a b D cl

1.88

0.82 o-95 0.91

1.27 1.81

1.56 049 1.85

-t

A

Nucleotide

-

2.02 2.0-I

1.01

0.16

+ 0.97 + 0.99

0 0.02 0.05 O-04

0.09 0.01 0.02 0.06 0.07

-

0.33 I.07 I,12 1.67 2.06

u

G

composition*

G AG G A& G

A A AG A& AU, A&z

digestion

-. 1.66 2.50

AC(1.76)G(1.24) -

1.56

-

2.71

-

-

GpApApA

1.50

-

GPX

GpGpX GpGpGpA

-

GpX

APX -ApGpX .ApGpGpG -

deducedC

APX ApGpX. ApGpGpA APGPGPAPG

ApApGpGpG

APAPGPG

APX ApApX. -

Structure

GPADX -

--

3.21

_-

-

-

M-valuesb

AG(I+io)-G(0~22)

-

-

AG AAG AG,G RAG,G 2.58

-

-

‘I’, hydrolysate

with snake vetiom phosphodiesterase

10

a Expressed as the molar yield of each nucleotide for B sequence of indicated chain length. b Only M-valaes, referring to products moving slower than the blue marker are indicated. c Those products correspond to dephosphorylated oligonucleotides. EFenco, 81small p is used to den&o the (intemucleotide) phosphate. The 3’-terminel residue, released as an (unlabcllod) nucleoside after dkaline hydrolysis, is indicated by X, except when it can be identified on the basis of an M-value.

-

0 --

b

o-17 0.03 0.02 0 0.05

-

C

7a

spot no.

Partial

TABLE

HEPTANUCLEOTIDES

FROM

BACTERIOPHAGE

MS2

RNA

609

The complete structure for the other two isomers was established by partial digestion with spleen phosphodiesterase (Plate II(b) and Table 8). Starting from the di- and trinucleotides, three sequences could be logically reconstructed. One of these was AAGAAGU, already discussed above. There was one link almost missing, vix. GQD ; presumably the enzyme is retarded by a preceding row of Ap residues. The next higher sequence, AGGU, was fully identified, such that the derived total nucleot,ide sequence is unambiguous. Peak 6--(A,,G,)C. As ribonuclease T, digestion gives less sequence information with increasing G content, the structure of this component was mainly determined by partial degradation with spleen exonuclease (Table 6). Starting from AC (Table d), the sequence could be built up stepwise to the heptanucleotide GAGGGAC. Also this component was often difficult to degrade enzymat,ically, presumably due to the GGGsequence which causes aggregation. Peak II-(A,,G,)U. Two sequences were present (Table 2). Hydrolysis with ribonuclease T, (Table 5) allows already the conclusion that one has the sequence GGGAAAU. The other was solved again by means of spleen phosphodiesterase (Table 9). In spite of the fact that some mixtures remained unidentified (GU + AAU, GGU $ GAAAU), we were able to deduce the sequences from the results of a onedimensional separation. Indeed, considering the T,-product composition (Table S)> the tetranucleotide AGGU and the pentanucleotide AAGGU can only be completed to GAAAGGU. Nevertheless, a two-dimensional analysis was also carried out (Table 9), which allowed the unambiguous identification of all the smaller components. Furthermore, partial digestion with venom phosphodiesterase independently confirmed the established sequences (Table 10). Peak 8--(A,G,)C. The sequence GGGAGGC was deduced by sequential degradation with spleen exonuclease (Table 6). Also with this component aggregation was often a problem in obtaining smooth enzymic degradation. Peak %--(&,G,)U. Partial digestion with spleen phosphodiesterase (Table 6) allowed the reconstruction of the sequence up to the pentanucleotide GAGGU. The sequence of the next higher products, the hexa- and heptanucleotide, could only be guessed from an M-value. Direct and unambiguous confirmation of the proposed structure, however, was obtained by partial snake venom phosphodiesterase digestion (Table 10). Pe& IO-(A,G,)U. This component proved by far to be the most di&ult to degrade enzymically; in fact clearly more refractory than peak 8, which contains the same number of Gp residues. Digestion with ribonuclease T, and identification of products, obtained by sequential degradation with spleen exonuclease, allowed construction of the partial structure GG(A,G,)GU. Degradation with venom exonuclease was only successful when a preceding heat treatment at pI3 10 was used (cf. ~a~ria~s and Nethods). On the ionophoretogram four spots were detected, all of which released only Gp after alkaline hydrolysis. Presumably they were GpX, GpGpX, G~GpG~~ and GpGpGpGpX. This would indicate that the heptanucleotide has the sequence GGGGAGU. Nevertheless, more evidence is needed to establish firmly this structure. Penk Il. It was readily realized that this component was not a heptanucleotide, but the 5’-terminal sequence described by De Wachter et al, (1968), and known to ~o~h~omatograph with the heptanu.cleotides at neutral pH. After elution from the acidic column, the material was rechromatographed on DEAE paper with a Trisacetate gradient in the presence of 8 M-formamide (De Wachter et al., 1968). In this

610

G. HAEGEMAN,

W.

MIN

JOU

AND

W. PIERS

system, component 11 moves faster than the heptanucleotides, namely about as fast as the pentanucleotides. The products, obtained by alkaline hydrolysis, were separated by descending paper chromatography an Whatman 3 mm paper. The results (Table 3) clearly confirm the composition, and hence the sequence, reported previously.

4. Discussion All the heptanucleotides, present in a pancreatic ribonuclease digest of MS2 RNA, were obtained in nearly molar yield after two successive column chromatographic steps. This implies a constant and unique structure of the viral RNA. The sequence of all (except perhaps the last one) could be determined unambiguously, mainly by T, ribonuclease digestion and/or sequential degradation with spleen exonuclease. In a few instances, further information was provided by partial degradation with venom phosphodiesterase. The results are summarized in Table 11. The M-values introduced

TABLE

Heptanuclwtides

Composition

II

of MX2 and of related ba&eriopFYages

MS2 RXA

AAGAAAC GAAAAGC

AAGGGAC AGGGAAC AAAAGGU AAGAAGU GAAGAAU GAGGGAC GGGAAAU GAAAGGU GGGAGGC AGGAGGU

Sequences present in R17 RNA”

AAGAAAC GilAAAti

Ml2 RNA

AAGAAAC AAAGAGC AAAAGAU AAGAGGC

AAAAGGU

GAGAGGV

GAGAGGU

(GGGGAGU) a Identical seauences itre underlined; results on R17 RKA and Ml2 RNA are from Thirion & Kaesberg (1970): b A component, with the sB;me nucleotide composition, but with & sequence AGGAGGU (as found for MS2 RNA), w7as reported by St&z (1969) 8s & pert of the ribosomal binding site of the A protein.

by Sanger et al. (1965), were often used as a guideline and as an independent cheek. ?he values for the spleen exonuclease digestion products ranged from O-2 to 0.9 (on one occasion 1.2 was found-Table 8) for A (27 observations), and 1.1 to 3.6 for G (27 observations). Similar values apply for the pH l-9 and for the 7% formic acid systems. The M-values for digestion with venom phosphodiest,erase were in agreement with our previous results (Min Jou & Fiers, 1969).

HEPTASUCLEOTIDES

FROM

RACTER.IOPfIAGE

1152 RNA

0; 1

Seven of the 15 heptanucleotides were isolated in pure form. The others were present as mixtures of two- or three-sequence isomers. No attempt was made to separate the latter, as analysis of the mixture nevertheless allowed unambiguous sequence determination of all components. The most serious difficulty encountered in this study was aggregation, presumably due to G-G interactions (Ralph, Connors Khorana, 1962; Lipsett & Heppel, 1963; De Wachter & Fiers, 3969). It seemed as if part of the material was present in a form completely resistant to enzymic degradation; a heat treatment was only moderately successful in rendering the product more susceptible. It is presumably not a coincidence that serious difficulties arose only with heptanucleotides, containing an uninterrupted row of three G residues, and that the only heptanucleotide, which (almost certainly) contains a row of four G residues, proved almost impossible to sequence. An inspection of the heptanucleotide sequences, reported in thi.s paper, and also of the higher polypurine sequences, published previously (Min Jon & Fiers, 1969)9 reveals several examples of a,pparent internal homology (Table 12). It is particularly TABLE

12

Possible evidence for internal homology in MS2 RiQkt

hepta: hepta: hepta:

’AAAAGGti CCC AAAAGGU!

IGAAAACzi C GGIGAAAAG,GU

dWXb: G’AGGGAC AiAGGGAC ‘L--l

hepta: octa:

Three hexanucleotide sequences snd two heptanucleotide sequences appear twice in MS2 RKA. The decanucleotide sequence GGGAAAAGGU may be an example of multiple internal homology. The probability of finding two identical heptnnucleotide tracts clue to chance amounts to 0~15 and for finding two pairs to 0.03. These calculations are based on an experimentally determined, total number of 23 sequences of the form (Pup),Pyp. The probability of observing 3 or more matching tracts of the form (Pup), or (Pup),Pyp, amounts to approx. 045, and it, is likely that at least some of the latter repeats are due to chance.

remarkable that three different parts of the decanucleotide sequence GGG~A~~G~W are repeated in three heptanucleotide sequences. These repetitions are far above what one would expect on a random basis, and it is not excluded that they are due to duplmation of segments during the evolution of the viral genome. Nevertheless, curious but irrelevant coincidences are bound to be found in these long NA molecules. Internal homology has also been observed in the 5 s ribosomal RNA’s of EschericRiu c& (Brownlee, Sanger & Barre& 1967) and KB cells (Forget C%Weissman, 1967) and in the 16 s and 23 s ribosomal RNA’s of E. coli (Fellner $ Sanger, 1968; Fellner, Ehresmann & Ebel, 1970). The close relationship between the bacteriophages MS2, Rl7 and Ml2 is reflected in their polypurine sequences, as was already demonstrated by Nin Jou & Fiers (1969) and by Thirion & Kaesberg (1970). This is now further confirmed after the structure det,ermination of the heptanucleotides present in MS2 RNA. Among the heptanucleotides sequenced so far in R17 RNA and in Ml2 RNA, several turn out to be identical with i&S2 sequences (Table 11). The oligonucleot,ide AGGAGGW was

612

0.

HAEGEMAN,

W. MXN

JOU

BXD

W.

FIERS

binding site of the maturation protein cistron in R17 RNA 1969), and it is very likely that this sequence occurs in an identical (or very analogous) position in the MS2 RNA molecule. Translation of the amino acid sequence of the MS2 coat protein (Lin, Tsung 8~ Fraenkel-Conrat, 1967; Weber & Koningsberg, 1967) into a nucleotide sequence (which is not unique, because of the degeneracy of the code) revea.led several candidates which might form part of this cistron among the heptanucleotides determined (Table 13). Longer oligonucleotide sequences, however, will be needed in order to establish unambiguously the correlation between the genetically encoded information and the resulting translation produot. found

in the ribosomal

(Steitz,

TABLE

Different

heptanudeotides

13

which may be part of the coat protein

PyA . GGG . AAC PyG - CGA. AAU Thr + Gly * AsN 17 15

PyA . AGG . AGC” PyA . AGA . AGUa Ser * Arg * Ser 39 37

Py G * GGA.GGC Py (A + CGA * GGU)b Val - Gly . Gly 72 74

P yA * AAA . GAU Leu .Lys 112

cistron

*Asp 114

The corresponding amino acid sequences, for which they could perhaps code, sre written underneath. B New data on longer sequences rule out these assignments (unpublished results). b This sequence is presumably a part of the A protein ribosomal biuding site, as in the case of R17 RNA (Steitz, 1969).

Finally, we may mention that this study has resulted in a better estimate for the chain length of the viral RNA, viz. 3500 mononucleotides (f- 200). Also the 5’terminal oligonucleotide was isolated by an independent way, and the sequence pppGGGU unequivocally confirms the previous result (De Wachter et al., 1968). This work was supported by grants from the U.S. National Institutes of Health (GM 1130495) and from the Fonds voor Fundamenteel Kollektief Onderzoek (no. 841). Two of the authors (G.H. and W.M.) thank the Nationaal Fonds voor Wetenschappelijk Onderzoek of Belgium for Fellowships. Mrs M. Borremans-Bensch expertly took care of the biological aspects. REFERENCES Adams, J. M., Jeppesen, P. G., Sanger, F. 8: Barre& B. G. (1969). Nature, 223, 1009. Brownlee, G. G. & Sanger, F. (1967). J. Mol. Biol. 23, 337. Brownlee, G. G., Sanger, F. & Barrell, B. G. (1967). Nature, 215, 735. De Wachter, R. & Fiers, W. (1967). J. Mol. Biol. 30, 507. De Wachter, R. & Fiers, W. (1969). Nature, 221, 233. De Wachter, R., Verhassel, J. P. k Fiers, W. (1968). FEBS Letters, 1, 93. Fellner, P., Ehresmann, C. & Ebel, J. P. (1970). Nature, 225, 26. Fellner, P. & Sanger, F. (1968). Nature, 219, 236. Fiers, W., De Wachter, R,., Lepoutre, L. & Vandendriessche, L. (1965). J. Mol. Biol. 13, 451. Fiers, W., Lepoutre, L. t Vandendriessche, L. (1965). J. Mol. Biol. 13, 432.

HEPTSXUCLEOTIDES

FROM

BACTERIOPHAGE

3iS2 RNA

613

Forget, B. G. 85 Weissman, S. M. (1967). Science, 158, 1695. Jeppesen, P. G., Steitz, J. A., Gesteland, R. F. & Spahr, P. F. (197Oj. Nature, 226, 230. Lin, J.-Y., Tsung, C. M. & Fraenkel-Conrat,, II. (1967). J. Mol. Biol. 24, 1. Lips&t, M. N. & Heppel, L. A. (1963). J. Amer. Chem. Sot. 85, 118. Xin Jou, W. & Fiers, W. (1969). J. Mol. Biol. 40, 187. Ralph, R. K., Connors, W. J. & Khorana, Il. G. (1962). J. Amer. Clam. Sot. $4, 2266. Robinson, W. E., Frist, R. M. & Kaesberg, P. (1969). Science, 166, 1291. Sanger, F. & Brownlee, G. G. (1967). In Method8 in Enzymology, ed. by L. Grossman $ K. Moldave, vol. 12 Part A, p. 361. New York: Academic Press. Sanger, I?., Rrownlee, G. G. & Barrell, B. G. (1965). J. Mol. Biol. 13, 373. Sledgers, H. & Fiers, W. (1970j. Biopolymers, 9, 1373. Steitz, J. A. (1969). Nature, 224, 957. Tnirion, J. P. & Kaesberg, P. (1970). J. Mol. Biol. 47, 193. Vandenbussche, P. & Fiers, W. (1966). Biochim. biophys. Acta, 114, 182. Weber, K. & Koningsberg, W. (1967). J. BioE. Chem. 242, 3563. Note czdded in proof:

AGGSGGU

recent results also rule out the as components of the coat protein cistron.

sequences

GGGAGGC

and