VIROLOGY
103,
17%190(1980)
Partial Amino Acid Sequences
ULRIKE
BOEGE,“,’ GERD BRIGITTE
Virologie,
*Institut,fiLr
of Sindbis and Semliki Core Proteins WENGLER,* GISELA WITTMANN-LIEBOLD?
Forest Virus-Specific
WENGLER,”
Jwstus-Liebig-liniversittit Giessen and tMax-Planck-Institut,fiir Aht. Wiffmann, Berlin-Dahlem, Federal Republic of Gemany Accepted
February
AND
Molekulare
Genetik,
9. 1~80
Tryptic peptides of Semliki Forest (SF) virus and Sindbis (SIN) virus core proteins have been investigated. About 88% of the total amino acids of SIN virus-specific and 67% of those of SF virus-specific core proteins, respectively, have been sequenced. Homologous sequence regions are present in the peptides of both proteins, comprising at least 22% ofthe total amino acids. One peptide of each of the proteins did not contain a free N-terminal amino acid, indicating that it represents the blocked N terminus. These peptides derived from the SIN virus and SF virus-specific core proteins have the structures b (Asx, Met, Arg,) and b (Asx, Thr, Glx, Pro, Met, He, Tyr, Phe,), respectively; the nature of the blocking group b remains to be determined. Digestion of SIN virus core protein with carboxypeptidases A and B indicates that the C-terminal sequence is -Phe-Arg. INTRODUCTION
The structure and replication of alphaviruses have been studied in great detail (see Kaariainen and Soderlund, 1978 for a review). The ribonucleoprotein (RNP) core of the virus particle consists of the 42 S infectious single-stranded viral RNA genome (Cheng, 1958; Richter and Wecker, 1963) complexed to a single species of virusspecific core protein, the latter having a molecular weight of about 30,000 (Strauss et al., 1968; Acheson and Tamm, 19’70). The RNP is surrounded by a lipid envelope which in the case of the alphavirus prototype virus Sindbis (SIN) virus contains the two glycoproteins E 1 and E2 (Schlesinger et al., 1972; Garoff et al., 1974). A third species of virus-specific glycoprotein, E3, is associated with the virus particles of the alphavirus Semliki Forest (SF) virus (Garoff et al., 1974). All viral structural proteins are translated from a single species of virusspecific mRNA which sediments at about 26 S on sucrose density gradients (Cancedda and Schlesinger, 1974; Simmons and Strauss, 1974; Wengler et al., 1974; Clegg and Ken’ To whom
reprint
requests
should
00426822/80/070178-13$02.00/O Copyright All rights
0 1980 by Academic Press, Inc. of reproduction in any form reserved.
be addressed. 178
nedy, 1975a). Only a single site of protein synthesis initiation has been detected on this RNA (Clegg and Kennedy, 197513; Glanville et al., 1976). The N-terminal amino acid sequences of the SIN virusspecific glycoproteins El and E2 have been determined and it has been shown that the N terminus of the SIN virus-specific core protein presumably is blocked (Bell et al., 1978). Recent progress in the methodology of protein sequencing makes it possible to determine partial amino acid sequences of proteins which are available only in nanomole quantities (Chang et al., 1978). By making use of these techniques we have studied the structure of fragments of the SIN- and SF virus-specific core proteins. The results of these experiments are presented in this report. The data obtained allow us to identify homologous peptides present in both proteins and to characterize three of the four termini of the molecules. MATERIALS
AND
METHODS
Materials and reagents. 14C-Labeled amino acid hydrolysate, obtained by acid hydrolysis of algal protein, was purchased from Amersham Buchler. Kodak X-ray
AMINO
ACID
SEQUENCES
OF VIRUS-SPECIFIC
CORE
PROTElNS
179
The opalescent virus-containing material, visible after centrifugation, was aspirated with a Pasteur pipet, diluted with TNEM, and the virus was pelleted by a further centrifugation (14 hr, 15,000 rpm, 2”, SW 27 Spinco rotor). The pelleted material was suspended in buffer containing 20 mM triethanolamine, 100 mJ4 NaCl, pH 7.4 (TN), incubated at 25” for 5 min after addition of NP 40 to a final concentration of l%, and centrifuged on sucrose density gradients (lo-30% w/w sucrose in TN, 100 min, 40,000 rpm, 2”, SW 41 Spinco rotor). The viral ribonucleoprotein cores were located by absorbance analysis at 260 nm, pooled, and recovered by centrifugation (150 min, 40,000 rpm, 2”, SW 41 Spinco rotor). Except where indicated, all steps involved in the purification of virus and viral ribonucleoprotein were done at 4”. Vims growth, and putification qf the viral Isolation of the core proteins. RNP cores RNP. BHK 21 cells were grown in roller were suspended in water at a concentration 10 mgiml for 20 min at bottles in Eagle’s minimum essential me- of approximately dium modified according to Dulbecco and room temperature, brought to 5% acetic Freeman (1959), containing 10% fetal bo- acid by dropwise addition of 10% acetic acid, vine serum. Cells were infected with SIN and kept at -25” overnight. After dilution virus (strain Sa-AR-86) or SF virus (strain to about 2.5 mgiml with 5% acetic acid solid Osterrieth), respectively, at a m.o.i. of urea was added to a final concentration of about 50 PFUicell for 45 min at 37” in 8 M and dissociation of the RNP was serum-free growth medium (Wengler et crl., allowed to proceed at room temperature for 1977). Fetal bovine serum was added to a 30 min. LiCl was then added to the solution final concentration of 10% (v/v) after the to give a final concentration of 3 M using a adsorption period. For the preparation of 14C- 10 IM LiCl stock solution and after a further incubation at 2” for 75 min the suspension labeled virus, growth medium containing was centrifuged (15 min. 15,000 rpm, 2”, glutamine as the sole unlabeled amino acid, 1% fetal bovine serum, and ‘“C-labeled amino SS 34 Sorvall rotor) in order to remove the acid hydrolysate at a concentration of 4 insoluble RNA. The clear supernatant conPCtiml was added to the cells at the end taining the core protein was dialyzed against of the adsorption period. About 10 hr after decreasing molarities of urea and acetic acid infection the medium was collected and cell until a solution of 1% acetic acid containing debris was spun down for 10 min at 2000 g. no urea was reached. Then the protein was After addition of Na,-EDTA to the super- lyophilized, redissolved in water, and stored natant to a final concentration of 4 n&‘, the at -25” until use. Trypsin treatment. Protein was dissolved virus was pelleted by centrifugation for 190 min at 19,000 rpm in the Spinco No. 19 in 0.1 M ethylmorpholineacetate pH 8.1 to a rotor. The pellet was immediately resus- final concentration of 4 mg/ml. TPCK-tryppended in buffer containing 20 mM triethasin was added in an enzyme-to-substrate nolamine, 100 m&i NaCl, 1 m&’ Na,-EDTA, ratio of 1:50 and digestion carried out at 37” 0.05% mercaptoethanol, pH 7.4 (TNEM) for 4 hr. The peptides were lyophilized and by homogenization, layered onto tartrate redissolved in water. Only minimal amounts gradients (15-40% w/w K,-tartrate) in of insoluble material were obtained. TNEM, and subjected to centrifugation (2 Fingerprint technique. About 4 nmol of hr, 30,000 rpm, 2”, SW 41 Spinco rotor). an aqueous solution of peptides was applied films XR 5 were used for autoradiography. Solvents and reagents used for analyses of amino acid composition or sequence were obtained from Pierce (Netherlands), except 4-N,N-dimethylaminoazobenzene 4’-isothiocyanate (DABITC) which was obtained from Fluka (Switzerland). TPCK-treated trypsin was purchased from Serva (Germany), and carboxypeptidases A and B, both treated with PMSF, from Worthington. Carboxypeptidase A was prepared from the crystalline suspension according to the method of Fraenkel-Conrat et al. (1955). Cellulose thin-layer plates (Cel 300, 20 x 20 cm) and polyamide sheets (F 1’700, 15 x 15 cm) were purchased from Macherey & Nagel (Duren, Germany) and Schleicher & Schiill (Dassel, Germany), respectively. All other materials and chemicals were obtained from Merck (Germany).
180
BOEGE
ET AL.
onto cellulose thin-layer plates which had by this method and therefore cannot be disbeen equilibrated in the electrophoresis tinguished. Therefore Ile/Leu is given in the chamber for 30 min. Electrophoresis was peptide sequences derived by this method carried out in the first dimension in pyriwhenever a peptide contained both amino dine:acetic acid:acetone:water (1:2:8:40, vi acids as shown by amino acid analysis. v/v/v) at 400 V and 12 ? 2 mA for 135 min. Carboxypeptidase digestion. About 2 mg The plates were dried in a hood before of protein was incubated at 37” with chromatography in butanol:acetic acid:pyricarboxypeptidase A or B in 0.2 M ethyldine:water (15:3:10:12, v/v/v/v) in the sec- morpholineacetate pH 8.5, at a concentraond dimension. The spots containing pep- tion of about 4 mg/ml and an enzyme-totides were detected by autoradiography. substrate ratio of 1:40. At different times Isolation of the tryptic peptides. Mateduring digestion lo-nmol aliquots were rerial containing peptides was scraped off from moved, quickly frozen, and lyophilized. The the thin-layer sheets with a spatula and free amino acids were determined at 0.5 extracted two times with 150 ~1 of 50% acetic maximal absorption in a Durrum D 500 acid with vigorous mixing. The cellulose was analyzer. spun down by centrifugation for 5 min in a Beckman microfuge B. The peptides were RESULTS dried under vacuum in the presence of NaOH and used for amino acid analysis or As a first step in the characterization of sequencing. the core proteins of SIN and SF virus the Amino acid analysis. For analyses of the amino acid compositions of both proteins intact proteins, 2-3 nmol of protein was were determined. The results obtained dissolved in 100 ~1 of 6 N HCl containing (Table 1) can be compared with data that 0.02% mercaptoethanol, sealed under vac- have been reported earlier for SIN virus uum, and hydrolyzed at 110” for 24 and 48 (Burke and Keegstra, 19’76; Bell et al., 1979) hr. Cysteine was determined after oxidaand SF virus (Simons and Kaariainen, 1970; tion with performic acid according to Hirs Kennedy and Burke, 1972). The data con(1956). Tryptophan was determined after cerning the protein of SF virus are in hydrolysis in 4 N methanesulfonic acid con- general agreement with those reported in taining 0.02% 3-(2-aminoethyl)indole accord- the literature cited above. The data obing to Liu and Chang (1971) and by spraying tained for the SIN virus core protein are the fingerprints with Ehrlich’s reagent. For in agreement with the results of Bell et al. analysis of peptides l-4 nmol was hydro(1979). Both proteins are rich in basic amino lyzed for 20 hr in 6 N HCl as indicated acids and proline, but obviously differ from above. All analyses were performed in a each other in amino acid composition. CysDurrum D 500 analyzer. The maximal ab- teine, for example, is present in about 1.24 sorption was 1.0 to 2.0 OD for the whole mol% in SF virus core protein but is absent proteins and 0.5 OD for the peptides. from hydrolysates of the corresponding proSequence analysis. Peptides (1-4 nmol) tein of SIN virus. Since our studies on the core proteins of were sequenced by use of the double-couboth viruses by a new very sensitive pling method with DABITC and phenylisothiocyanate (PITC) according to Chang et sequencing method which is suitable for the al. (1978) except that the conversion reac- N-terminal sequencing of proteins (Chang tion was performed in 50% trifluoroacetic et al., 1978) indicated that the N termini acid. 4-N,N-Dimethylaminoazobenzene 4’- of both proteins are blocked, we have fragthiohydantoin amino acids were identified mented both proteins by proteolytic digesby two-dimensional chromatography on tion with trypsin and studied the structure 3 x 3-cm polyamide sheets using 4-N,Nof the individual peptides. Two-dimensional dimethylaminoazobenzene 4’-thiocarbamyl maps of the tryptic peptides of the core (DABTC)-diethylamine and DABTC-ethaproteins of SIN and SF virus are presented nolamine as synthetic markers. Leucine and in Figs. IA and 2A, respectively. All pepisoleucine are not separated from each other tides identifiable on these maps were eluted
AMINO
ACID TABLE
AMINO
Asp Thr Ser Glu Pro CYS GUY Ala Val Met Ile Leu Tyr Phe His LYS Ax Trp
ACID
SEQUENCES
OF VIRUS-SPECIFIC
1
COMPOSITION OF THE CORE OF SIN AND SF VIRUSES”
PROTEINS
Sindbis virus
Semliki Forest virus
6.14 6.78” 4.85” 10.50 10.63 None’ 9.40 8.58 5.62” 3.43 2.58” 5.37 1.52 3.58 2.62 9.84 9.17 0.97’
8.46 6.40” 4.05” 9.60 8.23 1.24’ 8.56 9.15 6.68” 2.93 3.67” 3.25 2.40 1.99 2.88 14.16 5.50 1.13’
N Expressed as mole percentages. b The values were extrapolated to zero time. Determined as cysteic acid. ‘I No corrections for incomplete hydrolysis were made. Only the 4%hr value is given. ” Determined after hydrolysis in methanesulfonic acid. The values are not corrected for destruction.
and subjected to amino acid analysis and sequential N-terminal sequencing according to the method of Chang et al. (1978). In Figs. 1B and 2B all spots are indicated by numbers which are present in sufficient yield and purity for structural analyses. The data obtained are presented in Tables 2 and 3. It can be seen from these data that many pepticles have been sequenced completely in spite of the small amounts of material available from the tryptic fingerprints. Since the concentration of the individual pepticles was further reduced by nonspecific cleavages, it was not possible to determine the total sequence of all peptides. Also there was not sufficient material available to purify those peptides which were not separated on the fingerprint plates by other methods, e.g., column chromatography. Quantitative analysis of the sequences present in independ-
CORE
PROTEINS
181
ent peptides shows that 237 amino acids of SIN virus and 182 amino acids of SF virus core protein, respectively, have been sequenced. Since the molecular weight of each of the core proteins is approximately 30,000 (Strauss et al., 1968; Acheson and Tamm, 1970), which corresponds to about 270 amino acids, about 88% of the sequences present in SIN virus-specific core protein and about 67% of those present in the core protein of SF virus are comprised in these tryptic pepticle sequences. Visual inspection of the pepticle patterns presented in Figs. 1A and 2A indicates that the tryptic peptides of SIN virus and SF virus core protein, respectively, are rather different from each other. A comparison of the structural analyses of the peptides presented in Tables 2 and 3 shows that most of the peptides are present in only one of the core proteins. A small number of identical or very closely related sequence regions, however, are present in both core proteins, tryptic peptide of 11 e.g., 1 identical residues (Table 4). These peptides contain a total number of 59 identical amino acid residues, comprising about 22% of the total amino acids of each of the core proteins. From the data presented in Tables 2 and 3 it can be seen that pepticle 78 of SF virus and peptide 41 of SIN virus core protein, respectively, could not be sequenced though these peptides are generated in high yields during tryptic digestion. Pepticle 41 of SIN virus core protein contains the amino acids aspartic acid or asparagine, methionine, and arginine in equimolar amounts. Since each of the possible N-terminal amino acids given by this composition can be sequenced by Chang’s (Chang et al., 1978) method without difficulties, the inability to determine the sequence of this peptide strongly suggests that it does not contain a free N-terminal amino acid. Though no reactive amino acid was found, three complete cycles of N-terminal sequencing were performed. The residue was then hydrolyzed and subjected to amino acid analysis in order to show that the peptide was still present. Equimolar amounts of aspartic acid, methionine, and arginine were found in this analysis. These results indicate that the pepticle 41 derived from the core protein
ROEGEETAL.
182
m
AMINO
ACID
SEQIJENCES
OF VIRUS-SPECIFIC’
COKE PROTEINS
J I i
m
184
BOEGE
ET AL.
TABLE
2
STRUCTURALDATAOFTHETRYPTICPEPTIDESOF spot No.
Comments
(Lys),
Arg Pro-Arg His-Val-Lys“ Gly-Val-Gly-Gly-Arg Gln-Ala-Ala-Pro-Met-Pro-Ala-Arg Leu-Glu-Ala-Asp-Arg
lla”
ifj-(
llb
Val-Ile-Gly-Gln-Ala-
Lys-Pro-Lys’ LYS Arg-
Arg
g)-(
llc 12 13, I p 13,,,“,”
[‘y)-/Eiji-Pro-Arg
14 15 16 17a 1’7b 18
Thr-Ile-Pro-Arg-Lys’ Leu- Lys’ Met-Ala-Leu-Lys”-Lys’ Phe-Thr-Ile-Pro-Arg Arg-Pro-Phe-Pro-(Thr,-, Pro, Ala, Met,Leu-Phe-Asp-Val-Lys’ Leu-Gly-Ala-t?) Asn-Val-Ile-(?) Asn-Glu-Val-Gly-Asp-Val-Ile-Gly-Lys’ Gly-Lys’ Gln-Pro-Pro-(Pro, Lys,) Gln-Pro-(Ser,-, Gly,~ I Ala, Lys,+,) Gln- Lys’ Thr-Gln-Glu-Lys’ Val-Met-Lys-Pro-Leu-His-Val-Lys’ Gln-Lys”-Lys’ His-Gln-(?) Gln-Pro-Ala-Pro-(Ser,. , Gly,+, Lys,) Lys-Gin-Pro-Ala-(Pro2 Gly, Lys,) Ala-Leu-Ala-Met-Glu-Gly-Lys’ Gly-Asp-Ser-Gly-Arg-Pro-Ile-Met-Asp-Asn-Ser-Gly-Arg Ser- Lys’
42’
Part
of 28
Part
of 49,2,
k?i-Arg-(Lys,)
Thr-Arg-Pro-Gln-Pro-Pro-(Arg-Pro-Arg-Pro-Pro-Arg)U Pro-Gly-(Ser, Glx, Arg, Lys,. ,) Gly-Thr-Ile/Leu-Asp-His-Pro-Val-IleiLeu-Ser-Lys’ Phe-Thr-Lys’
20Q,l’ 21 23 24a 24b 25 26 28 29 30 31 32 33 34 36 37 38 39 40 41
VIRUSCORE PROTEIN"
Structure”
1 2 41,” 3<2it’ 4 5 6 8 9 10
20,,,”
SIN
I)
Includes 6 25 + 3,,,
(LYSL Asn-Ser-Lys’ Thr-Thr-Pro-Glu-Gly-Thr-Glu-Glu-Trp’ Thr-Ser-Glu-His-(Glx,+, (Asx, Met, Arg,) (Thr,
Pro,
Ser, Glx,
Gly,
Pro,
Tyr,
Gly,
Phe,)
Ala,
Met,
Ile, Leu,
His,
Lys,
Arg)
Similar Similar
to 32 to 31
Includes
55
AMINO
ACID
SEQUENCES
OF VIRUS-SPECIFIC TABLE
CORE
185
PROTEINS
Z-Continued
spot NO.
43 44 4.5 46 47 48 49,, )r’ a,, 50 54 55 61 76
Comments
Structure” Ser-Glu-Ala-Phe-Thr-Tyr Ser-Gly-Ala-Tyr-Asp-Met-Glu-Phe-Ala-Gln-Leu-Pro-Arg Gly-Ala-Leu-Ser-Val-(Thr, Phe,) Asn-Gly-Val-Ala-Ser-(Thr, Glx, Leu,) Ser-(?) Asn-Met-Leu-Gly-Arg Asn-Trp-His-His-Gly-Ala-Val-Gln-(Tyr,) Arg-Pro-Phe-Pro-Ala-Pro-(?) Thr-IleiLeu-Pro-Arg-(Ser, Glx, Pro, Gly, Ala, Gly-Gly-Ala-Asp-Glu-Gly-Thr-Arg Pro-Ile-Met-Asp-(Asx, Ser, Gly, Arg,) Gly-Asp-(Asx,,+, Ser,+, Gly,-, His, Arg,) Thr-Ala-IleiLeu-Ser-Val-Val-(Thr, A!a, Val,
Includes Val,
IleiLeu,
Arg,,+,
Part IleiLeu,
17b
) of 34
Leu,)
‘I Peptides were eluted from peptide maps similar to that shown in Fig. 1A. The number of the peptides to the numbers in Fig. 1B. h All peptides were subjected to amino acid analysis and to at least two cycles of N-terminal sequence analysis according to Chang et ~2. (1978). In cases of those peptides which were not sequenced completely amino acids which were detected by amino acid analysis only are indicated in parentheses. ” C-Terminal lysine is deduced from the amino acid composition. ” The spot contains two to three peptides in significantly different relative concentrations. Therefore the amino acids detected after each sequencing cycle could be correlated to the two peptides. f The spot contains two to three peptides in similar concentrations. Association with certain peptides was not possible. ‘The presence of tryptophan is deduced from the reaction of the spot after spraying of tryptic peptide maps with Ehrlich’s reagent. ” The sequence in parentheses is uncertain because of the accumulation of proline residues. refers
of SIN virus contains the blocked N terminus of this protein and has the structure b (Asx, Met) Arg, b indicating the N-terminal blocking group, the nature of which is presently under investigation. In the same way peptide 78 of SF virus core protein was investigated; the finding that all amino acids of this peptide are still present in the same relative concentrations after eight sequencing cycles again indicates that peptide 78 represents the blocked N terminus of SF virus core protein. SIN virus core protein was available in amounts sufficient to characterize the Cterminal sequence, too. In order to identify the amino acids located at or near the C terminus, the protein was incubated with either carboxypeptidase A or B. The finding that no amino acid was released during 24 hr digestion with carboxypeptidase A (data not shown) indicates that either arginine or lysine is the C-terminal amino acid. The
results of analyses of amino acids released during incubation with carboxypeptidase B are presented in Fig. 3. It can be seen that phenylalanine and arginine are released quickly, followed by leucine, lysine, and methionine. The fact that the molar ratio of arginine released per mole of core protein increases to a value of 1.4 after release of leucine, lysine, and methionine indicates that a second arginine residue is located at the C-terminal region. The release of amino acids other than arginine and lysine by carboxypeptidase B is due to the intrinsic A-type activity associated with the B enzymes from different sources (Ambler, (1972). Since phenylalanine and arginine are both released in nearly equimolar amounts within the first minutes their correct sequence cannot be derived from the data presented in Fig. 3 with certainty. The finding that no amino acid is released by carboxypeptidase A indicates that the C-
BOEGE
186
ET AL.
TABLE
STRUCTURALDATAONTHE spot No. 1 2 4a 4b 5 7 8
3
TRYPTICPEPTIDESOFSF
Structure”
Comments
(LYSL Lys-Lys LYS Gly-Arg-(Lys, Arg,) Arg Val-Met-Lys-Pro-Ala-His-Val-Lys’ He - Lys’
Same as 18, I )
11 12 13 14 15,,,” 15,?,” 16a,,,” 16a,*,” 16b 17 a,,”
Thr-Met-Arg Tyr-Gly-Arg Tyr-Thr-(Asx, GlxR Pro, Gly, Ala, Val, Tyr, Tyr-Thr-(Glx, Pro, Gly, Tyr, His, Lys,) Thr-Gln-Pro-Lys’ Ser-Gly-Gly-Arg Gln-Asn-Ala-Ile-Ala-Pro-Ala-Arg-Pro-Pro-Lys(?)-Pro-Lys’ Ser-Gly-(Glyj-Arg Ser-Ser-Lys” Gly-Arg Val-Met-(Pro, Ala, Val, His, Lys,)
18,,,” 19 20 21 22 23 24 26 27 28 29 30 31,,,” 31,,,” 32 33 34 35
Gly- Arg Gly- Lys’ Lys-Pro-Gly-Lys’ Thr-Thr-Lys-(Pro2 Lys,) Lys-Thr-(Pro, Lys, 1 Lys-Lys-Pro-Gly-Lys’ Lys-Asp-Lys” Asn- Lys’ His-Glu-Gly-Lys” Thr-Gln-Gln-Gln-Lys’ Gln-Ala-Asp-(Lys,-:,I Asp-(?) Gly-Gly-Ala-Asn-Glu-Gly-Ser-Arg Gln-Ala-Asp-Lys’ Ser-Asp-Ala-Ser-Lys” Asp-Lys’ Glu-Arg Asp-Met-Val-Thr-Arg
36a 37
Phe-Thr-(Asx, Thr,,-, Ser,,+, Pros Gly:, Ala, Ile, Lys,) Phe-Thr-Ile-Pro-Thr-Gly-Ala-Gly-I~ys-Pro-Gly-Asp-Ser-Gly-ArgPro-Ile-Phe-Asp-Asn-Pro-(Lys,) Leu-Ala-Phe-(Lys+,) Asn-Trp-His-His-Gly-Ala-Val-(Glx, Tyr,) Asp-Met-Val-Thr-Arg Gly-Val-(Asx,, Alap Ile, Leu, Lys,) Gly-Val-IleiLeu-Asp-Asn-Ala-Asp-IleiLeu-Ala-Lys’
38a 39 41 44 45
VIRUSCORE PROTEIN"
4 His,
Lys,
Trp’)
14 + 39 Part of 13
Same as 18,,, Same as 7 (Met oxidized) Same as 17 Part
of 23
20 + 4a 33 + 4a
:I1121 t -la
Part Part
of 47 of 29
Part
of 24
Same as 41 (Met oxidized)
Part of 13 Same as. 95 SC L Same as 43
AMINO
ACID
SEQUENCES
OF VIRUS-SPECIFIC TABLE
Ls pot NO. 46 47 48 50 52 58 59 61 63 77 78
CORE
187
PROTEINS
3--Conti?~&
Comments
Structure” Val-Thr-Pro-Glu-Gly-Ser-Glu-Glu-Trp’ Val-Val-Ala-Ile/Leu-Val-Ile/Leu-Gly-Gly-Ala-Asn-Glu-Gly-Ser-Arg Pro-Ala-Ala-Arg-Pro-(Pro, Leu, Trpi) Pro-Ala-(Glx, Val, Leu,+,) Pro-Ala-(Glx, Val,-, Leu,,+, ) Val-Gly-(Asx, Lys,) Pro-Gly-(Asx, Ser, Ile, Lys,) Gln-Ala-(?) Val-Thr-(?) Thr-Ala-IleiLeu-Ser-Val-Val-(Thr, Ala, (Asx, Thr, Glx, Pro, Met, Be, Tyr, Phe,)
Val,
IleiLeu,
Includes
31,, ,
Similar
t 0 .52
Leu,)
cl Peptides were eluted from peptide maps similar to that shown in Fig. 2A. The number of the peptides to the numbers in Fig. 2B. ’ All peptides were subjected to amino acid analysis and to at least two cycles of N-terminal sequence analysis according to Chang et al. (1978). In case of those peptides which were not sequenced completely amino acids which were detected by amino acid analysis only are indicated in parentheses. ” C-Terminal lysine is deduced from the amino acid composition. ” The spot contains two peptides in signifiantly different relative concentrations. Therefore the amino acids detected after each sequencing cycle could be associated with both peptides. ” The spot contains two peptides in similar concentrations. Association with certain peptides was not possible. ’ The presence of tryptophan is deduced from the reaction of the spot after spraymg of trgptic peptide maps with Ehrlich’s reagent. refers
terminal sequence of the SIN virus-specific core protein is -Phe-Arg. Probably this terminus is preceded by a sequence containing one of each of the four amino acids leutine, lysine, methionine, and arginine, but since the latter amino acids are released rather slowly it cannot be excluded that one or more of these amino acids are generated from internal breaks introduced into the substrate during carboxypeptidase digestion. Final proof for the C-terminal sequence suggested above must come from sequence analysis of the C-terminal peptide. DISCUSSION
The sequence studies of tryptic peptides presented above show that some sequence homologies exist between the core proteins of SF and SIN viruses, respectively. It has been estimated by quantitative analysis of the data presented in Table 4 that about 22% of a total of 270 amino acid residues present in the core proteins of SF and SIN viruses were already found in the homolo-
gous sequence areas. The exact percentage of amino acids present in homologous regions of both proteins could be expected to be higher, since the total sequence content of the peptides analyzed comprises only 67 to 88% of the sequences present in these proteins and since during tryptic digestion homologous regions have been fragmented and therefore could not be recognized. Some of the homologous sequences identified probably contain the determinants responsible for the immunological cross-reactions existing between both proteins (Dalrymple et nl., 1973). It is interesting to note that many of these sequences have specific properties, as for example the identical peptides SF 77 and SIN 76 which are highly hydrophobic (Figs. IA, 2A) or the closely related peptides SF 46 and SIN 39 both of which are enriched in the amino acid glutamic acid (Tables 2 and 3). Further sequence work will allow the localization of the homologous sequences within the primary sequence of both core proteins; this might be helpful for the deter-
188
BOEGE
ET AL.
TABLE SEQUENCE
HOMOLOGIES
OF THE TRYPTIC
PEPTIDES
Spot No.
4 OF SF VIRUS
AND SIN
VIRUS
CORE PROTEINS”
Sequence
SF 1 SIN 28 SF 15,,, SIN 26 SF 15,,, SIN X .SF 16a SIN 9 SF SIN SIN SIN
31 17a 8 34
SF 39 SIN 49,,,
As”-Trp-His-His-Gly-Ala-Val-(Glx, Asn-Trp-His~HissGly-Ala-Val-Gin
Tyr,) (Tyr,)
Thr~Ala-Ile/l,eu-Ser-Val-Val-(Thr, Thr~Ala~IleiLeu-Ser-Val-Val-(Thr,
Ala, Val, IleiLeu, Ala, Val, IleiLeu,
SF 45 SIN 12 SF 41 SIN Ilb SF 11 SIN 76
Leu, 1 Leu,)
a The peptides presented m Tableb 2 and 3 have been compared for identlcal sequences. Considered were sequence areas with three or more identical residues: further single amino acid replacements were included to Indicate homologies. For quantitative analysis only Identical amino acids were considered; in case of sequences homologous to two different peptides the greatest homology was considered for quantitation.
mination of the three-dimensional structure of these proteins and in turn for the evaluation of the role of the homologous sequences in virus replication and assembly. The analyses described above have led to the detection of the peptides which probably contain the blocked N-terminal amino acid sequence of the core protein. The peptides derived from the SF virus and the SIN virusspecific core protein have the structures b (Asx, Thr, Glx, Pro, Met, Ile, Tyr, Phe,) and b (Asx, Met, Arg,), respectively. Further information on the exact structure of these peptides can be obtained by analyses of fragments generated from these peptides. Such experiments are currently being undertaken in our laboratories. Analyses of the nucleotide sequences present on the 26 S RNA molecules coding for the alphavirus-specific structural proteins will be of great importance for the determination of the primary structure of these proteins, but since these proteins are generated from a single precursor molecule
::
6LEU . LYS ‘MET
a e z
0
LO
&I
120
160
TIME(MINUTESI
FIG. 3. Time course of release of amino acids from SIN virus-specific core protein by carboxypeptidase B. The conditions of enzyme treatment and the technique used for amino acid analysis are described under Materials and Methods. The data given in this figure have been corrected for the amount of free amino acids generated during incubation of enzyme in the absence of substrate.
AMINO
ACID
SEQUENCES
OF VIRUS-SPECIFIC
by proteolytic cleavage (see Ktiriainen and Soderlund, 19’78 for a review) their terminal sequences can only be determined by direct studies of the proteins. Recently a protein of about 9800 molecular weight secreted into the growth medium by SIN virusinfected cells has been described by Welch and Sefton (19’79); this protein probably corresponds to the structural glycoprotein E3 present in SF virus particles. No analyses of the terminal sequences of the three viral glycoproteins El, E2, and E3 of SF virus and of the C-terminal sequence of the corresponding core protein have been published up to now. As far as the structural proteins of SIN virus are concerned, the terminal sequences of the 9800 molecular weight protein described above and the C-terminal sequences of the two viral structural glycoproteins El and E2 remain to be characterized. ACKNOWLEDGMENTS
We thank Dr. R. Rott for his encouragement and support during this work. This study was supported by the Sonderforschungsbereich 47 (Virologie). REFERENCES ACHESON, N. H., and TAMM, I. (1967). Replication of Srmliki Forest virus: An electron microscopic study. Virology 32, 123- 143. AMBLER, R. P. (1972). Enzymatic hydrolysis with carboxypeptidases. In “Methods in Enzymology” (C. H. W. Hirs, ed.), Vol. 25, pp. 143-154. Academic Press, New YorkLondon. BELL, J. R., HLJNKAPILLER, M. W., HOOD, L. E., and STRAUSS, J. H. (1978). Amino-terminal sequence analysis of the structural proteins of Sindbis virus. Proc. Nat. Acad. Sci. IlSA 75, 2722-2726. BELL, J. R., STRAUSS, E. G., and STRAUSS, J. H. (1979). Purification and amino acid compositions of the structural proteins of Sindbis virus. Virology 97, 287-294.
BURKE, D. J., and KEEGSTRA, K. (1976). Purification and composition of the proteins from Sindbis virus grown in chick and BHK cells. J. Viral. 20,676-686. CAXCEDDA, R., and SCHLESINGER, M. J. (1974). Formation of Sindbis virus capsid protein in mammalian cell-free extracts programmed with viral messenger RNA. Proc. Nat. Acad. Sci. USA 71, 1843-1847. CHANG, J. Y., BRAUER, D., and WITTMANN-LIEBOLD, B. (1978). Micro-sequence analysis of peptides and proteins using 4-NN-dimethylaminoazobenzene 4’.
CORE
PROTEINS
189
isothiocyanateiphenylisothiocyanate double coupling method. FEBS Lett. 93, 205-214. CHENG, P. Y. (1958). Infectivity of ribonucleic acid from mouse brains infected with Semliki Forest virus. Nature (London) 181, 1800. CLEGG, C., and KENNEDY, I. (1975a). Translation of Semliki-Forest-virus intracellular 26-S RNA: Characterisation of the products synthesized in &ro. Eur. J. Biochern. 53, 175-184. CLEGG, J. C. S., and KENNEDY, S. I. T. (1975b). Initiation of synthesis of the structural proteins of Semliki Forest virus. d. Mol. Biol. 97, 401-411. DALRYMPLE, J. M., VOGEL, S. N., TERAMOTO, A. Y., and RUSSELL, P. K. (1973). Antigenic components of group A arbovirus virions. J. Viral. 12,1034- 1042. DULBECCO, R., and FREEMAN, G. (1959). Plaque purification by the polyoma virus. Virology 8, 396-397. FRAENKEL-CONRAT, H., HARRIS, J. I., and LEVY, A. L. (1955). Recent developments in techniques for terminal and sequence studies in peptides and proteins. Methods Biochem. Anal. 2, 359-425. GAROFF, H., SIMONS, K., and RENKONEN, 0. (1974). Isolation and characterisation of the membrane proteins of Semliki Forest virus. Virology 61,493-504. GLANVILLE, N., RANKI, M., MORSER, J., K.&.&RI#INEN, I,., and SMITH, A. E. (1976). Initiation of translation directed by 42s and 26s RNAs from Semliki Forest virus in vitro. Proc. Nat. Acad. Sci. USA
73, 3059-3063.
HIRS, C. H. W. (1956). The oxidation of ribonuclease with performic acid. d. Biol. Chew 219, 611-621. K~~RI~INEN, L., and S~DERLUND, H. (1978). Structure and replication of u-viruses. In “Current Topics in Microbiology and Immunology” (W. Arber et al., eds.), Vol. 82, pp. 15-69. Springer-Verlag, Berlin/ Heidelberg/New York. KENNEDY, S. I. T., and BURKE, D. C. (1972). Studies on the structural proteins of Semliki Forest virus. J. Gen. Viral. 14, 87-98. LIU, T. -Y., and CHANG, Y. H. (1971). Hydrolysis of proteins with p-toluenesulfonic acid. J. Biol. Chem. 246, 2842-2848. RICHTER, A., and WECKER, E. (1963). The reaction of EEE virus preparations with sodium desoxycholate. Virology 20, 263-268. SCHLESINGER, M. J., SCHLESINC:ER, S., and BURGE, B. W. (1972). Identification of a second glycoprotein in Sindbis virus. Virology 47, 539-541. SIMMONS, D. T., and STRAUSS, J. H. (1974). Translation of Sindbis virus 26 S RNA and 49 S RNA in lysates of rabbit reticulocytes. J. Mol. Biol. 86, 397409.
SIMONS, K., and K~~RI~INEN, L. (1970). Characterization of the Semliki Forest virus core and envelope protein. Biochem. Biophys. Res. Commun. 38, 981-988. STRAUSS, J. H., BURGE, B. W., PFEFFERKORN, E. R.,
190
BOEGE
and DARNELL, J. E. (1968). Identification of the membrane protein and “core” protein of Sindbis virus. Proc. Nat. Acad. Sci. TJSA 59, 533-537. WELCH, W. J., and SEFTON. B. M. (1979). Two small virus-specific polypeptides are produced during infection with Sindbis virus. J. Viral. 29, 1186-1195. WENGLER, G., BEATO, M., and HACKEMACK, B.-A.
ET AL. (1974). Translation of 26s virus-specific RNA from Semliki Forest virus-infected cells in vitro. Virology 61, 120-128. WENGLER, G., WENGLER, G., and FILIPE, A. R. (1977). A study of nucleotide sequence homology between the nucleic acids of different alpha-viruses. Virology 78, 124- 134.