VIROLOGY
174,399~409
(1990)
Identification of Conserved Domains in the Cell Attachment Proteins of the Three Serotypes of Reovirus ROY DUNCAN,* DUFF HORNE,* L. WILLIAM CASHDOLlAR,t WOLFGANG K. JOKLlK,+ AND PATRICK W. K. LEE**’ *Department
of Microbiology and Infectious Diseases, University of Calgary Health Sciences Centre, Calgary, Alberta, Canada TZN 4N 1, tDepartment of Microbiology, Medical College of Wisconsin, Milwaukee, Wisconsin 53226; and *Department of Microbiology and immunology, Duke University Medical Center, Durham, North Carolina 277 10 ReceivedJuly
25, 1989; accepted October 6, 1989
Sequence analysis of reovirus serotype 1 (STl) and 2 (ST2) Sl genome segment cDNAs identified several differences from previously reported versions of their sequences. The sequences reported here comprise 1463 and 1440 base pairs, respectively; for comparison, the ST3 Sl genome segment is 1416 nucleotides long. The serotype 1 and 2 ~1 proteins are predicted to contain 470 and 462 amino acids, respectively; the ST3 ~1 protein is 455 amino acids long. As previously observed, the ST1 and ST2 ~1 proteins are much more closely related to each other than to that of ST3 (about 48 and 25% similarity, respectively, using a computer program that finds about 14% similarity among unrelated proteins). The sequences of the three Sl genome segments have diverged very extensively in all three codon positions, in some cases almost to the extent of randomness. Despite this, not only function but also shape and configuration have been retained (since the three ~1 proteins can be incorporated efficiently into completely heterologous capsids). Seventy-nine amino acid residues are conserved among all three serotypes, many of them clustered into five regions in which one-third or more of the residues are triply conserved. These regions may represent functionally conserved domains involved in oligomerization, cell attachment, and hemagglutination. o 199OAcademic PWSS. IK.
INTRODUCTION
and hemagglutination functions are controlled by different regions (Burstin et a/., 1982; Spriggs et a/., 1983; Nagata et al., 1987). Although the exact nature of the interactions between protein al and receptors on host cells and etythrocytes is presently unknown, it is clear that carbohydrate moieties on receptor molecules are important recognition signals in both cases (Armstrong et al., 1984; Gentsch and Pacitti, 1987; Pacitti and Gentsch, 1987). The al protein is located at the vertices of the viral isosahedron (Lee eta/., 1981) and exists in virus particles in tetrameric form (Bassel-Duby et a/., 1987; Banerjea et al., 1988). Analysis of its sequence suggests that its amino-terminal portion exists in the form of an a-helical, coiled-coil structure which may be involved in anchoring the protein in the virus particle, while its carboxy-terminal portion is globular and harbors the domains involved in receptor recognition (BasselDuby et a/., 1985). These predictions have been confirmed by electron microscopic studies of purified protein al (Furlong et a/., 1988; Banerjea et al., 1988) and by receptor binding studies using truncated al proteins and tryptic fragments of al (Nagata et a/., 1987; Yeung et al., 1989). As a first step toward understanding the molecular basis of protein al-cell recognition, the Sl genome segments of all three reovirus serotypes have been
Mammalian reoviruses are classified into three serotypes (ST1 ,‘ST2, and ST3) based on neutralization and hemagglutination tests (Rosen, 1960). Genetic reassortment analysis revealed that type-specificity is determined by the Sl genome segment which encodes the minor outer capsid protein al (Weiner and Fields, 1977). Besides eliciting the formation of neutralizing antibodies and inducing cell-mediated immune responses (Finberg et al., 1979; Fontana and Weiner, 1980; Weiner et al., 1980), protein al also defines tissue tropism (Weiner et a/., 1977), a function explained by the fact that it is the cell attachment protein (Lee et a/., 1981). It is noteworthy, in this regard, that although the three serotypes manifest different tissue tropisms in whole animals, they apparently bind to the same receptors on mouse L fibroblasts (Lee et al., 1981). Protein al is also the reovirus hemagglutinin since it also interacts with erythrocyte-surface receptors (Weiner et a/., 1978; Paul and Lee, 1987; Yeung et al., 1987). Epitope mapping and direct binding studies using truncated al proteins have shown that the host cell binding ’ To whom requests for reprints should be addressed. ’ Abbreviations used: ST, serotype; SDS, sodium dodecyl sulfate; PAGE, polyacrylamide gel electrophoresis; LORF, long open reading frame; SORF, short open reading frame. 399
0042-6822/90
$3.00
CopyrIght 0 1990 by Academic Press. Inc. Ail rights of reproduction an any form reserved.
400
DUNCAN
cloned and sequenced (Nagata et a/., 1984; BasselDuby et a/., 1985; Cashdollar et a/., 1985). The results indicate that the ST3 Sl gene is 1416 nucleotides long and contains a long open reading frame (LORF) of 1365 nucleotides which encodes the 455 amino acid long al protein. A second short overlapping ORF of 360 nucleotides encodes the 120 amino acid long al S protein (also referred to as al bNS or ~14) which has been detected not only among the translation products of melted Sl genome segments in in vitro protein synthesizing systems but also in reovirus-infected cells (Ernst and Shatkin, 1985; Jacobs et a/., 1985; Sarkar et al., 1985). The ST1 Sl genome segment has been reported to be 1458 or 1462 nucleotides long and to possess an LORF of 1254 nucleotides (418 codons) and an SORF of 357 nucleotides (Cashdollar et al., 1985; Munemitsu eta/., 1986). The ST2 Sl genome segment has been reported to be 1442 nucleotides long with an LORF of 1197 nucleotides (399 codons) and an SORF of 375 nucleotides (Cashdollar et al., 1985). A curious feature of these sequences is that while the ST3 Sl genome segment maximizes its coding potential with a 5’-noncoding region of 12 nucleotides and a 3’-noncoding region of only 36 nucleotides, the ST1 and ST2 Sl genome segments appear to contain relatively long 3’-noncoding regions (188 and 229 nucleotides, respectively). Thus, the predicted molecular weights of the ST1 and ST2 ~1 proteins are smaller than that of the ST3 al protein, which does not agree with their relative molecular weight estimates using SDS-PAGE, where the ST3 al protein migrates faster than either the ST1 or the ST2 al protein (Lee et al., 1981). A second curious feature involves the distribution of conserved sequences among the three genome segments. The largest stretch of such similarity occurs in a 97-nucleotide region located near the Zterrninus of the plus strands of all three genome segments (Cashdollar et al., 1985). The significance of this similarity is unclear, since in the case of the ST1 and ST2 genome segments this region resides in the 3’noncoding region. In view of these incongruities, we undertook to resequence cDNA clones of ST1 and ST2 genome segments. We report here the sequences that were obtained. The LORFs of the ST1 and ST2 genome segments were found to be 1410 and 1386 nucleotides long, encoding proteins 470 and 462 amino acids long, respectively. These results reconcile the above-mentioned anomalies. We have also analyzed the evolutionary divergence patterns of the three Sl genome segments with respect to nucleotide replacement by codon position and compared this pattern with that of several other reovirus genome segments. It has also been possible to identify several rather well conserved domains that are present in all three ~1 proteins.
ET AL
MATERIALS AND METHODS Template DNA The full-length ST2 Sl cDNA clone in pBR322 was the same as that previously sequenced (Cashdollar et a/., 1985). It was cleaved with Bglll and BarnHI (at nucleotides 45 and 1403, respectively) and subcloned in both orientations into M 13 mp18. The DNA was sequenced in both directions using the universal sequencing primer (United States Biochemical Corp.) and eight synthetic primers (four primers for each orientation). The primers were synthesized on an Applied Biosystems 380A DNA synthesizer and purified by gel electrophoresis. All sites complementary to primers were sequenced from the opposite strand. The ST1 Sl cDNA sequenced was a hybrid clone. ST1 Sl-specific cDNA was synthesized using a synthetic primer corresponding to nucleotides 14 to 28 and containing an EcoRl site at the 5’-terminus. This procedure was intended to generate cDNA suitable for direct subcloning into a prokaryotic expression vector. The several recombinants that were produced were found to be 3’-terminally truncated. In order to produce a full-length recombinant, the 5’-terminal region of one such clone that extended to the Xhol site at position 253 was ligated to the fragment of the clone previously sequenced (Cashdollar eta/., 1985) that extended from this Xhol site to the Pstl site at its 3’terminus. The cDNA was then subcloned into M 13 mp18 and mpl9 for sequencing. The 13 5’-terminal nucleotides of the ST1 Sl genome segment were not present in this recombinant and sequence analysis revealed that the 16 3’-terminal nucleotides were also absent. The sequence of this hybrid ST1 Sl genome segment was determined from a series of subclones generated by standard subcloning procedures.
Dideoxy sequencing Single-stranded, recombinant phage DNA was purified from 1.5-ml cultures grown in fscherichia co/i strain DH5& (Bethesda Research Laboratories), using phenol extraction and ethanol precipitation according to standard procedures (Messing, 1983). The single-stranded DNA was used as template for DNA sequencing using the dideoxy chain termination procedure and modified T7 DNA polymerase (United States Biochemical Corp.) according to published procedures (Tabor and Richardson, 1987). The DNA was labeled with [35S]dATP (400 CVmmol; Amersham) and fractionated in 33 cm X 40 cm X 0.4 mm 6% acrylamide-8 M urea buffer gradient gels (Biggin et a/., 1983) or in standard 4% acrylamide-8 M urea gels. Every base was identified by sequencing in both directions.
REOVIRUS PROTEIN ul CONSERVED m P *
5’ I I
200
E e ‘5
-L 9
$
I
I
L
I
400
a
-szp WV)
600
800
I
ii
1
,
1000
1200
3’
1400
.
T2 E $ 5’1 ’ I
I I 200
*
.
4bo
600
800
. *
. *
(
I
1000
1200
3’
1400
. *
.
FIG. 1. The sequencing strategy for determining the sequences of the cDNA versions of the ST1 and ST2 Sl genome segments. The arrows indicate the direction and the length of the sequences determined from a series of subclones of ST1 Sl cDNA or generated using a series of synthetic oligonucleotide primers for ST2 Sl cDNA. The restriction sites that were used are indicated. For further details see Materials and Methods.
Computer
analysis
The cDNA sequences were analyzed using the University of Wisconsin Genetic Computer Group programs (Devereux et al., 1984) and the Pustell DNA sequence analysis programs (International Biotechnologies Incorporated (Pustell and Kafatos, 1984)). RESULTS Sequence analysis of the Sl genome segments STl, ST2, and ST3
of
The sequencing strategies are shown in Fig. 1. As described under Materials and Methods, the 5’-terminal 13 nucleotides of the ST1 Sl genome segment (up to the initiation codon of the al coding region) and the 3’-terminal 16 nucleotides were not present in the clone sequenced in this study, but were determined previously (Li et al., 1980; Cashdollar et al,, 1985). Similarly, the 5’-terminal45 nucleotides (up to the Bglll site) and the 3’-terminal 32 nucleotides (from the BarnHI site) of the ST2 Sl genome segment were determined previously (Cashdollar et al., 1985) and were not sequenced in this study. The sequences of the ST1 and ST2 Sl genome segments are presented in Fig. 2, along with the previously published ST3 Sl sequence (Nagata et a/., 1984; Bassel-Duby et a/., 1985; Cashdollar et al., 1985). The genome segments are 1463,1440, and 1416 nucleotides long, respectively. They are aligned so as to maximize amino acid identities in a// three al proteins as de-
401
DOMAINS
scribed in the legend to Fig. 3. As reported previously, all three Sl genome segments are bicistronic and contain an LORF that encodes the ~1 protein, as well as an overlapping SORF that encodes the ~1 S protein (Nagata et al., 1984; Cashdollar et a/., 1985; Bassel-Duby et al., 1985). The sequences of the ST1 and ST2 genome segments presented here differ in several aspects from those reported previously (Cashdollar et a/., 1985; Munemitsu eta/., 1986). For the ST1 Sl genome segment the differences are there is a U instead of a G at position 285 (unless otherwise indicated, the numbers indicate the nucleotide position of the originally reported sequences (Cashdollar eta/., 1985)) which results in a g/y to val substitution in the al coding region and a val to i/e substitution in the ~1s coding region; there is a C instead of a U at position 519 which results in a phe to ser change; there is a U following position 1240 and a C following position 1312, which results in two frame shifts in the al coding region which now extends to a termination codon at position 1423 (new number), so that the LORF is 470 codons long; and there is a triplet CCC following position 1436 in the 3’-noncoding region. The Cat position 519, the addition of the C following position 1312, and the addition of the CCC following position 1436 had also been noted by Munemitsu eta/. (1986), who also reported a G to A change at position 1277. For the ST2 Sl genome segment two differences were found: absence of a G following position 1120 and absence of another G following position 1257 (both original numbers), which result in frame shifts so that the coding region extends to a termination codon at position 1399 (new number) thereby causing the ST2 ~1 LORF to be 462 codons long. As previously reported and discussed in some detail (Cashdollar et al,, 1985), the ST1 and 2 St genome segments are much more closely related to each other (579/o matches) than to the ST3 Sl genome segment (37 and 39% matches, respectively) (Table 1). These numbers apply to the sequences aligned for maximizing triply conserved amino acid residues as shown in Fig. 3. Corrected for random coincidence (259/o), the true similarity percentages are 32, 12, and 14, respectively; that is, the Sl genome segments have diverged 68, 88, and 86% respectively, toward randomness. Comparison of the al protein sequences three reovirus serotypes
of the
The amino acid sequences of the three al proteins, aligned for maximizing triple identities using the algorithm of Needleman and Wunsch (1970), are presented in Fig. 3. The GenBank Similarity program for optimally
402 TYPE
DUNCAN 1
GCUAVVCGCGCCU
13
TYPE 2
GCUAVVCGCACUC
13
TYPE 3
GCUAVUGGV CGG
12
TYPE
1
E
TYPE
2
TYPE 3
TYPE
1
TYPE
2
E
GA”
GCA UCU CUC AU”
G
VCG GA”
GA”
CCU CGC CUR CG”
C”A
GAA A”C GAG A”C
GAG GAA A””
ACA GAG AUA CGG AAA AVA GUA WC
G”G CAG CVC AUA AGA AGG GAG AUC “VA
AAG AM
CAA GVC CAG GUC AAC G””
AAG AAA CAA AVU AAA GAC AUC UC”
AGG G”C “CG GCG CUC GAG AAG ACG “C”
TYPE
1
GC” GAC A”C
TYPE
2
GG” GGA U”A
UC”
TYPE 3
AUC GC” C””
GAG CM
C”G ““A
GGG AAU GGA GAA “CA
ACG AGV GAV AAU GGA GCA “CA
GA” AUC AGG GC”
GCC AA”
A”V
AAA GAA AVC GAG
100
GCC AAC “CG AAA CAC CVG “CA
91
AAA GGG CVU GAA “CA
AAA CVC GAC GGA CV”
102
GGA AGA CAG AUU
181
GCU GAU GUC AAC AGG AVC AGV AAC AVC GVU GAU UCA AUC CAA GGA CAA CUG
181
CAC UCU GAU ACU AUC CUC CGG AUC ACC CAG GGA CUC GAU GAU GCA MC
AAA CGA A”C
192
ACC A””
GAG UCA AGA VUG GG” GAG AVG GAU AAU CGA CV”
GUG GG” A”C
“CG AGU CAG GVC ACG CAA
211
GVA CGC GUG “CA
GCC A””
GAA “CG GGA GUU AGV GAG AAC GGC AAU CGA A””
GA”
GAG CGA GA”
GGC
271
WCC AGA UUG GAA AGC VCU AUC GGA GCC
282
UUA UC”
TYPE 2
AVA “CG GC” AGC G””
TYPE
AGC A”C
CAA A”A
GA”
AC”
“CA
TYPE 1
3
CAA CVA VCU GUA UCA AGC AAU GGC UCC CAG “CA
GAA GAA GUA GVA CGG CUG AUA AUC GCA ““A
TYPE 3
AGC AA”
ET At
AG”
CGG GAU GAC UUG GUV GCA VCA GUC AGV GA”
AAC UCA GVU AGC CAG AAC ACU CAG AGC AUA UCC “CA AGC GGA AUC GAU VCG CGU WA
GCV CAA CUU GCA AK
VUG GGV GAC AGA AK
AGA C”C
G”C “CC
AAU GCV GVC GAA CCA CGA GUU GAC AGV CVG GA”
361
CVC CAA ACA G””
G”C AAU GGA CUV GAU VCG AGU GUV ACC CAG UUG GGV GCV CGA GVG GGA CAA CVU GAG ACA GGA CUU GCA GAC GVA CGC
372
AA”
TYPE
1
ACG G”C ACG UC”
2
ACA G”C ACG GAU AA”
C”C
CUV GAG CGA GCA “CA
TYPE 3
GU”
C”C
G””
TYPE 1
G”G ACA AC”
TYPE 2
GVG ACG ACU GM
TYPE 3
GUA ACA “CC AVA CAA
TYPE
1
ACG UC”
A”G
TYPE
2
ACC GA”
GCC GVG ACG “CG G””
TYPE
3
CGC ACG GCG GVC AC”
TYPE
1
UC”
TYPE
2
AAU AAU G””
CVG CAG WA
UUC VUA UCG AAC CAG CAG AAA GGG UUG GGA UUC AUA GAC RAU GGA AUG GUA GUG AAA AUA GA” ACC CAG “A”
700
TYPE 3
GGG AAU AA”
CVC GCC A”C
CGA UVG CCA GGA AA”
CAA
684
TYPE
1
““”
GC” “AC
RAV AGU AAU GGA GAG AVU ACA UUG GUG AGV CAA AUC AA”
GAA UVG CCA UCG CGC GUA UCA ACA CVG GRA UCA GCG LRA AUC
811
2
“UC
AGC “UC
GAU AGC AA”
GGC AAC AUA ACU CUG AAC AAC AAC AVA AG”
GG” C”G CCG GCG CGA ACA GGU UCC C”C
GAG GCA UCU CGU AUC
790
TYPE 3
“UC
CAG AVA G””
AAC UVG ACU CUC AAG ACG ACU GVG VW
UCU AUC AAC UCA AGG AUA GGC GCA ACU GAG CAA AGU UAC GUG
174
TYPE
1
GAU “CA
TYPE
2
GA”
TYPE
3
GCG “CG
GCA GUG ACU CCC ““G
TYPE
1
AAC “CC
G”A
C”G “CG
TYPE
2
MC
MC
G””
CVC ACA C”G AGA AAU CGA UCG GUC ACG CCA ACA WC
TYPE
3
AA”
UC”
AGU GGA CAG CUA ACU G””
TYPE
1
GCA GA”
GGC AUG CGC ACG GG” ACV “GG ACG GGA CRA ““G
CAA UAU CAG CAC CCA CAA UUG AGU UGG AGA GCA AA”
TYPE
2
CA”
AGA AA”
“AC
CGC A”U
AGA CV”
GAA “A”
TYPE
3
AG”
CCA AA”
“AU
AGG UU”
AGG CAG AGC AUG “GG A”A
TYPE
CAC GAC AA”
GAG G””
G””
GA”
GCA GAA CGV AAC A”U
GG” CVA AVC AA” ““G
AG”
AAC GA”
GGC CAG AA”
GGA AGC ““A
CGG ACA GAA C”A
GCA GCG CUA ACA ACA CGG
451
“CA
AC”
GGA “CA
ACG AGG
451
CUG ACG UVA CGA
462
“CC
GCC A”V
““A
AA”
GAC C””
U”G ACC ACU GAG CUA “CA
“CG A””
GGU GAG WA
GVC CGC CRA AC”
AU”
AA”
AAU G”G GAG
541 520
GCG GAU “UC
G”A
VCC ACA AGA C”A
AA”
GAA VCU AGG A”A
VCC ACG VVA GAG
510
GG” AA”
“GG U””
631
GGA A”G
CAG AAG ACU GGG AAC UCG AUU AAG GUU AUU GVG GGU ACG GGG AVG “GG “UC
GAC CVU “CG GGG CAA “CA
CCU CCA ““A
AC”
““G
GCU GCG AUA GAC ACG CG” C”C ACG ACA CUG GAG
AGC GCG GGA GC” CCC CUC UCA AVC CGU AAU AAC CGU AVG ACC AUG GGA VVA AAU GA”
GCG CCA CCG CU”
CG” “VU
GG” CAA GGG WV
AA”
GGA “CA
GUG ACG ACG GCV GGA CGG GGA C”G CAG AAA AAC GGA AAC ACC UVG AAC GUC A””
AAU CAA VVG CAG C”C
GVG G”A
AGA CVG GAA ACU GAA G”A
GCG AGA G”G GA” AC”
ACA AGG ““A
G””
GGC CAG ““G
TYPE
GA”
UVG GAG GCA GA”
GUV GCA GAA CAG CGA AU”
361
GAU
CUC ACU GGA CGA ACA UCC AC”
UCC GAG CUG GGU GAC CGA GVC AA”
ACG
AAA GGG GUG GGA UVU GUC GGC ACA GGA RUG G”G G””
GGU CUG MU
GA”
AA”
AG”
GAC CGC
610
GGA CUC ACG ““G
“CA
597
AAG AU”
UA”
721
AVU CAA AAU GG” GGA CUV CAG UVU CGA VU”
GA” ACU AA”
AA”
ACU GA”
ACC GUA CGC GAA GCG AGC GGC G”A
CG” ACC CUG AGC UUV GGU “A”
GA”
“VU
ACA AVC AVC
901
G”G AUA CAG UC”
CGG C”A
GAG GCU GUG GAC “UC
GVG G”V AC”
880
AVU
855
AGA U”A
UUA CGG “CA
AC”
GGV AGC AC”
AAC AGU AGC ACG AAG G”G CVG GA”
CGU UUG AC”
C””
AGA “CG ACA “CC
GGG CAA “GG “CA
CCG ACA UAC AGG “AC
CCG AA”
““G
GG” CAA ““G
CUG CG” C”C A”G AVG C”A
GAC AG”
“CA
ACG AGC GA”
RCA CUU GAA
CCU C”G GAG CVC GAC ACA GCA AA”
AAG “VU
CC”
AGG “AU
CCG
GGA AUU GUC “CC
A”A
“AC
C”G GAG ““G
AAU AGA G”G CAG GUG
991
AAU AGU GCU GAU AAC UCA GUG AGC AUU
970
AUA GC” GA”
G””
AGC GGC GG” AUC GGA A”G
CAC ACG CCG AGU UUG CGV UGG AA”
GC”
G”C AC”
UUG
939
1081
CCC G”C ACG GVU
1060
VAU UCU GG” AGU GGG CUG AAU “GG AGG GUA CAG GUG AAC UCC
1029
REOVIRUS PROTEIN al CONSERVED TYPE
1
AA”
““G
AUG AAG GUG GAU GAU “GG ““G
TYPE
2
AAU ““G
UUG AG”
TYPE
3
GAC AUU U"" AUU GUA GA” GA” “AC AUA CAU AUA “G”
AUG CGA GUA GAC GA”
GUG UUG AGC U””
“GG CUC AU”
UC”
CAG AUG ACG AC”
““U
AC”
CGG ““U
C””
CCA GC” “UU
TYPE 1
““0
GUG UC”
GGG ““A
UC”
“CU
TYPE
UUC GUA AC”
GGU ““G
UC”
CCA GGG “GG GCG ACU GGG AG”
2
403
DOMAINS
AAC UCA AUA AUG GCA GA”
“CG ACG AGC GGC AUC UUA GCG “CA GAC GG” “UC
GGA “GG CAG ACG GGG GA” ACU GAA CCA UCG ACC GAG CCC UCG
GGG AAA UUU GUG A”” GGA AAG ““U
GUA UUG AAC
CUA UCG ““G
AAC
1119
“CA
ACU AUU GA”
UC”
ACG ACA ““U
GCC
1255
AAC CCA CUG “CA
ACG ACG ““U
GCU
1234
GGA GCA CAG AC”
GUA
1209
AAG
1345
GCC GGG GAA CUA GAG AUC ACG
1324
RCA ACU AC”
CCA ““G
TYPE
1
GCG GUC CAA “UU
CUA AAU AAC GGU CAA CGC AUU GAU GCG ““U
AGG AUC AUG GGA GUA “CG GAA “GG ACG GAU GGA GAA UUA GAG AU”
TYPE
2
GCA AU”
AUC AA”
AGA AUC UUG GGA GUC GCA GAG UGG AA”
TYPE
3
GCU AUA GGG UUG UCG “CG GG” GGU GCG CC”
CAG UA”
CAA GUA UA”
CAG “UC
UUA CCA CCG ““A
GGG “CA
UC”
TYPE
1
AAU UAU GGU GGC RCA “AC
TYPE
2
AAU CAU GGC GGA ACA UAU RCA GCG CA”
TYPE
3
G””
GAG GGG GG” GGC “CA
CUU ACA GGA GAC AC”
CGC GUA GAC GCC “U”
ACC GGU CA”
AC”
““U
CA”
AUU ACG CAC “CA
TYPE
1
AUCUAGCGCGAACCCUCGGCACAAGGGGUCAAUCAUC
2
GGAUCCGGGUGCUCCACUCGGCACAGUGGCGACUCAUC
MC
AA”
GAC G”G GUC ACA “AU
AAG AAU CUG UGG GUG GAG CAG “GG CAG GAU GGA GUA CU”
“GG GC” CCG “GG ACG AUC AUG “AU
ACC AAU GUC GAC “GG GCG CCG AUG ACC A””
TYPE
TYPE 3
AUG AG”
GAG CCC GC”
AGU AAG “GG CC”
GCC AUG ACC GU”
1150
AUA GC” GAC GG” GGA GA”
“U”
ACC GGA “UG
1171
“CU
TYPE 3
G””
AA”
CCA UGC AA”
AUG UAC CCA “GU UCG “AC
GUG AGG E
CUG GGC E
CCG CG” AGU UUC ACG g
CGG ““A
CGU
1299
1426 1402 1380
1463
GGAUCAGACCACCCCGCGGCACUGGGGCAUUUCAUC
1440 1416
FIG. 2. The sequences of the plus strands of the reovirus STl, 2. and 3 Sl genome segments. The ST3 sequence is that published previously (Nagata et al., 1984; Cashdollar er a/., 1985; Bassel-Duby et a/., 1985). The sequences are alrgned so as to match the amino acid sequence alignments as shown in Fig. 3.
aligning the ST1 and 2 al proteins (the more closely related pair) and then aligning the ST3 al sequence for optimizing triple identities yields a slightly different alignment scheme with 80 triple identities instead of 79. Since this is not significantly different, and since the Needleman and Wunsch program introduces slightly fewer insertions/deletions than the Similarity program, it was preferred. It should be noted that unrelated se-
TABLE 1 RELATEDNESS
LORFs
PA-TTERNSAMONGTHE
SEGMENTSANDAMONGTHETHREEU~
Amino acid sequences Number of matches Percentage matches “True” percentage similarityC
Sl
GENOME
PROTEINS'
Serotype parr 1:3
2:3
802/1413 57
53511428 37
54711410 39
32
12
14
1:2 Nucleotide sequences Number of matches Percentage matches “True” percentage similarity*
OFTHETHREE
2271471 48 34
1181475 25 11
1171468 25 11
a On the basis of sequences aligned as shown in Fig. 3. ’ Corrected for 25% random matches. ’ Corrected for the fact that the alignment programs used find about 14% similarity in unrelated sequences of srmilar length.
quences of the same length yield 1O-l 3 triple identities when aligned with the same programs (and only about 1 when not aligned). Thus the three al proteins are very significantly related. The relatedness of the three 01 proteins is analyzed further in Table 1 which indicates that, aligned in the manner described, the ST1 and 2 al proteins are 48% related to each other, and 25% each to the ST3 ~1 protein. These figures are similar to those presented before (Wiener and Joklik, 1989). It should be noted that alignment programs of the type used here find about 14% similarity among unrelated proteins of similar length. Thus the true or functional extents of similarity are 34% for the ST1/2 al protein pair and 11% for the ST1/3 and ST2/3 ~1 protein pairs. These figures are strikingly similar to those found above for the three Sl genome segment pairs. An analysis of the evolutionary divergence of nucleotides in each of the three codon positions will be presented below. The amino acid compositions of the three al proteins are presented in Table 2. They are slightly acidic proteins, rather low in his, and contain only one cys. Since all three al proteins compete for the same receptor on L cells (Lee et al., 1981) and are capable of assembling into virus particles composed of heterologous capsid proteins (Weiner et al., 1978) certain functional domains such as those specifying cell attachment, hemagglutination, oligomerization, and association with the projections/spikes composed of protein
DUNCAN
404
ET AL.
Tl
MD ASLITEIRKI
12
72
.M SDLVQLIRRE
11
73
MD PRLREEWRL *
12
.EIKKQVQVN
VDDIRAANIK
LDGLGRQIAD
RLGEMDNRLV GISSQVTQLS
88
EEIKKQIKDI
SADVNRISNI
VDSIQGQLGG LSVRVSAIES
GVSENGNRID
RLERDVSGIS
88
SLSKGLESRV **
SALEKTSQIH
SDTILRITQG
LDDANKRIIA
SVSDAQLAIS
RLESSIGALQ
92
Tl
VLQLSVSSNG
SQSKEIE...
T2
ILLLTGNGES
ANSKH...
73
IIALTSDNGA *
EI
l
ISNSISTIES LEQSRDDLVA
*
A Tl
NSVSQNTQSI
SSLGDRINAV
EPRVDSLDTV
TSNLTGRTST
LEADVGSLRT
ELAALTTRVT
TEVTRLDGLI
NSGQNSIGEL
168
T2
ASVSGIDSRL
SELGDRVNVA EQRIGQLDTV
TDNLLERASR
LETEVSAITN
DLGSLNTRVT
TE.......L
NDVRQTIAAI
161
T3
TVVNGLDSSV *
TQLGARVGQL ETGLADVRVD
HDNLVARVDT AERNIGSLTT
ELSTLTLRVT
SIP.......
. . . . . ..ADF
158
Tl
STRLSNVETS
MVTTAGRGLQ KNGNTLNVIV
GNGMUFNSSN QLQLDLSGQS KGVGFVGTGM WKIDTNYFA
YNSNGEITLV
248
T2
DTRLTTLETD
AVTSVGQGLQ KTGNSIKVIV
GTGMUFDRNN VLQLFLSNQQ
FDSNGNITLN
241
T3
ESRISTLERT
AVTSAGAPLS
QFRFNTDQFQ IVNNNLTLKT
236
l **
***
*
*
*
*
***
B
*
*
IRNNRMTMGL NDGLTL.SGN
****
*
l
NLAIRLPGNT **
KGLGFIDNGM WKIDTQYFS .GLNIQNGGL l
*
l
**
*
Tl
SQINELPSRV
STLESAKIDS
VLPPLTVREA
SGVRTLSFGY DTSDFTIINS
VLSLRSRLTL
PTYRYPLELD
TANNRVQVAD
328
T2
NNISGLPART
GSLEASRIDV
VAPPLVIQST
GSTRLLRLMY
EAVDFWTNN
VLTLRNRSVT
PTFKFPLELN
SADNSVSIHR
321
73
TVFDSINSRI *
GATEQSYVAS AVTPLRLNSS ** t
TKVLDMLIDS
STLE...
DVSGGIGMSP
311
Tl
RFGMRTGTUT GQLQYQHPQL SWRANVTLNL HKVDDWLVLS FSQMTTNSIM
ADGKFVINFV
SGLSSGUQTG DTEPS..STI
406
T2
NYRIRLGQWS GQLEYHTPSL
RUNAPVTVNL
FTRFSTSGIL
ASGKFVLNFV
TGLSPGWATG STEPS..TTT
399
T3
NYRFRQSMWI GIVSYSGSGL
NWRVQVNSDI FIVDDYIHIC
LPAFDGFSIA
DGGDLSLNFV TGLLPPLLTG
INS SGQLTVRSTS PNLRYP..IA * ** * *
0
C
*
**
l
l
*
*
MRVDDWLILS ***
*
l
***
**
**
DTEPAFHNDV
391
***
E Tl
DPLSTTFAAV
QFLNNGQRID
AFRIMGVSEW TDGELEIKNY
GGTYTGHTQV YUAPUTIMYP
T2
NPLSTTFAAI
QFINGSSRVD
AFRILGVAEU
GGTYTAHTNV DWAPMTIMYP CLG.
T3
VTYGAQTVAI
GLSSGGAPQY MSKNLUVEQW QDGVLRLRVE GGGSITHSNS
*
**
NAGELEITNH
**
l *
*
CNVR
KUPAMTVSYP RSFT *
*
470 462
455
**
FIG. 3. The amino acid sequences of the al proteins of reovirus serotypes 1, 2, and 3. The sequences were optimally aligned using the algorithm of Needleman and Wunsch (1970). Deletions are indicated by (.); amino acids triply conserved are indicated by (*). Regions designated A, B, C, D, and E represent conserved domains.
X2 (Lee et al., 1981) are presumably conserved among the three al protein sequences. In fact, five 22-to 34residue-long domains of rather extensive sequence conservation can be discerned. They are labeled A to E in Fig. 3. In region D 46% of the amino acid residues are shared by all three proteins; for each of the other four regions, this fraction is 32% (compared to an overall similarity of 17% for the three proteins).
The tertiary structures of the three al proteins resemble each other. This is best shown by plotting their a-helix contents as predicted by the algorithm of Garnier et a/. (1978) (Fig. 4). All three proteins have an (Yhelix content of about 17%, a B-sheet content of about 37%, a random coil content of about 30%, and a turn content of about 16%. The ratio of residues in a-helix to P-sheet configuration (0.465) is the lowest of any
REOVIRUS PROTEIN crl CONSERVED
405
DOMAINS
TABLE2
ST1 al
Amino acid
ST251
20 26 33 25
25 30 35 25 1 17 23 37 5 34 48 10 7 15 12 45 42 7 6 38
25 21 38 2 30 46 12 11 13 11 53 47 8 10 38
K M F P S W
Total
Percentage forST3 01
ST3 al 29 29 28 28 1 20 17 38 5 31 53 7 8 12 14 52 37 5 8 33
6.4 6.4 6.2 6.2 0.2 4.4 3.7 8.4 1.1 6.8 11.6 1.5 1.8 2.6 3.1 11.4 8.1 1.1 1.8 7.3
ST1 01s
ST2 01 S
ST3 01s
6 13 5 5 0 6 5 2 3 7 16 4 4 0 5 16 6 3 6
6 9 6 5 1 7 8 1 2 7 18 6 5 0 5 17 8 4 5 5
2 9 5 6 1 9 5 4 4 5 20 5 6 0 4 17 6 5 2 5
119
125
120
Percentage for ST3 01 S 1.7 7.5 4.2 5.0 0.8 7.5 4.2 3.3 3.3 4.2 16.7 4.2 5.0 0 3.3 14.2 5.0 4.2 1.7 4.2
470
462
455
Acidic (D + E) Basic (R + K + H)
46 40
48 45
45 41
9.9 9.0
10 20
13 17
11 18
9.2 15.0
Aromatic (F + W + Y)
31
28
25
5.5
9
9
7
5.8
156
155
150
33.0
42
44
43
35.8
13,990
14,561
13,993
Hydrophobic (Aromatic + I + L + M +V) Molecular
weight
51,427
50,47 1
49,101
reovirus protein. Not surprisingly, the a-helix plots for the serotype 1 and 2 proteins resemble each other more closely than that of the ST3 protein. For the former two proteins about two-thirds of a-helices are in the N-terminal one-third of the molecule; for the ST3 al protein this fraction is about 90%. Comparison
of the three al S protein sequences
The sequences of the three al S proteins, aligned for the maximum number of triply conserved amino acids by the Needleman and Wunsch algorithm, are shown
50
100
150
in Fig. 5. Eighteen amino acids are conserved among all three proteins (overall similarity of 15%); most of them are located in two relatively conserved domains, A and B, in which they account for 27 and 33% of the total number of amino acids, respectively. Although the fraction of triply conserved amino acids in the al S proteins is almost the same as in the al proteins (15 and 1SO/o,respectively), the al S proteins are significantly less related than the al proteins. Thus the ST1 and 2, ST1 and 3, and ST2 and 3 al S proteins share only 37, 18, and 23% amino acids, respectively; since these se-
200
250
300
350
400
Amino Acid FIG. 4. The a-helix contents of the three ~1 proteins as predicted
by the algorithm of Garnier eta/. (1978).
DUNCAN
406
A NAPS-Q---- KKSRKSRNKSRSTLMISGLP 25 --MENQPTRNTRSRKLRNKLKTSLLMSTGS 28 MEHHCQKGLN QGSRRSRRRL KYTLILSSGS 30 * ** * * x
Tl T2 T3
Tl ILNSTDLEDR LLTSAIASQP LSQDWVRWIIDLWVSRVRSR NYLTQLARTL 75 T2 VTSLIQSKDNWVDYLYACQP LNRELVRTAI ELIDSSEMSPAYRLALAESI 78 T3 PRDSMMQTNE SSLLSKVGMTWLHQSVMLNL QSPDWKALSE PSKQLSMDLI 80 * *
B Tl RAYPHWVTES MLSNHELTVW IRSRLISLDE HPLWRQMLEA YGQN--T2 RVYPSWVTES MLQNSELASW IQSRIISLSE HQDWKLKYQP LLMTLDH T3 RVLPSWVLEW DNLRQDLQTYALITTISLRE WILQNVTLDH-------
124 127 120
FIG. 5. The amino acid sequences of the 01 S proteins of reovirus serotypes 1, 2, and 3. The proteins were optimally aligned using the algorithm of Needleman and Wunsch (1970). Amino acids triply conserved are indicated by (*).
quences were aligned according to the Needleman and Wunsch algorithm, the “true” similarity percentages are only about 23, 4, and 9%, respectively (see above). All three al S proteins are highly basic, with isoelectric points ranging from 9.83 to 1 1.85. Their secondary structures, however, as predicted by the algorithm of Garnier et al. (1978) are surprisingly different. According to this algorithm the percentage of amino acid residues in a-helix (A), @-sheet(B), random coil (C), and turn (T) configuration for the three proteins are A 28, 30, 20; B 23, 21, 25; C 25, 25, 28; and T 24, 24, 28, respectively. The ratios of amino acids in a-helix to P-sheet configuration for the three proteins are 1.22, 1.46, and 0.8, respectively, values that are remarkably different. DISCUSSION The Sl genome segment sequences reported here indicate that the al proteins of the three reovirus serotypes possess 470,462, and 455 amino acids, respectively, with predicted molecular weights of about 51.5K, 50.5K, and 49K. These results explain what was previously considered to be anomalous al protein migration behavior in SDS polyactylamide gels. The previously reported sequences (Cashdollar er al,, 1985; Munemitsu et a/., 1986) predicted the ST3 al protein to be the largest of the three al proteins; the modified sequences reported here correctly predict that the ST1 01 protein is slightly larger than the ST2 al protein which in turn is slightly larger than the ST3 al protein. The major point of interest of these sequences is that although the three al proteins have diverged very extensively (Table 3) they still share 79 out of about 470 amino acid residues. Interestingly, these residues are
ET AL
significantly clustered. If one ignores clustering of fewer than five triply conserved residues and places a lower limit of 30% on the triply conserved residue content of any given sequence stretch, one finds that 49 of the 79 triply conserved residues are located in five domains that are labeled A-E in Fig. 3. In one of these domains, D, no fewer than 469/o of residues are common to all three al proteins in a 34-residue-long region. Remarkably, these five conserved domains do not coincide with regions in which there is particularly pronounced conservation of conserved (i.e., D or E; A, S, or T; F, Y, or W; etc.) amino acid replacement. There are five such replacements in domains A, four in B, three in E, and only two in C and D; in each case, these numbers are lower, mostly much lower, than the number of identical residues. When conserved amino acids replacements are included, the five domains exhibit residue similarity contents that vary from 39 (domain C) to 55% (domain B). Little can be surmised concerning the functions specified by the five conserved domains. Domain A is predominantly a-helical in nature; it represents the Cterminal limit of the predicted coiled-coil structure (Bassel-Duby et a/., 1985; Furlong et a/., 1988) that is the hallmark of the N-terminal portion of all three al proteins. This region may play a key role in stabilizing ul tetramer formation, but is too far from the amino terminus to play a role in anchoring al onto the virion (Mah and Lee, unpublished observations). Domain B exists predominantly in a random coil/psheet with several turns configuration. This region may play a role in the hemagglutinating function because truncated ~1 lacking residues 123-223 does not bind glycophorin, the reovirus receptor on human erythrocytes, but retains ability to bind to L cells (Nagata eta/., 1987). Domains C, D, and E all reside in the carboxyterminal region of protein al previously shown to harbor the host cell attachment domain. Although the exact nature of the protein al-receptor interaction is unclear, recent evidence has provided interesting clues. The observation that the three reovirus serotypes compete for binding to mouse L cell receptors (Lee et a/., 1981) suggests that a common recognition mechanism is involved. An additional clue came from the finding that carbohydrate moieties, sialic acid in particular, on the surface of L cells play an important role in reovirus cell attachment (Armstrong et a/., 1984; Gentsch and Pacitti, 1987; Pacitti and Gentsch, 1987). Recent studies in our laboratory have further shown that reovirus is capable of binding to sialic acid alone (Paul et a/., 1989). Thus it appears very likely that a sialic acid binding domain is present in all three al proteins. On the other hand, Fields and co-workers (Weiner et a/., 1977) have
REOVIRUS PROTEIN al CONSERVED
407
DOMAINS
TABLE 3 ANALYSISOF NUCLEOTIDEMISMATCHESBETWEENTHE THREESl GENOME SEGMENTSBYCODON POSITION A. Distribution
of mismatches
among codon positions Serotype pair
Percentage mismatches in First base codon position Second base codon position Third base codon position B. Number/percentage
of mismatches
1:2
1:3
2:3
32 23 46
34 27 38
35 28 37
in each codon posrtion Serotype pair
Nucleotide mrsmatches in All base codon positions First base codon positron Second base codon position Third base codon position
2:3
1:3
1:2 Number
Percentage
Number
Percentage
Number
Percentage
611 1931471 1391471 279147 1
41 30 59
893 3071476 2441476 3421476
64 51 72
863 3031470 2401470 3201470
64 51 68
demonstrated that different tissue tropisms manifested by the three reovirus serotypes are a function of the ~1 protein, which suggests the presence of distinct cell receptors. It is likely, therefore, that in addition to the protein-sialic acid interaction, another type of interaction, one more specific in nature, is involved in the recognition of receptors by al under certain circumstances. This viewpoint has led us to speculate that domains that are conserved among the three serotypes are involved in the recognition of the common L cell receptor and possibly represent a sialic acid-binding site. At the same time, certain regions on ~1 may have diverged sufficiently to create unique, serotypespecific cell attachment domains that are responsible for the observed distinct tissue tropisms of ST1 and ST3 reovirus. In this regard, it is interesting to note that, through the use of anti-idiotypic antibodies and synthetic peptides, the region encompassing residues 317 to 322 has been implicated as the ST3-specific receptor recognition domain (Bruck et al., 1986; Williams et a/., 1988). Paradoxically, this region lies in domain C. It is also interesting in this regard that recent deletion mapping studies have shown that removal of regions D and E destroys L cell binding (Nagata et a/., 1987). They may therefore be involved in attachment to the common L cell receptor or in binding to sialic acid. Further studies are currently aimed at elucidating the role of these domains in sialic acid binding, receptor recognition, membrane penetration, and protein folding.
The second major point of interest of the sequences reported here is that they permit a detailed analysis of the relatedness of the three Sl genome segments and of the proteins that they encode. Table 3 presents an analysis of nucleotide mismatches between the three Sl genome segments by and within codon positions. The distribution of mismatches among the three codon positions is remarkable. Whereas for all other genome segments that we have analyzed (S3, M2, and Ll) (Wiener and Joklik, 1989) the overwhelming proportion of mismatches has been in third base codon positions (from 69 to 88%; average for the three base pairs, 75% for the S3 genome segment, 879/o for the M2 genome segment, and 79% for the Ll genome segment), the corresponding figure here is 40%; and whereas for the other three genome segments an average of only 15 and 4% of mismatches are in first and second base codon positions, respectively, for the three Sl genome segments these values are 34 and 26%, respectively. Since there is no reason for supposing that certain individual reovirus genome segments are more unstable genetically than others, this indicates, in all probability, that the al protein is far more able to accept alterations in its sequence without losing function than the other reovirus proteins that have been analyzed, namely IJNS, ~1, and X3. Table 3 also lists the percentage mismatches in each codon position for each of the three serotype pairs. Since 25% matches would be found when comparing completely unrelated sequences, it is informa-
DUNCAN
408 TABLE 4 EVOLUTIONARYDIVERGENCEPATERN OFTHETHREE Sl GENOME SEGMENT? Serotype pair
First base codon position Second base codon position Third base codon position a For details of calculations,
1:2
1:3
2:3
55 40 79
85 68 96
85 68 91
see text.
tive to express mismatches as a percentage of (codons x 0.75), which provides an estimate of the extent of divergence toward complete randomness. These figures are presented in Table 4, and they may be compared with the figures in Tables 5 and 7 of a previous publication (Wiener and Joklik, 1989), which also presented figures for many of the other genome segments. It is clear that the third base codon positions have diverged to an extreme extent for the Sl genome segment, more than for any other reovirus genome segment. For example, for the Sl genome segments the third base codon positions of the most closely related serotype pair have diverged 79%, whereas for the S3, M2, and Ll genome segments the third base codon positions of the most closely related serotype pairs have diverged only 48, 53, and 13Ob. The difference in evolutionary divergence patterns is even greater for the first and second base codon positions, where the highest divergence figures for any first base codon position for the S3, M2, and Ll genome segments is 25%, and for the second base codon position, 8%; by contrast, the corresponding figures for the Sl genome segments vary from 40 to 85%. Clearly evolutionary divergence has been much more extensive for the Sl genome segment than for any other reovirus genome segment: and the most likely reason is that the structure of the ~1 protein has been far more tolerant of sequence change. It is also of considerable interest to note that whereas for the S3 and M2 genome segments the majority of mismatches for the most closely related serotype pairs are transitions (81.5%) and are also significantly biased in favor of transitions for the less closely related serotype pairs (about 52%), for the Sl genome segment the percentage of transition mismatches for the most closely related serotype l/serotype 2 pair is only slightly above that expected by chance (43%), and that for the other two serotype pairs is 40 and 3596, respectively. Thus, the more closely related the genome segments, the higher the proportion of transitions among mismatches. We have argued, from the mismatch percentages in third base co-
ET AL.
don positions where most of the mismatches are neutral, that the various reovirus genome segments started diverging at different times, with the Sl genome segments having diverged for the longest period of time (Wiener and Joklik, 1989). It is known that polymerase errors result mostly in transitions (see, for example, Kuge et a/., 1989) and that many chemical mutagenic processes and mutagens like deamination, alkylation, nitrosoguanidine, and bisulfite also cause primarily transitions. The nature of the mechanism(s) that induce(s) transversions after the initial accumulation of transitions is not known. There remains the interesting question of the coiledcoil configuration of the amino terminal one-third portion of protein al, the region in which extensive conformational similarity is essential since it is this region that is thought to be inserted into the projections or spikes composed of protein X2 and that is responsible for anchoring the al tetramers, the equivalents of the adenovirus fiber, to them. Starting with residue 28, the ST1 01 protein has 21 heptads in 18 of which every 7th amino acid is I, L, or V. For the ST2 al protein there are 20 such heptads, and only 2 of the relevant residues are not I, L, or V; and in the ST3 ~rl protein there are also 21 such heptads, and only 5 of the relevant residues are not I, L, or V. In 14 of the 21 heptads in ST1 and ST3 01 each 4th residue is I, L, or V also; and the same is true for 15 of the 20 ST2 ~1 heptads. Interestingly, none of the 20 or so residues that form the heptad boundaries are conserved among all three proteins; all result from conservative amino acid replacements. The retention of a highly specific configuration, namely the coiled-coil structure, over a large portion (more than 150 amino acids) of protein gl provides a fascinating example of structural and functional conservation. No doubt further analysis will provide insight into the nature of the evolutionary mechanisms used to accomplish this.
ACKNOWLEDGMENT We thank Jon Wiener for running computer searches and Mike Roner for help in formatting Fig. 2. This work was supported by the Medical Research Council of Canada. L.W.C. was supported by a grant from the National Science Foundation (DCB-8518044). R.D. and D.H. were fellows of the Alberta Heritage Foundation for Medical Research (AHFMR). P.W.K.L. is an AHFMR Scholar.
REFERENCES ARMSTRONG. G. D., PAUL, R. W., and LEE, P. W. K. (1984). Studies on reovirus receptors of L cells: Virus binding characteristics and comparison with reovirus receptors of erythrocytes. Virology 138, 37-48. BANERJEA,A. C., BRECHLING,K. A., RAY, C. A., ERICKSON,H.. PICKUP, D. T., and JOKLIK,W. K. (1988). High-level synthesis of biologically
409
REOVIRUS PROTEIN 01 CONSERVED DOMAINS active reovirus protein al in a mammalian expression vector system. virology 167,601-612. BASSEL-DUBY,R., JAYASURIYA, A., CHA~ERJEE,S., SONENBERG,N., MAIZEL, J. V., and FIELDS,B. N. (1985). Sequence of reovirus hemagglutinnin predicts a coiled-coil structure. Nature (London) 315.42 l-423. BASSEL-DUBY. R., NIBERT, M. L., HOMCY, C. J., FIELDS, B. N., and SAWUTZ, D. G. (1987). Evidence that the al protein of reovirus serotype 3 is a multimer. /. Viral. 61, 1834-l 841. BIGGIN, M. D., GIBSON, T. J., and HONG, G. F. (1983). Buffer gradient gels and 35S label as an aid to rapid DNA sequence determination. Proc. Nat/. Acad Sci. USA 80,3963-3965. BRUCK, C., Co, M. S., SLAOUI. M., GAULTON, G. N., SMITH, T., FIELDS, B. N.. MULLINS, J. I., and GREENE, M. I. (1986). Nucleic acid sequence of an internal image-bearing monoclonal anti-idiotype and Its comparison to the sequence of the external antigen. Proc. Nat/. Acad. Sci. USA83,6578-6582. BURSTIN, S. J.. SPRIGGS,D. R., and FIELDS, B. N. (1982). Evidence for functional domains on the reovirus type 3 hemagglutinin. Virology 117,146-155. CASHDOLLAR, L. W., CHMELO, R. A., WIENER, J. R., and JOKLIK,W. K. (1985). Sequence of the Sl genes of the three serotypes of reovirus. Proc. Natl. Acad. Sci. USA 82, 24-28. DEVEREUX,J., HAEBERLI,P., and SMITHIES,0. (1984). Acomprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res. 12,387-395. ERNST, H., and SHATKIN, A. J. (1985). Reovirus hemagglutinin mRNA codes for two polypeptides in overlapping reading frames. Proc. Nat/. Acad. Sci. USA 82,48-52. FINBERG,R., WEINER,H. L., FIELDS,B. N., BENACERRAF, B., and BANKAKOFF, S. J. (1979). Generation of cytolytic T lymphocytes after reovirus infection: Role of Sl gene. Proc. Nat/. Acad. Sci. USA 76, 442-446. FONTANA, A., and WEINER, H. L. (1980). Interaction of reovirus with cell surface receptors. II. Generation of suppressorT cells by hemagglutinin or reovirus type 3.1. Immunol. 125, 2660-2664. FURLONG,D. B., NIBERT, M. L., and FIELDS, B. N. (1988). 01 protein of mammalian reoviruses extends from the surfaces of viral particles. 1. Viral. 62, 246-256. GARNIER, J., OSQUTHORPE,D. J., and ROBSON, B. (1978). Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular pr0teins.l. Mol. Biol. 120,97-l 20. GENTSCH, J. R., and PACITTI, A. F. (1987). Effect of neuraminidase treatment of cells and effect of soluble glycoproteins on type 3 reovirus attachment to murine L cells. J. Viral. 56, 356-364. JACOBS, B. L., ATWATER,J. A., MUNEMISTSU, J. M., and SAMUEL, C. E. (1985). Biosynthesis of reovirus-specified polypeptides: The Sl mRNA synthesized in viva is structurally and functionally indistinguishable from in vitro-synthesized Sl mRNA and encodes two polypeptides, ala and al bNS. virology 147,9-l 8. KUGE, S., KAWAMURA,N., and NOMOTO. A. (1989). Strong inclination toward transitlon mutation in nucleotide substitutions by poliovirus replicase. J. Mol. Biol. 207, 175-282. LEE, P. W. K., HAYES, E. C., and JOKLIK,W. K. (1981). Protein al is the reovirus cell attachment protein. Virology 108, 156-l 63. LI, J. K.-K., SCHEIBLE.P. P., KEENE,J. D., and JOKLIK,W. K. (1980). Nature of the y-terminal sequences of the plus and minus strands of the Sl gene of reovirus serotypes 1,2, and 3. virology 105,41-5 1. MESSING, J. (1983). New Ml3 vectors for cloning. In “Methods in Enzymology” (R. Wu, L. Grossman, and K. Moldave, Eds.), Vol. 101, pp. 20-78. Academic Press, New York. MUNEMITSU, J.-M., ATWATER.J. A., and SAMUEL, C. E. (1986). Biosynthesis of reovirus-specified polypeptides: Molecular cDNA cloning and nucleotide sequence of the reovirus serotype 1 Lang strain bicistronic sl mRNA which encodes the minor capsid polypeptide
ala and the nonstructural polypeptide 01 bNS. Biochem. Biophys. Res. Commun. 140,508-5 14. NAGATA, L., MASRI, S. A., MAH, D. C. W., and LEE, P. W. K. (1984). Molecular cloning and sequencing of the reovirus (serotype 3) St gene which encodes the viral cell attachment protein 01. Nucleic Acids Res. 12,8699-8710. NAGATA, L., MASRI, S. A., PON, R. T., and LEE, P. W. K. (1987). Analysis of functional domains on reovirus cell attachment protein 01 using cloned Sl gene deletion mutants. Virology 160, 162-l 68. NEEDLEMAN, S. B., and WUNSCH, C. D. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48,443-453. PACITTI, A. F., and GENTSCH,J. R. (1987). Inhibition of reovirus type 3 binding to host cells by sialylated glycoproteins is mediated through the viral attachment protein. J. Viral. 61, 1407-l 415. PAUL, R. W., and LEE, P. W. K. (1987). Glycophorin is the reovirus receptor on human erythrocytes. Virology 159,94-l 01. PAUL, R. W., CHOI, A. H. C., and LEE, P. W. K. (1989). The cY-anomeric form of sialic acid is the minimal determinant recognized by reovirus. Virology 172, 382-385. PUSTELL,J., and KAFATOS, F. C. (1984). A convenient and adaptable package of computer programs for DNA and protein sequence management, analysis, and homology determination. Nucleic Acids Res. 12,643-655. ROSEN, L. (1960). Serologic grouping of reoviruses by hemagglutination-inhibition. Amer. J. Hyg. 71, 242-249. SARKAR, G., PELLETIER,J., BASSEL-DUBY, K., JAYASURIYA,A., FIELDS, B. N., and SONENBERG,N. (1985). Identification of a new polypeptide coded by reovirus gene Sl .J. Viral. 54, 720-725. SPRIGGS,D. R.. KAYE, K., and FIELDS, B. N. (1983). Topological analysis of the reovirus type 3 hemagglutinin. Virology 127, 220-224. TABOR. S.. and RICHARDSON,C. C. (1987). DNA sequence analysis with a modified bacteriophage T7 DNA polymerase. Proc. Nat/. Acad. Sci. USA 84,4767-4771. WEINER, H. L., DRAYNA. D., AVERILL, D. R., JR., and FIELDS,B. N. (1977). Molecular basis of reovirus virulence: Role of the Sl gene. Proc. Nat/. Acad. Sci. USA 74,5744-5748. WEINER, H. L., and FIELDS, B. N. (1977). Neutralization of reovirus: The gene responsible for the neutralization antigen. J. Exp. Med. 146,1305-1310. WEINER, H. L., RAMIG. R. F., MUSTOE. T. A., and FIELDS, B. M. (1978). Identification of the gene coding for the hemagglutinin of reovirus. Virology 86, 581-584. WEINER, H. L.. GREENE,M. I., and FIELDS, B. N. (1980). Delayed hypersensitivity in mice infected with reovirus. I. Identification of host and viral gene products responsible for the immune response. 1. Immunol. 125,278-282. WIENER,1. R.. and JOKLIK,W. K. (1989). The sequences of the reovirus serotype 1, 2, and 3 Ll genome segments and analysis of the mode of divergence of the reovirus serotypes. L&o/logy 169, 194-203. WILLIAMS, W. V.. GUY, H. R., RUBIN. D. H., ROBEY. F.. MYERS, J. M., KIEBER-EMMONS, T.. WEINER, D. B., and GREENE, M. I. (1988). Sequences of the cell-attachment sites of reovirus type 3 and its antiidiotypic/antireceptor antibody: Modeling of their three-dimensional structures. Proc. Nat/. Acad. Sci. USA 85, 6488-6492. YEUNG. M. C., GILL, M. J., SULEIMAN, S. A., SHAHRABADI, M. S.. and LEE, P. W. K. (1987). Purification and characterization of the reovirus cell attachment protein ~1. Virology 156.377-385. YEUNG, M. C.. LIM. D., DUNCAN,
R., SHAHRABADI.
M. S., CASHDOLLAR,
L. W.. and LEE, P. W. K. (1989). The cell attachment proteins of type 1 and type 3 reovirus are differentially susceptible to trypsin and chymotrypsin. Virology 170, 62-70.