Identification of conserved domains in the cell attachment proteins of the three serotypes of reovirus

Identification of conserved domains in the cell attachment proteins of the three serotypes of reovirus

VIROLOGY 174,399~409 (1990) Identification of Conserved Domains in the Cell Attachment Proteins of the Three Serotypes of Reovirus ROY DUNCAN,* DUF...

1MB Sizes 0 Downloads 35 Views

VIROLOGY

174,399~409

(1990)

Identification of Conserved Domains in the Cell Attachment Proteins of the Three Serotypes of Reovirus ROY DUNCAN,* DUFF HORNE,* L. WILLIAM CASHDOLlAR,t WOLFGANG K. JOKLlK,+ AND PATRICK W. K. LEE**’ *Department

of Microbiology and Infectious Diseases, University of Calgary Health Sciences Centre, Calgary, Alberta, Canada TZN 4N 1, tDepartment of Microbiology, Medical College of Wisconsin, Milwaukee, Wisconsin 53226; and *Department of Microbiology and immunology, Duke University Medical Center, Durham, North Carolina 277 10 ReceivedJuly

25, 1989; accepted October 6, 1989

Sequence analysis of reovirus serotype 1 (STl) and 2 (ST2) Sl genome segment cDNAs identified several differences from previously reported versions of their sequences. The sequences reported here comprise 1463 and 1440 base pairs, respectively; for comparison, the ST3 Sl genome segment is 1416 nucleotides long. The serotype 1 and 2 ~1 proteins are predicted to contain 470 and 462 amino acids, respectively; the ST3 ~1 protein is 455 amino acids long. As previously observed, the ST1 and ST2 ~1 proteins are much more closely related to each other than to that of ST3 (about 48 and 25% similarity, respectively, using a computer program that finds about 14% similarity among unrelated proteins). The sequences of the three Sl genome segments have diverged very extensively in all three codon positions, in some cases almost to the extent of randomness. Despite this, not only function but also shape and configuration have been retained (since the three ~1 proteins can be incorporated efficiently into completely heterologous capsids). Seventy-nine amino acid residues are conserved among all three serotypes, many of them clustered into five regions in which one-third or more of the residues are triply conserved. These regions may represent functionally conserved domains involved in oligomerization, cell attachment, and hemagglutination. o 199OAcademic PWSS. IK.

INTRODUCTION

and hemagglutination functions are controlled by different regions (Burstin et a/., 1982; Spriggs et a/., 1983; Nagata et al., 1987). Although the exact nature of the interactions between protein al and receptors on host cells and etythrocytes is presently unknown, it is clear that carbohydrate moieties on receptor molecules are important recognition signals in both cases (Armstrong et al., 1984; Gentsch and Pacitti, 1987; Pacitti and Gentsch, 1987). The al protein is located at the vertices of the viral isosahedron (Lee eta/., 1981) and exists in virus particles in tetrameric form (Bassel-Duby et a/., 1987; Banerjea et al., 1988). Analysis of its sequence suggests that its amino-terminal portion exists in the form of an a-helical, coiled-coil structure which may be involved in anchoring the protein in the virus particle, while its carboxy-terminal portion is globular and harbors the domains involved in receptor recognition (BasselDuby et a/., 1985). These predictions have been confirmed by electron microscopic studies of purified protein al (Furlong et a/., 1988; Banerjea et al., 1988) and by receptor binding studies using truncated al proteins and tryptic fragments of al (Nagata et a/., 1987; Yeung et al., 1989). As a first step toward understanding the molecular basis of protein al-cell recognition, the Sl genome segments of all three reovirus serotypes have been

Mammalian reoviruses are classified into three serotypes (ST1 ,‘ST2, and ST3) based on neutralization and hemagglutination tests (Rosen, 1960). Genetic reassortment analysis revealed that type-specificity is determined by the Sl genome segment which encodes the minor outer capsid protein al (Weiner and Fields, 1977). Besides eliciting the formation of neutralizing antibodies and inducing cell-mediated immune responses (Finberg et al., 1979; Fontana and Weiner, 1980; Weiner et al., 1980), protein al also defines tissue tropism (Weiner et a/., 1977), a function explained by the fact that it is the cell attachment protein (Lee et a/., 1981). It is noteworthy, in this regard, that although the three serotypes manifest different tissue tropisms in whole animals, they apparently bind to the same receptors on mouse L fibroblasts (Lee et al., 1981). Protein al is also the reovirus hemagglutinin since it also interacts with erythrocyte-surface receptors (Weiner et a/., 1978; Paul and Lee, 1987; Yeung et al., 1987). Epitope mapping and direct binding studies using truncated al proteins have shown that the host cell binding ’ To whom requests for reprints should be addressed. ’ Abbreviations used: ST, serotype; SDS, sodium dodecyl sulfate; PAGE, polyacrylamide gel electrophoresis; LORF, long open reading frame; SORF, short open reading frame. 399

0042-6822/90

$3.00

CopyrIght 0 1990 by Academic Press. Inc. Ail rights of reproduction an any form reserved.

400

DUNCAN

cloned and sequenced (Nagata et a/., 1984; BasselDuby et a/., 1985; Cashdollar et a/., 1985). The results indicate that the ST3 Sl gene is 1416 nucleotides long and contains a long open reading frame (LORF) of 1365 nucleotides which encodes the 455 amino acid long al protein. A second short overlapping ORF of 360 nucleotides encodes the 120 amino acid long al S protein (also referred to as al bNS or ~14) which has been detected not only among the translation products of melted Sl genome segments in in vitro protein synthesizing systems but also in reovirus-infected cells (Ernst and Shatkin, 1985; Jacobs et a/., 1985; Sarkar et al., 1985). The ST1 Sl genome segment has been reported to be 1458 or 1462 nucleotides long and to possess an LORF of 1254 nucleotides (418 codons) and an SORF of 357 nucleotides (Cashdollar et al., 1985; Munemitsu eta/., 1986). The ST2 Sl genome segment has been reported to be 1442 nucleotides long with an LORF of 1197 nucleotides (399 codons) and an SORF of 375 nucleotides (Cashdollar et al., 1985). A curious feature of these sequences is that while the ST3 Sl genome segment maximizes its coding potential with a 5’-noncoding region of 12 nucleotides and a 3’-noncoding region of only 36 nucleotides, the ST1 and ST2 Sl genome segments appear to contain relatively long 3’-noncoding regions (188 and 229 nucleotides, respectively). Thus, the predicted molecular weights of the ST1 and ST2 ~1 proteins are smaller than that of the ST3 al protein, which does not agree with their relative molecular weight estimates using SDS-PAGE, where the ST3 al protein migrates faster than either the ST1 or the ST2 al protein (Lee et al., 1981). A second curious feature involves the distribution of conserved sequences among the three genome segments. The largest stretch of such similarity occurs in a 97-nucleotide region located near the Zterrninus of the plus strands of all three genome segments (Cashdollar et al., 1985). The significance of this similarity is unclear, since in the case of the ST1 and ST2 genome segments this region resides in the 3’noncoding region. In view of these incongruities, we undertook to resequence cDNA clones of ST1 and ST2 genome segments. We report here the sequences that were obtained. The LORFs of the ST1 and ST2 genome segments were found to be 1410 and 1386 nucleotides long, encoding proteins 470 and 462 amino acids long, respectively. These results reconcile the above-mentioned anomalies. We have also analyzed the evolutionary divergence patterns of the three Sl genome segments with respect to nucleotide replacement by codon position and compared this pattern with that of several other reovirus genome segments. It has also been possible to identify several rather well conserved domains that are present in all three ~1 proteins.

ET AL

MATERIALS AND METHODS Template DNA The full-length ST2 Sl cDNA clone in pBR322 was the same as that previously sequenced (Cashdollar et a/., 1985). It was cleaved with Bglll and BarnHI (at nucleotides 45 and 1403, respectively) and subcloned in both orientations into M 13 mp18. The DNA was sequenced in both directions using the universal sequencing primer (United States Biochemical Corp.) and eight synthetic primers (four primers for each orientation). The primers were synthesized on an Applied Biosystems 380A DNA synthesizer and purified by gel electrophoresis. All sites complementary to primers were sequenced from the opposite strand. The ST1 Sl cDNA sequenced was a hybrid clone. ST1 Sl-specific cDNA was synthesized using a synthetic primer corresponding to nucleotides 14 to 28 and containing an EcoRl site at the 5’-terminus. This procedure was intended to generate cDNA suitable for direct subcloning into a prokaryotic expression vector. The several recombinants that were produced were found to be 3’-terminally truncated. In order to produce a full-length recombinant, the 5’-terminal region of one such clone that extended to the Xhol site at position 253 was ligated to the fragment of the clone previously sequenced (Cashdollar eta/., 1985) that extended from this Xhol site to the Pstl site at its 3’terminus. The cDNA was then subcloned into M 13 mp18 and mpl9 for sequencing. The 13 5’-terminal nucleotides of the ST1 Sl genome segment were not present in this recombinant and sequence analysis revealed that the 16 3’-terminal nucleotides were also absent. The sequence of this hybrid ST1 Sl genome segment was determined from a series of subclones generated by standard subcloning procedures.

Dideoxy sequencing Single-stranded, recombinant phage DNA was purified from 1.5-ml cultures grown in fscherichia co/i strain DH5& (Bethesda Research Laboratories), using phenol extraction and ethanol precipitation according to standard procedures (Messing, 1983). The single-stranded DNA was used as template for DNA sequencing using the dideoxy chain termination procedure and modified T7 DNA polymerase (United States Biochemical Corp.) according to published procedures (Tabor and Richardson, 1987). The DNA was labeled with [35S]dATP (400 CVmmol; Amersham) and fractionated in 33 cm X 40 cm X 0.4 mm 6% acrylamide-8 M urea buffer gradient gels (Biggin et a/., 1983) or in standard 4% acrylamide-8 M urea gels. Every base was identified by sequencing in both directions.

REOVIRUS PROTEIN ul CONSERVED m P *

5’ I I

200

E e ‘5

-L 9

$

I

I

L

I

400

a

-szp WV)

600

800

I

ii

1

,

1000

1200

3’

1400

.

T2 E $ 5’1 ’ I

I I 200

*

.

4bo

600

800

. *

. *

(

I

1000

1200

3’

1400

. *

.

FIG. 1. The sequencing strategy for determining the sequences of the cDNA versions of the ST1 and ST2 Sl genome segments. The arrows indicate the direction and the length of the sequences determined from a series of subclones of ST1 Sl cDNA or generated using a series of synthetic oligonucleotide primers for ST2 Sl cDNA. The restriction sites that were used are indicated. For further details see Materials and Methods.

Computer

analysis

The cDNA sequences were analyzed using the University of Wisconsin Genetic Computer Group programs (Devereux et al., 1984) and the Pustell DNA sequence analysis programs (International Biotechnologies Incorporated (Pustell and Kafatos, 1984)). RESULTS Sequence analysis of the Sl genome segments STl, ST2, and ST3

of

The sequencing strategies are shown in Fig. 1. As described under Materials and Methods, the 5’-terminal 13 nucleotides of the ST1 Sl genome segment (up to the initiation codon of the al coding region) and the 3’-terminal 16 nucleotides were not present in the clone sequenced in this study, but were determined previously (Li et al., 1980; Cashdollar et al,, 1985). Similarly, the 5’-terminal45 nucleotides (up to the Bglll site) and the 3’-terminal 32 nucleotides (from the BarnHI site) of the ST2 Sl genome segment were determined previously (Cashdollar et al., 1985) and were not sequenced in this study. The sequences of the ST1 and ST2 Sl genome segments are presented in Fig. 2, along with the previously published ST3 Sl sequence (Nagata et a/., 1984; Bassel-Duby et a/., 1985; Cashdollar et al., 1985). The genome segments are 1463,1440, and 1416 nucleotides long, respectively. They are aligned so as to maximize amino acid identities in a// three al proteins as de-

401

DOMAINS

scribed in the legend to Fig. 3. As reported previously, all three Sl genome segments are bicistronic and contain an LORF that encodes the ~1 protein, as well as an overlapping SORF that encodes the ~1 S protein (Nagata et al., 1984; Cashdollar et a/., 1985; Bassel-Duby et al., 1985). The sequences of the ST1 and ST2 genome segments presented here differ in several aspects from those reported previously (Cashdollar et a/., 1985; Munemitsu eta/., 1986). For the ST1 Sl genome segment the differences are there is a U instead of a G at position 285 (unless otherwise indicated, the numbers indicate the nucleotide position of the originally reported sequences (Cashdollar eta/., 1985)) which results in a g/y to val substitution in the al coding region and a val to i/e substitution in the ~1s coding region; there is a C instead of a U at position 519 which results in a phe to ser change; there is a U following position 1240 and a C following position 1312, which results in two frame shifts in the al coding region which now extends to a termination codon at position 1423 (new number), so that the LORF is 470 codons long; and there is a triplet CCC following position 1436 in the 3’-noncoding region. The Cat position 519, the addition of the C following position 1312, and the addition of the CCC following position 1436 had also been noted by Munemitsu eta/. (1986), who also reported a G to A change at position 1277. For the ST2 Sl genome segment two differences were found: absence of a G following position 1120 and absence of another G following position 1257 (both original numbers), which result in frame shifts so that the coding region extends to a termination codon at position 1399 (new number) thereby causing the ST2 ~1 LORF to be 462 codons long. As previously reported and discussed in some detail (Cashdollar et al,, 1985), the ST1 and 2 St genome segments are much more closely related to each other (579/o matches) than to the ST3 Sl genome segment (37 and 39% matches, respectively) (Table 1). These numbers apply to the sequences aligned for maximizing triply conserved amino acid residues as shown in Fig. 3. Corrected for random coincidence (259/o), the true similarity percentages are 32, 12, and 14, respectively; that is, the Sl genome segments have diverged 68, 88, and 86% respectively, toward randomness. Comparison of the al protein sequences three reovirus serotypes

of the

The amino acid sequences of the three al proteins, aligned for maximizing triple identities using the algorithm of Needleman and Wunsch (1970), are presented in Fig. 3. The GenBank Similarity program for optimally

402 TYPE

DUNCAN 1

GCUAVVCGCGCCU

13

TYPE 2

GCUAVVCGCACUC

13

TYPE 3

GCUAVUGGV CGG

12

TYPE

1

E

TYPE

2

TYPE 3

TYPE

1

TYPE

2

E

GA”

GCA UCU CUC AU”

G

VCG GA”

GA”

CCU CGC CUR CG”

C”A

GAA A”C GAG A”C

GAG GAA A””

ACA GAG AUA CGG AAA AVA GUA WC

G”G CAG CVC AUA AGA AGG GAG AUC “VA

AAG AM

CAA GVC CAG GUC AAC G””

AAG AAA CAA AVU AAA GAC AUC UC”

AGG G”C “CG GCG CUC GAG AAG ACG “C”

TYPE

1

GC” GAC A”C

TYPE

2

GG” GGA U”A

UC”

TYPE 3

AUC GC” C””

GAG CM

C”G ““A

GGG AAU GGA GAA “CA

ACG AGV GAV AAU GGA GCA “CA

GA” AUC AGG GC”

GCC AA”

A”V

AAA GAA AVC GAG

100

GCC AAC “CG AAA CAC CVG “CA

91

AAA GGG CVU GAA “CA

AAA CVC GAC GGA CV”

102

GGA AGA CAG AUU

181

GCU GAU GUC AAC AGG AVC AGV AAC AVC GVU GAU UCA AUC CAA GGA CAA CUG

181

CAC UCU GAU ACU AUC CUC CGG AUC ACC CAG GGA CUC GAU GAU GCA MC

AAA CGA A”C

192

ACC A””

GAG UCA AGA VUG GG” GAG AVG GAU AAU CGA CV”

GUG GG” A”C

“CG AGU CAG GVC ACG CAA

211

GVA CGC GUG “CA

GCC A””

GAA “CG GGA GUU AGV GAG AAC GGC AAU CGA A””

GA”

GAG CGA GA”

GGC

271

WCC AGA UUG GAA AGC VCU AUC GGA GCC

282

UUA UC”

TYPE 2

AVA “CG GC” AGC G””

TYPE

AGC A”C

CAA A”A

GA”

AC”

“CA

TYPE 1

3

CAA CVA VCU GUA UCA AGC AAU GGC UCC CAG “CA

GAA GAA GUA GVA CGG CUG AUA AUC GCA ““A

TYPE 3

AGC AA”

ET At

AG”

CGG GAU GAC UUG GUV GCA VCA GUC AGV GA”

AAC UCA GVU AGC CAG AAC ACU CAG AGC AUA UCC “CA AGC GGA AUC GAU VCG CGU WA

GCV CAA CUU GCA AK

VUG GGV GAC AGA AK

AGA C”C

G”C “CC

AAU GCV GVC GAA CCA CGA GUU GAC AGV CVG GA”

361

CVC CAA ACA G””

G”C AAU GGA CUV GAU VCG AGU GUV ACC CAG UUG GGV GCV CGA GVG GGA CAA CVU GAG ACA GGA CUU GCA GAC GVA CGC

372

AA”

TYPE

1

ACG G”C ACG UC”

2

ACA G”C ACG GAU AA”

C”C

CUV GAG CGA GCA “CA

TYPE 3

GU”

C”C

G””

TYPE 1

G”G ACA AC”

TYPE 2

GVG ACG ACU GM

TYPE 3

GUA ACA “CC AVA CAA

TYPE

1

ACG UC”

A”G

TYPE

2

ACC GA”

GCC GVG ACG “CG G””

TYPE

3

CGC ACG GCG GVC AC”

TYPE

1

UC”

TYPE

2

AAU AAU G””

CVG CAG WA

UUC VUA UCG AAC CAG CAG AAA GGG UUG GGA UUC AUA GAC RAU GGA AUG GUA GUG AAA AUA GA” ACC CAG “A”

700

TYPE 3

GGG AAU AA”

CVC GCC A”C

CGA UVG CCA GGA AA”

CAA

684

TYPE

1

““”

GC” “AC

RAV AGU AAU GGA GAG AVU ACA UUG GUG AGV CAA AUC AA”

GAA UVG CCA UCG CGC GUA UCA ACA CVG GRA UCA GCG LRA AUC

811

2

“UC

AGC “UC

GAU AGC AA”

GGC AAC AUA ACU CUG AAC AAC AAC AVA AG”

GG” C”G CCG GCG CGA ACA GGU UCC C”C

GAG GCA UCU CGU AUC

790

TYPE 3

“UC

CAG AVA G””

AAC UVG ACU CUC AAG ACG ACU GVG VW

UCU AUC AAC UCA AGG AUA GGC GCA ACU GAG CAA AGU UAC GUG

174

TYPE

1

GAU “CA

TYPE

2

GA”

TYPE

3

GCG “CG

GCA GUG ACU CCC ““G

TYPE

1

AAC “CC

G”A

C”G “CG

TYPE

2

MC

MC

G””

CVC ACA C”G AGA AAU CGA UCG GUC ACG CCA ACA WC

TYPE

3

AA”

UC”

AGU GGA CAG CUA ACU G””

TYPE

1

GCA GA”

GGC AUG CGC ACG GG” ACV “GG ACG GGA CRA ““G

CAA UAU CAG CAC CCA CAA UUG AGU UGG AGA GCA AA”

TYPE

2

CA”

AGA AA”

“AC

CGC A”U

AGA CV”

GAA “A”

TYPE

3

AG”

CCA AA”

“AU

AGG UU”

AGG CAG AGC AUG “GG A”A

TYPE

CAC GAC AA”

GAG G””

G””

GA”

GCA GAA CGV AAC A”U

GG” CVA AVC AA” ““G

AG”

AAC GA”

GGC CAG AA”

GGA AGC ““A

CGG ACA GAA C”A

GCA GCG CUA ACA ACA CGG

451

“CA

AC”

GGA “CA

ACG AGG

451

CUG ACG UVA CGA

462

“CC

GCC A”V

““A

AA”

GAC C””

U”G ACC ACU GAG CUA “CA

“CG A””

GGU GAG WA

GVC CGC CRA AC”

AU”

AA”

AAU G”G GAG

541 520

GCG GAU “UC

G”A

VCC ACA AGA C”A

AA”

GAA VCU AGG A”A

VCC ACG VVA GAG

510

GG” AA”

“GG U””

631

GGA A”G

CAG AAG ACU GGG AAC UCG AUU AAG GUU AUU GVG GGU ACG GGG AVG “GG “UC

GAC CVU “CG GGG CAA “CA

CCU CCA ““A

AC”

““G

GCU GCG AUA GAC ACG CG” C”C ACG ACA CUG GAG

AGC GCG GGA GC” CCC CUC UCA AVC CGU AAU AAC CGU AVG ACC AUG GGA VVA AAU GA”

GCG CCA CCG CU”

CG” “VU

GG” CAA GGG WV

AA”

GGA “CA

GUG ACG ACG GCV GGA CGG GGA C”G CAG AAA AAC GGA AAC ACC UVG AAC GUC A””

AAU CAA VVG CAG C”C

GVG G”A

AGA CVG GAA ACU GAA G”A

GCG AGA G”G GA” AC”

ACA AGG ““A

G””

GGC CAG ““G

TYPE

GA”

UVG GAG GCA GA”

GUV GCA GAA CAG CGA AU”

361

GAU

CUC ACU GGA CGA ACA UCC AC”

UCC GAG CUG GGU GAC CGA GVC AA”

ACG

AAA GGG GUG GGA UVU GUC GGC ACA GGA RUG G”G G””

GGU CUG MU

GA”

AA”

AG”

GAC CGC

610

GGA CUC ACG ““G

“CA

597

AAG AU”

UA”

721

AVU CAA AAU GG” GGA CUV CAG UVU CGA VU”

GA” ACU AA”

AA”

ACU GA”

ACC GUA CGC GAA GCG AGC GGC G”A

CG” ACC CUG AGC UUV GGU “A”

GA”

“VU

ACA AVC AVC

901

G”G AUA CAG UC”

CGG C”A

GAG GCU GUG GAC “UC

GVG G”V AC”

880

AVU

855

AGA U”A

UUA CGG “CA

AC”

GGV AGC AC”

AAC AGU AGC ACG AAG G”G CVG GA”

CGU UUG AC”

C””

AGA “CG ACA “CC

GGG CAA “GG “CA

CCG ACA UAC AGG “AC

CCG AA”

““G

GG” CAA ““G

CUG CG” C”C A”G AVG C”A

GAC AG”

“CA

ACG AGC GA”

RCA CUU GAA

CCU C”G GAG CVC GAC ACA GCA AA”

AAG “VU

CC”

AGG “AU

CCG

GGA AUU GUC “CC

A”A

“AC

C”G GAG ““G

AAU AGA G”G CAG GUG

991

AAU AGU GCU GAU AAC UCA GUG AGC AUU

970

AUA GC” GA”

G””

AGC GGC GG” AUC GGA A”G

CAC ACG CCG AGU UUG CGV UGG AA”

GC”

G”C AC”

UUG

939

1081

CCC G”C ACG GVU

1060

VAU UCU GG” AGU GGG CUG AAU “GG AGG GUA CAG GUG AAC UCC

1029

REOVIRUS PROTEIN al CONSERVED TYPE

1

AA”

““G

AUG AAG GUG GAU GAU “GG ““G

TYPE

2

AAU ““G

UUG AG”

TYPE

3

GAC AUU U"" AUU GUA GA” GA” “AC AUA CAU AUA “G”

AUG CGA GUA GAC GA”

GUG UUG AGC U””

“GG CUC AU”

UC”

CAG AUG ACG AC”

““U

AC”

CGG ““U

C””

CCA GC” “UU

TYPE 1

““0

GUG UC”

GGG ““A

UC”

“CU

TYPE

UUC GUA AC”

GGU ““G

UC”

CCA GGG “GG GCG ACU GGG AG”

2

403

DOMAINS

AAC UCA AUA AUG GCA GA”

“CG ACG AGC GGC AUC UUA GCG “CA GAC GG” “UC

GGA “GG CAG ACG GGG GA” ACU GAA CCA UCG ACC GAG CCC UCG

GGG AAA UUU GUG A”” GGA AAG ““U

GUA UUG AAC

CUA UCG ““G

AAC

1119

“CA

ACU AUU GA”

UC”

ACG ACA ““U

GCC

1255

AAC CCA CUG “CA

ACG ACG ““U

GCU

1234

GGA GCA CAG AC”

GUA

1209

AAG

1345

GCC GGG GAA CUA GAG AUC ACG

1324

RCA ACU AC”

CCA ““G

TYPE

1

GCG GUC CAA “UU

CUA AAU AAC GGU CAA CGC AUU GAU GCG ““U

AGG AUC AUG GGA GUA “CG GAA “GG ACG GAU GGA GAA UUA GAG AU”

TYPE

2

GCA AU”

AUC AA”

AGA AUC UUG GGA GUC GCA GAG UGG AA”

TYPE

3

GCU AUA GGG UUG UCG “CG GG” GGU GCG CC”

CAG UA”

CAA GUA UA”

CAG “UC

UUA CCA CCG ““A

GGG “CA

UC”

TYPE

1

AAU UAU GGU GGC RCA “AC

TYPE

2

AAU CAU GGC GGA ACA UAU RCA GCG CA”

TYPE

3

G””

GAG GGG GG” GGC “CA

CUU ACA GGA GAC AC”

CGC GUA GAC GCC “U”

ACC GGU CA”

AC”

““U

CA”

AUU ACG CAC “CA

TYPE

1

AUCUAGCGCGAACCCUCGGCACAAGGGGUCAAUCAUC

2

GGAUCCGGGUGCUCCACUCGGCACAGUGGCGACUCAUC

MC

AA”

GAC G”G GUC ACA “AU

AAG AAU CUG UGG GUG GAG CAG “GG CAG GAU GGA GUA CU”

“GG GC” CCG “GG ACG AUC AUG “AU

ACC AAU GUC GAC “GG GCG CCG AUG ACC A””

TYPE

TYPE 3

AUG AG”

GAG CCC GC”

AGU AAG “GG CC”

GCC AUG ACC GU”

1150

AUA GC” GAC GG” GGA GA”

“U”

ACC GGA “UG

1171

“CU

TYPE 3

G””

AA”

CCA UGC AA”

AUG UAC CCA “GU UCG “AC

GUG AGG E

CUG GGC E

CCG CG” AGU UUC ACG g

CGG ““A

CGU

1299

1426 1402 1380

1463

GGAUCAGACCACCCCGCGGCACUGGGGCAUUUCAUC

1440 1416

FIG. 2. The sequences of the plus strands of the reovirus STl, 2. and 3 Sl genome segments. The ST3 sequence is that published previously (Nagata et al., 1984; Cashdollar er a/., 1985; Bassel-Duby et a/., 1985). The sequences are alrgned so as to match the amino acid sequence alignments as shown in Fig. 3.

aligning the ST1 and 2 al proteins (the more closely related pair) and then aligning the ST3 al sequence for optimizing triple identities yields a slightly different alignment scheme with 80 triple identities instead of 79. Since this is not significantly different, and since the Needleman and Wunsch program introduces slightly fewer insertions/deletions than the Similarity program, it was preferred. It should be noted that unrelated se-

TABLE 1 RELATEDNESS

LORFs

PA-TTERNSAMONGTHE

SEGMENTSANDAMONGTHETHREEU~

Amino acid sequences Number of matches Percentage matches “True” percentage similarityC

Sl

GENOME

PROTEINS'

Serotype parr 1:3

2:3

802/1413 57

53511428 37

54711410 39

32

12

14

1:2 Nucleotide sequences Number of matches Percentage matches “True” percentage similarity*

OFTHETHREE

2271471 48 34

1181475 25 11

1171468 25 11

a On the basis of sequences aligned as shown in Fig. 3. ’ Corrected for 25% random matches. ’ Corrected for the fact that the alignment programs used find about 14% similarity in unrelated sequences of srmilar length.

quences of the same length yield 1O-l 3 triple identities when aligned with the same programs (and only about 1 when not aligned). Thus the three al proteins are very significantly related. The relatedness of the three 01 proteins is analyzed further in Table 1 which indicates that, aligned in the manner described, the ST1 and 2 al proteins are 48% related to each other, and 25% each to the ST3 ~1 protein. These figures are similar to those presented before (Wiener and Joklik, 1989). It should be noted that alignment programs of the type used here find about 14% similarity among unrelated proteins of similar length. Thus the true or functional extents of similarity are 34% for the ST1/2 al protein pair and 11% for the ST1/3 and ST2/3 ~1 protein pairs. These figures are strikingly similar to those found above for the three Sl genome segment pairs. An analysis of the evolutionary divergence of nucleotides in each of the three codon positions will be presented below. The amino acid compositions of the three al proteins are presented in Table 2. They are slightly acidic proteins, rather low in his, and contain only one cys. Since all three al proteins compete for the same receptor on L cells (Lee et al., 1981) and are capable of assembling into virus particles composed of heterologous capsid proteins (Weiner et al., 1978) certain functional domains such as those specifying cell attachment, hemagglutination, oligomerization, and association with the projections/spikes composed of protein

DUNCAN

404

ET AL.

Tl

MD ASLITEIRKI

12

72

.M SDLVQLIRRE

11

73

MD PRLREEWRL *

12

.EIKKQVQVN

VDDIRAANIK

LDGLGRQIAD

RLGEMDNRLV GISSQVTQLS

88

EEIKKQIKDI

SADVNRISNI

VDSIQGQLGG LSVRVSAIES

GVSENGNRID

RLERDVSGIS

88

SLSKGLESRV **

SALEKTSQIH

SDTILRITQG

LDDANKRIIA

SVSDAQLAIS

RLESSIGALQ

92

Tl

VLQLSVSSNG

SQSKEIE...

T2

ILLLTGNGES

ANSKH...

73

IIALTSDNGA *

EI

l

ISNSISTIES LEQSRDDLVA

*

A Tl

NSVSQNTQSI

SSLGDRINAV

EPRVDSLDTV

TSNLTGRTST

LEADVGSLRT

ELAALTTRVT

TEVTRLDGLI

NSGQNSIGEL

168

T2

ASVSGIDSRL

SELGDRVNVA EQRIGQLDTV

TDNLLERASR

LETEVSAITN

DLGSLNTRVT

TE.......L

NDVRQTIAAI

161

T3

TVVNGLDSSV *

TQLGARVGQL ETGLADVRVD

HDNLVARVDT AERNIGSLTT

ELSTLTLRVT

SIP.......

. . . . . ..ADF

158

Tl

STRLSNVETS

MVTTAGRGLQ KNGNTLNVIV

GNGMUFNSSN QLQLDLSGQS KGVGFVGTGM WKIDTNYFA

YNSNGEITLV

248

T2

DTRLTTLETD

AVTSVGQGLQ KTGNSIKVIV

GTGMUFDRNN VLQLFLSNQQ

FDSNGNITLN

241

T3

ESRISTLERT

AVTSAGAPLS

QFRFNTDQFQ IVNNNLTLKT

236

l **

***

*

*

*

*

***

B

*

*

IRNNRMTMGL NDGLTL.SGN

****

*

l

NLAIRLPGNT **

KGLGFIDNGM WKIDTQYFS .GLNIQNGGL l

*

l

**

*

Tl

SQINELPSRV

STLESAKIDS

VLPPLTVREA

SGVRTLSFGY DTSDFTIINS

VLSLRSRLTL

PTYRYPLELD

TANNRVQVAD

328

T2

NNISGLPART

GSLEASRIDV

VAPPLVIQST

GSTRLLRLMY

EAVDFWTNN

VLTLRNRSVT

PTFKFPLELN

SADNSVSIHR

321

73

TVFDSINSRI *

GATEQSYVAS AVTPLRLNSS ** t

TKVLDMLIDS

STLE...

DVSGGIGMSP

311

Tl

RFGMRTGTUT GQLQYQHPQL SWRANVTLNL HKVDDWLVLS FSQMTTNSIM

ADGKFVINFV

SGLSSGUQTG DTEPS..STI

406

T2

NYRIRLGQWS GQLEYHTPSL

RUNAPVTVNL

FTRFSTSGIL

ASGKFVLNFV

TGLSPGWATG STEPS..TTT

399

T3

NYRFRQSMWI GIVSYSGSGL

NWRVQVNSDI FIVDDYIHIC

LPAFDGFSIA

DGGDLSLNFV TGLLPPLLTG

INS SGQLTVRSTS PNLRYP..IA * ** * *

0

C

*

**

l

l

*

*

MRVDDWLILS ***

*

l

***

**

**

DTEPAFHNDV

391

***

E Tl

DPLSTTFAAV

QFLNNGQRID

AFRIMGVSEW TDGELEIKNY

GGTYTGHTQV YUAPUTIMYP

T2

NPLSTTFAAI

QFINGSSRVD

AFRILGVAEU

GGTYTAHTNV DWAPMTIMYP CLG.

T3

VTYGAQTVAI

GLSSGGAPQY MSKNLUVEQW QDGVLRLRVE GGGSITHSNS

*

**

NAGELEITNH

**

l *

*

CNVR

KUPAMTVSYP RSFT *

*

470 462

455

**

FIG. 3. The amino acid sequences of the al proteins of reovirus serotypes 1, 2, and 3. The sequences were optimally aligned using the algorithm of Needleman and Wunsch (1970). Deletions are indicated by (.); amino acids triply conserved are indicated by (*). Regions designated A, B, C, D, and E represent conserved domains.

X2 (Lee et al., 1981) are presumably conserved among the three al protein sequences. In fact, five 22-to 34residue-long domains of rather extensive sequence conservation can be discerned. They are labeled A to E in Fig. 3. In region D 46% of the amino acid residues are shared by all three proteins; for each of the other four regions, this fraction is 32% (compared to an overall similarity of 17% for the three proteins).

The tertiary structures of the three al proteins resemble each other. This is best shown by plotting their a-helix contents as predicted by the algorithm of Garnier et a/. (1978) (Fig. 4). All three proteins have an (Yhelix content of about 17%, a B-sheet content of about 37%, a random coil content of about 30%, and a turn content of about 16%. The ratio of residues in a-helix to P-sheet configuration (0.465) is the lowest of any

REOVIRUS PROTEIN crl CONSERVED

405

DOMAINS

TABLE2

ST1 al

Amino acid

ST251

20 26 33 25

25 30 35 25 1 17 23 37 5 34 48 10 7 15 12 45 42 7 6 38

25 21 38 2 30 46 12 11 13 11 53 47 8 10 38

K M F P S W

Total

Percentage forST3 01

ST3 al 29 29 28 28 1 20 17 38 5 31 53 7 8 12 14 52 37 5 8 33

6.4 6.4 6.2 6.2 0.2 4.4 3.7 8.4 1.1 6.8 11.6 1.5 1.8 2.6 3.1 11.4 8.1 1.1 1.8 7.3

ST1 01s

ST2 01 S

ST3 01s

6 13 5 5 0 6 5 2 3 7 16 4 4 0 5 16 6 3 6

6 9 6 5 1 7 8 1 2 7 18 6 5 0 5 17 8 4 5 5

2 9 5 6 1 9 5 4 4 5 20 5 6 0 4 17 6 5 2 5

119

125

120

Percentage for ST3 01 S 1.7 7.5 4.2 5.0 0.8 7.5 4.2 3.3 3.3 4.2 16.7 4.2 5.0 0 3.3 14.2 5.0 4.2 1.7 4.2

470

462

455

Acidic (D + E) Basic (R + K + H)

46 40

48 45

45 41

9.9 9.0

10 20

13 17

11 18

9.2 15.0

Aromatic (F + W + Y)

31

28

25

5.5

9

9

7

5.8

156

155

150

33.0

42

44

43

35.8

13,990

14,561

13,993

Hydrophobic (Aromatic + I + L + M +V) Molecular

weight

51,427

50,47 1

49,101

reovirus protein. Not surprisingly, the a-helix plots for the serotype 1 and 2 proteins resemble each other more closely than that of the ST3 protein. For the former two proteins about two-thirds of a-helices are in the N-terminal one-third of the molecule; for the ST3 al protein this fraction is about 90%. Comparison

of the three al S protein sequences

The sequences of the three al S proteins, aligned for the maximum number of triply conserved amino acids by the Needleman and Wunsch algorithm, are shown

50

100

150

in Fig. 5. Eighteen amino acids are conserved among all three proteins (overall similarity of 15%); most of them are located in two relatively conserved domains, A and B, in which they account for 27 and 33% of the total number of amino acids, respectively. Although the fraction of triply conserved amino acids in the al S proteins is almost the same as in the al proteins (15 and 1SO/o,respectively), the al S proteins are significantly less related than the al proteins. Thus the ST1 and 2, ST1 and 3, and ST2 and 3 al S proteins share only 37, 18, and 23% amino acids, respectively; since these se-

200

250

300

350

400

Amino Acid FIG. 4. The a-helix contents of the three ~1 proteins as predicted

by the algorithm of Garnier eta/. (1978).

DUNCAN

406

A NAPS-Q---- KKSRKSRNKSRSTLMISGLP 25 --MENQPTRNTRSRKLRNKLKTSLLMSTGS 28 MEHHCQKGLN QGSRRSRRRL KYTLILSSGS 30 * ** * * x

Tl T2 T3

Tl ILNSTDLEDR LLTSAIASQP LSQDWVRWIIDLWVSRVRSR NYLTQLARTL 75 T2 VTSLIQSKDNWVDYLYACQP LNRELVRTAI ELIDSSEMSPAYRLALAESI 78 T3 PRDSMMQTNE SSLLSKVGMTWLHQSVMLNL QSPDWKALSE PSKQLSMDLI 80 * *

B Tl RAYPHWVTES MLSNHELTVW IRSRLISLDE HPLWRQMLEA YGQN--T2 RVYPSWVTES MLQNSELASW IQSRIISLSE HQDWKLKYQP LLMTLDH T3 RVLPSWVLEW DNLRQDLQTYALITTISLRE WILQNVTLDH-------

124 127 120

FIG. 5. The amino acid sequences of the 01 S proteins of reovirus serotypes 1, 2, and 3. The proteins were optimally aligned using the algorithm of Needleman and Wunsch (1970). Amino acids triply conserved are indicated by (*).

quences were aligned according to the Needleman and Wunsch algorithm, the “true” similarity percentages are only about 23, 4, and 9%, respectively (see above). All three al S proteins are highly basic, with isoelectric points ranging from 9.83 to 1 1.85. Their secondary structures, however, as predicted by the algorithm of Garnier et al. (1978) are surprisingly different. According to this algorithm the percentage of amino acid residues in a-helix (A), @-sheet(B), random coil (C), and turn (T) configuration for the three proteins are A 28, 30, 20; B 23, 21, 25; C 25, 25, 28; and T 24, 24, 28, respectively. The ratios of amino acids in a-helix to P-sheet configuration for the three proteins are 1.22, 1.46, and 0.8, respectively, values that are remarkably different. DISCUSSION The Sl genome segment sequences reported here indicate that the al proteins of the three reovirus serotypes possess 470,462, and 455 amino acids, respectively, with predicted molecular weights of about 51.5K, 50.5K, and 49K. These results explain what was previously considered to be anomalous al protein migration behavior in SDS polyactylamide gels. The previously reported sequences (Cashdollar er al,, 1985; Munemitsu et a/., 1986) predicted the ST3 al protein to be the largest of the three al proteins; the modified sequences reported here correctly predict that the ST1 01 protein is slightly larger than the ST2 al protein which in turn is slightly larger than the ST3 al protein. The major point of interest of these sequences is that although the three al proteins have diverged very extensively (Table 3) they still share 79 out of about 470 amino acid residues. Interestingly, these residues are

ET AL

significantly clustered. If one ignores clustering of fewer than five triply conserved residues and places a lower limit of 30% on the triply conserved residue content of any given sequence stretch, one finds that 49 of the 79 triply conserved residues are located in five domains that are labeled A-E in Fig. 3. In one of these domains, D, no fewer than 469/o of residues are common to all three al proteins in a 34-residue-long region. Remarkably, these five conserved domains do not coincide with regions in which there is particularly pronounced conservation of conserved (i.e., D or E; A, S, or T; F, Y, or W; etc.) amino acid replacement. There are five such replacements in domains A, four in B, three in E, and only two in C and D; in each case, these numbers are lower, mostly much lower, than the number of identical residues. When conserved amino acids replacements are included, the five domains exhibit residue similarity contents that vary from 39 (domain C) to 55% (domain B). Little can be surmised concerning the functions specified by the five conserved domains. Domain A is predominantly a-helical in nature; it represents the Cterminal limit of the predicted coiled-coil structure (Bassel-Duby et a/., 1985; Furlong et a/., 1988) that is the hallmark of the N-terminal portion of all three al proteins. This region may play a key role in stabilizing ul tetramer formation, but is too far from the amino terminus to play a role in anchoring al onto the virion (Mah and Lee, unpublished observations). Domain B exists predominantly in a random coil/psheet with several turns configuration. This region may play a role in the hemagglutinating function because truncated ~1 lacking residues 123-223 does not bind glycophorin, the reovirus receptor on human erythrocytes, but retains ability to bind to L cells (Nagata eta/., 1987). Domains C, D, and E all reside in the carboxyterminal region of protein al previously shown to harbor the host cell attachment domain. Although the exact nature of the protein al-receptor interaction is unclear, recent evidence has provided interesting clues. The observation that the three reovirus serotypes compete for binding to mouse L cell receptors (Lee et a/., 1981) suggests that a common recognition mechanism is involved. An additional clue came from the finding that carbohydrate moieties, sialic acid in particular, on the surface of L cells play an important role in reovirus cell attachment (Armstrong et a/., 1984; Gentsch and Pacitti, 1987; Pacitti and Gentsch, 1987). Recent studies in our laboratory have further shown that reovirus is capable of binding to sialic acid alone (Paul et a/., 1989). Thus it appears very likely that a sialic acid binding domain is present in all three al proteins. On the other hand, Fields and co-workers (Weiner et a/., 1977) have

REOVIRUS PROTEIN al CONSERVED

407

DOMAINS

TABLE 3 ANALYSISOF NUCLEOTIDEMISMATCHESBETWEENTHE THREESl GENOME SEGMENTSBYCODON POSITION A. Distribution

of mismatches

among codon positions Serotype pair

Percentage mismatches in First base codon position Second base codon position Third base codon position B. Number/percentage

of mismatches

1:2

1:3

2:3

32 23 46

34 27 38

35 28 37

in each codon posrtion Serotype pair

Nucleotide mrsmatches in All base codon positions First base codon positron Second base codon position Third base codon position

2:3

1:3

1:2 Number

Percentage

Number

Percentage

Number

Percentage

611 1931471 1391471 279147 1

41 30 59

893 3071476 2441476 3421476

64 51 72

863 3031470 2401470 3201470

64 51 68

demonstrated that different tissue tropisms manifested by the three reovirus serotypes are a function of the ~1 protein, which suggests the presence of distinct cell receptors. It is likely, therefore, that in addition to the protein-sialic acid interaction, another type of interaction, one more specific in nature, is involved in the recognition of receptors by al under certain circumstances. This viewpoint has led us to speculate that domains that are conserved among the three serotypes are involved in the recognition of the common L cell receptor and possibly represent a sialic acid-binding site. At the same time, certain regions on ~1 may have diverged sufficiently to create unique, serotypespecific cell attachment domains that are responsible for the observed distinct tissue tropisms of ST1 and ST3 reovirus. In this regard, it is interesting to note that, through the use of anti-idiotypic antibodies and synthetic peptides, the region encompassing residues 317 to 322 has been implicated as the ST3-specific receptor recognition domain (Bruck et al., 1986; Williams et a/., 1988). Paradoxically, this region lies in domain C. It is also interesting in this regard that recent deletion mapping studies have shown that removal of regions D and E destroys L cell binding (Nagata et a/., 1987). They may therefore be involved in attachment to the common L cell receptor or in binding to sialic acid. Further studies are currently aimed at elucidating the role of these domains in sialic acid binding, receptor recognition, membrane penetration, and protein folding.

The second major point of interest of the sequences reported here is that they permit a detailed analysis of the relatedness of the three Sl genome segments and of the proteins that they encode. Table 3 presents an analysis of nucleotide mismatches between the three Sl genome segments by and within codon positions. The distribution of mismatches among the three codon positions is remarkable. Whereas for all other genome segments that we have analyzed (S3, M2, and Ll) (Wiener and Joklik, 1989) the overwhelming proportion of mismatches has been in third base codon positions (from 69 to 88%; average for the three base pairs, 75% for the S3 genome segment, 879/o for the M2 genome segment, and 79% for the Ll genome segment), the corresponding figure here is 40%; and whereas for the other three genome segments an average of only 15 and 4% of mismatches are in first and second base codon positions, respectively, for the three Sl genome segments these values are 34 and 26%, respectively. Since there is no reason for supposing that certain individual reovirus genome segments are more unstable genetically than others, this indicates, in all probability, that the al protein is far more able to accept alterations in its sequence without losing function than the other reovirus proteins that have been analyzed, namely IJNS, ~1, and X3. Table 3 also lists the percentage mismatches in each codon position for each of the three serotype pairs. Since 25% matches would be found when comparing completely unrelated sequences, it is informa-

DUNCAN

408 TABLE 4 EVOLUTIONARYDIVERGENCEPATERN OFTHETHREE Sl GENOME SEGMENT? Serotype pair

First base codon position Second base codon position Third base codon position a For details of calculations,

1:2

1:3

2:3

55 40 79

85 68 96

85 68 91

see text.

tive to express mismatches as a percentage of (codons x 0.75), which provides an estimate of the extent of divergence toward complete randomness. These figures are presented in Table 4, and they may be compared with the figures in Tables 5 and 7 of a previous publication (Wiener and Joklik, 1989), which also presented figures for many of the other genome segments. It is clear that the third base codon positions have diverged to an extreme extent for the Sl genome segment, more than for any other reovirus genome segment. For example, for the Sl genome segments the third base codon positions of the most closely related serotype pair have diverged 79%, whereas for the S3, M2, and Ll genome segments the third base codon positions of the most closely related serotype pairs have diverged only 48, 53, and 13Ob. The difference in evolutionary divergence patterns is even greater for the first and second base codon positions, where the highest divergence figures for any first base codon position for the S3, M2, and Ll genome segments is 25%, and for the second base codon position, 8%; by contrast, the corresponding figures for the Sl genome segments vary from 40 to 85%. Clearly evolutionary divergence has been much more extensive for the Sl genome segment than for any other reovirus genome segment: and the most likely reason is that the structure of the ~1 protein has been far more tolerant of sequence change. It is also of considerable interest to note that whereas for the S3 and M2 genome segments the majority of mismatches for the most closely related serotype pairs are transitions (81.5%) and are also significantly biased in favor of transitions for the less closely related serotype pairs (about 52%), for the Sl genome segment the percentage of transition mismatches for the most closely related serotype l/serotype 2 pair is only slightly above that expected by chance (43%), and that for the other two serotype pairs is 40 and 3596, respectively. Thus, the more closely related the genome segments, the higher the proportion of transitions among mismatches. We have argued, from the mismatch percentages in third base co-

ET AL.

don positions where most of the mismatches are neutral, that the various reovirus genome segments started diverging at different times, with the Sl genome segments having diverged for the longest period of time (Wiener and Joklik, 1989). It is known that polymerase errors result mostly in transitions (see, for example, Kuge et a/., 1989) and that many chemical mutagenic processes and mutagens like deamination, alkylation, nitrosoguanidine, and bisulfite also cause primarily transitions. The nature of the mechanism(s) that induce(s) transversions after the initial accumulation of transitions is not known. There remains the interesting question of the coiledcoil configuration of the amino terminal one-third portion of protein al, the region in which extensive conformational similarity is essential since it is this region that is thought to be inserted into the projections or spikes composed of protein X2 and that is responsible for anchoring the al tetramers, the equivalents of the adenovirus fiber, to them. Starting with residue 28, the ST1 01 protein has 21 heptads in 18 of which every 7th amino acid is I, L, or V. For the ST2 al protein there are 20 such heptads, and only 2 of the relevant residues are not I, L, or V; and in the ST3 ~rl protein there are also 21 such heptads, and only 5 of the relevant residues are not I, L, or V. In 14 of the 21 heptads in ST1 and ST3 01 each 4th residue is I, L, or V also; and the same is true for 15 of the 20 ST2 ~1 heptads. Interestingly, none of the 20 or so residues that form the heptad boundaries are conserved among all three proteins; all result from conservative amino acid replacements. The retention of a highly specific configuration, namely the coiled-coil structure, over a large portion (more than 150 amino acids) of protein gl provides a fascinating example of structural and functional conservation. No doubt further analysis will provide insight into the nature of the evolutionary mechanisms used to accomplish this.

ACKNOWLEDGMENT We thank Jon Wiener for running computer searches and Mike Roner for help in formatting Fig. 2. This work was supported by the Medical Research Council of Canada. L.W.C. was supported by a grant from the National Science Foundation (DCB-8518044). R.D. and D.H. were fellows of the Alberta Heritage Foundation for Medical Research (AHFMR). P.W.K.L. is an AHFMR Scholar.

REFERENCES ARMSTRONG. G. D., PAUL, R. W., and LEE, P. W. K. (1984). Studies on reovirus receptors of L cells: Virus binding characteristics and comparison with reovirus receptors of erythrocytes. Virology 138, 37-48. BANERJEA,A. C., BRECHLING,K. A., RAY, C. A., ERICKSON,H.. PICKUP, D. T., and JOKLIK,W. K. (1988). High-level synthesis of biologically

409

REOVIRUS PROTEIN 01 CONSERVED DOMAINS active reovirus protein al in a mammalian expression vector system. virology 167,601-612. BASSEL-DUBY,R., JAYASURIYA, A., CHA~ERJEE,S., SONENBERG,N., MAIZEL, J. V., and FIELDS,B. N. (1985). Sequence of reovirus hemagglutinnin predicts a coiled-coil structure. Nature (London) 315.42 l-423. BASSEL-DUBY. R., NIBERT, M. L., HOMCY, C. J., FIELDS, B. N., and SAWUTZ, D. G. (1987). Evidence that the al protein of reovirus serotype 3 is a multimer. /. Viral. 61, 1834-l 841. BIGGIN, M. D., GIBSON, T. J., and HONG, G. F. (1983). Buffer gradient gels and 35S label as an aid to rapid DNA sequence determination. Proc. Nat/. Acad Sci. USA 80,3963-3965. BRUCK, C., Co, M. S., SLAOUI. M., GAULTON, G. N., SMITH, T., FIELDS, B. N.. MULLINS, J. I., and GREENE, M. I. (1986). Nucleic acid sequence of an internal image-bearing monoclonal anti-idiotype and Its comparison to the sequence of the external antigen. Proc. Nat/. Acad. Sci. USA83,6578-6582. BURSTIN, S. J.. SPRIGGS,D. R., and FIELDS, B. N. (1982). Evidence for functional domains on the reovirus type 3 hemagglutinin. Virology 117,146-155. CASHDOLLAR, L. W., CHMELO, R. A., WIENER, J. R., and JOKLIK,W. K. (1985). Sequence of the Sl genes of the three serotypes of reovirus. Proc. Natl. Acad. Sci. USA 82, 24-28. DEVEREUX,J., HAEBERLI,P., and SMITHIES,0. (1984). Acomprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res. 12,387-395. ERNST, H., and SHATKIN, A. J. (1985). Reovirus hemagglutinin mRNA codes for two polypeptides in overlapping reading frames. Proc. Nat/. Acad. Sci. USA 82,48-52. FINBERG,R., WEINER,H. L., FIELDS,B. N., BENACERRAF, B., and BANKAKOFF, S. J. (1979). Generation of cytolytic T lymphocytes after reovirus infection: Role of Sl gene. Proc. Nat/. Acad. Sci. USA 76, 442-446. FONTANA, A., and WEINER, H. L. (1980). Interaction of reovirus with cell surface receptors. II. Generation of suppressorT cells by hemagglutinin or reovirus type 3.1. Immunol. 125, 2660-2664. FURLONG,D. B., NIBERT, M. L., and FIELDS, B. N. (1988). 01 protein of mammalian reoviruses extends from the surfaces of viral particles. 1. Viral. 62, 246-256. GARNIER, J., OSQUTHORPE,D. J., and ROBSON, B. (1978). Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular pr0teins.l. Mol. Biol. 120,97-l 20. GENTSCH, J. R., and PACITTI, A. F. (1987). Effect of neuraminidase treatment of cells and effect of soluble glycoproteins on type 3 reovirus attachment to murine L cells. J. Viral. 56, 356-364. JACOBS, B. L., ATWATER,J. A., MUNEMISTSU, J. M., and SAMUEL, C. E. (1985). Biosynthesis of reovirus-specified polypeptides: The Sl mRNA synthesized in viva is structurally and functionally indistinguishable from in vitro-synthesized Sl mRNA and encodes two polypeptides, ala and al bNS. virology 147,9-l 8. KUGE, S., KAWAMURA,N., and NOMOTO. A. (1989). Strong inclination toward transitlon mutation in nucleotide substitutions by poliovirus replicase. J. Mol. Biol. 207, 175-282. LEE, P. W. K., HAYES, E. C., and JOKLIK,W. K. (1981). Protein al is the reovirus cell attachment protein. Virology 108, 156-l 63. LI, J. K.-K., SCHEIBLE.P. P., KEENE,J. D., and JOKLIK,W. K. (1980). Nature of the y-terminal sequences of the plus and minus strands of the Sl gene of reovirus serotypes 1,2, and 3. virology 105,41-5 1. MESSING, J. (1983). New Ml3 vectors for cloning. In “Methods in Enzymology” (R. Wu, L. Grossman, and K. Moldave, Eds.), Vol. 101, pp. 20-78. Academic Press, New York. MUNEMITSU, J.-M., ATWATER.J. A., and SAMUEL, C. E. (1986). Biosynthesis of reovirus-specified polypeptides: Molecular cDNA cloning and nucleotide sequence of the reovirus serotype 1 Lang strain bicistronic sl mRNA which encodes the minor capsid polypeptide

ala and the nonstructural polypeptide 01 bNS. Biochem. Biophys. Res. Commun. 140,508-5 14. NAGATA, L., MASRI, S. A., MAH, D. C. W., and LEE, P. W. K. (1984). Molecular cloning and sequencing of the reovirus (serotype 3) St gene which encodes the viral cell attachment protein 01. Nucleic Acids Res. 12,8699-8710. NAGATA, L., MASRI, S. A., PON, R. T., and LEE, P. W. K. (1987). Analysis of functional domains on reovirus cell attachment protein 01 using cloned Sl gene deletion mutants. Virology 160, 162-l 68. NEEDLEMAN, S. B., and WUNSCH, C. D. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48,443-453. PACITTI, A. F., and GENTSCH,J. R. (1987). Inhibition of reovirus type 3 binding to host cells by sialylated glycoproteins is mediated through the viral attachment protein. J. Viral. 61, 1407-l 415. PAUL, R. W., and LEE, P. W. K. (1987). Glycophorin is the reovirus receptor on human erythrocytes. Virology 159,94-l 01. PAUL, R. W., CHOI, A. H. C., and LEE, P. W. K. (1989). The cY-anomeric form of sialic acid is the minimal determinant recognized by reovirus. Virology 172, 382-385. PUSTELL,J., and KAFATOS, F. C. (1984). A convenient and adaptable package of computer programs for DNA and protein sequence management, analysis, and homology determination. Nucleic Acids Res. 12,643-655. ROSEN, L. (1960). Serologic grouping of reoviruses by hemagglutination-inhibition. Amer. J. Hyg. 71, 242-249. SARKAR, G., PELLETIER,J., BASSEL-DUBY, K., JAYASURIYA,A., FIELDS, B. N., and SONENBERG,N. (1985). Identification of a new polypeptide coded by reovirus gene Sl .J. Viral. 54, 720-725. SPRIGGS,D. R.. KAYE, K., and FIELDS, B. N. (1983). Topological analysis of the reovirus type 3 hemagglutinin. Virology 127, 220-224. TABOR. S.. and RICHARDSON,C. C. (1987). DNA sequence analysis with a modified bacteriophage T7 DNA polymerase. Proc. Nat/. Acad. Sci. USA 84,4767-4771. WEINER, H. L., DRAYNA. D., AVERILL, D. R., JR., and FIELDS,B. N. (1977). Molecular basis of reovirus virulence: Role of the Sl gene. Proc. Nat/. Acad. Sci. USA 74,5744-5748. WEINER, H. L., and FIELDS, B. N. (1977). Neutralization of reovirus: The gene responsible for the neutralization antigen. J. Exp. Med. 146,1305-1310. WEINER, H. L., RAMIG. R. F., MUSTOE. T. A., and FIELDS, B. M. (1978). Identification of the gene coding for the hemagglutinin of reovirus. Virology 86, 581-584. WEINER, H. L.. GREENE,M. I., and FIELDS, B. N. (1980). Delayed hypersensitivity in mice infected with reovirus. I. Identification of host and viral gene products responsible for the immune response. 1. Immunol. 125,278-282. WIENER,1. R.. and JOKLIK,W. K. (1989). The sequences of the reovirus serotype 1, 2, and 3 Ll genome segments and analysis of the mode of divergence of the reovirus serotypes. L&o/logy 169, 194-203. WILLIAMS, W. V.. GUY, H. R., RUBIN. D. H., ROBEY. F.. MYERS, J. M., KIEBER-EMMONS, T.. WEINER, D. B., and GREENE, M. I. (1988). Sequences of the cell-attachment sites of reovirus type 3 and its antiidiotypic/antireceptor antibody: Modeling of their three-dimensional structures. Proc. Nat/. Acad. Sci. USA 85, 6488-6492. YEUNG. M. C., GILL, M. J., SULEIMAN, S. A., SHAHRABADI, M. S.. and LEE, P. W. K. (1987). Purification and characterization of the reovirus cell attachment protein ~1. Virology 156.377-385. YEUNG, M. C.. LIM. D., DUNCAN,

R., SHAHRABADI.

M. S., CASHDOLLAR,

L. W.. and LEE, P. W. K. (1989). The cell attachment proteins of type 1 and type 3 reovirus are differentially susceptible to trypsin and chymotrypsin. Virology 170, 62-70.