VIROLOGY
165,601-605
(1988)
EIAV Genomic
JUDITH *Department
M.
BALL,*
Organization: Further Characterization of Purified Glycoproteins and cDNA’
SUSAN
L.
PAYNE,*
of Biochemistry! and tDepartments and Louisiana Agricultural Received
CHARLES
of Veterinary Experiment
December
J.
Science Station,
lssEL,t
AND
RONALD
and Veterinary Baton Rouge,
2 1I 1987: accepted
April
by Sequencing
C.
MONTELARO*~*
Microbiology, Louisiana Louisiana 70803
Stare
University
19, 1988
Nucleotide sequence analyses of two different proviral clones of equine infectious anemia virus (EIAV), designated X12 (K. Rushlow et a/., 1986, virology 155,309-321) and 1369 (T. Kawakami eta/., 1987, Virology 158,300-312), indicate significant differences in the organization of two critical regions of the viral genome, i.e., in the short open reading frames in the pal-env intergenic region and in the 5’-end of the env gene. To determine the correct structure of the EIAV genome, we have performed nucleotide sequence analyses of cDNA clones produced from viral RNA and direct sequencing of purified EIAV envelope glycoproteins (gp90 and gp45). The results of the cDNA sequencing confirm the presence of two short open reading frames in the pal-env intergenic region, as reported previously for the X12 clone. The protein sequencing data correlated exactly with the amino-terminal sequences of gp90 and gp45 deduced from X1* nucleotide sequences. However, the protein sequencing also revealed that the putative signal sequence of EIAV gp90 is not removed during processing. Thus, EIAV apparently contains short open reading frames analogous to human immunodeficiency virus, but differs in its mode of env polyprotein processing. o 1998Academic PESS, IN.
Based on similarities in morphology, serology, target cells, genomic organization, and gag and pal gene sequences, the lentivirus subfamily currently consists of visna virus of sheep, caprine arthritis-encephalitis virus of goats, equine infectious anemia virus (EIAV) of horses, and the human (HIV) and simian (SIV) immunodeficiency viruses ( I-5). Recent isolates of feline T-lymphotropic virus (6) and bovine immunodeficiency virus (7) apparently will also be classified as lentiviruses. Two properties that distinguish lentiviruses from other retroviruses and that evidently are important for viral persistence and pathogenicity are env gene variability and the short open reading frames observed in lentivirus genomes. Thus special attention has been focused on these gene regions. We previously reported the nucleotide sequence of the LTR, env gene, and pal-env intergenic region of a proviral clone X1* derived from cells infected with the Malmquist cell-adapted strain of EIAV (8). These sequences, together with the gag-pol sequences reported by Stephens ef al. (4) for the same provirus clone, provided the first complete elucidation of the EIAV genome. The results of these studies demonstrated a genomic organization for EIAV similar to that reported for HIV, i.e., overlapping gag and pal genes, nonoverlapping env gene, two short open reading ’ Approved for publication by the Director of the Louisiana tural Experiment Station as publication number 88-12-2035. ’ To whom requests for reprints should be addressed.
frames (orfs), Sl and S2, in the pal-env intergenic region, and a third orf (S3) contained in the 3’-env gene sequences. More recently Kawakami eta/. (9) reported the nucleotide sequence of an independent EIAV provirus clone (designated 1369) also derived from the Malmquist cell-adapted strain of EIAV. The latter proviral sequences differ from the initial XI2 sequences in two significant aspects. First, the S2 orf identified by Rushlow eta/. (8) is not present in the proviral clone described by Kawakami et al. (9). Second, a comparison of deduced amino acid sequences for the respective env genes of the two provirus clones reveals marked differences (90%) in the first 1 1 amino acids. These differences can be attributed to a single base change in the nucleic acid sequence. The EIAV genome mediates trans-activation by transcriptional and post-transcriptional mechanisms and this activity has been mapped to the region encompassing the 3’- and 5’-termini of the pal and env genes, respectively (10, 11). Given the importance of the EIAV orf and env genes, we sought to achieve a definitive elucidation of the EIAV genome structure and to resolve the differences in reported sequences. This was accomplished by direct amino acid sequencing of purified env gene products, gp90 and gp45, and by sequencing cDNA clones representative of the pal-env intergenic region. Absolute confirmation of deduced EIAV envgene sequences was obtained by direct protein sequencing of
Agricul-
601
0042.6822/88 Copyright All rights
$3.00
0 1988 by Academvz Press. Inc. of reproduction I” any form reserved
602
SHORT
COMMUNICATIONS
the mature viral env glycoproteins. The EIAV gp90 and gp45 glycoproteins were purified to homogeneity from the prototype strain of EIAV (8, 7) employing a recently described high-pressure liquid chromatography (HPLC) procedure (12). The SDS-PAGE profile of the purified glycoproteins (Fig. 1) indicates a single protein species representative of the purity of these protein preparations. The isolated proteins were subsequently subjected to protein sequencing with an Applied Biosystems 470-A gas-phase sequencer. Resultant phenylthiohydantoin (PTH)-amino acid derivatives were identified by reverse-phase HPLC. Figure 2 shows the chromatograms of cycles l-4 and 10 for each glycoprotein. The first 18 N-terminal amino acids of gp90
ABVS
301:
101
FIG. 1. SDS-PAGE analysis of EIAV glycoproteins purified by reverse-phase HPLC ( 12). Protein bands were visualized by Coomassie blue staining. Lane A contains purified gp90 and lane B contains gp45. EIAV (V) and protein molecular weight standards (S) are indicated and included as references.
(YGGIPGGISTPITQQSEK) and the first 20 N-terminal amino acids of gp45 (DFGISAIVAAIVAATAIAAS) were readily identified using these techniques. The presence of a single amino acid peak in each cycle confirms the purity of the protein preparations. The sequencing method utilized is sensitive to the picomole level, thus minor contaminants could easily be identified. As seen in Fig. 2, no contaminating peaks are visible indicating the high degree of purity of both glycoproteins. The amino-terminal sequences determined for each purified glycoprotein correspond exactly to the sequences deduced from our proviral clone (Fig. 3) confirming the identity of the env gene as predicted by nucleotide sequencing. In addition, direct protein sequencing reveals processing events that occur during the production of infectious virions. An unexpected finding is that the N-terminal residue of gp90 is tyrosine, the 7th residue encoded by the env gene. The gp90 amino terminus was previously predicted to be glutamate, the 16th residue, based on the assumption that the initial 15 hydrophobic or uncharged residues constitute a signal peptide removed during polyprotein processing. The elucidation of the gp45 N-terminal residue as aspartate confirms the previously predicted cleavage site between the env gp90 and gp45 components (8). It has generally been assumed that the signal peptides of lentiviruses can be identified from nucleotide sequence data and that these signal sequences are removed completely during processing of the envelope polyprotein to mature viral proteins. However, a number of signal peptides are retained in a variety of secreted or membrane-associated proteins (13). In HIV, the entire 30 amino acid signal peptide is indeed removed during processing of gpl60 to gpl20 and gp41 as confirmed by amino acid sequencing (74). The first 25 to 35 amino acids of the SIV gpll0 are hydrophobic in nature, but lack the hydrophilic residues that usually precede the hydrophobic residues of a signal peptide (15). In contrast, in visna virus the putative signal sequence is a stretch of 25 hydrophobic amino acids that is preceded by 78 hydrophilic residues. Analysis of the amino-terminal amino acid sequences of these respective envelope proteins will help determine if complete or partial signal peptide sequence removal is the usual route of lentivirus envprotein processing. In the case of EIAV, the gp90 amino terminus appears to be a highly conserved region among serologically distinct virus isolates (16) making it an interesting candidate for vaccine development. Confirmation of envgene proviral sequences and the pal-env intergenic region was further accomplished by sequencing cDNA clones generated from gradient-purified viral RNA isolated from the prototype strain of
SHORT
9P 90 cycle I
a
b
DMPTU
J
603
COMMUNICATIONS
9P 45
DPTU
t
I
5
IO TIME
w
15
5
(minutes)
IO
TIME
I5
(mrnutes)
FIG. 2. Amino acid sequencing of purified EIAV glycoproteins. HPLC purified gp90 and gp45 (cf. Fig. 1) were analyzed by protein sequencing on an automated Applied Biosystems 470-A gas-phase sequencer and the resultant phenylthiohydantoin (PTH)-amino acid derivatives identified by RP-HPLC, as described (12). Edman degradation cycles 1 through 4 and cycle 10 are presented for each glycoprotein. Cycle 10 was included to demonstrate the low background and high repetitive yield after 10 cycles. (a) N-terminal sequence analysis of gp90. Eighteen residues from the N-terminus (YGGIPGGISTPITQQSEK) were identified and were in complete agreement with the deduced amino acid sequence (8). (b) Nterminal sequence analysis of gp45. Twenty residues from the N-terminus (DFGISAIVAAIVAATAIAAS) were identified and were in complete agreement with the deduced amino acid sequence (8).
previously used for provirus isolation (7, 8, 16). EIAV-specific cDNA was synthesized by reverse transcription of the viral genomic RNA using oligo(dT) primers, cloned into M 13 vectors, and sequenced by the dideoxy chain termination method as described (16). The results of these cDNA sequencing studies (Fig. 3) reveal an env gene region with 99% homology to that reported previously for the XI2 proviral clone, invirus
or-f Sl and S2. The deduced amino acid sequences for ot-fs Sl, S2, and the amino terminus of gp90 share 100, 98, and 100% homology, respectively, between the proviral and cDNA clones (Fig. 3). The cDNA nucleotide and glycoprotein amino acid sequencing data presented here confirm the genomic sequences reported previously for the X,2 proviral clone of EIAV (8). Thus, it appears tha the proviral seeluding
604
SHORT 10 Al2
provirus
20
COMMUNICATIONS
30
40
50
60
70
80
90
CCATTAACCAGCACTAACTTACTAATAAAGCCAAATTCACTATTGTTGCAGGAAGCAAGACCCAACTACCATTGTCAGCTGTGTTTCCTG ProLeuThrArgThrLysLeuLeuIleLysProAsnEndValLeuLeuGlnGl~lsArgProAsnTyrHisCysGlnLeuCysPheLeu
512
cONA
.._....................................................................................... POL STOP
100 Al2
provirus
110
120
Sl
START-
130
140
150
160
170
180
AGGTCTCTAGGAATTGATTACCTCGATGCTTCATTAAGGMGAAGAATAAACAAAGACTGAAGGCAATCCAACMGGAAGACAACCTCAA ArgSerLeuGlyIleAspTyrLeuAspAlaSerLeuArgLysLysAsnLysGlnArgLeuLysAlalleGlnGlnGlyArgGlnProGln
cDNA
512
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . ..
190 Al2
provirus
200
210
220
230
240
250
260
270
TATTTGTTATAAGGTTTGATATATGGGATTATTTGGTAAAGGGGTAACATGGTCAGCATCGCATTCTATGGGGGGATCCCAGGGGGAATC TyrLeuLeuEnd
7 yrGlyGlyIleProGlyGlylle
ENV START-ArgGlyAst-#~ValSerIleAlaPhe
S2 START-TyrMetGlyLeuPheGlyLysGlyValThrTrpSerAlaSerHisSerMetGlyGlySerGlnGlyGluSe
cDNA
512
. . . . . . . . . . . . . . . . . . . . . . . . . . ..G............................................................. Vat 280
A12
provirus
290
300
310
330
320
340
350
360
TCAACCCCTATTACCCAACAGTCAGAAAAATCTAAGTGTGAGGAGAACACAATGTTTCAACCTTATTGTTATAATAATGACAGTAAGAAC SerThrProlleThrGlnGlnSerGluLysSerLysCysGluGl~snThrMetPheGlnProTyrCysTyrAs~s~spSerLysAsn rGlnProLeuLeuProAsnSerGlnLysAsnLeuSerValArgArgThrGlnCysPheAsnLeu~leValIleIl~etThrValArgTh
512
cDNA
. . . . .._...................................................................................
370 ,412
provirus
380
390
400
410
420
430
440
450
AGCATGGCAGAATCGAAGGAAGCAAGAGACCAAGAAATGAACCTGAAAGAAGAATCTAAAGAAGAAAAAAGAAGAAATGACTGGTGGAAA SerMetAlaGluSerLysGluAlaArgAspGlnGl~etAsnLeuLysGluGluSerLysGluGluLysArgArgAsnAspTrpTrpLys rAlaTrpGlnAsnArgArgLysGlnGluThrLysLysEnd
512
cDNA
. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .
FIG. 3. Comparison of nucleotide sequence data obtained from proviral and cDNA clones of EIAV. The sequence of the pal-env intergenic region and the first 77 codons of the env gene are shown. Dots (.) indicate sequence identity between the cDNA clone and the provirus clone. Arrows mark the start of the Sl , S2, and env open reading frames. The envgene initiation methionine is underlined. The amino terminus of gp90 (as determined by protein sequencing) is indicated with an inverted triangle(v).
quences reported by Kawakami et al. (9) may represent a defect in that particular proviral clone. This is not surprising, as lentivirus proviral clones are frequently defective (3, 4, 8, 9, 1 I). In this regard, a comparison of the two reported proviral sequences reveals a single base deletion in the nucleotide sequence of the 1369 clone relative to the XI2 proviral clone. The resulting frame shift leads to an altered 5’-env gene sequence and the termination of or-f S2 after 34 codons; addition of a G residue at position 5333 of the clone 1369 sequence of Kawakami et al. (9) opens the S2 reading frame through 66 codons and changes the amino-terminal env sequences to match exactly with the X1* clone. In summary, the data presented here clarify the genomic organization of EIAV and confirm previous conclusions of its similarity to HIV, particularly with regard to the presence of the two short open reading frames in the pal-env intergenic region. Moreover, the gp90 and gp45 amino acid sequencing data provide a com-
plete model for the processing antigens.
of these important
viral
ACKNOWLEDGMENTS This study was supported by the Louisiana Agricultural Experiment Station, the Louisiana State University School of Veterinary Medicine, Public Health Service Grant CA-38851 from the National Cancer Institute, Grant GAM 6502128 from the United States Department of Agriculture, and Public Health Service Grant Al-25850 from the National Institute of Allergy and Infectious Diseases. We acknowledge Dr. V. S. V. Rao for his excellent technical assistance with automatic protein sequencing.
REFERENCES 1.
CHIU,
I,
M.,
TRONICK,
YANIV,
A.,
DAHLBERG,
S. R., and AARONSON,
1. E.,
GAZIT,
A., Nature
A., SKUNTZ,
(London)
S. F.,
317,366-
368 (1985). 2.
MURPHEY-CORB,
G. B., GORMUS, MONTELARO,
M., MARTIN, B. J., WOLF, R. C.. Nature
L. N., RANGAN, S. R. S., BASKIN, R. H., ANDES, W. A., WEST, M., and
(London)
321,435-437
(1986).
SHORT
COMMUNICATIONS
3. SONIGO. P., ALIZON, M., STASKUS, K., KLATZMANN, D., COLE, S., DANOS, O., ETRZEL, E., TIOLL~IS, P., HMSE, A., and WAIN-HOBSON, S., Cell42, 369-382 (1985). 4. STEPHENS, R. M., CASEY, 1. W., and RICE, N. R., Science 231, 589-594 (1986). 5. YANIV, E., DAHLBERG, J. E., TRONICK, S. R., CHIU, I. M., and AARONSON, s. A., viro/ogy 145, 340-345 (1985). 6. PEDERSEN, N. C., Ho, E. W., BROWN, M. L., and YAMAMOTO, J. K.. Science235,790-793 (1987). 7. GONDA, M. A., BRAUN, M. J., CARTER, S. G., KOST, T. A., BESS, JR., J. W., ARTHUR, L. O., and VAN DER MAATEN, M. J., Nature (London)330,388-390 (1987). 8. RUSHLOW, K., OLSEN, K., STIEGLER, G., PAYNE, S. L., MONTELARO, R. C., and ISSEL, C. J., Virology 155, 309-321 (1986). 9. KAWAKAMI, T., SHERMAN, L., DAHLBERG, J., GAZIT, A., YANIV, A., TRONICK, S. R., and AARONSON, S. A., I/iro/ogy 158, 300-312 (1987).
605
10. DERSE, D., DORN, P. L., LEVY, L., STEPHENS, R. M., RICE, N. R.. and CASEY, J. W., 1. Viral. 61,743-747 (1987). 17. SHERMAN, L., GAZIT, A., YANIV, A., KAWAKAMI, T.. DAHLBERG, J. E., and TRONICK, S T., J. viral. 62, 120-l 26 (1988). 12. BALL, J. M., RAO, V. S. V., ROBEY, W. G., ISSEL, C. J., and MONTELARO, R. C.. J. Viral. Methods 19, 265-277 (1988). 73. WICKNER, W. T., and LODISH, H. F., Science 230, 400-407 (1985). 14. ALLAN, J. S., COLIGAN, I. E., BARIN, F., MCLANE, M. F., SODROSKI, 1. G., ROSEN, C. A., HASELTINE, W. A., LEE, T. H., and ESSEX, M.,Science228,1091-1094(1985). 15. HIRSCH, V., RIEDEL, (1987).
N., and
MULLINS,
J. I., Cell 49,
309-319
16. PAYNE, S. L., FANG, F., LIU, C., DHRUVA, B. R., RWAMBO, P., ISSEL, C. J., and MONTELARO, R. C., virology 161,321-331 (1987).