VIROLOGY
177,40 l-405
(1990)
Genome Organization
and Taxonomic Position of Human Papillomavirus Inferred from Its DNA Sequence’
Type 47
TOHRU KIYONO,* AYUMI AoAcHl,t AND MASAHIDE ISHIBASHI*.* *Laboratory
of Viral Oncology, Research Institute, Aichi Cancer Center, Chikusa-ku, Nagoya 464, Japan; and tDepartment Nagoya University, School of Medicine, 65, Tsurumai-cho, Showa-ku, Nagoyo 466, Japan Received January 5. 1990; accepted
of Dermatology,
March 16,199O
The complete nucleotide sequence of human papillomavirus type 47 (HPV-47) DNA isolated from the lesion of epidermodysplasia verruciformis (EV) was determined. The computer-aided comparison of HPV-47 with other EV-associated viruses using the available sequence data on them revealed that HPV-47 resembles both HPV-5 and HPV-8 as much as HPV-5 and HPV-8 resemble each other, and it led us to regard these three viruses as one cluster and HPV-19 and HPV-25 as another. The conclusion implies that HPV-47 as well as HPV-5 and HPV-8 is associated with the cancer occurrence in EV. Two sets of splicing donor and acceptor sequences in HPV-47, which were previously shown to work in viva, are also conserved in HPV-5 and HPV-8. One of them allows formation of an ORF predicted to encode an El/ E4 fused protein. 0 1990 Academic Press, Inc.
Human papillomavirus type 47 (HPV-47) was isolated first from scrapes of the benign lesion of a patient who had suffered from epidermodysplasia verruciformis (EV) accompanied with skin cancer, and our comparison of the viral DNA with other papillomavirus DNAs using DNA hybridization techniques (1) led us to conclude that this virus belongs to one of three subgroups of EV-associated papillomaviruses, Dl, to which viruses such as HPV-5 and HPV-8 also belong (2). To confirm the above conclusion and to infer the genetic organization of the virus, which is essential for its molecular-biological study (3), we determined the complete nucleotide sequence of the viral DNA. Here, we report it, and refer to several lines of inference from its comparison with those of several other papillomaviruses. Four segments of HPV-47 DNA delimited by Pstl sites were subcloned into pTZ18R (U.S. Biochemical Corp.) in either direction, and the plasmids containing nested deletions in the viral DNA segment were prepared by the exonuclease III method (4). Double-or single-stranded plasmid DNAs (5) were sequenced by dideoxy chain termination method (6) employing the Klenow enzyme or the modified T7 DNA polymerase (Sequenase kit of U.S. Biochemical Corp.). Thus, both strands of viral DNA were sequenced completely. The sequence data were compiled and analyzed using the
Genetyx software package (Software Development Corp.). The nucleotide sequence of HPV-47 DNA (Fig. 1) consists of 7726 nucleotide pairs, 42% of which are G: C pairs. When the whole nucleotide sequence of HPV47 was aligned with that of HPV-5 (7) or HPV-8 (8) using an algorithm of Wilbur and Lipman (9), HPV-47 showed a high degree of homology to HPV-5 (79%) or HPV-8 (770/o), a degree comparable to that between HPV-5 and HPV-8 (77%). Like the other papillomavirus DNAs whose total nucleotide sequence has been elucidated, HPV-47 DNA has large open reading frames (ORFs) preferentially on one of the two strands (Fig. 2), showing a good agreement with the exclusive hybridization of viral RNAs with the other strand of viral DNA (3). Thus, we have called this strand a sense strand, and the other an antisense strand, and given individual ORFs on the sense strand the same nomenclatures (Table 1) as those of HPV-5 and HPV-8 (7, 8) with which they show significant homology (see below). Like HPV8, HPV-47 has no equivalents to ORF E5 or the ORF on the antisense strand of HPV-5 described by Zachow et a/. (8). Consequently, it has become clear that HPV-47 ORFs are quite similar to those of HPV-5 and HPV-8 in terms of size and relative location on the viral genome. In addition, the upstream regulatory region (URR), which we operationally define as the segment delimited with the 3’-end of Ll and the first ATG of E6, is also similar to that of HPV-5 and HPV-8 in terms of size and organization of several signal sequences, such as relative location of CAAT box, TATA box, and PyGCCAA direct repeats (7, 8, 10).
’ Sequence data from this article have been deposited with the EMBUGenBank Data Libraries under Accession No. M32305. ’ To whom reprint requests should be addressed.
401
0042-6822/90
$3.00
Copyright 0 1990 by Academic Press. Inc. All rights of reproductton I” any form reserved.
SHORT COMMUNICATIONS 1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301 1401 1501 1601 1701 1801 1901 2001 2101 2201 2301 2401 2501 2601 2701 2801 2901 3001 ^._. JlUl 3201 3301 3401 3501 3601 3701 3801 3901 4001 4101 4201 4301 4401 4501 4601 4701 4801 4901 5001 5101 5201 5301 5401 5501 5601 5701 5801 5901 6001 6101 6201 6301 6401 6501 6601 6701 6801 6901 7001 7101 7201 7301 7401 7501 7601 7701 FIG
AACGGTAAGT ATCGTTTTCG AAGACAMTG CCTTTAGTAG AAGACTACAG TGAGCTAGCT GGACTTCCCT GTTCTGGAGT TCGACAGAGT ATTTCAGGAA GTTTGGTGAT TTAGACAATT ATCTTAGTCC CAGCGGACTC TCAGGGGATG ATGMTTGAC GCTGTTGCAA CATAAGCTAA GATGTATGGG TCTGTCAGTC GTTGCATGGC CTGAGTGGAT TTTGGCTGCT TTAATAAAAG ATGTTACAGA AACAAAGTTT AATCCATTTC GTGACCAAGA TAAAGGCACA GCCTGCATTA ACCTTAGTGG ~.~~~ CAAATGCTAA CTACCTATAT ACTGTGTTTA CCACTGACAC GCGACCGCAA TCCAGGTCTA CCACAAGGCA GAGCAGTGGG GTGTCTCCTA TTCTTGTGCG GGTAGCTGGA CCAAAAGGAG TACGTTTTTT GcACTTGCCC CATAGGAACA TCTCTTGTTC CGTCTGGTGC GACAGGTTCC CTCACTGAAT ACGAAATTGA TGCAGTAAGG AGATTCTCAT ATATTAAACA TGCACAAATA GCTACTATTG TTTTGGATGA AAGTAGTTCC GTAGTTGTCA AGATGGCAGT TATCTATTAT TCAGGTAATC TGTGGGCCTG TAATTCCTAT GAACACTGGG ACATTGGTTT CAMLATGCAA GATGACATAC TGTATTTCCC TCTGTGGGCT CAGGATTATA AAGTTTTAGC ATATCTAGAA GAGCGCCTTT CTTACAGGGG GTCTGTTTAC CACACCGTGT TGTTTTGGCA
TTGCATTAAT TACCTTACCT GCTCAGAAGG ATTGTTTGCT TGTTTATGCC ACAGGCCTTT TTCACAAAGT TARGTGAGGT CGTTTTCAAA CTTTTGACTG TGGTGTATTT GTGACCTGGA AAAAGCTGTC GAGTTAACCT TAGATATTCA AAGACAATTC CAGCATTGTG TTACCACAAT ACCTGGAGTG ATGGTTCAAT TTGCACATAA ATATACAAGG TTAAAAGATT TTCTAAGGGG TCCATGTTGG CCACCTTTAA CTATGAAAGC AGACGAGGGC AATTTTACAT GCAATATCTG ACACTAGTAC TTTGTATACT GGAACATTTA CTCCTGTCAC CTCGCCCAGA ACGCACCAM, CGTCGCTCAA AAGATCGCGA GGGAGAGAGC GCGAGGTGGG AGGGGACGCA GATAGCATTG TCGAGTGGTC TATTTTACTG CTCGGACGTT GGCCGAGGGA CTGAAGCAAT TGATTTACTT AGTGCTGTTC CAACACCTGC ACTTACTGAG CGACGGGGCT TTGACAATCC ATTGGGCCGT GGTTCTCAGG TTCAAGGTCC GACTGTGGAG TATTATGTTC TTCACACCCA GTGGCACTCG CATGCAAATA AGCATAGGGT CAGGGGTCTA ATCACAAACT ATAAGGCAGA TGGCAACATT AATGATGTTT CAGGAGCACA AACTGTCAGT AATCAAATGT ATGCAGACM ACAAATTAAT TCTTTGGCCA CCCTGGATTT GTCCATCAGA TGTGACTMG TCGTCGCAAC CTACAAGACA
1. The complete nucleotide
GTACCAGGTG ACAGTATTTT CTTTGGAACA ACCTTGCAAC TGCTGCCGTT CCATTTTTGA AAGAAACGCC TCAACCTGAA GTGATTGCAC GTGATCTGCA TGGAAGCTGA TCAGGGCAAT GCGCAGCTTA TTAACAATGA TTATACAGCA AAAAGCTACA ACTATATATG GTTAAATGTG TTCACCCACG GGGCATTTGA TAACCAAGCT ATACATGAAG TATTACATTC GAGAGTATTA GTGTATATGG TACTTACATC AGATAATACA GAACATGGAG TGGCAGACAT AGGCAAGGGC
AGAGACTTTT ATGTGGACAT AACACTATTA TAGCTCCACA CGCCAATCCA GGCGATCCAG CAAGACTCGT TCACCCTCCA AGCGACGGAG AAAGCAACTT AACACATTAA AGCGTCTAGG ATATGGTAGT ATTTTGTACT GTTAATAAAG CTGGGGGTGC TGGACCAGTT CCCGGTGAAG TGGAAGTGGC GCAGGGCGAG TTTCCAAGCA TCTCATTAAC AGCTTTTGAA CCTCAATATT TACACTTTTA TGTAGAAAGC GATTTTAGTG AAGACACAGA TGACAATTCT GCTAACGGTA CTGACCGCCT GTTTCGCTTA GAAATTGGAA CAAAAGATGA GCCTTGTGGG AATTTCAAGG ATGGGGATGC GGTTGGCAAT GGCTCACTAG TTGTCACAGT TTTTAGAGAG GCCATGAATT CTAGGTGTCC AGATCAATAT GGAACAAAGC TGTAACTCTG ACGCTCGGAT ACCGTT
sequence
CGGTACAGAT ATATTAATAT GACTACAGTT TTTTGTGGCA TGTGCTGCTC GATTGACATA TGGAAGGGTG GTATTACCAG CGTGCGGTTG GCTCCTCTGC CTGTAGTGAT TCACGGGAAC GTCCGCGTCT AGCTGAAGAT TTGTTGCGTT AAACCTGCTG GGTCCGTGGG CATGAGCAAC GTCCTTACCC TAACAATCTG AAATATGTTA TAGAGGGAGA AGTACCTAAA TCATTTGTAA ATCAATATTT TAATATTAAT CCTCAATTTG AATCTCAGCG TGCGAAAAGA CAAAGAGGCT AAGAGTGCTC TTGTGTATTA TGTGTTATTT CCACCAGGGT TCAATAAACA ATCCAGATCC GCTCGTTCCA CCTACACCTC AAGGAGATCA CGATCAGTTG MTGcTTTcG CAGGTCCAGA CTTGATAGCC GCAATGGCGC TTGAGCAAAC TACTGGGTAC GATATTTTAC TTGAAACTAT TCCAGAACCT AGTTCTCTTG GATATACATT AAATAGAAGA GAAGAGGTTA CTACAACACC TAGAGATTTA ACATTTATAG GGTCCCAATT TGGTTATTAT GGAGACTTTT AAGTATACCT TTTAACAGTA AAATTGCCAG GGGGTCAACC CAGACAAGAC GAACAGCAAA
CCTTACAACA TTGCTTTTTT GGTAATATGA TCTCTAGTGA TGTAGACAAC TATCAAAGAC CGTCTCTTTT TGAAAAGTCT TCATTAGGTA GCAAACGAAA CCAAAGAAAC TAGGTCTTCT
CATTTCACAA ATAAATAAAT AAAGAGGAAA GATTTCTTGA AGCAACTGCC AGGTGTCATA TTTGTAGGCA TTGACCTGTT CAGCTGCTGC CCAGAGTGCC GTTGAGGATG TATTTCATCA TGAGTCAATT GTTACTCCTG CCAGCAACCA TAATCATTGG ATAGATGCAA AGATATTGTC TGAATGGATT TTTGAGGAGG GAGAATGTGC AGGACAGTGG CGCAATTGTA ACTCCAAAAG AAGAAATGGG GTACATGCAG AGTTAACTGA AGCGTTTCAA AGCTGTGACA ATATATATGG CAGAAAATCA CATGGATTCA GCTGATGATG CACCAGGAGG GTCACAACAA CGGTCGCGGT GGTCAAGGTC AAAACGGTCA TTCTCAACCT GTGCAAAACA CAACAGAGCA ATGCTCATTA TTTAACAAGC GTGCTAGAAG AACAGTTGCT GTGCCACTTG CCATTGACAC AGCCGAAATA GTACCCCCTA CTGACCATAT TGAAATAGAA TTGGTACAAC CCAATATTTT AGCAGGTTAT AGTTCTATAA ATATGGACAT AGTAATTGGA GTTGCTTACC ACTTACATCC TCCTCCATCA GGACATCCAT ATCCTAATAG TTTAGGTGTT ACCTCTTTTG CTGGTCTTTG CAGTAGGTCT TATGCTCGTA MAATCAATT TGCTCAACTG ACAAGAAATA ATGTGGAGGA AGAGGAATGG CCTCCAAMG GAAAGTTCTT AAATTGAAGA AACCGCACCC GCCAAAAGAA
TGGATATTAT AAATATATAA AGCTAGAACT CTATTTAGAA ACATATGAAT CCTGCCTGTC GTGTAAGCAT TTGCGACGAG GAGGTCAAGC GTGGGAACTG ATTTGGGACA ACAGGAGTGT TCATTGTCAC AGGTGGAGGT AAAGGCCACA GTTGTATCCG TGTCATTATA TGAGCCTCCA GCACAATTAA CAGACATTGC TATGATGGTT TCTAGCATTG TTTTATTCCA TCAGTTTTGG TTAGATGGGC AGACCAATTA CCAAAGCTGG TGCTCTGCAA CTCTACTTTG TGTTGCAGTT TTTTAAAAAG GATGATGTGT CAAAGAGATA ACAAACAGAC ACCGAAACCA CCAGTTCTCA CACCTCCAGA CGGGAAGGAA CCCCTGACTC TTCAGGGCGA AGGAACAAAT GCTTTTCCTG ATTAACGCTG GGTCAAACGT GACAATATTT GGGAAGGTCC AATCGCACCT CATCCTATTC CACGTGTTAG TTTGGTCACC GAACCCACCC AAGTAGCTGT TGAACAGGAT ATTAGGGTAA ATACTGAGGA TGCTGARARC AATCGAAGGA CAGAGTCACG TAGTCTTAGA ACACCAGTGG ATTTCAATGT ATTTGCTCTA GGCAGTACTG ATCCTAAACA TCCTCCTATT GATGTTAGTC GAGAGCAATG TTACATTCCT TTTAACAGGC
CAAATTTCAG ATATGAAATT CAGTTAGGAT AGAAGGTTGA ATTCCAGGCT TGACCGTTTT GGTACACGTA ATTTAATCTT
TGTTGCCAAC ATGTGTATTT ACCTACTACT GTTTGTGAAT TTAATGTTTT ATTTCTTGAC TTTTACAATG GAATTACCAA TTCGCATTTT CAAACATGGC ATTATTTGAG AAGCAAAGCG CTCAGCAGAA ACCGGCTATA TTACTGGCAA TATATGCAGT TCTATTGTGT AAGTTAAGAA CCATTTTGGG ATACGGATAT CGATACTACA TTAAATTTTT TGGCCCTCCA TTGCAGCCTC ATTTTGTGTC TAGATACCTA AAATCTTTTT GAACAGCTAA CTAGGCAGAA AGAGTCGCTA GGGCCTGTAC GGCATAAGAC TAGTGCTACT CCAGACACCT AACGAAGAGG AACCCACTCT TCTACCAGCA ACACAAGGGG CTCCAAACGA CTTGGAAGGT ATAGAGGGCT CCTCACTCAG CTTTGCTACT GACTCTGTAA TGAAATATGG TGGTGTCCGT GTCGAGCCTA CTGAAGGTCC AATTGCTAGA TCAGGGTCTG CTCCACGAAA AGACAATCCT GTTAACAGCT GCAGACTAGG TCCAATAGAA CCTTTATCTG GTACAACATC GGACACTATT AGGCGTAAGC CCAGGGTTCA ATACAATAAT GCGGACATGT GTCACCCATA AATACAGATG GAATTAAAAA TTGACATTGT TTATGCCAGA GGTGCTACGG CATTCTGGCT CATCTCTGTT TCTGTAATAT TTGTGCCTAC CCCCTACAAA GGATTACAGC CGGTACAGAT TTCAGCTTGT GTTATCGTTT
TACCATAGTC ATTTCTCAGG AT'I'AGAGGCT TTGATTATM TTATCAACAA ATTATTGAAA ATTGGTAAAG ATGAACAACA TGTGAACGCA GGATTCTARA AGAGATACAG AGGAGCAATT ATCCAAGAGA GACTCTCGGC AATTCAAACA CCATGATGAT TTTAAGGCGG ATACAGCTGC CCATAAGAGT GCAAGACTGG AAAAGGGGCA
AAGATATCAA AATACAGGAA TTGGAGRATG TTTGGATTGT CATAGTAGAA TTACAAGGCT TGAACATTTA AGGCATAAAT CAAAAATCAG CTGTGGAGGT AACAAGTGGG GGAGAATGGG CCTCCAAGAC GTACGGACGG TCCACCACCA CCACCAGTAG CAGAGGGAGG GTCAGACGGG TATTGGAGGA TTTTAGATCA AGAAGGGATT AACTGCTATT CACATATATA CAGTGCTGGT GTGGGAGGAA CTGCTTCATC GACAATCGAC ACACAATATC GTGGACAAAG AAGTAGCACA TTATTTTTAA TTGAAGAACC AACTCGAGGC CTACAGCTTT AAACAATAGA ATATACTGTT GATATTATTT GTAAAAGAAA AAGCACGGAT RATGGAACTA CTGTATACAA TTTTMTAAG TTTATTGTGG ACACATACAT AAATGAAACT CATTTTTTTG GTCAGGCTCA CCAA,+GGGCT TACTCTCAGG TACAATTGTG TCCAGACAAC GGTTTAAACT AGACGACCGT TGTTTAACTT TGCcmCA TTGGCGATCA
ATAATCAAGT TCTTGCCTGT CTCAGTTCTT TGCAATTATT TAGCTCAATT GTTAGACATA MAGCTTACT TTAATTTGGA ACAGTGTTAG GTAGAGATAT AGTTAGATAG CTGTGGAAGA AGGTCACCGT GCGAGATATT GGCGGAGGAG GAGCTAGACA ACAAACCGTG GCATCAGGAC GGTAGTACAT CTAAAGAAGG ACTCAGATAT CTCGGACCTG ACAAAAACTA AAACGAAAGT AGGCTCTTTG CAGAGCAAGA CGGATGATGA TGAGGGAGGA AGCGTTTGGG GTAGGCTTTA CTATTTGAAA GCTCAMGCA GL4AAAA TCG TGGGACAGTT TGCATTATTT TGGTACAAAG CCTGAGGCAA GTGCGTTTGA CACCAGAGGA TAGCAATGCA AATGAGAGAT ATGAGCATGT GAAATAAATT TTATTTCATT AGTCATCGTT TGGAATGTCC TAAAATAGCA TTATTAGATG AAATATAGAG CACCCATGCA TTAAGGGTTT TGAATTTAAA TTGGACACAC TTAGACCTGA TGAAGCTGCA GAACAGACAT AGGTTGGGAT ACCAACCAGT CGTTTGCTTT GGAGCCTTGG GATATATGAC AAAGATGAAG GTCAATCAM CTGGCATTTA AAGTTAAAGT TAATAAGGAA CCCCACCACC ACCACAGCCG AGACCATCAA GCAGAACAAG CCACCACCAC CACCTACAGG AAGGGGAGGT AGAGGGTCAT GGGAGACMG GGAGAGCAGG AGTCTCCTAA ATACCGTGGC AGCTAGGGAC CCCCCAGTAA TTcAGcACTA CATTTTCCTG TTGATGATGC TGTCAAATAT AACAACCACA GCTTTTTTTT TCAGACCTGC AAACAGGCAG GTCTTTTTTG GAGGCCTTGG CCCCAACGGT TGTAAGGCCT TTTAGTCCCA TTAACAGAGT TCCCCTGTAG TCACCACAAC ATAATCCCTC TTTTCAGATA GATAGGCGGT GATATAACAG CCATTACAAA CTGTAGCCTC GTCAACCTTC TAAGATGGTA TCCAGACAGG GATTTTCTTG ACCATTCGCA CTCGTTCTGG TAGGGCAGCA TTCTGGAGAT TGCTTCATCT AATGATTTAC CCCAGATTTG AGACTACTAG ACCCTACACC TGAATTACCT ATATTTGTGA TTTGCATTGC GAATACATAC AAAGGACTAA CATTAGAGGT TCCAAAAGTA CCCTGACAAA GAACGCTTGG GTAMAGATA CAGAAAACAG GCTGCACTCC ATGTATTGGC TCAGGATGGC GACATGGCAG TGCAAGTACC CGGATTTTCT TTAGAGGGGG AAAAACAGGT GAGCACTATA GGTAATGCCA CAGGGTCATA ATAATGGCAT CAGGGGACAT UAGGATATA CAAAGTTCCT TTAAAAGCAG CCTATTCAGG ATACATATAG TTTGGGATGT CGATATGACA AAACGGTACA mcAACTc TTACACAGTA TTCAAGGAAT GATAAGCTTG GCAGTCAGM CATTTGGCAC CGCGGGCAGC
of the sense strand of HPV-47 DNA. Position 1 is defined by alignment with HPV-5 and HPV-8.
In order to determine the interrelations of HPV-5, HPV-8, and HPV-47 more precisely, the similarity of amino acid sequences coded by individual ORFs was
analyzed, and expressed as indices (Table 2a). It is clearthat indices between HPV-47 and HPV-5 (or HPV8) are roughly the same as those between HPV-5 and
SHORT COMMUNICATIONS
Pstl I ’
/
I
Pstl ,
l9amH
Pal
EcoRl
I
FIG 2. The genome organization of HPV-47 DNA. Upper and lower panels show the distribution of termination codons (longer vertical lines) and initiatron codons (shorter vertical lines) in all posssible reading frames. The only sites of signal sequences which were shown to be utilized in viva (3) together with some restriction sites, are presented on the linear map drawn between the panels: Early promoter sequence(,), splicing donor(D), splicing acceptor (A), and polyadenylation signal (W)
HPV-8 for all the ORFs. These analytical results indicate that HPV-47 resembles both HPV-5 and HPV-8 as much as HPV-5 and HPV-8 resemble each other, and they rule out the possibility that HPV-47 is an intratypic variant of either one of them. Since partial nucleotide sequences including URR of two additional Dl subgroup viruses, HPV-19 and HPV25, have been reported (IO), the interrelation among HPVs -5, -8, -19, -25, and -47 was assessed by comparing URR of these viruses (Table 2b). The results led us to divide these five viruses into two clusters. HPV-5, HPV-8, and HPV-47 related closely to each other (70.7-71.2%) are regarded as one cluster, and HPV-19 and HPV-25 related more closely to each other (83.1%) are regarded as another cluster, and viruses of different clusters are related less closely (60.5-66.8%) than those in the same cluster. This conclusion is further corroborated by the degree of conservation of the palindromic sequence, ACCNGGGT, which functions as the core sequence of the binding site of E2 protein in URR (7, 8, 10); i.e., HPVs of the former cluster conserve the palindrome at all four sties in URR, whereas those of the latter conserve it only at the third site. However, the conclusion must be further confirmed when the more nucleotide sequences for the two viruses become available. Recently, we detected HPV-
47 DNA by the polymerase chain reaction method (12) in small clusters of malignant cells which appeared to proliferate in the lower dermis or in the muscle layer of the patient from whom HPV-47 was first isolated (in preparation). This finding together with the close homologyof HPV-47 and HPV-5 and HPV-8 suggests that HPV-47 is also asociated with malignant conversion of EV lesion. TABLE 1 OPEN.READINGFRAMESOF HPV-47 Position
ORF
5’.end
First ATG
E6 E7 El E2 E4 L2 Ll
145 623 918 2701 3083 4322 5891
208 668 966 2725 3086 4334 5903
3’.end
No. of bases
No. of amino acids”
675 976 2780 4242 3997 5887 7444
468 309 1815 1518 912 1554 1542
156 103 605 506 304 518 514
a Numbers of amino acid residues constituting polypeptides which are expected to be synthesized, if the translation takes place from the first ATG of individual ORFs.
404
SHORT COMMUNICATIONS TABLE 2a
TABLE 2b
SIMILARITIESOF HPV-5, HPV-8, AND HPV-47 FORAMINO ACID SEQUENCESCODED BY INDIVIDUALORFs HPV-47 vs HPV-5
HPV-47 vs HPV-8
HPV-5 vs HPV-8
SIMILARITIESOF FIVE SUBGROUPDl PAPILLOMAVIRUSES FORTHEIR NUCLEOTIDESEQUENCESOF URR HPV-25
HPV-19
HPV-8
HPV-5 70.7 19 (62)
HPV-47
% gap
60.5 24 (50)
62.7 25 (76)
71.2 20 (48) 70.7 15 (40)
E6
%” 9apb
75.8 1 (1)
69.2 2 (5)
70.0 1 (2)
HPV-5
% gap
63.8 21 (44)
63.6 20 (50)
E7
% gap
79.6 0 (0)
81.5 0 (0)
75.7 0 (0)
HPV-8
% gap
64.8 21 (68)
66.8 16 (56)
El
% gap
88.1 2 (3)
84.3
2 (2)
83.6 2 (3)
HPV-19
% gap
83.1 12(18)
E2
% gap
71.1 6(14)
70.6 8 (18)
68.8 4(18)
E4C
% gap
62.7 7(12)
55.9
63.3
8 (20)
8 (22)
L2
% gap
87.2 0 (0)
86.6 0 (0)
86.6 0 (0)
Lid
% gap
91.0 1 (2)
87.5
87.2
2 (2)
2 (2)
’ Percentage expression of the number of matched amino acids, when amino acid sequences deduced from the nucleotide sequences of an ORF of a pair of viruses are aligned by a computer program, “Maximum match,” of Genetyx. The program is based on Needleman and Wunsch’s algorithm (73) where an amino acid match between two sequences is weighted with -1, a mismatch with 1, and a gap with n + 2 (n = the gap length), and the program is designed to search the alignment in which the sum of weight is minimum. b Figures indicate numbers of gaps in pairs of aligned sequences, and figures in parentheses indicate the sum of gap length in all the gaps. “The 5’-terminal segment (position 3083-3277) of HPV-47 E4 ORF including nine ATGs was eliminated so that only the segment which corresponded to E4 ORFs of HPV-5 and HPV-8. which had no internal ATG, was used for the computation. For details, see text. d The amino acid sequence starting from the second methionine of HPV-8 Ll was used for the computation, because it corresponds to that from the first methionine of the other HPVs’ Ll.
ruses are expressed as El/E4 fused proteins (Table 3) and that none of the nine in-frame ATGs located in HPV-47 ORF E4 functions as a translational initiation codon. Since the acceptor signal of the latter set is located immediately upstream of the first ATG of ORF E2, the splicing may contribute to efficient expression of E2 protein. By deducing from their ORFs, it has been noted that papillomaviruses have conserved E2 proteins relatively well for their N-terminal and C-terminal segments during evolution, whereas they have diversified the middle segment not only in terms of its length but also in terms of its amino acid composition (2, 1 I; Table 4). Now, it has become clear that the three EV-associated viruses whose DNA sequences have been completely determined have common features for this middle segment; i.e., their middle segment is far longer and richer in arginine and serine residue than those of other papillomaviruses (Table 3). The segment makes the protein more hydrophilic and more positive in charge than the corresponding proteins of other papillomaviruses. It remains
TABLE 3 ORGANIZATIONOFEl/E4 OPEN READINGFRAMES
Two sets of splicing donor and acceptor signals and a polyadenylation signal (AATAAA) (Fig. 2) have been shown to work in rat cells transfected with HPV-47 DNA (3). The RNA spliced at the signals of the first set forms a fused ORF El/E4. The signals of the second set are located at positions 1358 and 2678, respectively. Consensus sequences for splicing at the above four sites and the polyadenylation signal are also conserved in HPV-5 and HPV-8, indicating these signal sequences are also used in them. If the above notion is correct, it is very likely that E4 ORFs of these three vi-
Position
HPV
5’.end
First ATG
47 5 8
918 913 783
966 961 951
Donor
Acceptor
3’-end
No. of amino acids”
981 982 966
3324 3322 3303
3997 4019 3952
230 240 222
a Numbers of amino acid residues constituting polypeptides which are expected to be synthesized, if the translation takes place from the ATG in the fused ORF.
405
SHORT COMMUNICATIONS TABLE 4
ACKNOWLEDGMENTS
AMINO ACID RESIDUESOF E2 PROTEINSAND THEIR SEGMENTS
We are grateful to Ms. T. Yoshida for her expert technical assistance. This work was supported in part by a Grant-in-Aid for Scientific Research from the Ministry of Education, Science, and Culture, Japan.
Total HPV-47 HPV-8 HPV-la HPV-16 BPV-1
506’(45, 93)” 498 (44,96) 401 (45, 55) 365 (38, 37) 410 (50.45)
N-terminal’ segment 187 (24, 18) 187(24, 19) 188 (24, 18) 186(26, 18) 188(24,22)
Middle segment 242 (13, 66) 234(12.63) 136 (I 1, 24) 106(8,9) 144(21, 13)
C-terminal segment 77 (8, 77(8, 77 (9. 73 (4, 78 (5,
14) 14) 13) 10) 10)
a Polypeptide segment of BPV-1 E2 protein starting from N-terminal to the 188th amino acid residue, valine. which is conserved in all the papillomavirus E2 proteins to date (see Ref. (2)). Corresponding polypeptide segments of other papillomaviruses were operationally regarded as N-terminal segments. Similarly, the polypeptide segment of BPV-1 E2 protein starting from the 333rd amino acid residue, glycine, which is also conserved in all the papillomavirus E2 proteins, to C-terminal and corresponding segments of other papillomavirus E2 proteins were regarded as C-terminal segments. The segment between N-terminal and C-terminal segments was regarded as the middle segment. b Number of amino acid residues constituting the polypeptide. c The first figure in parentheses is the sum of asparagic and glutamic residues in polypeptide, and the second is the sum of arginic and lysinic residues of polypeptide.
to be elucidated whether the segment affects the manner of protein binding to the target sequences in URR.
REFERENCES 1. ADACHI. A., YASUE, H., OHASHI, M., and ISHIBASI,M., /apan. J. Cancer Res. 77,978-984 (1986). 2. PFISTER,H., In “Papovaviridae 2” (N. P. Salzman, Ed.), pp. l-38. Plenum, New York, 1987. 3. KIYONO.T.. NAGASHIMA, K., and ISHIBASHI,M., l/irology 173, 551565. (1989). 4. HENIKOFF, S., In “Methods in Enzymology” (R. Wu, Eds.), Vol. 155, pp. 156-165. Academic Press, London, 1987. 5. VIEIRA, J.. and MESSING, J., ln “Methods in Enzymology” (R. Wu and L. Grossman, Eds.), Vol. 153, pp. 3-l 1. Academic Press, London, 1987. 6. SANGER, F., NICKLEN, S., and COULSON, A. R., froc. Nat/. Acad. Sci. USA 74,5463-5467 (1987). 7. FUCHS, P. G., IFTNER,T.. WENINGER,J.. and PFISTER, H., J. Viral. 58,626-634 (1986). 8. ZACHOW, K. R., OSTROW. R. S., and FARAS, A. J., Virology 158, 251-254(1987). 9. WILBUR, W. J., and LIPMAN, D. J., Proc. Nat/. Acad. Sci. USA 80, 726-730 (1983). 10. KRUBKE,J., KRAUS.J., DELIUS, H., CHOW, L., BROKER,T., IFTNER,T., and PFISTER,H.,/. Gen. Viral. 68, 3091-3103 (1987). 7 1. BAKER,C. C., In “Papovaviridae 2” (N. P. Slazman, Ed.), pp. 321385. Plenum, New York, 1987. 72. SAIKI, R. K., GELFAND. D. H., STOFFEL, S. J., HIGUCHI, R., HORN, G. T., MULLIS, K. B.. and ERLICH, H. A., Science 242,487-491 (1988). 13. NEEDLEMAN, S. B., and WUNSCH, C. D., J. Mol. Biol. 48,443-453 (1970).