VIROLOGY
153,96-112 (1986)
Nucleotide Sequence and Genetic Map of the 16-kb Vaccinia Virus Hindlll D Fragment EDWARD G. NILES,' RICHARD C. CONDIT, PAUL CARO, KIRK DAVIDSON, LINDA MATUSICK, AND JANNY SETO Biochemistry Department, State University of New York, Buf ato, New York 14214 Received February 25, 1988 ; accepted April 8, 1986 We have determined the nucleotide sequence of the 16,059-bp HindIII D fragment from vaccinia virus strain WR . Translation in all 6 reading frames reveals a set of 22 open reading frames (ORFs), which are capable of encoding proteins ranging from 61 to 844 amino acids in length . With one exception, ORF 12, we have divided them into two primary sets according to their size . The minor group contains eight members ranging in length from 61 to 84 amino acids . The major group has thirteen members varying from 146 to 844 amino acids in length, and, in addition, due to its location on the DNA, one small ORF, 61 amino acids long. The neighboring major ORFs are closely packed along the DNA, being separated by 42 or fewer base pairs . In several instances the ends of adjoining ORFs overlap for up to 11 triplet coders. In three cases, 1 or 2 bases are shared between translation start and stop signals in adjacent ORFs . Regions of both strands of the DNA are transcribed . Two sets of temperature-sensitive mutations, totaling 17, which map to the HindIII D fragment, have been combined into eight complementation groups . The results of marker rescue analysis map one or more member of each group to a site in the HindIII D fragment within a defined open reading frame . © 1956 Academic Press, Inc .
genome . Through this approach we would define a template which would serve as a focal point for our future studies on gene regulation, DNA replication, and DNA recombination . The choice of the HindIII D fragment was based on the following considerations . Belle-Isle et at (1981) reported that the 16kb HindIII D fragment encodes at least 15 gene products, 11 early and 4 late . Condit and Motyczka (1981), Condit et at (1983), and Ensinger and Rovinsky (1983) reported that from their collections, 11 and 13 is mutants, respectively, mapped in the HindIII D fragment . Together, these data describe the HindIll D fragment as a segment of the vaccinia genome which encodes up to 15 proteins, both early and late, has an extensive set of is mutants, and a size, 16 kb, which is amenable to the determination of the entire DNA sequence . In this report, we present the complete nucleotide sequence of the vaccinia virus HindIII D fragment DNA, along with a refined genetic map of the eight complemen-
INTRODUCTION
Poxviruses are a family of large doublestranded DNA containing enveloped viruses that have the peculiar property of replicating within the cytoplasm of infected cells (Moss, 1985) . In order to carry on a cytoplasmic life cycle, they encode many, if not all, of the enzyme activities required to express and to regulate the expression of their early and late genes, and to replicate their DNA . Vaccinia virus, the quintessential poxvirus, has been the preferred subject for research during the past 2 decades and is the source of much of our knowledge of poxvirus molecular biology . Since the vaccinia genome is large, 180 kb, and encodes up to 200 gene products, we felt that it would be worthwhile to develop a detailed genetic and physical map of a limited but substantial segment of the 'To whom requests for reprints should be addressed. 0042-6822/86 $3.00 Copyright 9 1956 by Academic Press . for . All rights of reproduction in any form reserved.
96
MAPPING VACCINIA VIRUS HindlIl D FRAGMENT
tation groups which reside in the DNA fragment .
cells by the triton lysis method described by Clewell and Helinsky (1972) . DNA sequence analysis. The 16-kb HindIII D fragment was divided into seven subsections (Fig . 1A), 774, 775, 787, 801, 804, 707, and 722, and the DNA sequence was determined for each segment independently . The shotgun cloning approach (Messing, 1983) was used for all fragments except 707 . In that case, the exonuclease III treatment as described by Henikoff (1984), was employed . Both methods were applied to the 775 fragment . The chain termination method of Sanger (1977) was used to carry out the sequencing . A total of 16 primers were synthesized in order both to sequence across the junctions of the adjacent fragments and to confirm the sequence in areas that were ambiguous . Each base was sequenced a minimum of three independent times on one strand or at least one time on both strands . Greater than 90% of the bases were sequenced more than three independent times . The data were analyzed by software written by Roger Staden for the DEC VAX 780 (1982) . Genetic analysis. Complementation analysis carried out by Condit et al. (1983) and Ensinger and Rovinsky (1983) demonstrated that there are 6 and 7 comple-
METHODS AND MATERIALS
Cells. Bacterial strains, Escherichia coli JM 83 and JM 101 were purchased from BRL . Bacteria containing the plasmids pUC 13 and pUC 19, and the bacteriophage M13mplO, M13mpll, M13mp18, and Ml3mp19 were purchased from BRL . Bacteria and M13 were grown as described (Messing, 1983). The HindIII D fragment, (Belle-Isle et al., 1981), inserted into pBR 322, was provided by Dr . Bernard Moss, NIH . Wild-type vaccinia virus WR and selected temperature-sensitive mutants were propagated as described (Condit and Motyczka, 1981 ; Condit et al., 1983) . Recombinant DNA techniques . The restriction endonuclease cleavage sites for the enzymes Sall, PstI, BglII, BamHI, and EcoRI were mapped by a combination of single and double digestions of the isolated HindIIl D fragment . The digestion products were inserted into the plasmid pUC 13, cut with the appropriate endonucleases, ligated, and employed in the transformation of E. coli JM 83 . Plasmid DNA was prepared from cultures of transformed
I
12
1
I
1
1
1
RIi
791 A
,
. 701 , 774 ,
B
C
,
I
775 i
•
3, 4
' 793 73A . 731 . 714 7R7
S
97
8 747
I
1
,
I
I
I
744 , 707 770 ,763, 764, ,1 , 801 , 804 761
R
7_`
I
16
RIB 794
X0+ 1
11
722 77R
11 Sal I PS, I 991 a Bamw Eco An
12 13,J4-
l:E17.C38 .C50
4' F52 2: FR4cI7c24 7'E94 3E45.C21 8: C5. C35 5:E93,C46
6EIOIC33C43
Fm, 1 . Combined physical and genetic map of the vaccinia virus HindIII D DNA fragment . (A) Restriction endonuclease cleavage sites for the enzymes Pstl, Sall, Bg1II, BamHI, and EcoRI are presented . The numbers designate the plasmid subelones constructed from the HindIIl D fragment . (B) The size of each major open reading frame is proportional to the line length . The 3' end of the ORF is pointed end of the arrow. (C) The current genetic map determined by marker rescue is presented . Each bar represents the length of the DNA fragment, and its map coordinates, which successfully rescue one or more member of each associated complementation group .
98
NILES ET AL-
1 AGCTTAAAAT AGCTCTAGCT AAAGGCATAG ATTACGAATA TAT GCTTGTTAAT 61 AACTMATGA AAAAAAACTA GIGGTTTATA ATMMCACG A'l9WA1 C CAACGTAGTA 121 TCATCTTCTA CTATTGCGAC GTATATAGAC GCTTTAGCGA AGAATGCTTC GGMTTAGAA 181 CAGAGCTCTA CCGCATACGA MTAMTAAT GAATTGGAAC TACTATTTAT TAAGCCGCCA 241 TTGATTACTT TCACAAATGT AGTGAATATC TCTACGATTC AGGAATCGTT TATTCGATTT 301 ACCGTTACTA ATAAGGAAGC TGTTAAAATr ACAACTAAGA TTCCATTATC TAAGGTACAT 361 CCTCTAGATG TAAAAAATGT ACAGTTACTA GATGCTATAC ATAACATAGT TTOCGAAAAG 421 AAATCATTAG TGATGGAAGA TCGTCTTCAC AAAGAATGCr TGTTGAGACT ATCOACAGAG 481 GAACGTCATA TATTTTTGGA TTACMGAM TATCGATCCT CTATCCGACT AGMTTAGTC 541 MTCTTATTC AAGCAAAMC MAAMCTTT ACCATACACT TTAAGCTAM ATATTTTCTA 601 GCATCCCCTC CCCACTCTAA AACTTCTTTA TTACACCCTA TTAATCATCC AAAGTCAAGG 661 CCTAATACAT CTCTCCAAAT AGMTTTACA CCTAGACACA ATGAAACAGT TCCATATCAT 721 GAACTAATAA AGGAATTGAC GACTCTCTCG CGTCATATAT TTATGGCTTC TCCAGAGAAT 781 GTAATTCTTT CTCCGCCTAT TAACGCGCCT ATAAAAACCT TTATGTTGCC TAAACAAGAT 841 ATAGTAGCIT TCCATCTCCA AMTCTATAT GCCCTMCTA AGACTGACGG CATTCCTATA 901 ACTATCGGAG TTACATCAAA CGGGTTCLAT TGTTATTTTA CACATCTTGG TTATATTATT 961 AGATATCCTC TTAACAGAAT AATAGATTCC GAAGTACTAG TCr1TGGTGA GGCACTTMC 1021 GATAAGAACT GGACCGTATA TCTCATTAAG CTAATAGAGC CTGTGMTGC MTCAATGAT 1081 AGACTAGAAG AAAGTAAGTA TGTTCAATCT AAACTAGTGG ATATTTGTGA TCGGATAGTA 1141 TTCMGTCAA ACAAATACGA AGGTCCGTTT ACTACMCTA CTGAAGTCGT CGATATGTTA 1201 TCTACATATT TACCAMCCA ACCAGAACGT GTTATTCTGT TCTATTCAM CGGACCTAM 1261 TCTAACATTG ATTTTAAAAT TAAAAACGAA AATACTATAG ACCAMCTGC AAATCTAGTA 1321 TTTAGGTACA TGTCCAGTGA ACCAATTATC TTTGGAGAGT CGTCTATCTT TGTAGACTAT 1381 AAGAAATTTA CCMCCATAA AGGCTTTCCT AAAGAATATC GTTCTGGTAA GATTGTGTTA 1441 TATMCGGCC TTMTTATCT AMTAATATC TATTGTTTGG MTATATTAA TACACATAAT 1501 GMCTCGCTA TTAACTCCGT GGTTGTACCT ATTAAGTTTA TAGCAGAATT CTTAGTTAAT 1561 GGAGAAATAC TTAMCCTAG AATTGATAM ACCATCAMT ATATTMCTC AGMGATTAT 1621 TATCGAAATC AACATAATAT CATACTCGM CATTTAAGAG ATCAAAGCAT CAAMTAGGA 1681 GATATCTTTA ACGAGGATAA ACTATCGGAT GTGGGACATC AATACGCCAA TAATCATAM 1741 TTTAGATTAA ATCCACAAGT TACTTATTTT ACCMTAMC GMCTAGAGG ACCGTTGCGA 1801 ATTTTATCM ACTACGTCAA CACTCTTCTT ATTTCTATGT ATTGTTCCM MCATTTTTA 1861 GACGATTCCA ACAAACGAAA CGTATTGCCC ATTGATTTTC GAMCGGTCC TGACCTGGM 1921 AMTATTTTT ATGGAGAGAT TCCGTTATTG GTAGCGACGG ATCCGCATGC TGATGCTATA 1981 GCTACAGGAA ATGAAAGATA CAACAMTTA AACTCTGGAA TTAAAACCAA CTACTACAAA 2041 TTTCACTACA TTCAGGAMC TATTCGATCC GATACATTTG TCTCTACTGT CACACMCTA 2101 TTCTATTTTG CAAAGTTTM TATCATCGAC TGGCAGTTTG CTATCCATTA TTCTTTTCAT 2161 CCGAGACATT ATGCTACCGT CATGAATAAC TTATCCGAAC TAACTGCTTC TGCAGCCAAC 2221 CTATTAATCA CTACCATVGA CGGAGACAAA TTATCAAAAT TAACAGATAA AMCACTTTT 2281 ATAATTCATA ACAATTTACC TAGTACCCM MCTATATGT CTGTAGMAA AATAGCTGAT 2341 CATAGAATAG TGGTATATAA TCCATCAACA ATGTCTACTC CAATCACTGA ATACATTATC 2401 AAAAAGMCG ATATAGTCAG AGTGTTTAAC GMTACCGAT TTCTTCTTCT ACATMCGTT 2461 GATTTCCCTA CMTTATAGA ACGMGTAM AAGTTTATTA ATCGOGCATC TACMTGGAA 2521 GATAGACCAT CTACAACAM CTTTTTCGM CTAAATAGAG GAGCCATTM ATGTGAAGGT 2581 TTAGATGTCG AAGACMT TAGTTACTAT GTTGTTTATG TCTTTTCTAA GCGGf.TA 2641 ATAATATGGT ATGGGTTCTC ATCTCCCAGT TCTAAATGCA TTAMTMTT CCAATAGAGC 2701 GATTTTTGTT CCTATAGGAC CTTCCAACTG 3GGATACTCT CTATTGTTAA TAGATATATT 2761 AATACTTTTC TCGCCTMCA GAGGTTCTAC GTCTTTTAAA AATAAAAGTT TGATMCATC 2821 TGOCCTGTTC ATMATAAAA ACTTTGCGAT TCTATATATA CTCTTTTTAT CAMTCTAGC 2881 CATTGTCTTA TAGATCTGAC CTACTGTACG TGTACCATTT GATTTTCTTT CTAATACTAT 2941 ATATTTCTCT CGAAGAAGTT CTTCCACAT CTCGGMT AAAATACTAC TCTTCACTM TTT ATAGTTAAGG ATAATAAGTA 3001 ATCACTTATT TTTTTTATAT CGATAT 3061 TTCCAAGTTA GATAACGACC ATAACGM ATTTATACTT TTAGGAAATC ACAATGACTT 3121 TATCAGATTA AAATTAACAA AATTAMGGA GCATGTATTT TTTTCTGMT ATATTGTGAC 3181 TCCACATACA TATGGATCTT TATCCGTCGA ATTAMTCGG TCTACTTTTC ACCACCCCGG 3241 TAGATATATA GAGGTGGACG MTTTATAGA TGCTGGAAGA CAAGTTAGAT CGTCTTCTAC 3301 ATCCAATCAT ATATCTAAAG ATATACCCGA AGATATGCAC ACTCATMAT TTGTCATTTA 3361 TGATATATAC ACTTTTGACC CTTTCAACM TAAACGATTG GTATTCGTAC ACGTACCTCC 3421 GTCGTTAGGA CATGATAGTC ATTTCACTAA TCCGTTATTG TCTCCGTATT ATCGTMTTC 3481 AGTAGCCACA CAAATCCTCA ATGATATCAT TTTTMTCM GATTCATITr TAAA TATTT 3541 ATTAGAACAT CTGATTAGAA GCCACTATAG AGTTTCTAAA CATATAACM TAGTTAGATA 3601 CMGGATACC GAAGMTTM ATCTMCCAC AATATGTTAT MTAGACATA AGTTTAACCC 3661 GTTTGTATTC CATTGTCTTA ACGCCCTT C CGAAAATCM AAGGTACTAC ATACGTATM 3721 MAGGTATCT AATTTGAT TT A GTGACTCTAT CACACCCCCC ATATACTATT 3781 ACTTATCACC ATGATTGGCA ACCAGTMTG AGTCAATTGG TACAATTTTA TAACGAAGTA 3841 GCCAGTTGCC TCCTACGAGA CGAGACCTCG CCTATTCCTG ATMGTTCTT TATACAGTTG 3901 MACMCCGC TTAGAMTAA ACCAGTATGT GTCTCCC0TA TAGATCCGTA TCCGAAAGAT 3961 MCCTACCTTCGA ATCACCMT TTACM ATCATTA AA GGAGATAGCT 4021 TCATCTATAT CTACATTMC CGAAGAAGTT GATTATAAAG CTTATAACCT TAATATAATA 4081 GACGGGGTTA TACCCTGGAA TIAITACTTA ACTTGTAMT TAGGAGAAAC AAAAAGTCAC 4141 GCCATCTACT GGGATAACAT TTCCAAGTTA CTCCTGCACC ATATAACTM ACACGTTACT 4201 CTTCTTTATT GTTTGGGTM AACACATTTC TCGAATATAC GGGCCAAGTT AGAATCCCCG 4261 GTAACTACCA TAGTCGGATA TCATCCAGCG CCTACAGACC GCCMTTCGA GAAAGATAGA CT 4321 TCATTTGAM TTATCMCGT TTTACTCCAA TTAGACAACA AGGCACCTAT AMTT C 4381 CAAGGGTTTA TTTAT~TG CTTTAGTGAA ATTTTAACTT GTGTTCT 4441 TATTACACCT AATCATCTTA TCTTTGTTCT TAACACTATA GGTGTCCCGT CAGCGTCCAG 4501 ACAAMTCAA GATCCAAGAT TTGTACAACC ATTTAMTGC GACGAGTTAG AAAGATATAT
FIG . 2 .
The
DNA sequence
each
ORF
is indicated by
ORF 2
ORF 3
ORF R
D fragment. The sequence is presented from left to right 16059 result from HindIII cleavage of the virus DNA. Translation the major ORFs are marked with boxes . The direction of translation of an arrow at the ATG. of the HindIII
along the virus genome. Bases start and stop codons for
ORF I
1
and
MAPPING
VACCINIA
VIRUS
HindIII
D
99
FRAGMENT
4561 TCAGAATAAT CCAGMTCTA CACTATTCGA AAGTCTTAGG CATGAGGAAG CATACTCTAT 4621 AGTCAGAATT TCATAGATG TAGATTTTTAGA CCCCTCTCTA GACGAAATAG ATTATMAAC 4681 GCCTATTCM GATTTTATTA TCGAGGTGTC MACTGTGTA GCTAGATTCG CGTTTACAGA 4741 ATGCCGCGCC ATTCATGAAA ATCTAATAAA ATCCATCAGA TCTAATTTTT CATTGACTAA 4801 GTCTACAMT AGAGATAAAA CAAGTTTTCA TATTATCT7r TTACACACGT ATACCACTAT 4861 CGATACATTC ATAGCTATCA AACCMCACT ATTAGAATTA AGTAGATCAT CTGAAAATCC 4921 ACTAACAACA TCCATAGACA CTGCCGTATA TAGGAGAAAA ACMCTCMC GGGMGTAGG 4981 TACTAGCAM MTCCAMTT GCGACACTAT TCATGTMTC CMCCACCGC ATGATMTAT 5041 AGAACATTAC CTATTCACTT ACGTGGATAT GMCMCAAT ACTTATTACT TTTCTCTACA 5101 ACAACCATTG CAGGATTTA TTCCTGATAA GTTATGGGM CCACCGTTTA TTTCATTCGA 5161 AGACCCTATA AAAAGAGTTT CAAMATATT CATTMTTCT ATMTAAACT TTMTGATCT 5221 CCATGAAAAT AATTTTACAA CCCTACCACT GGTCATAGAT TACCTAACAC CTTCTGCATT 5281 ATGTAAAAAA CGATCGCATA AACATCCGCA TCAACTATCC TTGGMAATG GTGCTATTAG 5341 AATTTACAAA ACTGCTAATC CACATAGTTG TAAACTTAAA ATTGTTCCCT TGGATGGTM 5401 5461 5521 5581
TAAACTGTTT AGGAGACCAT AACAAAACTA CTCTCCAAGA
ORF
5
ORF
6
AATATTCCAC AAAGAATTTT AGACACTMC TCTGTTTTAT TAACCGAACG ATAGTTTOGA TTMTMTTC ATGCAAATTT AACAGCGAAG MCCCTTGAT ATTTTGTCAA TAACACATA ACTACCTMG GMTATTCM GCGAATTACT AAACGAMGA CTGTAGAAGC TAACATACGA CACATGTTAC TAGATTCAGT
5641 AGAGACCGAT ACCTATCCGG ATAMCTTCC GTTTAAAMT GGTGTATTGC ACCTGGTACA 5701 CGGAATCTTT TACTCTGCAC ATGATCCTM AAMTATACG TGTACTGTAT CAACCGGATT 5761 TAAATTTGAC GATACTAGAT TCGTCCMGA CACTCCAGAA ATCGAACACT TAATCMTAT 5821 CATTAACCAT ATCCMCCAT TMCGGATGA AAATAAGAAA AATAGAGAGC TATATGAAAA 5881 AACATTATCT ATTTATTTAT CCCCTGCTAC CAAAGGATGT TTAACATTCT TTTTTCGAGA 5941 AACTCCAACT GGAMGTCGA CAACCAMCG TTTGTTAAAG TCTGCTATCG GTGACCTGTT 6001 TGTTGAGACC GGTCAMCM TTTTMCAGA TGTATTCGAT MAGGACCTA ATCCATTTAT 6061 CCCTAACATG CATTTGAAAA GATCTGTATT CTCTAGCCAA CTACCTCATT TTCCCTGTAG 6121 TCCATCAAAG AAMTTAGAT CTGACAATAT TAAAAACTTG ACACAACCTT GTGTCATTGG 6181 6241 6301 6361
AACACCGTGT TTCTCCMTA AAATTAATM TAGAMCCAT GCCACAATCA TTATCCATAC TAATTACAM CCTGTTTITC ATAGGATACA TAACGCATTA ATGAGAAGM TTGCGCTCGT CCGATTCAGA ACACACITRT CTCAACCTTC TGCTAGAGAC GCTGCTGAAA ATAATGACGC GTACCATAAA CTCAAACTAT TACACGACCG GTTACATCGT AAMTACAM ATMTAGATA
6421 TACATTCGCA TTCTATACT TGTTGGTGM ATGGTACAGA AAATATCATG TTCCTATTAT 6481 GAMCTATAT CCTACACCCG AAGAGATTCC TGACTTTGCA TTCTATCTCA AAATAGGTAC 6541 TCTGTTAGTA TCTACCTCTC TAAAGCATT TCCATTMTG ACGGACCTCT CCAAAAACGG 6601 ATATATATTG TACGATAATG TGGTCACTCT TCCGTTCACT ACTTTCCMC AGAAMTATC 6661 CMCTATTTT MTTCTAGAC TATTTCCACA CGATATACAC AGCTTCATCA ATAGACATA4 6721 GAMTTTGCC AATGTTACTG ATGAATATCT GCMTATATA TTCATAGAGG ~A~TATR7'CATC 6781 6841 6901 6961
TCCtATA TATGCTCATA TATTTATAGA AGATATCACA TATCTAA MT .000GM TCATAGATTT ATTTCATMT CATGTTGATA GTATACCMC TATATTACCT CATCACTTAG CTACTCTAGA TTATCTAGTT ACMCTATCA TAGATGAGM CAGAAGCCTC TTATTCTTCC ATATTATGGG ATCAGGTAM ACMTMTCC CTTTGTTGTT CCCCTTGCTA GCTTCCAGAT
7021 TTMMAGGT TTACATTCTA GTGCCTMTA TCAACATTTT GAAAATTTTr AATTATAATA 7081 TGGCTGTAGC TATGMCTTC TTTAATGACG AATTCATAGC TAAGMTATC TTTATTCATT 7141 7201 7261 7321 7381
CCACAACAAC CTCGCTACAA CTGCACAACT GATCTCCCAT AGACGATACA
TTTTTATTCT CMAATTATA ACGATAACCT CATTAATTAT AACGGATTAT TAACTCTATT TTTATCGTTG ATGAGGCACA TAATATCT T GGGAATMTA TATGACCGTG ATAAAAAATA AAACAACAT TCCTTTTCTA CTATTGTCTG TACTMCACA CCTMTACTC TTCCTCATAT TATACATTTA ATGTCCCAAC TTTTGGTGAG ATTATTAGTC CTGCTAAGAA AGTAATTCAC ACACTTCTTA
7441 ACGAACGCCG TGTGAATGTA CTTAAGGATT TGCTTAAACC AACAATATCA TATTACGMA 7501 TOCCTGATAA AGATCTACCA ACGATMGAT ATCACGGACG TAAGTMCTA GATACTAGAC 7561 TAGTATATTC TCACATCTCT AAACTTCAAG AGACACATTA TATGAACT AGACGACACC 7621 7681 7741 7801
TATCTTATCA AACTTMTCT CAMTCTCAA GTTCCAMTT
TGAAATGTTT GATAAAMTA TGTATAACGT GTCAATCGCA GTATTGCCAC GATGAATMT TTAGATACTT TATTTCAGGA ACAGCATAAG CMTTGTACC AATAAATAAT CGCGTGTAT ACGGAGAAGA ATTGCTMCC TTAMCATTA TAMTACTTT ATTMTCCGA TACACACACT CMCGGAAAA CATTTTATAT
7861 ACTTTTCTAA TTCTACATAT GCCGCATTGG TAATTAAATA TATCATGCTC AGTMTGGAT 7921 ATTCTGAATA TAATCGTTCT CAGGCMCTA ATCCACATAT GATAMCGGC AAACCAAAM 7981 8041 8101 8161
CATTTCCTAT CCTTACTAGT AAAATGAMT CGTCTTTAGA GGATCTATTA GATCTGTATA ATTCTCCTGA AAACGATGAT GGTCATCGAT TGATGTTTTT GTTTTCCTCA MCATTATCT CCGAATCCTA TACTCTGMA GAGGTAAGCC ATATTTCCTT TATGACTATC CCAGATACTT TTTCTCMTA CAACCAAATT CTTGCACCAT CTATTACAAA ATTCTCTTAC GCCGATATTT
8221 8281 8341 8401 8461
CTGAACCAGT TMTGTATAT CTTTTAGCCG CCGTATATTC CGATTTCAAT GACGAATAA CGTCATTAAA CGATTACACA CACGATGMT TGATTAATCT TTTACCATTT GACATCAAM AGCTGTTATA TCTAAMTTT MCACTAAAG AAACGMTAG MTATTCTAT ATTCTTCMC ACATGTCTCA MCGTATTCT CTTCCACCAC ATCCATCAAT TGTAAAAGTT TTATTGGGAG AATTGGTCAG ACMTTTTTT TATAATAATT CTCGMTTAA GTATMCGAC TCCAAGTTAC
8521 TTAAMTGGT TACATCACTT ATAAAAAATA AAGAAGACGC TAGGMTTAC ATAGATCATA 8581 TTGTAMCGG TCACTTCTTT GTATCGAATA AAGTATTTGA TAAATCTCTT TTATACAMT 8641 ACGAAAACCA TATTATTACA GTACCGTTTA GACTTTCC'~TA~CsG 'A ' ACCATTT GTTTGGGGAG 8701 TTAACTTTCG GMTAT MCGTGGTAT CTTCTCCCTGATGA AATATATAM 8761 CAMTA! IT".~GCTTTG TTACCAATCG ATACCTTCCA GTTACATTGG AACCACACGA 8821 CCTGACCTTA GACATAAMA CTAATATTAG CMTGCCCTA TATAAGACCT ATCTACATAG BB81 ACAAATTAGT GGTAAAATCG CCAAGAAAAT AGAMTTCCT GAACACGTCC AATTACCTCT 8941 CGGCGAMTA GTTAATAATT CTCTACTTAT AMCGMCCG TCTGTMTM CCTACCCGTA 9001 TTATCACGTT GGGCATATAG TCAGACGMC ATTAMCATC GAAGATGMT CAMTCTMC
Fir . 2-Continued
ORF
7
100
NILES ET AL . 9061 TATTCAATGT GGACATTTM TCTGTAAACT AAGTAGAGAT TCGCCTACTG TATCATTTAG 9121 CGATTCAMG TACTCCTTTT TTCGAAATGG TAATCCCTAT CAC MTCCCA GCGAAGTCAC 9181 TGCCGTTCTA ATGCAGCCTC AACAAGGTAT CCM1tMT TTT iIAn1L TCGCGAATAT 9241 CGTCGACTCA Qd$.AAAQAC MTACCGGTA ACTATAAACA CGAATACTAT GGCAATAATT 9301 GCGAATGITT TATTCTCFTC GATATATTTT TGATAATATG AAAAACATGT CTCTCTCAAA 9361 TCGCACAACC ATCTCATAAA ATAGTTCTCG CGCGCTGGAG AGGTAGTTGC TCCTCGTATA 9421 ATCTCCCCAG AATAATATAC TTGCGTGTCG TCGTTCMTT TATACGGATT TCTATACTTC 9481 TCTGTTATAT AATGCGGTTT TCCATCATGA TTAGACGACC ACAATACTGT TCTGMTTTA 9541 CATAATTGAT CACAATGAAT GTTTATTGGC CTTGGAAAAA TTATCCATAC AGCGTCTGCA 9601 GAGTGCTTGA TAGTTGTTCC TAGATATGTA AMTAATCCA ACTTACTAGG CACCMATTG 9661 TCTAGATAAA ATACTGAATC AAACGGTGCA GACGTATTGG CGGATCTMT GGMTCCMT 9721 TGATTAACTA TCTTTTGAM ATATACATTT TTATGATCCA ATACTTGTAA CAATATAGAA 9781 ATMTGATAA GrCCATCATC GTGTTTTTTT GCCTCTTCAT AAGAACTATA TTTTTTCTTA 9841 TTCCAATGAA CAAGATTAAT CTCTCCAGAG TATTTGTACA CATCTATCAA GTGATTGGAT 9901 CCATAATCCT CTTCCTTTCC CCAATATATA TGTACTGATG ATAACACATA TTCATTGGGG 9961 AGAAACCCrC CACTTATATA TCCTCCCTTA AAATTAATCC TTACTAG1Tf TCCAGTGTTC 10021 TGGATAGTGG TIGGTTTCGA CTCATTATAA TGMATGTCTA ACGGCTTCM TTGCGCGTTA 10081 GAAATTGCTT TTTTAGTTTC TATATTAATA G~WT GnT~CC(C AGTAAAMTG 10141 MATGATMC TCITTAAAM TAGCTCTTAG ACAATCGATG AGGAACTGAT •~Y 10201 ATTTGMACT CCTACACMT TAATATCTAT TAAACGAATA AAAGATATTC CAAGATCAAA 10261 AGACACGCAC GTCrTTCCTG CGTGTATAAC AAGTGACGGA TATCCCITAA TAGGAGCTAC 10321 AAGAACTTCA TTCGCGTTCC AGGCGATATT ATCTCAACM AATTCAGATT CTATCTTTAG 10381 AGTATCCACT AAACTATTAC =MAMA CTACMTGM CTAAGAGAM TCITTAGACG 10441 GTTGAGAAAA GGTTCTATCA ACMTATCCA TCCTCACTTT GAAGAGTTAA TATTATTGCG 10501 TGGTAAACTA GATAAAAAGG MTCTATTAA AGATTGTTTA AGAAGAGAAT TAAAACACAA 10561 AAGTGATGAA CGTATAACAC TAAAAGAATT 000AAATGTA ATTCTAAAAC TrACAACGCG 10621 CGATAAATTA TCTAATAAAG TATATATAGG TTATTGCATG GCGTGT1TTA TTMTCAATC 10681 GTTGGAGGAT TTATCGCATA CTAGTATTTA CMTGTAGM ATTAGAAAGA TTAAATCATT 10741 AMTGATTGT ATAMTACGA ATATCTGTCT TATATTTATA ATATGCTAGT 10801 TMTAGTAA~ACTTTT CAGATCTAGT ATMTTAGTC AGATTATTM GTATMTAGA 10861 CGACTAGCTA AGTCTATTAT TTGCGCCGAT GACTCTCAM TTATTACACT CACGCCATTC 10921 GTTAACCAAT GCCTATGGTG TCATAAACGA GTATCCGTGT CCGCTATTTT ATTMCTACf 10981 GATAACAAAA TATTAGTATG TAACACACGA GATAGTTTTC TCTATTCTGA AATAATTACA 11041 ACTAGAAACA TGTTTAGAM GAAACGATTA TTTCTGMTT ATTCCMTTA TTTGAACAM 11101 CAGCAAACAA GTATACTATC GfCATTTTTT TCTCTAGATC CAGCTACTCC TAATATTGAT 11161 AGAATAGACG CTATTTATCC GGGTGGCATA CCCAAAAGGG GTGAGAATCT TCCAGACTCT 11221 TTATCCTGGG AMTTAAAGA AGAAGTTAAT ATAGACAATT CTTTTGTATT CATAGACACT 11281 CGGTTTTTTA TTCATGGCAT CATAGAAGAT ACCATTATTA ATAAATTTTT TGAGGTAATC 11341 TTCTTTCrCG GMCMTATC TCTMCGAGT GATCAAATCA TTGATACATT TAAAAGTAAT 11401 CATGAMTCA ACGATCTMT ATTTTTAGAT CCGMTTCAG GTMTCGACT CCAATACGM 11461 ATTGCAMAT ATGCTTTAGA TACTGCMM CTCMATCIT ATGGCCATAG ACGATGTTAT 11521 TAGGAATCAT TAAAAAMTT MCTGAGGAT CA' T AAAATATAM TTMTTTACC 11581 ATCCTGTATT TTTATAACGG GATTCTCCGC CATATCATGT AGATAGTTAC CGTCTACATC 11641 GTATACTCGA CCATCTACGC CTTTAAATCC TCTATTTATT GACATTAATC TATTAGMTT 11701 GGAATACCAA ATATTACTAC CCTCAATTAG TTTATTGGTA ATATTTTTGT TAGACGATAG 11761 ATGGATGCCT CTTGAAACCA AGGTTTTCCA ACCGGACTCA TTCTCGATCG GTGAGAAGTC 11821 TITTTCATTA GCATGAATCC ATTCTMTGA TCTATGTTTA AACACTCTM ACMTTGGAC 11881 AAATTCTTTT GATTTGCTTT GAATGATTTC AAATAGGTCT TCCTCTACAG TACGCATACC 11941 ATTAGATMT CTAGCCATTA TAAAGTGCAC GTTTACATAT CTACCTTCTC GAGGAGTAAG 12001 AACGTGACTA TTGAGACGAA TGGCTCTTCC TACTATCTGA CGAAGAGACG CCTCGTTCCA 12061 TGTCATATCT AAMTGAAGA TATCATTGAT TGAGAACAAG CTAATACCCT CGCCTCCACT 12121 ACAACAGMT ACCCATGTTT TTATCCATTC TCCGTTAGTG TTTGATTCTT GGTTAAACTC 12181 ACCCACCCCC TTGATTCTAC TATCTTTTGT TCTAGATCAC AACTCTATAT TAGAGATACC 12241 AAAGACTTTG AAATATAGTA ATAAGATTTC TATTCCTGAC TGATTAACM ATGGTTCAM 12301 GACTAGACAT TTACCATGGG ATGCTAATAT TCCCAAACAT ACATCTATAA ATTTGACGCT 12361 TTTCTCTTTT AATTCAGTAA ATAGAGAGAT ATCAGCCGCA CTAGCATCCC CTCCCAATAG 12421 TTCTCCCCTT TTAAACGTAT CTAATGCAGA TTTAGAAAAT TCTCTATCTC TTAATGAATT 12481 TTTAAMTCA TTATATAGTG TTGCTATCTC TTGCGCGTAT TCGCCCGGAT CACGATTTTG 12541 TCTTTCAGGA AACCTATCCA ACCIMACCr AGrAGCCATA CGTCTCAGM TTCTAAATGA 12601 TGATATACCT GTTiTTATrT CAGGGAGTTT AGCCTTTTGA TAAATTTCTT CTTGCTTTTT 12661 CGACATATTA ACCTATCCCA TTAATACTGT TTTCTTAGCG MTGATGCAG ACCCTTCTAC 12721 CTCATCAAAA ATACAAAACT CGTTATTMC TATGTACGM CATAGCCCTC CTAGTTTGGA 12781 GACTMTTCT TTTTCATCAA CTACACGITT ATTCTCAMT AGCGATTGGT GTTGTAAGGA 12841 TCCIGGTCGT AGTAAGTTAA CCAACATGGT GMTTCTTGC ACACTATTGA CGATACGTGT 12901 AGCCGATMA CAAATCATCf TATGGITTTT TAATGCGATG GTCTTAGATA AAAAATTATA 12961 TACTGAACGA GTAGGACGGA TCTTACCATC TTCTTTGATT AATGATTTAG AMTCMGTT 13021 ATGACATTCA TCAATAATGA CGCATATTCT ACTCTTGGM TTAATAGTTT TGATATTAGT 13081 AAAAMMA TTTCTAAMT TTTGATCATC CTAATTAATA AAAATACMT CCTTCGTTAT 13141 CTCTGGAGCG TATCTGAGTA TAGTCITCAT CCAAGGATCT TCTATCAAAG CCTTTTTCAC 13201 CAATAAGATA ATAGCCCAAT TCGTATAAAT ATCCTTAAGA TGT'ITGAGAA TATATACACT 13261 TGTT TTACCGACAC CGTTTCATGG AACAATAAAA GAGAATGCAT ACTGTCTAAT 13321 CCfAAGMAA CTCTTGCTAC AMATGTTGA TAATCCTTGA GGCGTACTAC GTCCGACCCC 13381 ATCATTTCM CAGCCATATT AGTAC1TCTG CGCAATGCAT MTCGATATA GCCCGCGTCT
ORF 8
ORF 9
jr
FIG . 2-Continued
ORF 10
ORF I I
ORF 12
MAPPING VACCINIA VIRUS HindIII D FRAGMENT 13441 GATTTAC ATGAGTC ATAAGTAATA ACTATGTTTT AAAAAE&A GCAGTAGTTT 13501 AACTAGTCTT CTCTGATGTT TGTTTTCCAT ACTTTTTGAA TCAGAACTCA TACTAGAATA 13561 AAGCAACGAG TGAACGTMT AGAGAGCTTC GTATACTCTA TTCGAAAACT CTAAGAACTT 13621 ATTAATGAAT TCCGTATCCA CTCCATTGTT TAAMTACTA AATTGAACAC TGTTCACATC 13681 CTTCCAAGM GAAGACTTAG TGACGGACTT AACATGAGAC ATAMTAATT CCAMTTTTT 13741 TTTACAMCA TCACTACCCA CCATMTGGC GCTATCTTTT MCCAGCTAT CGCTTACGCA 13801 TTTTAGCACT CTAACATTTT TAAACAGACT ACMTATATT CTCATAGTAT CGATTACACC 13861 TCTACCCAAT AM(,TTGGM GrTTMTAAT ACMTATTTT TCGTTTACM MTCAMTAA 13921 TGGTCCAAAC ACGTCGMGG TTAACATCTI ATMTCGCTA ATGTATAGAT TGTTTTCAGT 13981 GAGATGATTA TTAGATTTAA TAGCATCTCG TTCACGTTTG MCAGTTTAT TGCGTGCGCT 14041 GAGGTCGGCA ACTACGCCGT CCGCTTTAGT ACTCCTCCCA TAATACTTTA CGCTATTMT 14101 CTTTAAAATT TCATAGACTT TATCTAGATC GGTTTCTGTT AACATGATAT CATGTGTAM 14161 AAGTTTTMC ATGTCGGTCC GCATTCTATT TACATCATTA ACTCTAGAAA TCTGMGAM 14221 GTMTTAGCT CCGTATTCCA GACfAGGTM TGCGCTmA CCTAGAGACA GATTMCTTC 14281 TGGCAATCTT TCATAAMTG GAAGAAGGAC ATGCCTTCCC TCCCGGATAT TTTTTACMT 14341 r~r_,ATCff TACTATTCTA TAGTTTCTTT TCATTATTA~TTATTAT CTCCCATAAT 14401 TGUIMTA CTTACCCCTT GATCGTAAGA TACCTTATAC AGGTCATTAC ATACMCTAC 14461 CMTTGTTTT TGfACATMT AGATTGGATG GTTGACATCC ATGCTGCAAT AMCTACTCC 14521 AACAGATAGT TTATCTTTCC CCCTAGATAC ATTAGCCGTA ATAGTTGTCG GCTAMGM 14581 TATCTTTGGf GTAMGITM AAGTTAGCGT TCTTCTTCCA TTATTGCTTT TTGTCACTAG 14641 TTCATTATM ATTCTCGAGA TGGGTCCGTT CTCTGMTAT AGAACATCAT TTCCAMTCT 14701 MCTTCTAGr CTAGMATM TATCGGTCTT ATTCTTAAAA TCTATTCCCT TGATGAAGGG 14761 ATCCTTAATG MCAMTCCT TGGCCTTTGA TTCGGCTGAT CTATTATCTC CGTTATAGAC 14821 GrrACGTTGA CTAGLCCAM GACTTACAGG MTACATGTA TCGATGATGT TGATACTATG 14881 TCATATGTGA GCAMGATTC TTCTCTTAGT GGCATCACTA TATGTTCCAC TMTGGCGCA 14941 AMCTTTTTA GMATGTTAT ATATAAAAGA ATTTTTTCGT GTTCCAMCA TTACCAGATT 15001 AGTATGAAGA TAMCACTCA TATTATCACC MCATTATCA ATTTTTACAT ACACATCACC 15061 ATCTTGMTA GAAACCATAC CATCTTCTGC MCCPCTACC ATCTCCCCAC. ACTCCCCATA 15121 ACCAGTCCGT CGACCATCGC TMCAATMC TAGATCATCC AACMTGTAC TCACATATGC 15181 ATCTATATM TCTTTTTCAT CTTGTGAGTA CCCTAGATAC GAMTAMTT TATTATCCGT 15241 ATTTCCATAA TAAGGTTTAG TATMACAGA GAGCGATGTT GCCGCATGAA CTTCAGTTAC 15301 AGTCGCCGTT GGITCGTTTA TTTCACCTAT TACTCTCCTA GGTTTCTCTA TAMTGATCC 15361 TTTMTTTGT ACATTCTTM CCATATATCC AATAMCCTC MTTCAGGM CATAMCAM 15421 TTCTTTCTTG MCCTTTCAA ACTCCMCCA ACACfCACCA ATMCGATAT CCGATACTCC 15481 ATTGAAGGTT ACCGTTACGG TMTTTTTCA ATCGGATACT TTMCACTCC TGMTCTATC 15541 TTCCACATCA MCCCAGTTT TAATATAAAC GTATACTGTA CATCGGTCTT TMTAGTGTC 15601 ATTAGGACTT ACGCCAATAC AAATATCATT AACTTCACTA GAATATCCAG AGTGTTTCAA 1566L AGCAATTCIA TTATTGATAC MTTATTATA TMTTCTTCC CCCTCAATTT CCCAMTMC 15721 ACCGTTACAC GAAGACATAG ATACGTGAI1 MTACATTTA TATCCMCAT ATCGTACCTA 15781 ACCGAATCTT CCCATACCTT TMCTTCTCG AAGTTCCAM CTCAGMCCA MTGATTAM. 15841 CGCAGTMTA TACTCATCCC TAATTTCGM GCTAGCGATA CCCTGATTGT CTGGACCATC 15901 GTTTCTCATA ACTCCGGATA GACAAATATA TTGCGGCATA TATAMGTTG CMTTTGACT 15961 ATCGACTGCG AAGACATTAG ACCGTTTAAT ACAGTCATCC CCACCGATCA AAGMTTAAT 16021 GATAGTATTA TT®rTTCT ATTTAAMTC CMAMCCT
101
ORF 13
ORF 14
FIG . 2-Continued
mentation groups, respectively, in their collections, which map in the Hindill D fragment. A single member of each group was tested by the spot complementation test (Condit and Motyczka, 1981) in order to merge the two sets of is mutants . Growth of two coinfecting viruses at the nonpermissive temperature, 40°, was assayed by plaque formation on BSC 40 cells in 24-well plates. Poor plaque formation at 40 ° was interpreted to mean that the two mutants belong in the same complementation group . Mapping the is mutants by one step marker rescue was carried out as described (Condit et al, 1983; Thompson and Condit, 1986) . A positive result was scored when a subcloned DNA fragment permitted the growth of a is mutant at the nonpermissive temperature . Mutants derived from the Condit and Ensinger collections are prefixed with either a C or an E, respectively .
RESULTS
DNA sequence analysis . The strategy adopted in sequencing this DNA fragment was to divide it into seven smaller subfragments, determine the sequence of each, and join the sequences through the use of synthetic oligonucleotide primers . We proceeded from left to right on the physical map . The result is a sequence 16,059 by in length (Fig . 2). The sequence was translated in all 6 possible reading frames yielding a group of 13 major ORFs which predict proteins greater than 145 amino acids in length (Table 1, Fig. 3). In addition there are 8 possible minor ORFs in a second group that predict polypeptides ranging from 61 to 84 amino acids in length (Table 2, Fig . 3). Although ORF 12 is small in size and, based on its length alone, it should be listed among the
102
NILES ET AL. TABLE 1 STATISTICAL SUMMARY OF THE MAJOR
ORFs
Translation Reading frame
Start
Stop
Number of amino acids
Mol mass (kDa)
pi
ORF 1 ORF 2 ORF 3 ORF 4 ORF 5 ORF 6 ORF 7 ORF 8 ORF 9 ORF 10 ORF 11 ORF 12 ORF 13 ORF 14
103-105 3034-3026 3029-3031 3742-3744 4430-4432 6828-6830 8768-8770 10128-10130 10172-10174 10810-10812 13,264-13,266 13,449-13,451 14,347-14,349 16,033-16,035
2635-2637 2596-2598 3740-3742 4396-4398 6785-6787 8739-8741 9251-9253 9216-9218 10811-10813 11554-11556 11557-11559 13,263-13,265 13,486-13,488 14,380-14,382
844 146 237 218 785 637 161 304 213 248 569 62 287 551
96,708 16,926 27,971 25,032 90,332 68,362 17,892 35,426 24,974 28,893 65,232 7,154 33,331 61,868
8.9 10 .1 6 .5 9 .3 6 .7 7 .0 5 .5 9 .7 9.7 9.5 9.7 9.8 9.8 5.9
minor ORFs, it is included in the group of major ORFs for reasons discussed below . There are several spaces not lettered in Fig . 3 which appear similar in length to spaces denoted as being an ORF . These regions are incapable of being translated into a polypeptide of 60 amino acids in length when translation is initiated at the first available methionine . We chose to disregard this set of possible open reading frames because of their small size . The ORFs in the major group can be
4
8
InYI/m1/1mn ///YII/IINII/IIM 01011111110" ∎/11
I
I
aligned end to end along the length of the DNA fragment (Fig . 1B) . The 5' and 3' ends of the adjoining ORFs in all but one case are separated by 42 or fewer base pairs . In several instances, the coding regions of adjacent ORFs overlap (Fig. 4) . Between ORF 11 and ORF 13 is a gap of 220 by that we feel forms a small ORF which would encode a polypeptide of 61 amino acids in length . Based on its arrangement on the DNA, we are assigning this ORF 12, in spite of its small size .
6
12
16kb
1/11YI
5 I~//I~11 .111
1/.101 1 IIIIN'111/111/1151 e
IIr 4 //ArYlrll~1r11YII/WIYI/ IO /rrll//11/IIIIIII I
nn111111111llsallnmla/nllwulloom a
MINE poll mm 1%01110 No] III
aammin5 ulliutriu51115156u1ISnanlm1555111w5115~i~iii~rRol
iI
l
b P 111111,2 11111ne1111u11 ll in ~YOn nn// 11110
11
° 15 1
14
1
FIG. 3 . Identification of open reading frames. The complete DNA sequence was translated in all six possible reading frames using the Analyseq program developed by Roger Staden (1982) . Vertical lines indicate the position of translation stop codons . Numbers refer to the major groups of ORFs and letters to the group of minor ORFs .
MAPPING VACCINIA VIRUS HiadIII D FRAGMENT
103
TABLE 2 STATISTICAL SUMMARY OF THE MINOR ORFs
Translation Reading frame
ORF a ORF b ORF ORF ORF ORF ORF ORF
c d c f g h
Number of amino acids
Mol mass (kDa)
2891-2893 12281-12283 13952-13954 13204-13206 14527-14529 2313-2315
80 69
8512 7754 7001
6013-6015 6523-6525
72 80
Start
Stop
2651-2653 12074-12076 13766-13768 13021-13023 14311-14313 2565-2567 6229-6231 6763-6765
Coding tricks. The adjacent ORFs are closely packed along the DNA . In several cases, one or two bases are shared between the translation control signals of adjoining ORFs (Fig . 4) . For instance the UAA of ORF 3 shares the A with the AUG of ORF 4 ; the UGA of ORF 9 shares the UG with the AUG of ORF 10 ; and also, the UGA of ORF 12 shares the UG with the AUG of ORF 11 . The 3' ends of ORFs 1 and 2, and ORFs 7 and 8 overlap, as do the 5' ends of ORFs 2 and 3 . There are eight possible minor ORFs in group 2 (Table 2) . Each is embedded in an alternate reading frame with respect to one of the major ORFs . Although the arrangement of the major ORFs is pleasing, and we have evidence that most if not all are expressed, there is no reason to assume at this time that any of the minor ORFs are not also expressed during virus infection . Genetic analysis . Two collections of temperature-sensitive mutants are available for the WR strain of vaccinia virus . Condit et aL (1983) have placed 11 mutants into six complementation groups, and Ensinger and Rovinsky (1983) have formed seven complementation groups from 13 mutants, which map in the HindIII D fragment . We have merged the two collections of mutants by carrying out complementation testing employing one member of each complementation group from each collection (Figs . 5a, b) . Enhanced plaque formation in cells coinfected at 40° is interpreted as positive complementation . For example, in
62 61 72 84
7211 8639 9134 7859 9300
11 .1 5.9 12 .2 10.6 7.0 7 .0 7 .0 10 .6
Fig . 5a, the mutant E17 complements all mutants except C36 . Therefore, E17 and C36 must be in the same complementation group . Some mutants, such as E52 and E94 complement all other mutants and must therefore form their own group . There are a total of eight complementation groups that result from this analysis . The location of each complementation group on the DNA fragment was determined by the one step marker rescue method (Condit et aL, 1983 ; Fig . 6) . In this analysis cells are infected with a is mutant, transfected with a plasmid subcloned from the wild-type HindIII D fragment, and placed at 40° . If the transfecting DNA recombines with the viral genome in the region of the is lesion, wild-type progeny form plaques at the nonpermissive temperature. The number of virus plaques formed varies widely depending both on the is virus and on the transfecting DNA fragment. Virus bearing leaky mutations are difficult to rescue using this approach . The data in Fig . 6 demonstrate that for these 10 viruses, the number of virus plaques formed at the nonpermissive temperature is in all instances well above the 0 DNA control . We are presenting only the successful rescues recorded with the smallest DNA fragment employed to date . Each rescue is part of a set in which larger DNA fragments were also used . For instance, members of group 2 are also rescued with the HindIll D fragment, and the plasmids 714 and 793 (Figs . 1A, C) .
NILES ET AL .
104
CARBOXYLTERMIN OF R1ANDORF2OVERLAP
•
ORF I
2592 .~ 2 641 AGACTTACTTAGTTACTATGTTGTTTATGTCTTTTCTAAGCG TAA TCTGO*AATCAATGATACAACAAATACAGAAAAGATTCGCCATTTATT
ORF 2 AUG FOR ORF2 AND ORF3
OVERLAP
ORF 3
3022 3051 GATATTGATGGACATTTTTATAGTTAAGGA CTATAACTACCTGTAAAAATATCAATTCCT
ORF C
. . . .---
O .UAA.OF ORF3 AND AUG OF ORF4 ORF3 373 2 ~-~ 3751 ATTTGATATAATGAATTCAG TAAACTATATTACTTAAGTC
ORF 4
SHARE AN A
UA
G
o 118A OF ORF4 AND AUG OF ORF5 ARE 32 BP APART ft~ ~ OAF 4392 ~..y 4441 TTATk-AkTGCTTTAGTGAAATTTTAACTTGTGTTCTAAATGGATGCGGCT AATAATTACGAAATCACTTTAAAATTGAACACAAGATTTACCTACGCCGA
5
ORES AND ORF6 ARE 42 BP APART
ORFS
_
ORF 6
6782 6831 CCGTAAATATATGCTCATATATTTATAGAAGATATCACATATCTAA%M GGCATTTATATACGAGTATATAAATATCTTCTATAGTGTATAGATTTACT
OR QRF6 AND ORF7 ARE 28 BP APART
ORF
F
7
87332 8771 H TTCTCCATAAAACTGATGAAATATATAAAGAAATAAA~TGT AAGAGGTATTTTGACTACTTTATATATTTCTTTATTTACA
FiG. 4 . Coding tricks in the HindIII D fragment. Nucleotide sequence arrangement in the areas between adjacent ORFs is highlighted in order to emphasize the interesting coding structures in the DNA which allow the major ORFs to be closely packed .
These results demonstrate that the location of one member of each complementation group correlates well with an observed ORF (Figs . IA-C) . DISCUSSION
Extensive physical mapping studies of the vaccinia genome carried out by several groups have demonstrated that viral mRNA are colinear with their genes and that the genes are closely packed along the viral DNA (Cooper et at., 1981 ; Bajszar et a, 1983 ; Morgan and Roberts, 1984 ; Golini and Kates, 1984 ; Mahr and Roberts, 1994a, 1984b) . DNA sequence analysis of a 7 .6-kb
segment of vaccinia DNA supports this general organization of the viral genes Plucienniczak et aL, 1985) . In this report, we present the primary sequence of a segment of the vaccinia DNA representing about 9% of the genome . The gene arrangement proposed by earlier studies is confirmed by these results . ORFs are present along the entire length of the DNA. In order to achieve such close packing of the ORFs, 42 or fewer base pairs are found between adjacent ORFs. In several instances, the ends of adjoining ORFs overlap. It is interesting to observe that in three cases a single nucleotide is part of two different translation signals in adjacent ORFs .
MAPPING VACCINIA VIRUS HindIII D FRAGMENT CARBOXYL TERIMINI OF ORF7 AND ORFB OVERLAP
105
ORF 7
9212 9261 GAATCTAGTTTTGTTTTTCTCGCGAATATCGTCGACTCA AhA AAAGAGA CTTlb,'rCAAAACAAAAAGAGCGCTTATAGCAGCTGAGTATTTTTTCTCT
~
ORF 8
ORFB AND ORF9 ARE SEPARATED BY 41 BP
_
O RF 9
10131 ~• 10186 GCATAGTAAAAATGAAATGATAACTGTTTAAAAATAGCTCTTAGT"'CtGAATTACAATG CGTATCATTTTTACTTTACTATTGACAAATTTTTATCGAGAATCATACCCTTAATGTTAC
O
U9
ORF 10
OF ORF9 SHARES UG WITH ORF10
10802 10821 AATAGTAA TATGGAACTTTTAC TTATCATTTACTTGAAAATG
GA
A
A OF ORF10 IS ADJACENT TO UAA OF ORF11
ORE 10 % 11552~~ 11571 ATTGATTAGAAAATATAAAT TAAC M TTTTATATTTA
- -UGA AAU--
ORF 11 AUG OF ORF11 AND UGA OF ORF12 OVERLAP 13251 13280 TATATACAGTAGTCATTGTTTTACCGACAC ATATATGTCATCF =AACAAAATGGCTGTG
ORF II
AG[1A !~I
ORF12
ORF12 AND ORF13 ARE SEPARATED BY 34 BASE PAIRS 13445 13500 GATTTACTCATTTATGAGTGATAAGTAATAACTATGTTTTAAAAATCACAGCAGTAGTTT CTAAATGA T7 ATACTCACTATTCATTATTGATACAAAATTTTTPAGTCGTCATCAAA
ORF 12
ORF 13
ORF 13 AND ORF14 ARE SEPARATED BY 34 BASE PAIRS 14345 14390 TTCATCCATTTACAACTCTATAGTTTGTTTTCATTATTATTAGTTATTAT AAGTAGG4AATGTTGAGATATCAAACAAAAGTAATAATAATCA ATA
ORF 14
ORF 13 FiG .
4-Continvevd
We do not yet know the sites of transcription initiation or termination for any of these ORFs . Data from other vaccinia genes show that transcription initiation occurs within 78 nucleotides of the translation start signal, often as close as 1 to 6 bases from the AUG (Venkatesan et at, 1981, 1982 ; Bajszar et at, 1983 ; Bertholet et at, 1983; Weir and Moss, 1984 ; Rosel and Moss, 1985) . The 3' ends of several early transcripts map to sites located as close as 33 nucleotides downstream from the
translation stop signal (Venkatesan et at, 1981, 1982; Bajszar et at, 1983) . If we assume that each of the major ORFs is transcribed and that the general transcriptional organization of the genes in this region of the genome is the same as elsewhere, then we must conclude that the transcriptional control signals for each ORF must in many instances be embedded in the neighboring genes . We arbitrarily divided the ORFs into two groups, major and minor, based primarily
a
C5
C17
C21
C33
C36
C46
E17
E69
E45
E52
E93
E94
E101
0
b
C5 C17 C21 C33 E17 + + + + E69 + + + E45 + + - + E52 + + + + E93 + + + + E94 + + + + E101 + + + -
C36 + + + + + +
046 + + + + + +
FIG. 5. Complementation spot test . (a) The results of the complementation spot test in which one member of each complementation group in the Condit and Ensinger collections were compared . Mutants in the same complementation group fail to form plaques at the nonpermissive temperature, 40° . (b) The interpretation of the spot test in Fig. 5a is presented . + denotes a positive complementation result; - indicates that no complementation was observed . 106
GROUP
DNA
1
C50
722
0 GROUP
2
C17,C24Ee9
C17
C24
E69
DNA
0
797
Marker rescue . Photographs demonstrating positive marker rescue results are presented . In each group, two dishes are infected with a is virus, one is transfected with plasmid DNA, and both are incubated at 40 ° . After 3 days the dishes are stained with crysal violet and photographed . Plaque formation in the presence of a DNA fragment is interpreted as a positive result. Mutant E52 is leaky resulting in the mottled control monolayer . FIG . 6.
108
NILES ET AL. GROUP
3
801
0
DNA
GROUP
DNA
4
E52
791
0
GROUP
DNA
C21
5
C46
0
801 FIG .
6-Continued
on their size . The arrangement of the major ORFs is remarkable in that they lie end to end along the DNA . The minor ORFs, save one, are found in alternate reading frames within a major ORF. This fact does not preclude their expression during infection or for that matter guarantee the expression of the major ORFs . The answer awaits the
results of the transcript mapping analyses now being carried out. We will invoke the sequencers license in order to upgrade one small ORF, ORF 12, into the group of major ORFs, because of its location on the DNA . It fits nicely between the major ORFs 11 and 13, filling what otherwise would be a gap of 220 bp . The translation start signal
MAPPING VACCINIA VIRUS HindIII D FRAGMENT GROUP
DNA
6 E101
0
776
GROUP
DNA
7
E94
0
775
GROUP
DNA
109
8
C5
0
775 FIG .
6-Continued
for ORF 12 is 34 nucleotides downstream from the UGA of ORF 13, and the UGA of ORF 12 shares the UG with the AUG of ORF 11 . This arrangement is identical to that exhibited by other ORFs found in this fragment (Fig. 4). The genetic analysis of the 17 is mutants that map in the HindIII D fragment shows
that there are eight complementation groups . Marker rescue results demonstrate that each group maps to a single defined region of the HindIII D fragment . A comparison of the location of each complementation group and the arrangement of the ORFs is satisfying in that each group maps to an available ORF (Figs . 1B, C) . In the
110
NILES ET AL .
left 3.7 kb there are three major ORFs, 13, and three complementation groups, 4, 7, and 8. Group 2 mutants map to a DNA fragment that contains only two ORFs, 4 and 5. Since the plasmid 793, which rescues C17, cuts in the 3' region of ORF 4, ORF 5 is a likely site for complementation group 5. Groups 3 and 5 are rescued by the plasmid 801 which has only two ORFs, 6 and 7 . A genetically silent region follows that contains ORFs 8-11 . The final two complementation groups, 1 and 6, are mapped to fragment 722 which has three ORFs, 12, 13, and 14 . Further genetic analysis is being carried out in order to assign each is mutant to a segment of each ORF by repeating the marker rescue analysis employing both ORF specific subclones and the single stranded DNA from M13 phage (Carol Thompson, personal communication) used to generate the DNA sequence . A temperature-sensitive mutation in a gene demonstrates that the gene is essential for virus growth . There are at least 13 major ORFs in the HindIII D fragment yet there are only eight complementation groups . Therefore, either the other ORFs are nonessential for virus growth, or the genetic map has not yet been saturated . There are certainly other nonessential genes (Moss et at, 1981) or conditionally nonessential genes (Weir et at, 1982) in vaccinia so that the nonessential nature of several of these genes would not be surprising . However, there is every reason to believe that only a portion of the essential genes have been identified by the random approach of mutant generation applied to date (Condit et at, 1983) . A directed mutagenesis approach will be taken in order to determine the essential nature of each ORF that is not already assigned to a complementation group, in order to determine if it is an essential gene . The map positions of the major ORFs correlate nicely with genes already mapped to the HindIII D fragment . Morgan et at (1984) identified a 84-kDa protein encoded by a 3-kb mRNA derived from the left end of the HindIII D fragment, and showed that it behaved like the large subunit of the viral guanyl transferase . Although there is some discrepancy in the size of the
protein, these data agree reasonably well with the location of ORF 1 . Weinrich et at (1985) identified a 65-kDa late protein that maps to the region of ORF 14 . Tartaglia and Paoletti (1985) have demonstrated that resistance to rifampicin maps to the same site . They reported the sequence of 445 by from the mutant DNA which compares with our data from position 14,208 to 14,653 . Interestingly, the sequence reported by Tartaglia and Paoletti (1985) differs from that presented here in 7 positions . The differences, and their hypothetical consequences, can be summarized as follows . (1) Relative to our sequence, the Tartaglia and Paoletti (1985) sequence has a 3-base deletion, ATT (Fig . 2), 14623-14625, which removes half of what in our sequence is a direct repeat, ATTATT, and results in the loss of one amino acid . (2) The published sequence has a 1-base insertion, G, position 14491 (Fig . 2), and 2 single base deletions, A, at both 14456 and 14463 (Fig. 2). Collectively, these mutations result in a frameshift effecting the carboxyl terminal 36 amino acids of our predicted ORF 14 sequence . Their sequence predicts an ORF 14 protein which is 11 amino acids shorter and has an altered carboxyl terminal 25 amino acids . (3) The published sequence deletes an additional A, 14342 (Fig . 2), which changes the reading frame of our ORF 13 at the third amino acid, resulting in a truncated protein . In order to preserve our predicted ORF 13 reading frame, an available upstream ATG could be used in their virus which would add 8 amino acids to the amino terminus of their ORF 13 protein . These sequence differences undoubtedly represent virus strain differences . Although the viruses in question originated from the same source, up to that point, the virus preparation had been serially passaged over many years without plaque purification. Stocks passaged in this fashion can be heterogeneous in composition (Wittek et at, 1978), and the viruses that we have used for sequence analysis were plaque purified from such stocks at different times and in different laboratories . In fact, the virus used by Tartaglia and Paoletti (1985) was originally characterized
MAPPING VACCINIA VIRUS
by Panicalli et aL (1981) as being a deletion variant relative to previously characterized WR strains of vaccinia . Finally, it is noteworthy that our sequence at and around the site identified by Tartaglia and Paoletti (1985) as the rifampicin locus has the same sequence as their "wild-type" rifampicinsensitive strain. All of the major ORFs have been analyzed for homology to the protein sequences at the Protein Identification Resource of the National Biomedical Research Foundation at the Georgetown University Medical Center . The only striking homology observed is between ORF 8 and carbonic anhydrase from horse . The two sequences share 36% identical amino acids within a 238-amino acid overlap . The significance of this homology can only be properly addressed by genetic analysis of the ORF 13 gene . One final observation should be made . There are two sets of divergent open reading frames present on this DNA fragment . ORF 2 and 3 actually overlap by 8 nucleotides at their N-terminal ends while ORFs 8 and 9 are separated by 41 by (Fig . 4) . Genetic mapping in the region of ORF 2 and ORF 3 (Fig. 1C), and the preliminary transcript mapping data, provide evidence that these ORFs are expressed . Therefore, two sets of divergent transcription initiation signals must exist, and each are likely to provide an intriguing transcription regulatory scheme worthy of further study . ACKNOWLEDGMENTS W e express our appreciation to Nigel Godson, Goeffrey Smith, Rob Carter, Bob Kong, and Frank Rees, for their helpful suggestions . We would like to thank Marcia Ensinger for supplying virus samples from her collection . This work was supported by NSF PCM 8311864 .
REFERENCES BAJSZAR, G ., WITER, R., WIER, J ., and Moss, B. (1983) . Vaccinia virus thymidine kinase and neighboring genes: mRNAs and polypeptides of wild-type virus and putative nonsense mutants . J. Virot 45,62-72 . BELLE-ISLE, H ., VENKATESAN, S., and Moss, B. (1981) . Cell-free translation of early and late mRNAs selected by hybridization to cloned DNA fragments
HindIII
D FRAGMENT
111
derived from left 14 tp 72 million daltons of the vaccinia virus genome . Virology 112,306-317 . BERTHOLET, C ., DRILLIEN, R ., and WITTEK, R . (1985). One hundred base pairs of 5' flanking sequence of a vaccinia virus late gene are sufficient to temporally regulate late transcription . Prm. NatL Acad Set USA 82, 2096-2100 . CLEWELL, D . B ., and HELINSKY, D . R . (1972) . Effect of growth conditions on the formation of the relaxation complex of supercoiled ColEI deoxyribonucleic acid and protein in Escherichia eoli . ,L BacterioL 110,1135-1146 . CONDIT, R . C ., and M0TYCZKA, A . (1981) . Isolation and preliminary characterization of temperature sensitive mutants of vaccinia virus . Virology 113, 224241 . CONDIT, R . C ., MOTYCZKA, A ., and Spin, G . (1983) . Isolation, characterization and physical mapping of temperature sensitive mutants of vaccinia virus .
Virology 128, 429-443. COOPER, J . A ., WITTEK, R., and Moss, B . (1981) . Extension of the transcriptional and translational map of the left end of the vaccinia virus genome to 21 kilohase pairs. J. Virol. 39, 733-745. ENSINGER, M . J ., and RovixsICY, M . (1983) . Marker rescue of temperature sensitive mutations of vaccinia virus WR : Correlation of genetic and physical maps . J. ViroL 48, 419-428 . GOLINI, F ., and KATES, J. R. (1984) . Transcriptional and translational analysis of a strongly expressed early region of the vaccinia virus genome . J. Tirol, 49,459-470 . HENIKOFF, S . (1984) . Unidirectional digestion with exonuclease III creates targeted breakpoints for DNA sequencing . Gene 28, 351-359 . MAHR, A ., and ROBERTS, B. E . (1984a) . Arrangement of late RNAs transcribed from a 7 .1-kilobase Eco R1 vaccinia virus DNA fragment . J ViroL 49, 510520 . MAHR, A ., and ROBERTS, B . (1984b) . Organization of six early transcripts synthesized from a vaccinia virus Eco R1 DNA fragment . J. ViroL 49, 497-509 . MONROY, G ., SPENCER, E ., and HURwITZ, J . (1978) . Purification of mRNA guanylyltransferase from vaccinia virions . J. Biel Chen 253, 4481-4489 . MORGAN, J . R., COHEN, L . K., and ROBERTS, B . E . (1984). Identification of the DNA sequences encoding the large subunit of the mRNA-capping enzyme of vaccinia virus . J. ViroL 52, 206-214 . MORGAN, J. R, and ROBERTS, B . E. (1984) . Organization of RNA transcripts from a vaccinia virus early gene cluster. J Viroi 51, 283-297 . Moss, B . (1985) . Replication of poxviruses . In "Virology" (B . N . Fields, D . M . Knipe, R . M . Chanock, J . L. Melnick, B . Roizman, and R . E . Shope, eds .), pp . 685-704 . Raven Press, New York . Moss, B ., WINTERS, E ., and COOPER, J . A . (1981). Deletion of a 9000 base pair segment of the vaccinia
112
NILES ET AL.
virus genome that encodes nonessential polypeptides . J ViroL 40,387-395, PANICALI, D., DAVIS, S . W ., MERCER, S . R, and PAoLErrI, E . (1981) . Two major DNA variants present in serially propagated stocks of the WR strain of vaccinia virus. J ViroL 37, 1000-1010 . PLUCIENNICZAK, A ., SCHROEDER, E, ZEITLME ssL, G ., and STREECK, R . E . (1985). Nucleotide sequence of a cluster of early and late genes in a conserved segment of the vaccinia virus genome . Nucleic Acids Res. 13,985-998 . Rosnr, J ., and Moss, B. (1985) . Transcriptional and translational mapping and nucleotide sequence analysis of a vaccinia virus gene encoding the precursor of the major core polypeptide 4b . J ViroL 56,830-838 . SANGER, F., NICKLEN, A . R., and COULSON, A . R. (1977) . Proc NAIL Acad Sca USA 74, 5463-5467 . STADEN, R. (1982) . Automation of the computer handling of gel reading data produced by the shotgun method of DNA sequencing . Nucleic Acids Rea 10, 4731-4751 . TARTAGLIA, J., and PAOLEVPI, E . (1985) . Physical mapping and DNA sequence analysis of the rifampicin resistance locus in vaccinia virus . Virology 147, 394-404 . THOMpsoN, C . L ., and CONDrr, R . C. (1986) . Marker
rescue mapping of vaccinia virus temperature-sensitive mutants using overlapping cosmid clones representing the entire virus genome . Virology 149, in press . VENKATESAN, S ., BAROUDY, B. M ., and Moss, B . (1981) . Distinctive nucleotide sequences adjacent to multiple initiation and termination sites in an early vaccinia virus gene . Cell 25, 805-813. VENKATESAN, S ., GERSHOwrnz, A ., and Moss, B . (1982) . Complete nucleotide sequences of two adjacent early vaccinia virus genes located within the inverted terminal repetition . J. Virol 44, 637-646. WMNRICH, S. L., NILES, E. G ., and HRunY, D . E . (1985) . Transcriptional and translational analysis of the vaccinia virus late gene L65. J ViroL 55, 450-457. WEIR, J . P., BA,SZAR, G ., and Moss, B. (1982) . Mapping of the vaccinia virus thymidine kinase gene by marker rescue and by cell-free translation of selected mRNA. Proc NatL Acad Sci USA 79, 12101214 . WEIR, J. P., and Moss, B. (1984). Regulation of expression and nucleotide sequence of a late vaccinia virus gene. ,7. Virot 51, 662-669. WITTEK, R ., MULLER, H . K ., MENNA, A ., and WYLER, R. (1978). Length heterogeneity in the DNA of vaccinia virus is eliminated on cloning of the virus . FEES Let& 90, 41-46 .