Gene, 152 (1995) 201 204 © 1995 Elsevier Science B.V. All rights reserved. 0378-1119/95/$9.50
201
GENE 08504
Sequence analysis and transcriptional mapping of the orf-2 gene of Autographa californica nuclear polyhedrosis virus (Gene characterization; nucleotide sequence; amino-acid sequence)
Marc Ohresser a'b, Nathalie M o r i n c, Martine Cerutti b and Claude Delsert a alnstitut Francais de Recherche pour l'Exploitation de la Mer and Centre National de la Recherche Scientifique UMR 9947, Universitk de Montpellier 2, 34095 Montpellier Cedex 5, France; bInstitut National de la Recherche Agronomique and Centre National de la Recherche Scientifique UA 1184, 30380 Saint Christol-lez-Al~s, France. Tel. (33-66) 783-703; and CCentre National de la Recherche Scientifique UPR 9008 and lnstitut National de la Santk et de la Recherche Mbdieale U249, Route de Mende, 34000 Montpellier, France. Tel. (33-67) 613-330
Received by A. Kohn: 10 July 1994; Revised/Accepted: 9 September/15 September 1994; Received at publishers: 13 October 1994
SUMMARY
Sequencing of the dnapol promoter region of Autographa californica nuclear polyhedrosis virus (AcNPV) revealed an overlapping open reading frame (ORF) in an antisense orientation, referred to as ORF-2. Analysis of the ORF-2 deduced amino-acid sequence revealed two short regions of homology with a similar ORF from Lymantria dispar nuclear polyhedrosis virus (LdNPV). Two 3' processing signals of this gene, expressed late during infection, were shown to be located on the orf-2 stop codon and 162 nucleotides further downstream.
INTRODUCTION
The Baculoviridae, a family of large double-stranded DNA viruses infecting insects (Blissard et Rhormann, 1990), is known to express a DNA polymerase activity (Miller et al., 1981). Sequencing of a 3.6-kb region of the AcNPV DNA, from 39.5 to 42.5 map units, allowed the identification of an ORF presenting some degree of homology with DNA polymerases from mammalian viruses (Lardep et al., 1987; Tomalski et al., 1988). A distantly related baculovirus, LdNPV, was shown to contain a related ORF, which shares 48% aa sequence homology with AcNPV dnapol (Bjornson et al., 1992). Analysis of the organization of the dnapol-containing region revealed a second ORF, referred to as ORF-2, in
Correspondence to: Dr. C. Delsert, D6fense et R6sistance chez les Invert6br6s Marins, UMR9947, IFREMER/CNRS, Universit6 de Montpellier 2, CP80, 2 Place E. Bataillon, 34095 Montpellier Cedex 5, France. Tel. (33-67) 144-625; Fax (33-67) 144-622; e-mail:
[email protected]
SSDI 0378-1119(95)00715-2
LdNPV genome (Bjornson et al., 1992). ORF-2 is located upstream from dnapol and oriented in the opposite direction, with the two initiation codons separated by only a few nucleotides (Bjornson et al., 1992). Sequence determination of the AcNPV dnapol promoter region revealed a similar organization with an ORF starting at the same coordinates (Tomalski et al., 1988; Ohresser et al., 1994). Moreover, a consensus sequence for the initiation of transcription of baculovirus late genes, GTAAG, is present 14 nt downstream from the dnapol initiation codon, in both AcNPV and LdNPV baculovirus (Bjornson et al., 1992; Ohresser et al., 1994). Here we report the complete sequence of the AcNPV off-2 gene, the analysis of its encoded protein and the mapping of its 3' processing signals.
Abbreviations: aa, amino acid(s); AcNPV, Autographa californica nuclear polyhedrosis virus; bp, base pair(s); dnapol, gene encoding the DNA polymerase; kb, kilobase(s) or 1000 bp; LdNPV, Lymantria dispar nuclear polyhedrosis virus; nt, nucleotide(s); ORF (orf), open reading frame (and gene); p.i., post-infection; poly(A), polyadenylated tail; tsp, transcription start point(s).
202 EXPERIMENTAL AND DISCUSSION
(a) Cloning and sequencing of the orf-2 region The SacI-F D N A restriction fragment of A c N P V genome (strain 1.2, a kind gift of G. Croizier) (Ohresser et al., 1994) contains the entire dnapol gene and its upstream region was subcloned as the E c o R I - U and E c o R I - V fragments, and as a 235-bp EcoRI-235 (Fig. 1A). EcoRI-235 was derived from an EcoRI site not present in the A c N P V L1 strain. D N A sequencing revealed a 954-nt long O R F in an antisense orientation, termed O R F - 2 (Fig. 1C).
(b) Sequence comparison and analysis of the ORF-2 protein The orf-2 gene encodes a protein of 317 aa corresponding to a predicted molecular mass of 36.3 k D a and with a pl of 7.l. Thus, although the beginning of both O R F s is precisely the same in A c N P V and L d N P V , A c N P V O R F - 2 contains an additional 126 aa in C terminus and
the percentage of conservation between both viruses is considerably lower than previously estimated (Bjornson et al., 1992) (Fig. 2A). However, based upon the alignment of the N-terminal part of A c N P V and L d N P V ORF-2, a 25-aa motif with 48% conservation ( 10 aa identical out of 25 and two conservative changes from T to S) was observed starting at aa 6 (Fig. 2B). As previously reported (Bjornson et al., 1992), a highly conserved Tyrrich motif, T H N ( I / F ) N Y K Y D Y N , is present at aa 86 (Fig. 2B). Additionally, the A c N P V O R F - 2 region located between aa 106 and aa 144, contains 56% of Pro (21 out of 38 aa) against 37% for its L d N P V counterpart ( 15 out of 39 aa between the aa 127 and 166) (Fig. 2B). Such a Pro content gives a predicted structure (Taylor and Thornton, 1984) rich in J3-turns and J3-sheets (data not shown). C o m p u t e r analysis (Dessen et al., 1990) revealed an N-terminal region rich in hydrophilic peptides and two main antigenic determinants (data not shown), while the C-terminal half is mostly h y d r o p h o b i c (Fig. 2C) ( H o p p and Woods, 1981). Two putative sites
A .Sac !
, ~
- - v -
,
z3s
:>~
U --
EcoRI
1 A.,,.
C -74 +1 +61 +121 +181 +241 +301 • 361 +421 +481 +541 +601 +661 +721 +781 +841 +901 +955 + 1034 +1113 + ] 192 * ]271
C~%~TC CTGGTTTTGCATATTCTGCAAAGCGCGTTTTGAGTTCATT~-~'~ATATA~TAAAT C A G C G A T G G C C C A A A T A T G G A G G C A C A G A C G T C A A C A C G C G C A C T G T A CAC_ G A T T T G T T A A A C A C C A T A A A C ACC A T G A G T G C T C G A A T C A A A A C T C T G G A CT T T G C G A G A G A T T C A C A A A G T C G T T G T A A T T T T G A A A C C G TCC G C G A A C A C A C A T AGC T T T GAA CCC GAC GCT CTG CCG GCG TTG ATT ATG CAA TTT TTA TCG GAT TTC GCC GGC CGA GAg A T C A A C A C G T T G A C G C A C A A C A T C A A C T A C A A G T A C G A T TAC A A T T A T C C G C C G G C G C C C G T G C C C G C G A T G C A A C C A C C G C C A C C G C C T C C T C A A C C C CCC G C G C C A C C T C A A C C A C C G TAT TAC AAC AAT TAT CCG TAT TAT CCG CCG TAT CCG TTT TCG ACA CCG CCG CCA ACA CAG C C G C C A G A A T C G A A C GTC G C G G G C G T C G G C GGC T C G C A A A G T T T G A A T C A A A T C A C G T T G ACT AAC GAG GAG GAG TCT GAA CTG GCG GCT TTA TTT AAA AAC ATG CAA ACG AAC ATG ACT TGG GAA CTT ~ C A A A A T T T C G T T G A A G T G T T A A T C A G G ATC G T A C G C G T G C A C G T A G T A AAC AAC GTG ACC ATG ATT AAC GTT ATA TCG TCT ATA ACT TCC GTT CGA ACA TTA ATT GAT T A C A A T T T T A C A G A A T T T A T T A G A T G C G T A TAC C A A A A A A C A A A C A T A C G T T T T G C A A T A GAT CAG TAT CTG TGC ACT AAC ATA GTT ACG TTT ATA GAP TTT TTT ACT AGA GTC TTT TAT TTG GTG ATG CGA ACA RAT TTT CAG TTC ACC_ACT TTT GAC CAA TTG ACC CAA TAC TCT AAC GAA CTT TAC ACA A~ ATE CAA ACG AGC ATA CTT CAA AGC GCG GCT CCT CTT TCT CCT CCG A C C G T G G A A A C G T C A A C A G C G A T A T C G T C A T T T C A A A T T T G C A A G AAC AAAGAGAACGCGCTTTGATGCAACAAATCAGCGAGCAACATAGAATTGCAAACGAAAGAGTGGAAACTCTGCAATCGCA ATACGACGA~TT~R~I>ATARIG~%~KTATTTGAAGACAAAAGTGL/&~ECGCACAACAAAAAAGTGAAAACGTG C G A A A A A T T A A A C A A T T A G A G A G A T C C A A C A A A G A A C T C A A C G A C AC C G T A C A G A A A T T G A G A G A T G A A A A T G C C G A A A GATTGTCTGAAATACAATTGCAAAAAGGCGATTTGGACGAATATAAAAACATGRATCGCCAGTTGAACGAGGACATTTA TAAACTCAAAAGAAGAATAGAATCGACATTTGA
Fig. 1. Schematic representation and sequence of the AcNPV or/'2 gene. (A) Respective location of the orfl2 (hatched box) and dnapol (open box) genes. Broken arrows are for early and late tsp. + 1 corresponds to the origin of nt numbering. Asterisks are for restriction sites and the restriction fragments are marked U, 235 and V for EcoRI-U, EcoRI-235 and EcoRI-V fragments, respectively. (B) Schematic representation of the sequencing strategy with arrows figuring independently determined sequences. Sequence was determined for U, 235 and V fragments subcloned in pBluescript and on pBdpl to 5 plasmid series, carrying truncations in the dnapol promoter region as previously described (Ohresser et al., 1994). Additionally, specific primers were used on the SacI-F fragment. (C) ort:2 nt sequence. Start codon (the A of the orf-2 ATG is defined as coordinate + 1) and stop codon are boxed. Lower-case letters indicate the non-coding sequence. Regions corresponding to specific sequencing primers are underlined by arrows. The tsp are indicated by bent arrows, while putative polyadenylation signals are underlined. EcoRl sites are in bold characters. The nt sequence appears in the EMBL Sequence Database under accession No. X78446.
203 A
A
0
Ld
2
4
6
8
9
10
11 12
24
c
a
P
191 //
/
-,q-311
I S7 /
118
/
-',II--241/831
79
40
,I, 65
I 129
I ,, 193
t 257
I AC 317
1
AC ~]~rgI[YG-GTD-~-Rq;~rBDU~l~211~'~AP.ZKTLER~ESM~IU~.~EX'~'J'~TSKPS;~l~SFIEpD4-63
* 1 * • * , * * ~* Ld -MASYRYRAPTRYINA~3VSVUNLLRTIDSMSRQCRSRNETESELAR'/RS~-TLYRP-~L~NRVDL "t'~3
"ell-
O"
-.,e,--139/129
AC *AI~AL LHQ FL5 D FAGRE ~RTLT~N ~%q ~ Y D L ~ p p .%P~M4Q ........................ "105 Ld QVAELVLEALMPPNGA--QQITINF~TKTDYMTNSAAAPPRI~SRRAAAPDRRV[tRAE~VRTDAP
÷~27
AC PPPPPPQPPAPPQPPYT~II~PTTPPYPFSTPPPTQPPE$~AGVGGSQSLNQITLTNEEE~ELAAL
4.171
Ld $ PPP P P KADAP tIP1 QQ~V~ ~NSAGEAA3%P S S PRP P PPPASGe~IAS RscrWIT!WI~CR~ RTTTTp
4.191
AC F E ~ Q N F V E ~ L I R / V ~ I ~ I ~ S S I T S V I L T L I D ~ F T E F ~ R C ~ Q K . ~ I R A¢ R F A I D ~ T L C T M ~ I D F ~ / ~ F T T F ~ L T ~ T S N E L T T R I Q T S I L O S A A P L S ~ P ~ I AC ~TSTAISSFQ I C K ~
+237 4..307 +317
E
C
poly(A)
ooly(A)
6 4
mRNAs
2
0 -2
Protected fragments
-4
.3~I
6.3
9.~
i2~"
1~6
1~'/
218'
2"t9
280
A
~ I I I I
. . . .
129/139-
A A
12r/3, 235
311
Fig. 2. Sequence comparison and analysis of the ORF-2 protein of AcNPV. (A) Dot matrix representation of the comparison of the deduced ORF-2 protein sequence of AcNPV and LdNPV. Comparison was performed on MacMolly program (Soft Gene, Berlin, Germany) with a window size of seven aa (one mismatch accepted). (B) AcNPV and LdNPV ORF-2 aa sequences were aligned using the main sequence homology stretch. Numbering starts on the first aa. Asterisks are for perfect match and dashes for gaps. (C) Hydrophilicity profile for the ORF-2 protein (Hopp and Woods, 1981).
for N-glycosylation were identified at aa 13 (NTR) and 235 (NIR), whereas no nuclear localization sites were found. Analysis of the AcNPV ORF-2 protein on GenPro, SWlSSPROT and NBRF-PIR protein data banks neither revealed significant homology with any known protein nor allowed the identification of particular motifs (data not shown) which might suggest a function for this protein. Thus, the complex organization of this region conserved in two distantly related baculoviruses remains intriguing, since it is not involved in the dnapol regulation of transcription (Ohresser et al., 1994) and since the ORF-2 aa sequence presents little conservation.
Riboprobe
~
311
. . . . .
Fig. 3. Mapping of the off-2 3' processing signals. (A) RNase protection analysis of RNA extracted of AcNPV-infected Sf9 cells at various times p.i. (0, 2, 4, 6, 8, 9, 10, 11, 12 and 24 h), as indicated on the above line and 24 h p.i. when treated with cycloheximide (lane c) or aphidicolin (lane a). Undigested products were electrophoresed on a 7 M urea-6% polyacrylamide gel. When hybridized to RNA extracted after 6 h p.i., the 311-nt riboprobe gave protection to fragments of approx. 129/139 nt and 231/241 nt, respectively, corresponding to RNA terminated at the level of the stop codon or further downstream. Lane P is for undigested probe. (B) Schematic representation of the orf-2 3'-end specific riboprobe and of the protected fragments it generates in RNase protection assay. Numbers indicate the size in nt. E, for EcoRI; poly(A) for putative 3' processing sites; The AAA tracks for polyadenylated tail.
(e) Mapping of the 3' processing signal While the orf-2 tsp have been previously identified (Ohresser et al., 1994), 3' processing signals remained to be mapped. Analysis of the 349-bp sequence downstream from the stop codon revealed two polyadenylation related sequences, AATTAA, one overlapping the stop codon, while the other one is located 163 bp downstream from the stop codon (Fig. 1C). Such sequences are known as 3' processing signals for certain baculovirus genes on both AcNPV and Bombyx mori nuclear polyhedrosis
204 virus (Guarino and Summers, 1987: Huybrechts et al., 1992). To test these putative sites, we performed RNase protection assay using a radiolabelled 311-nt riboprobe specific for the 3' end of the o1"1"-2,in vitro transcribed from a HindIII-linearized pBKS2 plasmid (Stratagene, La Jolla, CA, USA) carrying the AcNPV EcoRI-235 fragment mentioned above (Fig. 1A). RNase protection analysis was done as previously described (Ohresser et al., 1994) on 5 lag of total RNA extracted (Chomczynski and Sacchi, 1987) from AcNPV infected Sf9 cells at various time p.i. (0, 4, 6, 8, 9, 10, 11, 12 and 24 h). In this assay, the 311-nt riboprobe gave protection to two fragments of approx. 231/241 nt and 129/139 nt, when RNA was extracted later than 6 h p.i. (Fig. 3A). The 129/139-nt signal is specific for a transcript polyadenylated 27 to 37 nt further downstream than the polyadenylation signal overlapping the stop codon. This value is in good agreement with the typical distance (l0 to 35 nt) observed between a conventional polyadenylation signal, AATAAA, and the corresponding poly(A) tail (Wahle and Keller, 1992). The 231/241-nt signal corresponds to a full protection of the homologous part of the riboprobe (Fig. 3B) by transcripts terminated further away, presumably at the termination signal we found 162 bp downstream from the off2 stop codon. Because the specific radioactivity of a riboprobe is proportional to its length and because the 129/139-nt and 231/241-nt signals are of similar intensity (Fig. 3A), our data show that approx. 60% of the orf-2 transcripts terminate near the stop codon. When done on RNA from cells treated with cycloheximide or aphidicolin which, respectively, inhibit delayed early and late transcription, RNase protection provided no signal. These data are in agreement with previous results indicating that the orf-2 gene is expressed as a late gene (Ohresser et al., 1994).
ACKNOWLEDGEMENTS
We thank G. Croizier and G. Devauchelle for helpful comments during the course of this work and M. Lopez
for critical reading of the manuscript. This research was supported by the grant 7807 (to M.O.) from Socidt6 Nationale Elf Aquitaine.
REFERENCES Bjornson, R.M., Glocker, B. and Rohrmann, J.F.: Characterization of the nucleotide sequence of Lymantria dispar nuclear polyhedrosis DNA polymerase gene region. J. Gen, Virol. 73 (19921 3177 3183. Blissard, G.W. and Rhormann, F.: Baculovirus diversity and molecular biology. Annu. Rev. Entomol. 35 (1990) 127 155. Chomczynski, P. and Sacchi, N.: Single-step method of RNA isolation by acid guanidinium thiocyanate-phenol-chloroform extraction. Anal. Biochem. 162 (1987) 156 159. Dessert, P., Fondrat, C., Valencien, C. and Mugnier, C.: A French service for access to biomolecular sequence databasis. Comp. Appl. Biosci. 6(1990) 355 356. Guarino, L.A. and Summers, M.D.: Nucleotide sequence and temporal regulation of a baculovirus regulatory gene. J. Virol. 61 (1987) 2091 2099. Hopp, T.P. and Woods, K.R.: Prediction of protein antigenic determinants from amino acid sequences. Proc. Natl. Acad. Sci. USA 78 (1981) 3824 3828. Huybrechts, R., Guarino, L., Van Brussel, M. and Vulsteke, V.: Nucleotide sequence of a transactivating Bomhyx mori nuclear polyhedrosis virus immediate early gene. Biochim. Biophys. Acta 1129 11992) 328 330. Lardep, B.A., Kemp, S.D. and Darby, G.: Related functional domain in virus DNA polymerases. EMBO J. 6 (1987) 169-175. Miller, L.K., Jewell, J.J. and Browne~ D.: Baculovirus induction of a DNA polymerase. J. Virol. 40 (1981) 305 308. Ohresser, M., Morin, N., Cerutti, M. and Delsert, C.: Temporal regulation of a complex and unconventional promoter by viral products. ,I. Virol. 68 (1994) 2589 2597. Taylor, W.R. and Thornton, J.M.: Recognition of super-secondary structure in proteins. J. Mol. Biol. 173 (1984)487 514. Tomalski, M.D., Wu, J. and Miller, L.K.: The location, sequence, transcription and regulation of a baculovirus DNA polymerase gene. Virology 167 (1988) 591-600. Wahle, E. and Keller, W.: The biochemistry of 3' end cleavage and polyadenylation of messenger RNA precursor. Annu. Rev. Biochem. 61 (19921 419 440.