Expression of adenoviral genetic information in productively infected cells

Expression of adenoviral genetic information in productively infected cells

Biochimica et Biophysica Acta, 651 (1982) 175-208 175 Elsevier Biomedical Press BBA 87097 E X P R E S S I O N OF A D E N O V I R A L GENETIC I N F...

3MB Sizes 0 Downloads 51 Views

Biochimica et Biophysica Acta, 651 (1982) 175-208

175

Elsevier Biomedical Press

BBA 87097

E X P R E S S I O N OF A D E N O V I R A L GENETIC I N F O R M A T I O N IN PRODUCTIVELY INFECTED CELLS S.J. F L I N T

Department of Biochemical Sciences, Princeton University, Princeton, NJ 08544 (U.S.A.) (Received A u g u s t 12th, 1981)

Contents 1.

Introduction ............................................................................

175

II.

Structure of a d e n o v i r a l early genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Early regions E 1 - E A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. R e g i o n E I A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. R e g i o n E I B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. F u n c t i o n of type C adenoviral region 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. Region E l of other serotypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. Region E2A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. Region E3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7. Region E4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. E x p r e s s i o n of a d d i t i o n a l genes d u r i n g the early phase of infection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. Region E2B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. The IVa 2 gene . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Classical late genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. A d d i t i o n a l genes in the region 17-30 units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

176 176 176 178 179 180 181 182 184 185 185 186 187 187

III.

Synthesis of a d e n o v i r a l early m R N A species . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

187

IV.

R e g u l a t i o n of a d e n o v i r a l early gene expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

190

V.

O r g a n i z a t i o n a n d expression of a d e n o v i r a l late genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. The r-strand transcriptional unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. O t h e r late genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. Structure a n d splicing of a d e n o v i r a l late leader segments . ~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

193 193 196 196

Vl.

The early to late transition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

197

VlI.

Structure a.nd function of a d e n o v i r a l ' p r o m o t e r ' sites . . . .

200

............................................

VlI1. The V A - R N A genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

202

Acknowledgements ............................................................................

204

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

204

I. Introduction

The mechanisms that mediate the synthesis of viral mRNA species in permissive human cells infected by human adenoviruses, especially se~o-

A b b r e v i a t i o n : DBP, D N A - b i n d i n g protein. 0 3 0 4 - 4 1 9 X / 8 2 / 0 0 0 0 - 0 0 0 0 / $ 0 2 . 7 5 © 1982 Elsevier Biomedical Press

types 2 and 5, have been the subject of ever more intense scrutiny over the past two decades. These viruses, whose genomes comprise linear, doublestranded D N A molecules of (20-25). 106 daltons and thus are of moderate complexity among DNA-containing viruses, encode no known biosynthetic enzymes; consequently, adenoviruses depend absolutely on their hosts' synthetic machin-

176

ery for replication of their DNA and synthesis of their m R N A species and p o l y p e p t i d e s . Adenovirus-infected cells, therefore, provide a convenient model system for the study of mechanisms of gene expression in eukaryotic cells, a feature with especial attraction before the advent of molecular cloning methods. Of course, adenoviruses share this property with other DNA-containing viruses, such as the papovaviruses, or viruses whose life-cycle depends on a DNA intermediate, the retroviruses. Compared to these systems, adenovirus-infected cells offer the unique advantage of efficient suppression of cellular macromolecular synthesis as infection proceeds, facilitating recovery and analysis of virusspecific macromolecules. Moreover, the intricate governing circuits to which adenoviral gene expression appears to be subject in productively infected cells provide opportunities to dissect the molecular mechanisms that regulate gene expression in eukaryotic cells. The molecular biology of adenovirus infections has been the subject of numerous reviews, including several of recent origin [1,2]. This article is most concerned therefore with advances that have improved our understanding of the mechanisms and regulation of adenoviral gene expression.

that each of these five classical early regions encodes multiple, related mRNA species.

IIA-1. Region E1A Early region EIA occupies the r-strand from 1.3 to 4.6 map units and specifies at least three mRNA species [5-8], designated EIA-a, b and c in Fig. 1. Species E1A-b is the major form made at 6 h after infection in the absence of drugs that inhibit DNA synthesis or in the presence of such drugs, cytosine arabinoside or cycloheximide [5,6]. This mRNA species contains an internal splice from 2.6 to 3.3 map units. Species E1A-a contains a very short internal splice, 0.1-0.3% of the genome, and was defined by nuclease S1 mapping [5]: it certainly corresponds to a species described as unspliced by Chow and her colleagues [6], for the very small intervening sequence loop formed in a heteroduplex between EIA-a mRNA and its template would be very difficult to discern in the electron microscope. The third species, E1A-c, contains the largest intervening sequence, some 1.7 units, and constitutes a relatively minor proportion o f . t h e E1A

E1B -

o, 13S 53k, 44k -'^"

b 12S

11. Structure of adenoviral early genes

^,

47k, 35~

9Sc

ix

~

~

.

22Sa

55 65k

11.4. Early regions E l - E 4 c

Four regions of the adenoviral g e n o m e encoding early RNA sequences, defined as those synthesized in the absence of viral DNA replication, were identified originally by saturation hybridization experiments [3,4] and designated El, E2, E3 and E4, reading from left to right across the genome. Mapping of individual mRNA species to specific restriction endonuclease fragments of adenovirus type 2 (Ad2) DNA and of early promoter sites (see Ref. 1) soon revealed that the El region in fact comprises two closely spaced transcriptional units, E1A and E1B. It is now clear from detailed analysis of the structure of early mRNA species by the nuclease S1/exonuclease VII mapping technique [5] or electron microscopic examination of D N A : R N A heteroduplexes [6,7]

---~''-

9S

............ 28k

1

2

3

13Sb

15-19k 4

5

6

7

8

9

10

11

Fig. I. Messenger RNA species complementary to ea'rly regions E I A and E1B of type C adenovirus. The left-hand region of the type C adenoviral genome is represented by the solid horizontal line divided into map units at the bottom of the figure. The m R N A species complementa W to earl_,,' regions EIA and EIB are depicted as horizontal arrows, drawn in the direction of transcription, in which the caret symbols represent sequences removed during splicing. The species are designated both in terms of their sedimentation coefficients and following the nomenclature of Chow et al. [6]. The coding sequences of each m R N A species are shown by the horizontal boxes drawn belo'a it. In this representation, the frame read in at the 5' end of EIA m R N A species is open, whereas the reading frames shifted by two nucleotides is shown stippled. The molecular weights of the encoded polypeptides are listed below each mRNA species. The sources of these data are given in the text.

177

total E1A mRNA population present at 6 h after infection, or in infected cells held in the early phase by cytosine arabinoside or cycloheximide [6]. Its relative concentration, however, increases as adenovirus infection progresses [6]. Such conclusions about the relative concentrations of individual E1A m R N A species based on quantitative electron microscopy agree well with the results of size fractionation and quantitation of individual, labelled mRNA species [9]: E1A-c 9 S mRNA, for example, was not observed until later times after infection than the other two EIA mRNAs. The nucleotide sequence of the EIA region has been determined completely for a number of adenovirus serotypes, type5 [10,11], type7 [12] and type 12 [13], as have the sequences of cDNA copies of major E1A mRNA species made in Ad2[14] or Adl2- [15] infected cells. The common 5'-terminus of Ad2 E1A mRNA species (see Fig. 1) has been located in the DNA sequence by determination of the sequence of its 5'-terminal, capped T 1-oligonucleotide [ 16]. Comparison of genomic and mRNA sequences, therefore, provides a complete description of the type C adenoviral E1A region: type 2 and type 5 adenoviral DNA are 98% homologous, at least at the left-hand ends of their genomes [11], permitting comparisons among their sequences. The 5'-end of E1A mRNA lies at nucleotide 499 from the left-hand end of the genome [16]. Two cloned cDNAs copied from species E1A-a and E1A-b, shown in Fig. 1, analysed by Perricaudet et al. [14] share sequences from this point to nucleotide 974, including an initiator A U G codon at nucleotide 560. In the smaller m R N A species, E1A-b, this site is spliced to nucleotide 1229, beyond which the mRNA sequence continues to nucleotide 1632, with a termination codon at nucleotide 1543 [11,14]. The larger mRNA species bears this same 3' portion but includes a smaller splice, from nucleotide 1112 to the common 3' splice site at nucleotide 1229. The 5' splice site defining species E1A-c has not been located experimentally, but possible sites lie at nucleotide 584 and nucleotide 638. In all three mRNA species, then, the region between nucleotides 1113 and 1228, which includes A-T-rich sequences, is deleted during processing. Moreover, splicing intro-

duces a 2 nucleotide shift in the reading frame used in the 3'-terminal compared to the 5'-terminal segment preceding the splice point. Splicing occasions the same change in reading frame in each E1A mRNA species, so that all three mRNA species in fact employ the same frames before and after the splice point (see Fig. 1). Thus, the products of translation of, for example, E1A-a and E1A-b mRNA share both their N-terminal and C-terminal segments and differ only in the presence of 46 additional, internal amino acids in the E1A-a products. These sequence data also permit the prediction of the amino acid sequences of the polypeptides specified by E1A-a and E1A-b mRNAs [11,14]. Both polypeptides must be rich in proline and glutamic acid residues and would be expected to exhibit molecular weights of 32000 and 26000, respectively. However, at least four related polypeptides have been assigned to E1A by translation in vitro of RNA selected by hybridization to restriction endonuclease fragments specific to E1A or by immunoprecipitation of labelled cell extracts [17-21]. These polypeptides exhibit apparent molecular weights variously reported as 3800053000 or 34000-43000, clearly larger than the values calculated from sequence information. This discrepancy probably reflects the unusually high proline content of the E1A polypeptides, leading to their aberrant migration in SDS-polyacrylamide gels. The most obvious question, though, is how three m R N A species encode some six polypeptides. Peptide mapping reveals that the four major members of the 34000-54000 dalton family are related to one another and, in fact, can be grouped in two highly related pairs [18,22,23]. Size fractionation of m R N A and selection by hybridization to specific adenoviral DNA fragments followed by in vitro translation has shown that E1A-a mRNA encodes polypeptides of 53 and 44 kDa, whereas E1A-b m R N A directs the synthesis of 47 and 35 kDa proteins [19,24]. The source of the heterogeneity among the products of translation of each mRNA species has not been identified, but generally is believed to reflect some form of post-translational modification. However, the possibility that each m R N A class includes two species with minor difo

17~

ferences has n o t yet been excluded. Interestingly, mutants of Ad5 carrying deletions of sequences to the right of position 3.8 units produce, as expected, truncated E I A m R N A species and polypeptides, but only one shortened polypeptide from each m R N A [24], an observation suggesting that sequences to the right of 3.8 units are the source of the heterogeneity. It should also be noted that at least six highly related 32-58 k D a proteins can be resolved among the E1A products made in the cell [18,25]. The E1A-c m R N A species appears to specify only one polypeptide, exhibiting an apparent molecular weight of 28000 [19,24]. llA-2. Region E1B Three m R N A species complementary to early region El B, whose structures are depicted in Fig. 1, have been defined by electron microscopic heteroduplexing and the nuclease S1 assay [5-8]. Species El B-a, complementary to the region 4 . 6 - ! 1.2 units with a small splice from approximately 9.3-9.7 units, was the form most readily detected by the nuclease S1 assay [5]. This species appears as a continuous transcript in electron microscopic analysis of D N A : R N A heteroduplexes. Under most conditions of infection, however, species E1B-b, comprising sequences from 4.6 to 6.1 units spliced to those complementary to the region 9.9-11.2 units, is the major form of E1B m R N A observed [6,26]. The relative concentration of species E1B-a can be raised by the inclusion of cytosine arabinoside or cycloheximide in the culture medium. Synthesis of E1B-b m R N A apparently begins at a rate lower than that of E1B-a and continues well into the late phase of infection (when little EIB-a m R N A is made) at a stimulated rate [6,9,24]. A third EIB m R N A , species E1B-c in Fig. 2, corresponding closely to the T-terminal segments of the other two E1B m R N A species, that is approx. 9.8-11.2 units with no splices, can be observed in early R N A preparations although its synthesis begins later than that of E1B-a or E1B-b m R N A [6]. This m R N A specifies virion polypeptide IX [9,27]. Synthesis of the structural protein itself does occur in the absence of viral D N A replication [28], but polypeptide IX m R N A is synthesized at increased rates as an infection continues, in parallel to species E1B-b [9,24].

The complete nucleotide sequence of region E1B of Ad5 D N A [11], the left-part of A d l 2 E1B [13] and of cloned c D N A copies of E1B-a and EIB-b [29] and polypeptide IX [27] m R N A of Ad2 have been determined. The sequences originally published indicated a different distribution of nonsense codons, and thus of coding sequences in A d l 2 compared to Ad5 DNA. Such a difference was difficult to reconcile with the striking similarity of the Ad5 and Adl2 E1B-encoded polypeptides [ 18-20,30,31]. Reexamination and correction of the D N A sequences have resolved these apparent discrepancies. The arrangement of the two spliced Ad5 E1B m R N A species is analogous to that of E I A mRNAs: the E1B-a and E1B-b m R N A species share a 3'-terminal segment from nucleotide 3596 to a polyadenylation site at nucleotide 4038. Both m R N A species also include a 5'-terminal sequence from nucleotide 1701 or 1703 [32] to a 5' splice site at nucleotide 2257. In E1B-b m R N A , nucleotide 2256 is joined to a 3' splice site at nucleotide 3600, whereas EIB-a m R N A includes nucleotide 2257 to a second 5' splice site at nucleotide 351 I, sufficient to specify 417 amino acids. Nucleotide 3511 of E1B-a m R N A is ligated to the common 3' splice site at nucleotide 3600. From these points, the m R N A sequence extends continuously to the polyadenylation site near nucleotide 4038 [27]. Two major polypeptides, variously described as 55-65 and 15-19 kDa, have been assigned to type C adenoviral E1B region [17,18,20,30]. These share no methionine-containing peptides [33] although an 18 kDa acidic polypeptide related to the larger EIB protein has been described [20]. Translation in vitro of size-fractionated m R N A has established that the 2.2 kilobase E1B-a m R N A specifies the 55-65 kDa polypeptide, whereas species EIB-b encodes the smaller polypeptide [19,24]. The N-terminal sequence of the 15-19 kDa protein has been determined [34] and can be located within the D N A sequence, initiating at an A U G codon at nucleotide 1714. From this start point, the sequence contains a reading frame open to nucleotide 2242, sufficient to specify a polypeptide of 21 k D a [23]. As the terminating T G A codon precedes the E1B-b m R N A splice site at nucleotide 2257, none of the C-terminal portion added as a result

179

of splicing is actually expressed E1B-b mRNA. The E1B-a m R N A must include this same coding segment (see Fig. 1) and Bos et al. [23] have suggested that it too can be translated into the smaller E1B polypeptide. Arguments in favor of this notion include the observation that the 15-19 kDa polypeptide is synthesized in certain lines of transformed cells in which E1B-b mRNA is not detected [33,35] and the synthesis in vitro of the smaller E1B polypeptide from mRNA that is larger than 2.0 kilobase, as well as the 1.0 kilobase E1B-b m R N A [24]. In addition, however, E1B-a mRNA includes a long open reading frame, nucleotides 2019-3507, sufficient to specify a protein of 55 kDa. This frame is shifted by two nucleotides with respect to that in which the 15-19 kDa polypeptide is encoded (see Fig. 1). Although no sequences of the larger E1B polypeptide are available, there is no other open frame of sufficient size to encode it, nor could splicing create one. It must be concluded, therefore, that the second A U G of the E1B-a mRNA at the start of the 55 kDa polypeptide coding sequence can be recognized and utilized efficiently within the cell, an unusual occurrence [36]. Polypeptide IX is specified by the unique m R N A species E1B-c depicted in Fig. 1, and is encoded by nucleotide 3609 to nucleotide 4028. Translation in vitro of mRNA selected by hybridization to E1B DNA and size-fractionated has also identified an 18 kDa polypeptide specified by an m R N A species slightly larger than EIB-b [24]. Such an RNA molecule, with a shorter intervening sequence, has been observed occasionally among E1B RNA: DNA heteroduplexes [6]. A novel polypeptide of some 22 kDa appears to be specified uniquely by E1B RNA prepared from cells harvested during the late phase of infection: its appearance is concomitant with a decline in the 18 kDa polypeptide mRNA, but no specific mRNA encoding it has been identified [24]. 11A-3. Function of type C adenoviral region 1 It has been apparent for some time that early region 1 of the type C adenoviral genome encodes all functions necessary and sufficient for initiation and maintenance of transformation of semi-per-

missive rodent cells (see Tooze [1]). An understanding of the functions performed by gene products specified by region 1 is, therefore, of considerable interest. The use of mutations carrying lesions within the E1 region has provided most of the information on this subject that we now possess. One class of host-range mutants of Ad5 that has lost the ability to grow in HeLa cells bears lesions within region E1A [37]. Such mutants, epitomized by H5hrl, are defective in transformation of rodent cells and in DNA replication in permissive cells [38,39]. Non-permissive HeLa cells infected by H5hrl also fail to accumulate significant quantities of cytoplasmic RNA sequences complementary to regions E1B, E2, E3 or E4 [40]. Interestingly, the H5hrl mutant specifies truncated products of E1A polypeptides encoded by E1A-a mRNA (see Fig. 1) [24]. Because only the products of the largest E1A mRNA are affected by this presumed nonsense mutation, the mutation must lie in coding sequences unique to this mRNA, that is, between 2.6 and 3.3 units (see Fig. 1). Mutants from whose genomes most of region E1A has been deleted [41] also fail to synthesize substantial quantities of other early mRNA species or polypeptides. However, mutants, such as H5d1311, in which only the C-terminal portion of E1A, to the right of 3.8 units is missing, exhibit a normal phenotype in this respect [24,42]. Thus, it can be concluded that the products of E1A-a, either alone or in combination with other E1A polypeptides, mediate regulation of early gene expression and that this function is encoded by sequences between 2.6 and 3.8 map units. The fact that an E1A product(s) is required for accumulation of mRNA sequences complementary to the other classical early regions of course raises the possibility that E1A proteins in fact have no direct role in either replication or transformation; rather, they would simply be required for expression of the early proteins that actually perform these functions. It is, however, well established that DNA fragments comprising the E1A region, whose products are expressed in all transformed cell lines examined [1,17,43,44], are sufficient to induce abortive or partial transformation of rodent cells [30], implicating one, or several, E1A-encoded polypeptide in initiation of transformation.

180

Moreover, the mutant H5d1311, which is deleted only in sequences between 3.8 and 4.2 units, is defective in replication [41]; this mutation has essentially no effect on the expression of RNA sequences directly complementary to the other early regions (see previous paragraph), implicating sequences encoded to the right of 3.8 units of EIA in replication. The phenotypes of host-range and deletion mutants of Ad5 carrying lesions in E1B [37,38,41,42,45] reveal that E1B products also participate in both replication of viral DNA and transformation. Little is known of the nature of the replication function performed by an E1B product. Comparison of the transforming capabilities of adenoviral DNA fragments that comprise subsections of E1 [30,46] suggests that, whereas E1A can immortalize semi-permissive rodent cells, E1B product(s) are required for full expression of the phenotype typical of transformed cells. Little is known of the molecular properties of proteins specified by E1A or E1B, although a protein kinase has been reported recently to be present in infected or transformed cell extracts immunoprecipitated by sera that recognize only E1A and E1B proteins [47,48]. This activity does not seem to represent non-specifically trapped cellular enzyme and phosphorylates both the IgG heavy chain and a 55-65 kDa polypeptide [48], the major antigenic determinant of most adenoviral tumor sera (see Tooze [1]). A similar result has been obtained in analogous experiments with Ad 12 [48,49].

llA-4. Region E1 of other serotypes It is apparent from the preceding discussion that regions E1A and E1B of the type C adenoviruses have received a great deal of attention, reflecting the participation of their products in transformation. Given the differing tumorigenic potential of the three major groups of adenovirus serotypes (see Tooze [1]), it has also been of considerable interest to compare the E1 regions of the type C (non-tumorigenic) serotypes with those of group A (highly oncogenic) and group B (weakly oncogenic). Mapping of cytoplasmic RNA sequences originally suggested that the overall organization of the

genomes of Adl2 or Ad7, members of group A and B, respectively, is very similar to that of the group C serotypes, 2 and 5 [50-52]. Moreover, transformation of rodent cells with defined subgenomic fragments of Adl2 or Ad7 DNA gave resuits identical to those obtained with Ad2 or Ad5 DNA: fragments comprising the left-hand 4.54.7% of the Adl2 or Ad7 genome can transform such cells to display an incompletely transformed phenotype, whereas fragments comprising the region 0-8 units, or larger segments, induce 'complete' transformation [53-57]. More detailed experiments have, however, revealed some interesting differences in the structure and expression of E1 among the different serotypes. Nuclease S 1-mapping of Ad 12 early mRNA has identified four species complementary to region E1A, generated by the combination of two alternative initiation sites with two alternative 5' splice sites [35,58], that is, the four species comprise two pairs whose corresponding members are identical except at their 5'-termini. Species with 5'-termini at nucleotide 445 from the left-hand end of the genome appear to be the major forms made under most conditions of infection [35] and are analogous to the E1A-a and E1A-b species of group C serotypes. It should be noted that minor forms of the type C adenoviral E1A mRNAs with 5' ends further upstream than those shown in Fig. 1 have also been described [6]. The sequence of the transforming region of Adl2 DNA has been determined [13], as have sequences of cloned cDNA copies of Adl2 E1A mRNA [15], permitting detailed comparisons of the structure of this region among the serotypes [11,13]. The 5'-terminus common to the major Adl2 E1A mRNA species lies at nucleotide 445 and is preceded by a TATA box sequence (see Section VII). 5' splice sites lie at nucleotide 974 and nucleotide 1069 and one 3' splice site is located at nucleotide 1144. Both major mRNA species are open for translation before and after the splice point to generate proteins that differ only in internal sequences. The major organizational difference among the EIA regions of Adl2 and Ad2 or 5 is the location as the first 5' splice site, shifted 45 nucleotides to the right in the former compared to the latter sequences. Comparison of the Ad5, Ad7

181

[12] and Adl2 DNA sequences and of the amino acid sequences predicted for the polypeptides encoded by the major mRNA species reveals an overall homology of 31-35% for the DNA sequence, 50-55% homology between any two strains,as well as more highly conserved features. These include: the sequences surrounding the 5'termini of EIA mRNAs, including the RNA start site itself and TATA box sequences (see Section VII); a region of homology extending over some 200 nucleotides and starting approximately 100 nucleotides beyond the initiating A U G codon. This region is more highly conserved than that lying between the 3' splice site and the polyadenylation site; three regions of high conservation in the amino acid sequence, indicative of a common ancestor for these viruses. Apart from differences in the primary nucleotide sequence, these comparisons also reveal a deletion within the C-terminal sequence and an insertion and deletion in homologous regions of Ad7 and Adl2 compared to Ad5. The closer relationship of Adl2 and Ad7 both in this respect and in terms of the location of the promoter proximal 5' splice site suggests that these two serotypes shared an ancestor from which the type C serotypes did not evolve [15]. Less information is currently available about the type A or B E1B mRNA species. Species corresponding to polypeptide IX mRNA and E1Bb of Ad2 or Ad5 have been described by Sawada and Fujinaga [35], but could be detected only once Ad 12-infected cells had entered the late phase. The A d l 2 species corresponding to type C E1B-a (see Fig. 1) has been observed occasionally in small quantities in productively infected cells, again at intermediate times after infection, but appears to be made in large amounts in Adl2-transformed cells [35,59]. Similarly, no cDNA clones representing copies of Adl2 E1B mRNAs could be detected among some 10,000 produced from early mRNA and screened by Perricaudet et al. [15], a procedure that yielded several E1A cDNA clones. Any biological significance of this apparent difference in expression of Adl2, compared to Ad2 or 5, E1B sequences remains to be established. 11.4-5. Region E2A The coding sequences of early region E2A oc-

72 k c, Lote

~# ~ ~ "" ......

6~1

613

65

67

619

- ....

71

- - - - --

713

-

b, Early

75

Fig. 2. Messenger RNA species complementary to early region E2A in the Ad2 genome. The mRNA species complementary to region E2A of the Ad2 genome are represented as in Fig. I,

except that the polypeptide specified by this region is shown above the mRNA species.

cupy the segment 61.7-66.6 units in the l-strand of Ad2 DNA (see Refs. 1, 5-7) As illustrated in Fig. 2, this mRNA body can be attached to various 5' leader sequences. The major version of E2A mRNA found during the early phase of infection has its 5' end near 75.2 units and includes a short set of sequences from this site. In the mature mRNA, this set of 5' sequences is linked to a second, short set of leader sequences, some 170 nucleotides in total, lying near 68.4 units and finally to the 5' end of the mRNA body at 66.6 units, species E2-a in Fig. 2. A second type of E2A mRNA molecule, more readily distinguished by the nuclease S1 assay [5], but also seen in the electron microscope [6], is identical to species E2-a, except that it carries a small splice, (some 50 nucleotides) just within the mRNA body near 66.3 units, species E2A-b in Fig. 2. The relative amounts of species E2A-a and E2A-b made during the early-phase productive infection have not been determined accurately. As an infection continues, these forms decrease in concentration and are replaced by a new form of E2A mRNA, E2A-c in Fig. 2, whose 5' end and first leader sequences lie near 72.0 rather than 75.2 units; no other differences between the early and late forms of E2A m R N A have been detected [6]. Minor variants of E2A mRNA with alternative 5' ends or splice patterns have been observed [6,60], but are extremely rare and probably of little significance to the adenovirus life cycle. Transcription of E2A initiates later during the early phase than that of

182

other early regions (see Section IV) and accumulation of E2A mRNA continues well into the late phase [6,61]. The nucleotide sequence of EcoRI fragment F of Ad2, which includes the early and late E2 promoter sites, has been determined [62] as have the 5'-terminal sequences of E2 mRNA species made at early and late times after infection [33,63]. The early E2 leader segment is located between 75.1 and 74.8 units and ends at a 5' splice site that is a good approximation to the eukaryotic consensus sequence (see Section III). The 5' end of the late E2 first leader segment lies near 71.5 map units, nucleotide 320 in the sequence in Ref. 62. Approximately 80 nucleotides downstream lies the sequence A A G G T A C C G which is very similar to the eukaryotic consensus 5' splice site, shown in Fig. 7. A second related sequence, TAGGT'FGCA, is a poorer match to this consensus sequence and also appears to be too far from the mRNA start site, some 130 nucleotides, to agree with estimates of the length of the leader of less than 100 nucleotides [6]. Only one polypeptide, the 72000 dalton DNAbinding protein, (DBP) a phosphoprotein [65-68], has ever been assigned to early region 2 [17]. This protein in fact binds to both single-stranded [64] and the ends of double-stranded [69] DNA and is required for elongation during adenoviral DNA synthesis [70,71]. Although analysis in two-dimensional gel systems reveals considerable heterogeneity in DBP purified from infected cells, indicative of variable post-translational modification, all forms examined bind to DNA with equal efficiency [72]. In addition to its functions in viral DNA synthesis, the 72000 dalton protein plays a role in regulation of adenoviral early gene expression (see Section IV). Mutations that relieve the block to efficient growth of human adenoviruses in monkey cells (see Ref. 1) also lie within the 72000 dalton gene [73], suggesting a role for this protein in some step required for efficient synthesis of normal fibre mRNA, severely depressed in adenovirus-infected monkey cells [74,75]. IIA-6. Region E3 In terms of the number of mRNA species complementary to it, E3 is by far the most complex

early region in the adenoviral genome [5-7]. As illustrated in Fig. 3, four major E3 mRNA species and at least four minor ones have been described: species E3-a, d, f and h are the most abundant forms detected by electron microscopic examination of D N A : R N A heteroduplexes [6]. Inspection of Fig. 3 reveals that all E3 mRNA species share 5'-terminal sequences complementary to the region 76.6-77.6 units which are spliced to sequences near 78.6 units. Beyond this point, the various E3 mRNA species differ in both their splicing patterns and their 3'-terminal sequences: this set of m R N A species is synthesized using one of three poly(A)-addition sites, one possible additional 5' splice site and one of five possible 3' splice sites, two of which appear to correspond to poly(A)-addition sites (see Fig. 3). It would therefore seem that when a more distal poly(A)-addition site is recognized, 5' splice site sequences can be joined pre lgk L b

~

I pre lgk

I

I pre lgk

L

I

14k

76

717

78

7'9

0~0

81

82

8'3

64

~5

S6

Fig. 3. Messenger R N A species complementary to early region E3 in the Ad2 genome. The m R N A species complementary to region E3 are depicted as described in the legend to Fig. 2 and designated according to the nomenclature of Chow et al. [6]. At the bottom of the figure, the frames open for translation of r-strand sequences are indicated [76] by the horizontal boxes. The frame which encodes the 19 kDa (19k) glycoprotein precursor is shown by open boxes and frames 2 and 3 relative to this frame are shown as crosshatched and stippled, respectively. A vertical line within a box represents the site of a termination codon. A 14.5 kDa polypeptide has also been assigned to E3 (see text), but not yet to a specific m R N A .

183 to 3' splice sites near more proximal poly(A)-addition sites. The use of three different poly(A)-addition sites in processing of E3 RNA is unique among the families of early mRNA species and in this sense the organization of E3 mRNA species is more like that of the late transcriptional unit described in Subsection VA. It is also apparent from the mRNA structures depicted in Fig. 3 that although the E3 mRNA species share 5'-terminal sequences each bears a unique combination of C-terminal sequences, an arrangement that suggests that E3 has the potential to encode multiple, fairly distantly related products. The quantitative analyses of Chow et al. [6] suggest that no one E3 mRNA species is the major form made under any conditions of infection. Nor could much change in the relative concentrations of the individual E3 mRNA species be induced by perturbation of infections in the presence of drugs, or be observed during the course of an infection [6]. On the other hand, species E3-a was observed as the major form in early RNA preparations from cells infected in the presence of cytosine arabinoside [5,8]. Although the reasons for these differences are not yet known, the detailed and exhaustive quantitative analysis of Chow and her colleagues leaves' little doubt that E3 mRNA species are less subject to temporal regulation than those complementary to regions E1 or E2. The nucleotide sequence of EcoRI fragment D, 75.9-83.4 units, has been determined [76] as has that of the 5'-terminal, capped Tl-oligonucleotides of early E3 mRNA selected by hybridization to the fragment 75.9-78.15 units [32]. Comparison of these sequences locates the 5'-termini common to all E3 mRNA species (see Section VII) at nucleotides 237 and 239 from the left-hand end of EcoRI fragment D, that is, at 76.55 map units, in very good agreement with the mapping studies described in preceding paragraphs. The first initiating A U G codon encountered downstream from this start site lies at nucleotide 527, suggesting that E3 mRNAs bear a long 5' untranslated region of at least 290 nucleotides. Some 80 nucleotides beyond this A U G codon at nucleotide 608 lies the sequence CGG T G A G , the only potential 5' splice site for the first intervening sequence in this region [76]. Thus, the first segment of E3 mRNA species occupies

372 nucleotides, an estimate in good agreement with the electron microscope analyses summarized in Fig. 3. This segment is open for translation from the A U G codon at nucleotide 527 to the putative splice point and could encode 27 amino acids which would be common to all E3 products. As illustrated in Fig. 3, the r-strand of EcoRI fragment D contains blocks of open, translatable sequence in all three reading frames, all relatively short [76]. Insufficient E3 mRNA or polypeptide sequences are available to permit either assignment of many of these stretches of coding sequence to discrete polypeptide products or deduction of the precise location of all splice points. Polypeptides of 13, 14, 15.5-16 and 19-21 kDa have been assigned to early region E3 [17,23,77,78]. The largest polypeptides, variously described as 19 and 21 kDa or 17.5 and 19 kDa, are glycoproteins [79,80] and are associated with membrane fractions of infected cells [81,82]. Tryptic peptide analysis reveals that the 16 kDa polypeptide is related to the E3 glycoprotein, 19 kDa [83]. When E3 mRNA is translated in vitro in the presence of a microsomal fraction, the 19 kDa glycoprotein is indeed synthesized [83], confirming that the 16 kDa polypeptide is its unglycosylated precursor. The 16 kDa polypeptide is in fact some 1.5 kDa larger than the unglycosylated form of the 19 kDa protein synthesized in vivo in the presence of the inhibitor of glycosylation, tunicamycin [83]. This observation suggests that the 16 kDa polypeptide carries a hydrophobic signal sequence, as might be expected from the membrane association of the E3 glycoprotein in the cell [82,84]. Translation and examination of the structure of size-fractionated E3 mRNA species have shown that the 16 kDa glycoprotein precursor is specified by E3-a mRNA and less frequently by species E3-b and E3-c [83]; these three species possess identical 5'-terminal but distinct C - t e r m i n a l portions (see Fig. 3). Determination of the N-terminal sequence of the 16 kDa polypeptide [83] has permitted alignment of its coding sequence with the DNA sequence of H~riss~ et al. [76]. It is clear that the 16 kDa polypeptide coding sequence begins at nucleotide 1440 in the latter sequence and continues to nucleotide 1917, to specify a polypeptide of 159 amino acids, a theoretical molecular weight of

184 18439. Thus, it must be that the A U G present at nucleotide 527 in the 5' leader segment of all E3 mRNAs is not recognized during translation of the E3-a, b or c mRNA species, presenting an unusual example of failure of eukaryotic ribosomes to initiate translation at the first A U G codon encountered beyond the 5' end of an mRNA (see Ref. 36 for a review). It is also apparent that these E3 mRNA species must bear an extremely long 5' leader: although the precise 3' splice point of the first E3 intervening sequence has not been located experimentally, it must lie in the location of nucleotides 1000-1005 (see H6riss6 et al. [76]). Thus, both the 372 nucleotides of the leader segment and the segment of some 440 base pairs from the 3' splice point to the 16 kDa polypeptide A U G codon at nucleotide 1440 contribute to the 5' untranslated region of those E3 mRNA species that encode the glycoprotein precursor. Finally, it should be noted that the 16 kDa sequence defined as described above does include a presumptive signal sequence, 18 amino acids in length and rich in hydrophobic residues. The 14 and 14.5 kDa polypeptides also specified by region E3 are related to one another, but not to the glycoprotein [83]. The 14 kDa polypeptide appears to be translated from species E3-h, but its coding sequences have not been located within the DNA sequence. This polypeptide has also been purified [85], but little is known of its function. Indeed, the whole question of function of the E3 polypeptides remains mysterious: it has been appreciated for some time that region E3 is not essential for growth of Ad2 in tissue culture. Deletion of parts of the E3 sequence in the Ad2SV40 hybrid viruses, in the most extreme case such that essentially no E3 sequences are expressed in cytoplasmic RNA [86] or by in vitro manipulation [87], has little effect on the growth properties of the resulting viruses (see Tooze [1]). Whether E3 products perform some role(s) that aid(s), but is not essential to, productive infection in culture or whether they are important in natural infections remains to be established. IIA-7. Region E4 Early region E4, encompassing sequences from about 91.3-99.0 units in the 1-strand also encodes

llk,

17k,

19k,

21k,

24k~

e

91

92

g3

94

95

96

97

98

gg

100

Fig. 4. MessengerRNA species complementaryto early region E4 in the Ad2 genome. The mRNA species are depicted and designated as described in the legend to Fig. 3, except that the polypeptides assigned to E4 (see text), which have not yet been assigned to specificmRNA species, are simply listed at the top of the figure. a large set of mRNA species, whose structures are summarized in Fig. 4. All mRNA species have in common 5'-termini that lie near 99.2 units. In all but one mRNA, species E4-f, a small segment, less than 50 nucleotides [5], from this region is retained to form a 5'-terminal leader segment. All E4 m R N A specie s also share a 3'-terminus at 91.3 units, but the six species listed in Fig. 4 vary in the number and locations of splice points. The 5' proximal leader segment may be spliced to any one of five 3' splice sites (the sixth mRNA species is colinear with the DNA sequence in this region) and four of the six species bear a second, large, internal deletion (see Fig. 4). The major mRNA species complementary to region E4 detected both by electron microscopy [6,7] and biochemical assays [5] is E4-a. In the absence of any drugs, a reasonable proportion of E4 mRNA is also observed in species E4-e, but the remaining four species appear to be very minor components of the total E4 mRNA population: the concentration of such rare species is somewhat enhanced by the addition of cytosine arabinoside or cycloheximide [6]. Under most conditions of infection, E4 mRNA species are less abundant than those complementary to regions El, E2A, EIB or E3; consequently, less detailed information

185 the second possible 5' splice site at nucleotide 485. Polypeptides exhibiting apparent molecular weights of 11000, 17000, 19000, 21000 and 24000 have been assigned to region E4 [17,23,78]. These polypeptides have not yet been assigned to individual m R N A species, nor is anything known of the functions of E4 products.

about the relative concentrations of individual species is available. The E4 m R N A species comprise the set whose synthesis is most affected by the inclusion of cycloheximide after infection, when their concentration is increased considerably [6,77,88] (see Section IV). The nucleotide sequences of the right-hand end of Ad2 and Ad5 DNA, including the sites of initiation of transcription of region E4, have been determined [89,90], as have the sequences of 5'terminal, capped T1 oligonucleotides released from E4 R N A [16,32]. These studies have located 5'termini of E4 R N A (see Section VII) within the region 325-330 nucleotides from the right-hand end of the adenoviral genome. Potential 5' splice sites lie at nucleotide 379 and nucleotide 485, within Sinai fragment K. Although the latter more closely resembles the eukaryotic splice junction consensus sequence (see section III), its distance from the R N A start site, some 160 nucleotides, would seem to be greater than suggested by the R N A mapping studies described in previous paragraphs. Sequencing of E4 m R N A , or c D N A copies, will be required to determine which of the two potential splice sites is actually used. This is Of more than academic interest for the sequence from nucleotide 325 to the first potential 5' splice site encountered contains no A U G codons, whereas two occur between this point, nucleotide 379 and

13.5k t- . . . . . . . .

liB. Expression of additional genes during the early phase of infection The mapping of Ad5 group N early mutations to the region 18-22 units in the adenoviral genome [91,92], well outside any of the classical early regions described in Subsection IIA, has spurred a detailed and more sensitive reexamination of expression of adenoviral genetic information when viral D N A synthesis is precluded in infected cells. Several additional genes that are expressed under such conditions have now been identified.

lIB-1. Region E2B Careful hybridization analysis of early cytoplasmic R N A with relatively small D N A probes that span the region 11-36 units in the type 2 adenoviral genome revealed the presence of sequences complementary to the 1-strand within this region [91]. These m R N A sequences are present at extremely low concentrations, 20-60 copies/cell

13.6k

IN

52/55k

L1

i 11

13

15

17

19

21

.

.

.

.

.

.

23 .

25 .

27

2g

31

.

.

.

33

35

37

3 9 61

63

65

'7",: k .

/1'

67 .

6g .

71 .

73

.

.

-= -= ,,,,, ~~

a2

......

~

-

75

77

. - E 2Aa -E2Ab E2Ac

-

-

-E2Ba -E2Bb

-

-

-E2Bc

_8_7k_J_QSk . . . . . . . . .

Fig. 5. Early mRNA species complementary to the region 10-14 map units in the Ad2 genome. The type 2 adenoviral genome is represented by the horizontal line divided into map units in the centre of the figure. Note that the region between 39 and 61 units is omitted. The mRNA species are represented by horizontal arrows, drawn in the direction of transcription, above or below appropriate regions of the genome. Sequences deleted during splicing are shown here as gaps. Polypeptides that have been assigned to specific mRNAs are given immediately above the mRNA. The limits of sequences encoding polypeptides not yet assigned to an mRNA species are given by the dashed lines. The sources of these data are given in the text.

186

whether the infection be held in the early phase by cytosine arabinoside, cycloheximide, anisomycin or the early mutation H5ts125 [91,93]. By contrast, sequences complementary to regions E l - E 4 are present at 300 to over 1000 copies/cell [62,93]. Electron microscopic examination of heteroduplexes formed between early cytoplasmic RNA and Ad2 DNA sequences covering this region has identified the mRNA species whose structures are depicted in Fig. 5 [94]. The major species comprises a body mapping from 11.3-26 units in the l-strand, one upstream leader of about 200 nucleotides encoded near 39 units, a second leader encoded at 68.3-68.5 units and a 5'-terminal leader located near 75.0 units (see Fig. 5). The latter two leader segments are indistinguishable from those found attached to early forms of E2A mRNA (see Fig. 2). This observation suggests that these more recently discovered l-strand sequences are also expressed from the E2 promoter and this region has, therefore, been termed E2B [94]. A second form of mRNA, E2B-b in Fig. 5, is identical to the first except that its body extends from 11.3 to 30.3 units. A rarer form, E2B-c, has a body spanning the region 10.823.1 units [94]. The concentration of E2B mRNA sequences, measured both by hybridization and in vitro translation of E2B mRNA is little affected by the inclusion of drugs in the culture medium or the H5ts125 mutation [93]. Translation in vitro of E2B mRNA purified by hybridization selection has revealed that E2B encodes the 87 kDa precursor to the 55 kDa terminal protein [93,94], and possibly additional polypeptides of t05 and approx. 75 kDa. The group N adenoviral mutants fail to initiate viral DNA synthesis under non-permissive conditions of infection [95], implicating one of these proteins in that process. In light of the likely role of the terminal protein in initiation [96,97], it seems more than probable that these mutations lie in the sequence specifying the 87 kDa terminal protein precursor, the polypeptide that must actually function in the cell. This point, however, remains to be proven experimentally. The relationship of the 87 kDa terminal protein precursor to the 105 and approx. 75 kDa polypeptides has not been elucidated, nor are the functions of these

polypeptides known. However, partial peptidemapping indicates that the approx. 75 kDa polypeptide, synthesis of whose mRNA is considerably enhanced by anisomycin, iS in fact the 72 kDa DBP, the product of E2A [93]. This observation suggests that uncommon or aberrant processing of E2 transcripts, that is the use of the E2A-specific splice point at 66.6 units in combination with the E2B poly(A)-addition site at 11.3 units (see Fig. 5) can be enhanced in the presence of anisomycin, it may be significant that high concentrations of anisomycin also induce the appearance of a novel E1A mRNA, specifying a 25 kDa polypeptide and apparently not spliced to remove termination codons near 3.2 units [98]. IIB-2. The I V a 2 gene

The same saturation hybridization experiments that originally identified the early E2B mRNA sequences also demonstrated that the l-strand between 11.3 and 16 units is expressed during the early phase of infection when the major late genes are not [91]. This region encodes the minor virion protein IVa 2 [99,100] and is expressed during the late phase of infection as one spliced mRNA species [101] whose structure is illustrated in Fig. 5. It seems clear from electron microscopic analysis of D N A : R N A duplexes that IVa 2 sequences comprise the 3'-terminal segment of the E2B mRNA species described in the previous section [94]. Thus, the IVa 2 and E2B mRNAs appear to share a poly(A)-addition site, located near 11.3 units in the 1-strand of Ad2 DNA (Fig. 5). On the other hand, IVa 2 mRNA prepared during the late phase of infection carries a capped 5'-terminus which has been located near 15.9 units [32]. These observations might be interpreted to suggest that IVa 2 sequences present in the cytoplasm prior to the onset of viral DNA replication cannot be translated because they are present only as 3'-terminal segments of E2B mRNA. If this inference be correct, then the early to late transition must include activation of the IVa 2 promoter site defined by Baker and Ziff [32] to permit synthesis of the late form of IVa 2 mRNA. Alternatively, it might be argued that the 5' end of the IVa 2 and 3' end of E2B mRNA lie so close together that these species appear colinear in the electron microscope. This

187 possibility implies an activation of splicing as an infection enters the late phase, for the sequences complementary to the IVa 2 gene observed in E2B m R N A by Stillman et al. [94] do not include the splice typical of the late form of IVa 2 mRNA (see Fig. 5). Clearly, biochemical analysis of the size and structure of IVa 2 RNA sequences made at different times during productive infection should distinguish between these possibilities. 11B-3. Classical late genes

It is now firmly established that at least one classical late gene (see Section V) is expressed as functional m R N A during the early phase of adenovirus infection. Electron microscopic heteroduplex analysis of early RNA hybridized to Ad2 DNA has identified a species mapping to the r-strand between 30.5-39.0 units, t h e late LI region ([6,60] see Figs. 5 and 8). As illustrated in Fig. 5, all such molecules carry the three 5'-terminal segments that comprise the late tripartite leader (see Section V), but the majority also include a fourth leader segment, 22.0-23.2 units, the i leader [6,60]. Hybridization of Ad2-infected cell nuclear and cytoplasmic RNA, pulse-labelled at different times after infection, to restriction endonuclease fragments of Ad2 DNA [102,103] confirms that transcription from the late promoter near 16.45 units (see Sections V and VII) can be detected as early as 1 h after infection. Similarly, cytoplasmic RNA complementary to L1 is made from the beginning of infection, in parallel with classical early mRNA sequences, such as those complementary to El. The L1 m R N A appears to be present at relatively low concentrations during the early phase of infection, but accumulates as infection progresses [98]. This mRNA species encodes a 52-55 kDa polypeptide [98,104], of unknown function. ItB-4. Additional genes in the region 17-30 units

Translation in vitro of early RNA sequences selected by hybridization to restriction endonuclease fragments of Ad2 DNA that span the region 17-30 units has identified early mRNA species encoding 13.5 and 13.6 kDa polypeptides [98]. The 13.5 kDa m R N A is selected by a 17.021.5 unit fragment and is complementary to the

r-strand of the genome [22,98], but its precise structure is not known. The mRNA for the 13.6 kDa polypeptide includes sequences from 21.522.0 and 31.5-37.3 units [98]. It has been suggested that the 13.6 kDa polypeptide is in fact the product of translation of those L1 mRNA species described in the previous section that carry the i leader: forms that carry the conventional tripartite leader, which contains no initiating A U G codon (see Subsection VC), would specify the 52-55 kDa polypeptide. This arrangement remains to be tested experimentally. Finally, Lewis and colleagues [22] have described two polypeptides of 16.5 and 17.0 kDa, whose mRNAs are selected from early RNA preparations by DNA fragments lying between 11.6 and 17.0 units in the adenoviral genome. At the present time, no further information about the structure of these mRNA species or functions of their products is available. The summary of these recently discovered mRNA species presented in Fig. 5 emphasises that the adenoviral genome between 17 and 30 units, previously believed to be rather quiescent in terms of its expression, is fully packed with genes expressed during the early phase of infection. Further work is now required to elucidate the arrangement of these coding sequences and the roles of their products in infected cells.

III. Synthesis of adenoviral early mRNA species As mentioned previously, independent promoter sites for E1A, E1B, E2, E3 and E4 have been identified functionally by ultraviolet transcriptional mapping [105,106], or hybridization analysis of short, nascent, labelled RNA molecules [107109]. Except in the case of E2, whose 5' end lies near 75 units (see Subsection IIA-5), early promoters are located close to the 5' end of the corresponding mRNA sequences. Determination of the 5'-terminal sequences of adenoviral early mRNAs and of the surrounding DNA sequences [16,32] has permitted exact location of the sites at which transcription of each early pre-mRNA is initiated: the features of these sites are discussed in Section VII. Analysis of the size of pulse-labelled, early RNA originally revealed the presence of nuclear tran-

188 scripts, complementary to each of the five classical early regions, larger than their cytoplasmic counterparts [110-112]. Not surprisingly, unspliced R N A molecules complementary to each of these early regions have also been detected in nuclear R N A preparations, as have potential processing intermediates [7,40]. Thus, transcription must initiate at the cap site at the 5' end of each gene and continue across both coding and intervening sequences to some point at, or beyond, its 3' end. The sites of transcription termination in the adenoviral genome have not been well defined: it remains possible that discrete, absolute termination sites do not exist, analogous to the situation observed with papovaviruses (see Tooze [1]). Discrete 3' ends are, however, generated by polyadenylation, an event that must usually precede splicing because unspliced, poly(A)-containing early R N A molecules are detected readily in infected cell nuclei, whereas spliced, non-polyadenylated molecules are not [113-115]. The most detailed studies of processing of adenoviral early R N A yet performed have concentrated on region E2, for the primary product of transcription of this gene is considerably larger than the mature m R N A fashioned from it (see Fig. 2). Pulse-labelled, early, nuclear R N A contains a polyadenylated 28 S species which includes all sequences from the 3' end of E2 m R N A at 63.6 units to its 5' end at 75.2 units [113,114]. Species of nuclear R N A of 23S and 20 S complementary to E2 are also present in early R N A labelled in vivo. Both species lack sequences of the large E2 intervening sequence 75.1-68.6 units (see Fig. 6). The 23 S species retains sequences of the smaller intervening sequence whereas the 20 S R N A does not and, in fact, is identical to mature E2 m R N A [114]. These observations indicate that the E2 28 S precursor R N A is spliced sequentially in a 5 ' - 3 ' direction (see Fig. 6). More formal proof of a precursorproduct relationship between these R N A species comes from continuous labelling experiments: the accumulation of labelled 28 and 23 S nuclear E2 R N A ceases after 30 min of labelling, whereas the concentration of labelled E2 20 S R N A increases over a 3 h period. Conversion of prelabelled 28 S R N A to E2 20 S R N A that lacks intervening sequences has also been demonstrated in isolated

A

E 2A. 61 I

63 I

65 i

67 i

69 L

, 71 i

73 i

75 i 28S

,,,,

.~-

~

B.

EIA

E1Ao

I

....

- - - ~ - - -

23S

{



"--~--

L

E lab

----

20S

5

E 1A¢

Fig. 6. Processing of EIA and E2A RNA sequences. The regions of the Ad2 genome spanning E2A and E1A are represented by the solid horizontal lines divided into map units at the top of each part of the figure. Immediatelybelow is shown the polyadenylatedprimary transcript drawn in the direction of transcription and with the arrowhead indicating the poly(A) sequence. The spliced species fashioned from each transcript are represented as in Figs. 1-4.

nuclei [113,116]. In one such system, a 1.5-2.0 k D a R N A species complementary to the region 70.7-75.9 units could also be detected [113]. This presumably contains sequences of the large E2 intervening sequence (see Fig. 6) and its existence suggests that removal of this intervening sequence is achieved by one, or very few, splicing steps, rather than many small ones. Although other putative splicing intermediates have been described (see, for example, Refs. 6 and 7), we know little of the processing pathways by which other adenoviral early m R N A species are matured. As these other regions encode multiple, overlapping m R N A species it is of particular interest to learn what mechanism(s) govern the choice among several potential splice sites and in some instances, for example processing of transcripts complementary to regions E3 or E4, the number of splice sites employed during processing of a given precursor molecule. Not only do these choices determine which members of a set of related, mature, m R N A species are made at any one time, but it is also apparent that the recognition of splice site within an m R N A precursor can change as infection continues to mediate the temporal

189 r e g u l a t i o n of i n d i v i d u a l m e m b e r s of early m R N A families (see for example, Subsections IIA-1 a n d IIA-2). Little is k n o w n of these processes at the p r e s e n t time. I n the case of E 1 A transcripts, however, it does seem to be established that the s m a l l e r m R N A species are not necessarily genera t e d from the larger ones. This conclusion is b a s e d on the o b s e r v a t i o n that m u t a t i o n s at the E 1 A p r o m o t e r - d i s t a l 5'-splice p o i n t which abolish the synthesis of E 1 A - b m R N A have no effect on the synthesis of E l - c m R N A [117]. It might be inferred, therefore, that, at least in the case of an E 1 A p o l y a d e n y l a t e d transcript, i n t r o d u c t i o n of one splice, chosen a m o n g three p o t e n t i a l sites (see Fig. 1), is sufficient to exclude recognition of any r e m a i n i n g p o t e n t i a l splice j u n c t i o n sequences. Thus, as illustrated in Fig. 6B, synthesis of the three E l A m R N A species w o u l d occur i n d e p e n dently. O t h e r p o t e n t i a l splice sites might be o b s c u r e d b y i n t r o d u c t i o n of sufficient c o n f o r m a t i o n a l c h a n g e in the R N A ( a n d its associated proteins) w h e n the first splice is c o m p l e t e d to r e n d e r a d d i tional sites inaccessible to the splicing machinery. This n o t i o n is certainly consistent with the suggestion that the c o n f o r m a t i o n of an intervening sequence can influence quite p r o f o u n d l y the frequencies with which splicing reactions are perf o r m e d [llS,119]. O n the other hand, it m u s t be n o t e d that r e m o v a l of intervening sequences in e u k a r y o t i c m R N A p r e c u r s o r s can p r o c e e d in discrete steps: d u r i n g m a t u r a t i o n of a-2 collagen m R N A , for example, the first step in r e m o v a l of o n e of the m a n y intervening sequences j o i n s the 3' splice site to a site within the intervening sequence t h a t resembles the 5' splice site consensus sequence [ 120]. This ligation recreates a sequence r e s e m b l i n g the 3' splice site consensus sequence which, in the s e c o n d step, is j o i n e d to the 5' splice site at the b o u n d a r y b e t w e e n m R N A - s p e c i f i c a n d intervening sequence-specific residues. T h e e x p e r i m e n t s o f Solnick [ 117] d o not c o m p l e t e l y rule out such a m e c h a n i s m for m a t u r a t i o n of E 1 A transcripts, b u t d o show that E1A-c m R N A can be m a d e b y a p a t h w a y that does not include an i n t e r m e d i a t e _with the structure of E 1 A - b m R N A . It might also be argued that a stepwise m e c h a n i s m is no longer a p p r o p r i a t e for sets of o v e r l a p p i n g viral m R N A

species where the p r o d u c t of each splicing event specifies a p o l y p e p t i d e . Interestingly, this argum e n t might be reversed to suggest that this c o m -

E~nri) AdCEIA

AdI2EIA

AdCEIB

Ad22A

Ad2E3

5'

3'

a

ACAGUAAGU

b

AGGGUGAGG

*c

GAGGUACUG

a

ACAGUAAGU

b

AUUGUAAGU

a

GAGGUA

b

CAGGUGGCU

H

UUUI~TAAAAGGUCC

UCD~TGL%~JUAGC~CC

CUGUUDvJGCAGCAGG

a,b

GGGGUAGGG

ND

*c

AAGGUACCG

ND

*all

CCGGUGAGU

N~

A AGGUAAGU C

DYUYYYI~CAGG G U

La_te ii >]2

GGGGUCAGU

UUUUUCC~CAGCUCG

] 2-~13

CCGGUAAGA

UGUUGU¢,UUAGGUAC

y÷IV

GAGGUGACC

UAUUGUUGCAGAUCA

13-+y

AA ~,GUz~GC

A U U U A C ~ C A G ULRtC

13 >II

GUCCCCGCCAGAGGA

i3+pVI

UAUL~JL~JG UAGAAU G

] 3 +52/55k

GCUUUUCCCAGAUGC

13 >IV

UAIWGUUGCAGAUGA

Fig. 7. Splice junction sequences in adenoviral RNA. The sequences across 5' and 3' splice junctions in the adenoviral RNA splices listed in column 1 are shown in columns 2 and 3, respectively, arranged according to the GU...AG rule (see text). Those sequences marked with an asterisk have not been confirmed by sequencing of the appropriate mRNA species. In the centre of the figure are shown the eukaryotic consensus sequences, in which X is any nucleotide, Y is a pyrimidine and the percentage of splice junction sequences analyzed that contain that nucleotide are as follows: 5' A42 or C41, A59, G61, G100, Ul00, A56, A68, G86, U62 3' at the 5' splice site and 5 C74, Al00, Gl00, G47 3' immediately at the 3' splice site. The bottom part of the figure lists similar sequences for splice junctions in late RNA. The sequences joined are indicated in column l, in which ll, 12 and 13 refer to the first second and third segments of the tripartite leader and mRNA species are designated by the polypeptides they encode. The sources of these data are as follows: AdC E1A [11,14], Adl2 ElA [15], AdC EIB [29], Ad2 E2Aa and b [63], Ad2 E2AC [62], Ad2 E3 [76], 11~12 and 13-1 [160,162], y ~ I V and 13-y [160], 13~II [162], 13~pVl [172], 13--52/55K, Akusj~rvi and Persson quoted in Ref. 167.

190 monly encountered arrangement o f overlapping viral mRNA species (see Flint [157]) evolved from the stepwise splicing of intervening sequences in eukaryotic mRNA precursors. Be that as it may, further work is clearly necessary to permit elaboration of the processing pathways followed by adenoviral early transcripts, particularly those that can be fashioned into several mRNA species. The biosynthetic machinery that mediates synthesis of adenoviral mRNA species is far from completely understood. Transcription of adenoviral early genes is by cellular RNA polymerase form II (see Tooze [1]). It seems likely that the virus also relies on the cellular splicing machinery during the early phase of infection: splice junction sequences present in adenoviral early transcripts closely resemble the eukaryotic splice junction consensus sequences deduced from examination of some 80-90 eukaryotic mRNA splice junction sequences, as illustrated in Fig. 7 [ 121-123]. Correct splicing of adenoviral RNA has been observed in isolated nuclei [113,116,124], but soluble systems that perform splicing of adenoviral early RNA precursors have been developed only very recently [125,126]. These provide some hope for the purification and characterization of the apparatus that splices mRNA precursors. One nuclear system has been employed to demonstrate a requirement for small, nuclear (sn) RNP containing U1 RNA, whose 5' end contains a sequence complementary to the consensus sequences ([121]; see Reddy and Busch [127] for a review), in splicing of adenoviral E1A, E1B and E2A mRNA [128], but the nature of this requirement remains to be established.

IV. Regulation of adenovirai early gene expression It has become painfully apparent over the last few years that regulation of adenoviral early gene expression is a complicated business: we are still a long way from constructing a coherent picture of early gene expression that integrates all the phenomena described, and, as implied in previous sections, lack an appreciation of the mechanisms whereby the various governing circuits operate. Some progress has, however, been made in describing various regulatory phenomena. It is now well established that transcription of

all adenoviral early genes is not initiated simultaneously; rather, several subclasses of early genes can be defined in terms of the time of their transcriptional activation and the dependence of their expression on viral gene products. Initiation of transcription of individual early genes has been studied by hybridization of nuclear RNA, labelled for relatively short periods of time, to restriction endonuclease fragments of adenoviral D N A [ 102,129]. Transcription of early regions E 1A, E1 B, and of sequences from the L1 promoter to somewhere to the right of 50 units can be detected from the beginning of infection, whereas sequences complementary to E2A, E4, E2B and IVa 2 appear somewhat later. Each region also displays different kinetics of expression, measured in terms of the maximal rate of transcription: regions E1A, E3 and E4 are transcribed at maximal rates by 3 - 4 h after infection, whereas maximal transcription of E1B and E2A was not observed by Nevins et al. [129] until 7 h after infection. After reaching a maximal rate of transcription during the early phase, the expression of most classical early regions declines but apparently increases again as an infection enters the late phase, at least until 11 h after infection [102]. However, no transcription of E1A, E1B or E4 sequences could be detected in nuclei isolated 18 h after infection [130], suggesting that transcription of these early genes is inhibited eventually. In considering these facts in relation to the regulation of viral mRNA synthesis, it is important to bear in mind that regulatory phenomena can intervene during post-transcriptional processing such that even though a particular early gene be transcribed, the mature mRNA fashioned from it might not reach the cytoplasm. Indeed, this seems to be the case for sequences complementary to the IVa 2 region, which can be detected in the nucleus from 3 h after infection but not in cytoplasmic, poly(A)-containing RNA until some 6 h after infection [102]: whether IVa 2 sequences simply are not processed or are processed yet held in the nucleus remains to be determined. In summary, such labelling experiments indicate that the L1 promoter is active from the beginning of infection and provide evidence for an ordered expression of adenoviral early genes. Further insight into such

191

temporal regulation has come from studies of the expression of early genes in cells infected either by various mutants of adenovirus 5 or in the presence of inhibitors of protein synthesis. No cytoplasmic RNA complementary to E1B, E2A, E3 or E4 is present in cells infected by host-range or deletion mutants of Ad5 whose lesions lie within the E1A gene [40,42]. In cells infected by deletion mutants such as d1312, deleted from 1.5-4.5 units [41], no E1A mRNA can be made either. Cells infected by group 1 host-range mutants such as H5hrl do, however, make normal E1A mRNAs [40]. These observations implicate a product of E1A, that affected by the H5hrl mutation (see Subsection IIA-3), alone or in combination with other E1A products, in the synthesis of stable, cytoplasmic mRNA complementary to the other classical early regions in the adenoviral genome. Region E1A has therefore been defined as a pre-early gene. Nuclease S1 analysis of nuclear RNA complementary to classical early regions made in infected cells detected only very low amounts of nucleusspecific, discrete RNA species [40]. Similarly, hybridization of nuclear RNA made in d1312-infected cells to 32p-labelled adenoviral DNA fragments specific for various early regions showed that early RNA sequences are represented in the nucleus in the steady-state R N A population, but in significantly reduced amounts compared to wild-type infections [42]. These data alone cannot distinguish a role for the E1A products in potentiation of transcription of the affected early regions, in direct stabilization of their nuclear transcripts or in their processing, which in turn achieves stabilization and permits transport to the cytoplasm. However, pulse-labelling experiments have recently established that the EIA product(s) must affect transcription: in H5hrl- or d1312-infected cells, RNA sequences complementary E2A, E3 or E4 are labelled at only 0.9-7.4% wild-type levels in a 5m i n period [131]. Interestingly, synthesis of E1A-specific RNA at only 32% the level observed in control infections was observed in H5hrl-infected cells, suggesting that the E1A product might be to some extent autoregulatory. Comparison of the kinetics of synthesis in wild-type and d1312-infected cells has established that RNA sequences

complementary to region other than E1A can be made in the absence of a functional E1A product(s), although in greatly reduced amounts and at much later times than normal [131]: labelled E4 sequences, for example, attain values of 8 and 35% the wild-type level by 11 and 23 h after infection, at 20 plaque-forming units/cell, respectively. This effect is magnified when infection is performed at 200 plaque-forming units/cell, although the wild-type level or rate is never attained. Thus, it appears that the EIA product(s) mutated in H5hrl (see Subsection IIA-1) enhances transcription of other early regions, rather than being absolutely required for their expression. Moreover, the results of additional experiments suggest that the E1A product(s) functions indirectly. The phenotype of the E1A mutants predicts that complete inhibition of protein synthesis during infection should prevent transcription of the other classical early genes. When this experiment has been performed, using the inhibitors emetine or anisomycin added prior to infection [129,131,132], close to normal levels of transcription of regions E1A, E1B and E4 and somewhat reduced transcription of regions E2 and E3 has been observed. To account for the apparent paradox in these observations, it has been proposed that transcription of adenoviral early regions, except E1A, is normally inhibited by a short-lived, cellular factor: when cellular proteins are made normally the E1A product(s) is necessary to antagonize its effects. The E1A product(s)would be dispensable when synthesis of the unstable factor was prevented by inhibition of protein synthesis. The observations that transcription of regions other than E1A occurs in puromycin-treated 293 cells [132] and in cycloheximide-treated cells infected by dl312 [131] can both be explained by this model. Nevertheless, it seems prudent, in light of the rather drastic treatments to which cells are subjected in such experiments, to reserve final judgement on this model until direct proof, such as identification of the putative cellular inhibitory factor, can be obtained. The experiments discussed so far in this section define at least three classes of early genes: preearly, including E1A, whose expression is relatively unaffected by the E1A product(s); early,

192 such as E2A, E3 and E4 whose maximal expression requires the E1A products; and intermediate, whose expression commences only toward the end of the early phase, such as the IX and IVa 2 genes. Whether the expression of all early genes but E1A itself depends upon the E1A product(s) is presently a matter of controversy. When protein synthesis in infected cells is inhibited from the beginning of infection by addition of anisomycin, a potent drug that reduces protein labelling to 0.2-0.5% control levels, the levels of E1B, E2A, E3 and E4 mRNA species, assayed by hybridizationselection and translation in vitro are reduced [98]. This is the result expected if a functional E1A product whose synthesis is prevented completely by the drug were necessary to achieve maximal expression of the other early regions. However, the levels of the early mRNA species encoding the 13.5 kDa (17.0-21.5 units) and the L l 52-55 kDa polypeptide were not altered significantly in the presence of high concentrations of anisomycin [98]. In direct contrast, Nevins [131] has reported that deletion of E1A in the dl312 mutant affects early L~ transcription as it does the classical early regions, that is, it is considerably depressed and delayed. These contradictory reports have yet to be reconciled. Regulation of adenoviral early gene expression appears to be a more complicated business than merely a temporal cascade of expression of specific sets of gene: a large body of evidence (most of it confusing) points to quantitative regulation of viral gene expression. One of the more clear-cut observations is the increased accumulation of early m R N A sequences in cells infected at a non-permissive temperature by the H5ts125 mutant [133], an observation that implicates the E2 72 kDa DBP in repression of early RNA synthesis (see Carter and Blanton [134,135]). However, the repressive effect of the DBP appears to be exerted by means of more than one mechanism. Carter and Blanton originally reported that the levels of cytoplasmic R N A sequences complementary to all five classical early regions are increased in H5ts125-infected cells at a non-permissive compared to a permissive temperature, or relative to wild-type infections. These authors also observed an increase in the synthesis of early RNA sequences in nuclei isolated

from H5ts125-infected cells maintained at a nonpermissive temperature [136], although measurements of RNA complementary to individual early regions were not made in these experiments. More recently, Nevins and Winkler [137] compared the amounts of pulse-labelled RNA made in a 5' period complementary to the classical adenoviral early regions made in wild-type and H5ts125-infected cells at 41°C, at different times after infection. Interestingly, the only transcriptional program affected by the H5ts125 mutation is that of the E4 region: during wild-type infections, transcription of this region attains its maximal rate by about 4 h after infection and then declines, as expected if repression of transcription had occurred. In H5ts125-infected cells at 41°C, however, no such decline is observed, implicating the E2 72 kDa DBP in repression of E4 transcription, almost certainly at the level of initiation [137]. The mechanism of this repression has not been investigated, but now that in vitro transcriptional systems have been developed (see Section VII) it should be understood before too long. Although transcription of the E2 region displays a similar pattern to that of E4 sequences in wild-type infections, a considerable decrease in the rate of transcription once the maximum has been reached at about 6 h after infection, this decline is not relieved by the H5ts125 mutation [137]. Nor does the DBP influence the pattern of transcription of region E1A. At the present time, the apparent conflict between these results and those of Carter and Blanton described previously is presumed to reflect some additional role of the DBP in post-transcriptional processing a n d / o r stabilization of transcripts complementary to early genes such as EIA and E2, itself. Independent evidence for stabilization of early RNA sequences has been obtained recently. The inclusion of cycloheximide in the culture medium 2-3 h after virus adsorption leads to accumulation of cytoplasmic R N A sequences complementary to regions E 1A - E4 [ 17,26,28]. Cycloheximide inhibits protein synthesis to approximately 95% [22] so it is generally assumed that this degree of inhibition in conjunction with relatively late addition of the drug leads to only slight changes consequent upon changes in the synthesis of the E1A product(s)

193

discussed previously. Measurement of the rates of transcription of these genes in the absence and presence of cycloheximide, by methods like those described in previous paragraphs, has established that inclusion of the drug prolongs transcription of only E2 and E4 sequences [129,138]: the initial rates of transcription of E1A, E1B and E3 are, if anything, depressed slightly in the presence of inhibitors of protein synthesis. These experiments therefore suggest that increased transcriptional activity in the presence of cycloheximide cannot account for the accumulation (in the cytoplasm) of early mRNA sequences complementary to all classical early regions. Thus, the effects of a 95% inhibition of protein synthesis on viral early mRNA accumulation must be mediated in some other way, such as an increased efficiency of processing or stabilization of RNA sequences. Indeed, it has been reported that in the presence of cycloheximide the normally very short half-life of cytoplasmic RNA sequences complementary to E1A and E1B, 15-20 min, is increased some 4-fold compared to that measured in the absence of the drug [138]. A similar stabilization of E4 cytoplasmic RNA sequences was observed, which together with the continued transcription of this region when cycloheximide is present, can account for the larger observed increase in E4 compared to other early sequences [138]. That this stabilization is a consequence of a failure to synthesize the E2-encoded DBP, as suggested by the observations of Carter and Blanton [134,135] remains to be demonstrated. The continued transcription of the E2 gene during the later part of the early phase observed in the presence of cycloheximide [ 129,138] but not in H5ts125-infected cells maintained at a non-permissive temperature [137] implies that some gene product, other than the E2 DBP itself, normally represses transcription of E2. This product has not been identified, but it does not seem to be encoded by regions E1A or E2B for mutations that lie in them do not lead to an overproduction of E2 RNA sequences [40,42,139]. Finally, it should be noted that modulation of adenoviral early gene expression at the level of translation has also been invoked to explain decreased synthesis of virus-specific proteins when infected cells are released from a puromycin or

anisomycin block [132]. Thus, the data obtained using inhibitors of protein synthesis have suggested regulation of transcription, stabilization and translation of adenoviral early RNA sequences. In most cases, attempts were made to demonstrate that the drugs employed do not derange cellular metabolism grossly, at least within the time frame of the experiment. Nevertheless, it is difficult to escape the conclusion that this tool lacks the precision necessary for dissection of this subtle system. This, in retrospect, is not too surprising: the available data suggest that several proteins, possibly both viral and cellular in origin, may participate in regulation of adenoviral early gene expression, such that the precise pattern of viral RNA synthesis and accumulation within an infected cell at any one time must depend upon a delicate balance between a number of governing circuits. As disruption of any one mechanism might be expected to produce a complicated result, it is not too surprising that the consequences of a general inhibition of protein synthesis are far-reaching. Clearly, more delicate approaches, such as the continued use of mutants with lesions in specific genes and eventually reconstituted systems, are needed to improve our understanding both of the individual regulatory circuits and the ways in which they interlock.

V. Organization and expression of adenoviral late genes VA. The r-strand transcription unit

Once viral DNA synthesis begins, additional viral genes are expressed to produce large quantities of late mRNA and polypeptides. In addition to the genes specifying the virion proteins IVa 2 and IX, defined by this criterion as late genes but actually expressed just prior to the onset of viral D N A synthesis (see Subsections IIA-2 and IIB-2), some 17-18 late mRNA species are encoded in the r-strand of the adenoviral genome to the right of position 30 units. All species are included within one, polycistronic transcriptional unit, whose transcription initiates near 16.45 units [140] and continues to close to the right-hand end of the r-strand [141]. Capping, the addition of mTG linked via a

194

"L 1

L2

L3

L4

20

L5

30 i

40 i

50 i

60

70

80

90

100

;o TITO ---i1,-.

tb m 7G

~c

52,55~ --

II I-- II I- -

~p~rr

v

~T

~d

]] | iF--

*e

p ~Z]]I

--

~-~

IF-

100k

(-)

L3 --

-y

TE

L

VA

10

b

L5

20 i

~Z: a 2

30

40 i

50

60 i

70

80 i

g0

100

72k

Fig. 8. Adenoviral late mRNA species. The adenoviral genome is represented by the solid horizontal line divided into 100 map units near the bottom of the figure. The late mRNA species are represented as horizontal lines, drawn in the direction of transcription. Solid lines indicate sequences found in mRNA whereas gaps represent sequences removed during splicing. Arrowheads indicate poly(A) sequences. The sizes of leader segments are exaggerated for clarity. The location of VA genes within the late transcriptional unit is shown by the bar. The sources of these data are given in the text. The most likely assignments of late polypeptides to individual mRNA species, based upon hybridization-selectionand hybrid-arrested translation [104] are also shown: virion polypeptides are designed by Roman numerals and non-structural, late polypeptides by their apparent molecular weight. 5'-5' p h o s p h o d i e s t e r b o n d to the 5 ' - t e r m i n a l n u c l e o t i d e , of this t r a n s c r i p t o c c u r s very s o o n after, o r p o s s i b l y c o n c o m i t a n t with, i n i t i a t i o n of t r a n s c r i p t i o n [140,142,143]. T h e m a j o r late m R N A species, w h i c h f o r m five families of 3 ' - c o t e r m i n a l species as i l l u s t r a t e d in Fig. 8 [144-147], are fashi o n e d f r o m p r i m a r y p r o d u c t s of t r a n s c r i p t i o n of the r - s t r a n d u n i t b y a d d i t i o n of p o l y ( A ) at o n e of five sites t h a t lie n e a r 39, 49.5, 61.5, 78.3 a n d 91.5 u n i t s (ibid) a n d splicing. A d d i t i o n of p o l y ( A ) a p p e a r s to o c c u r b e f o r e t r a n s c r i p t i o n of the r - s t r a n d u n i t is c o m p l e t e : R N A s e q u e n c e s that lie i m m e d i a t e l y to the 5'-side of each of the five p o l y ( A ) - a d d i t i o n sites s h o w n in

Fig. 9. Synthesis of adenoviral L3 and L5 late mRNA species. The adenoviral genome and mature late mRNA species, at the top and bottom, respectively, of the figure, are represented as described in the legend to Fig. 8. In step a, transcription is initiated in the direction indicated by the arrow drawn complementary to the 5' end of the late transcriptional unit. Very soon thereafter, the 5'-terminal nucleotide is capped, step b where • represents m7G. Transcription continues (step c) and splicing of some 5' leaders begins in the absence of polyadenylation. Once transcription has passed a site of poly(A) addition, those of the L3 and L5 families in step c, cleavage and polyadenylation can occur (step d) to generate the mature T-termini indicated by the solid arrows. The dashed lines shown distal to the poly(A) sites represent transcribed sequences that cannot be fashioned into mature mRNA species. After polyadenylation, the final splicing reactions are performed, step e, to generate mature, late mRNA species. The data upon which this scheme is based are discussed in the text.

Figs. 8 a n d 9 are l a b e l l e d at a p p r o x i m a t e l y e q u a l rates in p e r i o d s of t i m e s h o r t e r t h a n that n e c e s s a r y for the t r a n s c r i p t i o n a l m a c h i n e r y to traverse the e n t i r e r - s t r a n d unit. W e r e cleavage to g e n e r a t e the 3 ' - t e r m i n i of m a t u r e m R N A species a n d their subsequent polyadenylation dependent on complete t r a n s c r i p t i o n of the u n i t , R N A s e q u e n c e s a d j a c e n t to the p r o m o t e r distal, that is right h a n d , p o l y ( A ) sites w o u l d be l a b e l l e d u n d e r these c o n d i t i o n s b e f o r e those a d j a c e n t to p r o m o t e r - p r o x i m a l p o l y ( A ) sites. T h u s , it has b e e n c o n c l u d e d that n a s c e n t t r a n s c r i p t s of the r - s t r a n d m u s t b e the s u b s t r a t e s for the c l e a v a g e / p o l y a d e n y l a t i o n react i o n s [148]. W h e n a p r o m o t e r - p r o x i m a l p o l y ( A ) a d d i t i o n site is r e c o g n i z e d a n d utilized, p r e s u m a b l y a short time after its t r a n s c r i p t i o n , the r e m a i n i n g p r o m o t e r - d i s t a l p o r t i o n of the u n i t is n e v e r t h e -

195 less transcribed, for all RNA sequences complementary to the r-strand to the right of position 30 units are represented in close to equimolar amounts in pulse-labelled, late, nuclear RNA [148]. This kind of mechanism of poly(A) addition implies the existence of some regulatory phenomenon that permits each poly(A) site to be recognized only about once in every five times that it is copied by the transcriptional machinery. Whether selection of poly(A) sites is under the control of an active regulatory mechanism [148] is not yet established. The final step in the maturation of the late mRNA species is splicing as illustrated in Fig. 9. Two types of splicing steps may be defined: those that mediate synthesis of the tripartite leader carried by the great majority of late mRNA molecules. This leader comprises three short segments, whose sequences are encoded near 16.45, 19.6 and 26.8 units in the r-strand [ 149-152]; ligation of the leader to one of several potential 3' splice sites within each polyadenylated intermediate to create a mature, late mRNA species. The sizes of adenoviral, late, polyadenylated intermediates [124,148,153] and the presence of the intervening sequence that lies between the first and second components of the tripartite late leader (see Fig. 9) in nuclear, poly(A)-containing RNA [140] are consistent with the idea that addition of poly(A) precedes splicing. Analysis of the structure of poly(A)-containing and -lacking late RNA species synthesized in isolated nuclei [124,153] (unpublished data) has confirmed that the second kind of splicing listed, leader to mRNA-body ligation, does indeed depend upon prior polyadenylation of the intermediate. However, these experiments also establish that splicing reactions that fashion the tripartite leader can occur independently of polyadenylation. Such a mechanism can be rationalized in the sense that it is the choice of poly(A) site and the choice of the 3' splice site to which the mature leader is ligated that determine which adenoviral late mRNA species will be fashioned from a given primary transcript. Each of the five poly(A)-addition sites appears to be used with approximately equal frequency [148], so that it must be concluded that it is the latter choice which is paramount in quantitative regulation of late m R N A species [61]. It should also be apparent

upon inspection of the summary of the pathway of adenoviral late mRNA synthesis presented in Fig. 9 that the observed dependence of leader-to-body joining upon polyadenylation is an essential prerequisite for orderly production of mature mRNA species. Each of the late mRNA species derived from the major r-strand transcriptional unit shown in Fig. 8 contains the 5'-tripartite leader segment, implying that only one mature mRNA molecule can be fashioned from each primary transcript, even though the latter contains the sequences of some 17 mRNAs, whose coding regions do not overlap (see Fig. 8; [100,104,154]). This inference is supported by the fact that only approximately 20% of adenoviral, nuclear RNA sequences made during the late phase of infection enter the cytoplasm [61,148]. In this context, it should be noted that although promoter proximal regions of the major late transcriptional unit are copied more frequently than the remainder, the result of premature termination of transcription [107,141,148,155,156], there exists no evidence to suggest that such short transcripts provide an additional source of leader sequences. The mode of adenoviral late gene expression depicted in Fig. 9 permits post-transcriptional determination of the nature, and quantity, of late m R N A species. At the transcriptional level, expression of this set of genes, which includes those specifying the major structural proteins of the virion (see Fig. 8), is subject to control at only one promoter site. It must be assumed that the use of but one such control signal during the late phase of infection coupled with a large array of posttranscriptional processing choices is of advantage to the virus. And this advantage must be of considerable worth to offset the apparent profligacy of the mechanism of viral late gene expression shown in Fig. 9: for reasons discussed in previous paragraphs, the majority of sequences comprising each primary transcript cannot be conserved in cytoplasmic mRNA. It might perhaps be speculated that regulation of expression of individual viral genes by choice of processing sites permits more subtle control, but it is difficult to argue such a case with much force in the absence of an understanding of the mechanisms that mediate,

196

and regulate, splicing of late RNA sequences. It is, however, worthy of note that a strategy analogous to that shown in Fig. 9 has been adopted by other viruses (see Flint [157]), suggesting that it does indeed offer some inherent advantage. VB. Other late genes

In addition to the adenoviral late genes included in the r-strand transcriptional unit, the genes encoding the virion polypeptide IX and IVa 2 have been regarded as late genes. Polypeptide IX is clearly a virion constituent, although its function is not well understood. However, it is also established firmly that the promoter of the polypeptide IX gene is active prior to the onset of viral DNA synthesis and the polypeptide IX mRNA (species E1B-c) enters the cytoplasm and is translated (see Subsection IIA-2). By these criteria, the gene specifying polypeptide IX can be classified as an early gene and would appear to fall into the intermediate subclass, for it is not expressed until towards the end of the early pfiase and attains its maximal rate of expression during the late phase (see Subsection IIA-2). The situation with respect to the IVa 2 gene, whose product may be involved in virion maturation [158], is not so clear cut: sequences complementary to the IVa 2 gene, 15.1-11.2 units in the l-strand, can be detected among early mRNA populations. However, the E2B mRNAs share the l-strand polyadenylation site at 11.3 units with IVa 2 mRNA, so that it is not yet established that functional IVa z mRNA is made during the early phase (see Subsection IIB-2). During the late phase of infection, a discrete mRNA species, comprising a 5'-terminal segment complementary to the region 15.7-16.1 units linked a body of 15.1-11.2 units is present [101] and specifies the IVa 2 protein in vitro [ 100]. VC. Structure and splicing of adenoviral late leader segments

As discussed in Subsection VA, the majority of adenoviral, late mRNA molecules carry the 5'terminal tripartite leader illustrated in Fig. 8 (those that do not possess variant leader, discussed subse-

quently). The sequences of the leaders present in hexon and fiber mRNAs (see Fig. 8) have been determined using cloned cDNA segments, as have some genomic sequences flanking leader segments [148,159-162]. Such studies have shown that the tripartite leader is on the order of 202 nucleotides in length, probably comprised of segments of 42 (near 16.45 units), 71 (near 19.6 units) and 89 ( n e a r 26.5 units) nucleotides: the ambiguity in these length estimates stems from the presence of short, tandemly repeated sequences at the borders of each set of sequences removed during splicing. Such ambiguities also render impossible the location to the nucleotide of the splice points in the various junctions analysed. Nevertheless, all the adenoviral, late RNA splice junction sequences so far determined can be arranged such that they contain the sequence G U at the 5' side and AG at the 3' end of the region deleted during splicing (the term 'intervening sequence' is not employed in this discussion of late RNA for although it is appropriate for the sequences between the three leader segments, it is not for the sequences removed between the 3' end of the third leader segment and the 5' end of a late mRNA coding segment: such segments obviously include coding sequences of other late genes; Fig. 9). The late splice junctions sequences arranged in this fashion are included in Fig. 7, to facilitate comparison with early splice junction sequences and the eukaryotic consensus sequence. Such a comparison suggests that although the late splice junction sequences have features in common with the eukaryotic consensus sequences, most notably their 5' splice site sequences, they do not conform quite as well as do other viral splice junction sequences [121,157]. In addition to the tripartite leader, variant leaders that contain additional segments have been observed. Species of late mRNA encoded by promoter-distal regions of the r-strand transcriptional unit, such as fiber mRNA (see Fig. 8), quite frequently carry additional leader segments, termed x, y and z, derived from the regions 76.6-77.3, 78.6-79.1 and 84.7-85.1 units, respectively, in the adenoviral genome [ 101,163]. Such ancillary leaders may be present singly, in pairs, or all together in individual fiber mRNA molecules, but forms that

197

have only the y leader are the most frequent: they comprise some 25% of the polysomal population of fiber m R N A at 22 h after infection, all but a tiny fraction of the remaining 75% having the usual tripartite leader [101]. As illustrated in Fig. 7, the splice junctions at either border of the y leader are similar to other late splice junction sequences [160]. The presence of the y leader has no effect whatsoever upon translation of fiber mRNA in vitro [163]. The second commonly observed class of variant leaders contains additional sequences complementary to regions of the adenoviral genome to the left of position 30. One such type includes a fourth leader segment, termed the i leader, 22.0-23.2 units, inserted between the second and third components of the conventional tripartite leader [6,60]. Adenoviral m R N A molecules bearing the i leader are especially prominent at intermediate times, 13-16h after infection, when they comprise as much as 30% of mRNA species complementary to the r-strand transcriptional unit [6,60]. A rarer variant leader has the normal set of third-segment leader sequences near 26.5 units, but this set of sequences extend further upstream, compared to the standard leader, for various lengths [6,60]. Adenoviral RNA molecules with similar leaders have been observed among populations of nuclear RNA isolated during the late phase of infection, as have other species that appear to be spliced incompletely, for example forms in which the first and second components of the leader have been joined but have not been subjected to further splicing [ 124,142,153,164]. The observed structures of such presumptive processing intermediates suggests that splicing, at least those reactions necessary to fashion the mature 5'-tripartite leader, proceeds with 5 ' ~ 3' polarity: molecules that have complete, promoter-distal splices but are immature at their 5' ends have not been detected. It also seems that splicing of the leader sequences takes place in small, discrete steps, for many nucleus-specific, uniquely spliced molecules whose splice sites do not correspond to any that remain in mature mRNAs, as well as forms like those described in the previous paragraph, are observed in both steady-state and newly synthesized late RNA preparations [124,164]. Removal of an intervening

sequences in a stepwise fashion, using intervening sequence-specific splice junctions has also been reported for an a-2 collagen mRNA precursor [120] but in this case a 3' ~ 5' polarity is followed. It is of interest that splicing of other cellular gene products, such as vitellogenin [165] and ovomucoid [166], does not follow an absolute pathway, in terms of the order in which individual intervening sequences are excised, but rather several preferred pathways. The various aberrant, spliced, adenoviral late mRNA molecules discussed in previous paragraphs and elsewhere [60] suggest that leaderto-body splicing too can follow alternative pathways to generate the same final product and also proceeds via a defined series of 'pauses'. At the present time nothing is known of the nature of the molecular machinery that mediates splicing of adenoviral late RNA molecules, not perhaps surprising in view of the general ignorance of mRNA splicing mechanisms. Similarly, any function the leader segment might provide remains obscure: the standard tripartite leader contains no initiating A U G codon, but does include, near its 5' end, sequences that could base-pair with the 3' end of 18 S rRNA [140,160,162]. VI. The early to late transition

In terms of adenoviral gene expression the characteristic change that defines the late phase of infection is the onset of synthesis of very large amounts of the mRNA species that encode the major structural proteins of the virion, that is, efficient and complete expression of the r-strand, late transcriptional unit. Until quite recently, this change in adenoviral gene expression was believed to reflect activation of the late promoter at 16.45 units, a consequence of viral DNA replication. It is now apparent that this promoter is, in fact, utilized from the very beginning of the infectious cycle and that synthesis of the classical r-strand, late mRNA species in part occurs because of altered transcription of the r-strand, late unit during the late phase of infection and is mediated in part by altered processing events. As discussed in detail in Subsection IIB-3 the largest L1 mRNA species,with coding sequences complementary to the region 30.5-39.0 units, is

198

made from the earliest times after infection. The L1 mRNA made during the early phase carries the three segments that comprise the 5'-terminal tripartite leader, but the majority of such molecules, 76-80%, also include the i leader described in Subsection VC [60]. These LI 52-55 kDa mRNA sequences can be detected in cytoplasmic poly(A)containing RNA fractions from 1 h after infection and increase in concentration in parallel with those of E1A and E1B mRNA sequences. By contrast, cytoplasmic RNA sequences complementary to the L2 or L3 family (see Fig. 8) do not appear until some 5 h later [102]. At early times after infection in the absence of any drugs, or in the presence of cycoheximide or anisomycin to inhibit viral DNA synthesis, 30 min pulse-labelled nuclear RNA contains sequences complementary to the r-strand from the late promoter site through the 3' end of the LI family near 39 units to somewhere between 50 and 60 units [102,103,167]. Such RNA carries the 5'-terminal capped sequence definitive of initiation at the late promoter site [102]. It is therefore clear that before the onset of viral DNA replication, transcription of the r-strand unit, initiated near 16.45 units, terminates at some point before the genomic sequences of the L4 and L5 regions and that a major consequence of the early to late transition must be the equimolar transcription of the complete r-strand unit. Lower amounts of RNA sequences complementary to L2 or L3 than to L1 are made in a 30 min period or continuous labelling at early times after infection [102], althoug h all three regions appear to be transcribed at equal rates during a 5-min label [103]. This difference indicates that sequences to the right of the 3' end of the L1 region are not stabilized after their transcription, in contrast to the L1 sequences themselves. Analysis of the labelling of L1, L2 and L3 sequences immediately adjacent to poly(A) in nuclear and cytoplasmic RNA suggests that the L1 poly(A) site near 39.0 units is recognized preferentially during the early phase of infection. Moreover, even w h e n L2 and L3 nuclear RNA sequences are, less frequently, polyadenylated, they do not appear to be transported to the cytoplasm with anything like the efficiency devoted to LI, poly(A)-containing R N A s [103]. Thus, the early to late transition must

also involve expansion of the number of poly(A) sites within the r-strand transcript that are recognized frequently and presumably some other alteration(s) that permits efficient exit of all late mRNA species. Finally, the early to late transition is accompanied by altered splicing of L1 transcripts: during early and intermediate phases of infection, most L1 transcripts are spliced to produce the largest message of the L1 family, 30.5-39.0 units [60,103,167]. During the late phase, however, at least two additional L1 mRNA species, whose 3' splice sites (to which the tripartite leader is ligated) lie further downstream than that of the 52/55 kDa mRNA (see Fig. 8), are made. This provides a clear example of temporal regulation of adenoviral gene expression by choice of splice sites. Other examples of this phenomenon have been alluded to in previous sections, namely the changing patterns of accumulation of certain early mRNA species with time (see Subsections IIa-1 and IIA-2). A particularly striking example is provided by E1A mRNAs: the smallest mRNA, species E1A-c, does not begin to accumulate until later times than species E1A-a or E1A-b and continues to be made well into the late phase. In this case, it is the choice of 5' splice site lhat determines which species will be made from a given primary transcript (see Fig. 1). It will be of considerable interest to learn how such modulations of splicing are mediated but their existence would seem to provide a strong hint that adenovirus infection does modify the splicing machinery of its host cell. In addition to the changes in both the extent of transcription of the late r-strand unit and posttranscriptional processing of sequences complementary to this unit documented in previous paragraphs, the late phase is characterized by the synthesis of increasing, and ultimately massive, quantities of classical late RNA sequences. Whether such accelerated transcription of the rstrand late unit reflects potentiation of the late promoter site at 16.45 units or recruitment of newly synthesized viral DNA molecules into the transcriptional machinery is not yet established. Of course, both mechanisms might pertain. The same can be said of the increased synthesis of polypeptide IX mRNA from its promoter site near

199 9.8 units (see Fig. 1), also characteristic of the late phase of infection. It is clear that the majority of changes in adenoviral gene expression that occur during the course of an infection are not mediated simply by the activation of promoter sites. Nevertheless, at least one such activation does occur. During the early phase of infection, region E2A (and presumably E2B) is transcribed from a promoter site near 75.2 units in the 1-strand (see Fig. 2). As infection continues, however, the proportion of cytoplasmic mRNA bearing leader sequences from this region declines, as a new form of mRNA with 5'-terminal sequences encoded near 72.0 units accumulates [6]. Species of E2A mRNA made at early or late times after infection carry different 5'-terminal, capped oligonucleotides [32], strongly suggesting that their synthesis is initiated at different sites in the genome. Thus, it seems very likely that a new promoter near 72.0 units for the E2A, and presumably E2B, gene is activated as the late phase commences. Whether this switch is related to the continued expression of these genes during intermediate to late times after infection [6,61] to form 18-30% of the total viral mRNA species at 13-16h after infection [6] is not known. Nor is it clear whether the switch to the E2 late promoter site and activation of the polypeptide IX promoter site occur simultaneously. The switch in use of E2 promoter sites is inhibited by cycloheximide [6], implying a role, direct or indirect, for a viral protein not yet identified. Interestingly, the appearance of the discrete IVa 2 mRNA species described in Subsection VB also requires viral protein synthesis, that is, is inhibited by cycloheximide, and does not occur under normal conditions until intermediate to late times after infection [60]. It is therefore possible that activation of the IVa 2 promoter site (see Subsections IIB-2 and VB) takes place as the cycle enters the late phase. No discussion of the early to late transition can be complete without a consideration of the role of viral DNA replication. It has, of course, been recognized for many years that inhibition of viral D N A synthesis, by drugs or early mutations, precludes initiation of the late phase, typified by the synthesis of mRNA species encoding the major structural proteins of the virus. Our understanding

of the role of replication has advanced slowly over the years and remains far from complete. The most obvious possibility is that replication of adenoviral DNA is necessary to generate a suitable template for complete transcription of late genes. It is well-established, however, that replicating DNA itself is not an obligate, or the usual, template from which late RNA sequences are transcribed. When cells infected by H5ts125 that have been permitted to enter the late phase at a permissive temperature are shifted to a non-permissive temperature, late transcription continues for extended periods, even though viral DNA synthesis ceases rapidly [168,169]. Under these conditions, cycles of replication initiated at the permissive temperature are completed [71], such that replicative intermediates do not persist. This observation therefore indicates that actively replicating adenoviral DNA molecules are not a prerequisite of transcription of late RNA sequences. Nor is the 72 kDa DBP itself. In line with this conclusion is the observation that nucleoprotein complexes that contain adenoviral DNA and are active in either replication or transcription in vitro are physically separable [170]. Thus, it seems clear that most transcription of adenoviral late genes does not take place or molecules that are also replicating, although forms that are undergoing both replication and transcription are observed occasionally in the electron microscope (A. Bayer, personal communication). An elegant series of coinfection experiments has confirmed this conclusion and revealed that production of classical late mRNA sequences involves non-diffusuable, cis-acting elements. Thus, when cells infected by one adenovirus serotype for sufficient periods of time to reach the late phase of infection are superinfected by a second serotype, some of whose late polypeptides can be distinguished from those of the first virus in SDSpolyacrylamide gels, proteins characteristic of the second, superinfecting virus are made only when synthesis of the DNA of that superinfecting virus is permitted, despite the fact that replication of the first virus and synthesis of its polypeptides are well under way [169]. These results clearly demonstrate that adenoviral genomes that have not undergone replication cannot act as templates from

200 which late genes are expressed and that diffusable transacting factors, (supplied by the original infecting virus), while probably necessary, are not sufficient to induce the early to late transition. Thus, the most plausible hypothesis at the present time must be that the structure of the ingoing viral D N A template restricts transcription of the rstrand, late transcriptional unit to sequences to the left of approximately 60 units: replication would then be imagined to induce a conformational change in the template, perhaps the consequence of association of a new set of polypeptides with the viral DNA, to permit complete transcription of this unit to the right-hand end of the genome. Such a conformational change might also render the E2 late promoter near 72.0 units, the IVa 2 promoter near 16.1 units and the polypeptide IX promoter near 9.8 units more accessible to the transcriptional machinery. Our current understanding of the nucleoprotein structure and organization of adenoviral D N A within virions or within infected cells is very limited, but this hypothesis should be readily testable using the elegant methods recently developed for analysis of the structure of specific chromosomal genes in eukaryotic cells [ 171,172]. This hypothesis can account satisfactorily for the changes in transcriptional activity that characterize the early to late transition (although roles for virus-specific factors that mediate some of the potentiation of promoter activity mentioned previously cannot be precluded, as described in Section VII), but it does not address the observed modulation of post-transcriptional processing. It is difficult to believe that chariges in template structure, and possibly the transcriptional machinery, can influence such steps as splicing, which are presumed therefore to be subject to separate regulatory mechanisms. VII. Structure and function of adenoviral 'promoter' sites Adenoviral sequences surrounding the sites at which transcription of R N A initiates, defined as promoter sites for the purposes of this discussion, were identified originally by location of the 5'terminal, capped Tl-oligonucleotides of viral early

Ad2EIA Ad2EIB Ad2E3 Ad2E4 AdSE4 Ad2Late Ad2 IX A d 2 2L A d 2 2E A d 2 IVa2

Location 1.4 4.7 76.6 99.1 99.161.4 9.8 72 75 15.9

GT~TATTTATACCCGGTGAGTTCCTCAAGAGGCCACTCT GGGTATATAATGCGCCGTGGGCTAATCTTGGTTACATCT GGGTATAACTCACCTGAAAATCAGAGGGCGAGGTATTCA TCCTATATATACTCGCTCTGTACTTGGCCCTTTTTACAC TCCTATATATACTCGCTCTGCACTTGGCCCTTTTTTACA GGCTATAAAAGGGGGTGGGGGCGCGTTCGTCCTCACTCT GAATATATAAGGTGGGGGTCTCATGTAGTTTT~TATCTG AGGTACAAATTTGCGAAGGTAAGCCGACGTCCACAGCCC TAGTCCTTAAGAGYCAGCGCGCAGTATTTGCTGAAGAGA TCCTTCGTGCTGGCCTGGACGCGAGCCTTCGTCTCAGAG G-GTATAAA A G G TT ~-34 to -26

Fig. 10. Adenoviral DNA sequences preceding sites of initiation of transcription. The sequences preceding the sites at which transcription of the adenoviral genes listed in column 1 initiates are aligned to facilitate comparison of the TATA box sequences, shown in larger face type. The nucleotides at which RNA synthesis begins are underlined. These data are taken from Baker and Ziff [32]. Below the viral sequences is shown the canonical TATA box sequence [177].

and late R N A species in the genomic sequence [16,32,63]. This definition assumes that the 5'terminal nucleotide that is capped is indeed the first residue transcribed and is not liberated by post-transcriptional cleavage, an assumption that has been demonstrated to be correct by analysis of the 5'-terminal sequences of R N A chains synthesized de novo in in vitro transcriptional systems [142, 173-176]. A summary of the adenoviral promoter site sequences is given in Fig. 10 (Baker and Ziff [32]), in which the sequences are aligned to facilitate comparison of sequences a short distance upstream from the R N A start site(s). The most obvious feature of these sequences is the presence in all but two promoter regions, those of the early start sites of the E2 and IVa 2 genes, of a sequence based on the canonical form, 5'TATAVAAVA3', the so-called T A T A or Goldberg-Hogness box. In the case of the adenoviral sequences shown in Fig. 10, the TATA box sequences range from 5 to 9 nucleotides in length, occur some 25-30 nucleotides upstream from the m R N A start site(s) and all, except that preceding the late E2 start sites, include the sequence TAT. Thus, these adenoviral sequences are very similar to those found in equivalent locations in a variety of cellular genes (see Refs. 32 and 123 for reviews). The rather general occurrence of obviously similar sequences a fixed distance before m R N A start sites suggests they play a crucial role in promoter site recognition,

201

analogous to the Pribnow box of prokaryotic promoter sites [177]. This hypothesis has been tested in several systems by analysis of the effects of deletion of, or alterations within, sequences flanking RNA cap sites upon expression of the mutated gene in vivo or in in vitro systems that possess the ability to initiate R N A synthesis correctly [173,178]. Deletion, for example, of sequences upstream of position - 4 7 (when the first nucleotide transcribed is 1) or downstream of position - 1 2 from the region preceding the Ad2 late promoter site at 16.45 units do not abolish its transcription in vitro. By contrast, deletions that extend into, or beyond, the TATAAA sequence at -- 31 to -- 25 reduce transcription to less than 10% of the normal rate [179,180]. Similar results have been obtained from deletion mapping experiments using transcription in vitro as a functional assay for promoter site sequences for conalbumin [ 179,181 ], ovalbumin [182] and SV40 early [183] genes. Indeed, even a T --, G transition with the TATA box sequence drastically reduces the efficiency of in vitro transcription [184,185]. These experiments emphasize the importance of the TATA box in efficient initiation of transcription in vitro and implicate some 60 nucleotides, from about - 5 1 to + 5 [180] as contributing to promoter site recognition a n d / o r utilization. It is noteworthy that the centre of the TATA box lies some three turns of the DNA helix upstream from the RNA start site, such that both sites could be recognized by a protein that viewed only one side of the DNA molecule. Although the interpretation of these experiments appears obvious, it is now only too apparent that the picture of eukaryotic promoter sites must include more elements than simply the TATA box sequence. Furthermore, the actual function of the TATA box within the cell is more subtle than suggested by analysis of transcription in vitro. It is immediately obvious upon inspection of Fig. 10 that not all adenoviral promoter regions include a TATA box sequence, immediately raising the question of the importance of such sequences in vivo. Indeed, examination of expression in vivo of genes from which TATA box sequences have been deleted reveals that, in general, the efficiency of initiation of transcription is not af-

fected: rather the location of RNA start sites is changed, such that cap site specificity is replaced by RNA start sites spread over an area of as much as 40 nucleotides, [186-!90]. Thus, the TATA box appears to be the determinant of selection of site of initiation of transcription, independent of the actual sequence of the start site. Such in vivo experiments have identified additional sets of 'promoter-site' sequences, including those well upstream from the cap site beyond - 85 nucleotides, whose deletion decreases transcription of the sea-urchin H2A gene 15-20-fold [187] and a sequence of 72 nucleotides that lies more than 100 nucleotides upstream from the SV40 early RNA start site and which is essential for expression of this gene in the cell [191,192]. Such controlling elements that influence the rate of transcription have been termed modulators by Grosschedl and Birnstiel [187], in contrast to sequences surrounding and including the TATA box that position the 5' ends of the RNA transcribed selectors. A second striking feature of the sequence information given in Fig. 10 is the microheterogeneity of the sites at which adenoviral RNA synthesis may be initiated, a characteristic that is not restricted to adenoviral transcripts (see Ref. 32).The majority of adenoviral transcripts can, as illustrated in Fig. 10, initiate at more than nucleotide in the genome: in the most extreme case, region E4 of Ad2 or Ad5, such micr0heterogeneity extends over 6 or 7 adjacent nucleotides, respectively [32]. All RNA termini, except those of E4 and IVa 2 RNA, are uniquely purine residues and, overall, purines are strongly preferred over pyrimidine nucleotides. Thus, when a cap site includes the sequence PuPyPu, the two purines, but not the pyrimidine residue, may serve as RNA start sites, for example the 5'-termini of E1B, IX, the late form of E2A and E3 RNA shown in Fig. 10. Capped pyrimidine 5'-termini are present among E3 and IVa 2 RNA species, but at a much lower molarity [32,193]. The sequences of the 5'termini of adenoviral RNA species presented in Fig. 10 suggest that the preference for initiation of transcription and formation of a capped 5'-terminal nucleotide is A > G >>U >>C. With the exception of the E4 gene, the region in which synthesis of adenoviral RNA species can

202 start is limited to 1-3 nucleotides lying 25-30 nucleotides from the TATA box (see Fig. 10). Thus, it would seem that RNA polymerase form II molecules are positioned tightly in an initiationcomplex, in most cases presumably the result of recognition of the TATA box sequence. Whether the cap site itself also constitutes an important recognition element, and thus an essential component of an eukaryotic promoter site, is not clear. Inspection of Fig. 10 reveals litle obvious homology among adenoviral cap sites. Moreover, when a normal RNA start site is deleted, but the corresponding TATA box is not, the sequences that become positioned approximately 30 nucleotides downstream from the TATA box as the result of the deletion serve to specify capped RNA termini [186,190,192]. Of the adenoviral 'promoter sites' listed in Fig. 10, all, except those of the IVa 2 and late E2 genes, are active in in vitro systems, although the activity of the early E2 site is very low [173,175-177,181]. Thus activity in vitro correlates perfectly with the presence of a sequence that is a good match to the TATA-box sequence, as expected on the basis of findings discussed previously. As those promoters that function poorly, if at all, in vitro are clearly rather active within the cell, at least from the end of the early phase of infection (see Subsections IIA-5 and IV), it must be concluded that the in vitro systems as currently constituted are incomplete. Their dependence upon TATA box sequences for efficient initiation of transcription, in marked contrast to the state of affairs within the cell, also emphasizes their deviance. In this context, it may be of significance that the relative efficiencies with which adenoviral 'promoter sites' are recognized in an in vitro system can be influenced by changes in the template DNA concentration or, more interestingly, by the use of extracts prepared from infected cells harvested during the late phase of infection compared to uninfected cells: under such conditions, transcription from the major late and polypeptide IX promoter sites is enhanced some 10-fold relative to expression from an early promoter [176]. It is also clear that in vitro transcriptional systems can respond to factors known to regulate transcription in the cell, most notably SV40 large T antigen

[194]. It is, therefore, not unreasonable to hope for the purification of factors that appear to permit discrimination between adenoviral early promoter sites and those maximally active at later times after infection, based on their ability to induce in uninfected cell extracts the characteristics of extracts from Ad2-infected cells. It may be as well to bear in mind when considering observations made in vitro that the template supplied has always been naked DNA, often cloned fragments of the adenoviral genome, structures that bear little resemblance to natural templates. Although reports of some kind of chromatin-like organization of intracellular adenoviral DNA sequences have appeared [195-197], our understanding of the details of such structures and their functional significance is minimal. Nor have templates other than naked adenoviral DNA yet been included in in vitro transcriptional systems. It is, therefore, clear that much further work will be required to elucidate the parameters that control viral promoter site activity during adenoviral productive infection. VIII. The VA-RNA genes

In addition t o t h e mRNA species discussed in previous section, the adenoviral genome encodes at least two small RNA species, termed virus-associated, or VA-RNA. The VA-RNA species present in largest amounts in infected cells, termed VARNAI, has been recognized for many years (see Tooze [1] for a review of early literature) and characterized in detail. Type 2 adenoviral VARNA I is 157-160 nucleotides in length and has been sequenced [198-202]. The sequence admits extensive base pairing and, therefore, secondary structures, most notable of which is an extended 'cloverleaf'-type configuration. Variants of VARNA l bearing short 5'- or 3'-terminal extensions have been described [200,203,204]. A single gene, located near 29.0 units in the r-strand in the Ad2 genome, encodes VA-RNA~ [202,205,206]. Close to the right of this gene, separated by a spacer of some 98 nucleotides (spacer heterogeneity arises from the heterogeneity of the 3'-termini of VA-RNAI) lies a gene for a second form of VA-RNA, VA-RNAII [202,204,205,207].

203 The sequences of VA-RNA~ and VA-RNA H are unique, but include scattered regions of homology [202,204]. Moreover, the two VA-RNA species can, at least in principle, adopt secondary conformations that are quite similar in structure and stability, although a more stable configuration than the common one is permitted to VA-RNA H [202]. These observations suggest that the modern VARNA genes, which are quite divergent in their primary nucleotide sequence, arose by duplication of an ancestral VA-RNA gene. Both VA-RNA genes are transcribed in a rightward direction and, although their sequences are, as illustrated in Fig. 8, included in the major late transcriptional unit, these RNAs are synthesized by form III RNA polymerase: transcription of VA-RNAs in isolated nuclei is inhibited only by concentrations of a-amanitin sufficiently high to inhibit this enzyme [207-210]. The most common forms of both VA-RNA species initiate with G residues, although minor forms that carry 5'-terminal A have been described [200,211]. VA-RNA I and VA-RNA H are transcribed from independent promoter sites for both forms of VA-RNA bearing triphosphorylated 5'-termini, that is 5'pppG .... are s y n t h e s i z e d in vivo a n d in vitro [203,207,211,212]. The accurate transcription of adenoviral VARNA genes in soluble systems [213,214] has permitted the identification of sequences controlling their expression, by methods like those outlined in Section VII [214-216]. Such experiments have defined two elements controlling initiation of transcription in the VA-RNA~ gene. The first is an intragenic region, extending from + 9 to +72, relative to the site at which transcription of the major form of VA-RNA I begins, nucleotide 1. Deletion within this region abolishes transcription of VA-RNA~ in vitro. A similar intragenic control region is present in the VA-RNAII gene [215], but its boundaries have not been defined. In this respect the adenoviral VA-RNA genes resemble cellular genes transcribed by form III RNA polymerase, such as 5 S and tRNA genes [217-221]. However, the VA intragenic control elements bear little resemblance to those of Xenopus 5 S RNA genes in either location within the coding region or primary nucleotide sequence. On the other hand,

striking homologies do exist among the sequences at the 5' and 3' ends of the VA-RNA I gene control region and various segments of eukaryotic tRNA genes [215], including sequences that appear to comprise analogous, intragenic control regions of tRNA gene expression [220-223]. The similarity between adenoviral VA and eukaryotic tRNA genes extends to the factors required for their transcription in vitro, leading to the suggestion that adenoviral VA-RNA genes have evolved from a host tRNA gene acquired by the virus [215]. The location of the second region controlling VA-RNA transcription in vitro within the 5'-flanking sequences, is less surprising [214-216]. Alterations within the region - 3 3 to the RNA start site affect both the choice of start site and the rate of transcription [215]. It should be noted that although both the intragenic and 5'-flanking control sequences of the VA-RNA l and VA-RNA n genes share many homologies [202,215], the differences between them must be of functional significance, for the VARNA H gene is expressed at a lower rate both in the cell and in vitro [207,214,215]: in infected cells, both VA-RNA species are made in the presence of drugs that prevent viral DNA synthesis, but once an infection enters the late phase, synthesis of VA-RNAI accelerates whereas that of VA-RNAn continues at a constant rate [207]. During productive infection, very large amounts of VA-RNA, especially VA-RNA~, most of which is found in the cytoplasm, are made [205]. The function(s) of these RNA species is still not completely established. It has been suggested that VARNA~, by virtue of regions of sequence complementarity, could play a role in splicing of adenoviral late RNA [224]: two sets of sequences within the VA-RNA I molecule could base-pair with sequences near splice junctions of hexon premRNA, as illustrated in Fig. 11, to form a structure in which the sequences to be joined during splicing are brought into close proximity. Such base-pairing would involve sequences of the precursor thatare both mRNA-specific and unique to the precursor, as illustrated in Fig. 11A. Such an interaction differs from that proposed to occur between U1 RNA (in snRNP) and eukaryotic consensus splice junction sequences [121,122] il-

204 INTERVENING

/

/ ~ 5' V A , R N A

/

N \ t

X 5' --uCGCAAG

AGGAGCU

/ /--3'hexon

mRNA

with a role in splicing. The late mRNA species made in the VA-RNA I mutant-infected cells are not, however, translated at anything like normal efficiency (T. Shenk, personal communication). Thus, the data currently available support a role for VA-RNA~ in efficient t r a n s l a t i o n of late mRNA species; what VA-RNA H might do remains a mystery.

Acknowledgements

3'--

GUCCAUUCAm

5'

X A G G - - 3 '

UmAS'U1RNA

I thank Noel Mann for her skill and patience in preparing this manuscript. Work from the author's laboratory cited in this proposal was supported by a grant from the American Cancer Society, No. N P 239B.

mRNA

References

Fig. 11. Postulated roles of small, nuclear RNA species in splicing. Base-pairing interactions between an mRNA precursor and small nuclear RNA species that might align splice junctions (see text) are illustrated. A, VA-RNA 1 and a precursor for hexon mRNA, showing the sequences of the splice sites at the 3' end of the third tripartite leader segment and the 5' end of hexon mRNA sequences. The VA-RNA~ chain is represented by the dashed line, including the two sets of sequences that could interact with the pre-hexon mRNA sequence. Adapted from Murray and Holliday [224] to show the actual 3'-termini of VA-RNA, species which vary in the number of 3'-terminal U residues represented by (U). B, UI RNA and the eukaryotic consensus splice junction sequence in an mRNA precursor. Adapted from Lerner et al. [121]. In both parts of the figure, the splice site is indicated by the crossover and Watson-Crick base-pairs by dashes.

lustrated in Fig. llB, in that this latter pairing would involve exclusively intervening sequences of an mRNA precursor. Observations in support of such a role for VA-RNA include the apparent association of VA-RNA with high molecular weight RNA within the cell (see, for example, Ref. 205) and the demonstration that V A - R N A t can indeed hybridize to cloned cDNA copies of adenoviral late mRNAs that include the complete tripartite leader segment [225]. On the other hand, viable mutants of Ad5 in which synthesis of VA-RNA t is prevented by deletion within its intragenic control region apparently induce normal synthesis of late mRNA species (T. Shenk, personal communication), an observation that is difficult to reconcile

1 Tooze, J. (ed.) (1980) The Molecular Biology of Tumor Viruses. II, DNA tumor viruses. 2nd. edn., Cold Spring Harbor Laboratory, Cold Spring Harbor 2 Ziff, E. (1980) Nature (London) 287, 491-499 3 Sharp, P.A., Gallimore, P.H. and Flint, S..I. (1974) Cold Spring Harbor Syrup. Quant. Biol. 39, 457-474 4 Tibbetts, C., Pettersson, U., Johansson, K. and Philipson, L. (1974) J. Virol. 13, 370-377 5 Berk, A.J. and Sharp, P.A. (1978) Cell 14, 694-712 6 Chow, L.T., Broker, T.R. and Lewis, J.B. (1979) J. Mol. Biol. 134, 265-303 "7 Kitchingman, G.R. and Westphal, H. (1980) J. Mol. Biol. 137, 23-48 8 Kitchingham, G.R., Lai, S.P. and Westphal, H.(1977) Proc. Natl. Acad. Sci. U.S.A. 74, 4392--4395 9 Spector, D.J., McGrogan, M. and Raskas, H.J. (1978) J. Mol. Biol. 126, 395-414 10 Van Ormondt, H., Maat, J., De Waard, A. and Van der Eb, A.J. (1978) Gene 4, 309-328 11 Van Ormondt, H., Maat, J. and Dijkema, R. (1980) gene 12, 63-76 12 Dijkema, R., Dekker, B.M.M. and Van Ormondt, H. (1980) Gene 9, 141-156 13 Sugisaki, H., Sugimoto, K., Takanami, M., Shiroki, K., Saito, I., Shimojo, H., Sawada, Y., Uemizu, Y., Useugi, S. and Fujinaga, K. (1980) Cell 20, 777-786 14 Perricaudet, M., Akusjiirvi, G., Virtonen, A. and Pettersson, U. (1979) Nature 281,694-696 15 Perricaudet, M., LeMoullec, J.M., Tiollais, P. and Pettersson, U. (1980) Nature 288, 174-176 16 Baker, G.C. and Ziff, E. (1979) Cold Spring Harbor Symp. Quant. Biol. 44, 415-428 17 Lewis, J.B., Atkins, J.F., Baum, P.R., Solem, R., Gesteland, R.F. and Anderson, C.W. (1976) Cell 7, 141-151 18 Harter, M.L. and Lewis, J.B. (1978) J. Virol. 26, 736-749

205 19 Halbert, D.N., Spector, D.S. and Raskas, H. (1979) J. Virol. 31,621-629 20 Brackmann, K.H., Green, M., Wold, W.S.M., Cartas, M., Matsuo, T.J. and Hashimoto, S. (1980) J. Biol. Chem. 255, 6772-6779 21 Lupker, J.H., Davis, A., Jochemsen, H. and Van der Eb, A.J. (1981) J. Virol. 37, 524-529 22 Lewis, J.B., Esche, H., Smart, J.E., Stillman, B.W., Harter, M.L. and Mathews, M.B. (1979) Cold Spring Harbor Symp. Quant. Biol. 44, 493-508 23 Green, M., Wold, W.S.M., Brackmann, K. and Cartas, M.A. (1979) Cold Spring Harbor Symp. Quant. Biol. 44, 457-470 24 Esche, H., Mathews, M.B. and Lewis, J.B. (1980) J. Mol. Biol. 142, 399-417 25 Schrier, P.L., Van den Elsen, P.J., Hertoghs, J.J.L. and Van der Eb, A.J. (1979) Virology 99, 372-385 26 Eggerding, F. and Raskas, H.J. (1978) J. Virol. 25,453-458 27 Alestrom, P., Akusj~irvi, G., Perricaudet, M., Mathews, M.B., Klessig, D.F. and Pettersson, U. (1980) Cell 19, 671-681 28 Perrson, H., Pettersson, U. and Mathews, M.B. (1978) Virology 90, 67-79 29 Perricaudet, M., LeMoullec, J.M. and Pettersson, U. (1980) Proc. Natl. Acad. Sci. U.S.A. 77, 3778-3782 30 Van der Eb, A.J., Van Ormondt, H., Schrier, P.J., Lupker, J.H., Jochemsen, H., Van den Elsen, P.J., Deleys, R.J., Maat, J., Van Beveren, C.P., Dijkema, R. and De Waard, A. (1979) Cold Spring Harbor Symp. Quant. Biol. 44, 383-400 31 Jochemsen, H., Daniels, G.S.G., Lupker, J.H. and Van der Eb, A.J. (1980) Virology 105, 551-563 32 Baker, C.C. and Ziff, E. (1981) J. Mol. Biol. 148, 189-222 33 Bos, J.L., Polder, L.J., Bernards, R., Schrier, P.L., Van den Elsen, P.J., Van der Eb, A.J. and Van Ormondt, H. (1981) Cell 27, 121-131 34 Anderson, C.W. and Lewis, J.B. (1980) Virology 104, 27-41 35 Sawada, Y. and Fujinaga, K. (1980) J. Virol. 36, 639-651 36 Kozak, M. (1980) Cell 22, 7-8 37 Frost, E. and Williams, J. (1978) Virology 91, 39-50 38 Harrison, T.J., Graham, F. and Williams, J.F. (1977) Virology 77, 319-329 39 Graham, F.G., Harrison, T.J. and Williams, J.F. (1978) Virology 86, 10-21 40 Berk, A.J., Lee, F., Harrison, T., Williams, J. and Sharp, P.A. (1979) Cell 17, 935-944 41 Jones, N. and Shenk, T. (1979) Cell 17, 683-689 42 Jones, N. and Shenk, T. (1979) Proc. Natl. Acad. Sci. U.S.A. 76, 3665-3669 43 Wold, W.S.M. and Green, M. (1979) J. Virol. 30, 297-310 44 Jochemsen, H., Hertoghs, J.J.L., Lupker, J.H., Davis, A. and Van der Eb, A.J. (1981) J. Virol. 37, 530-534 45 Lassam, N.J., Bayley, S.T. and Graham, F.L. (1978) Virology 87, 463-467 46 Hoeuwling, A., Van den Elsen, P.J. and Van der Eb, A.J. (1980) Virology 105, 537-550

47 Lassam, N.J., Bayley, S.T., Graham, F.L. and Branton, P.E. (1979) Nature (London) 277, 241-243 48 Branton, P.E., Lassam, N.J., Downney, J.F., Yee, S.-P., Graham, F.L., Mak, S. and Bayley, S.T. (1981) J. Virol. 37, 601-608 49 R~iska, K., Geis, A. and Frhring, B. (1979) Virology 99, 174-178 50 Ortin, J., Scheidtmann, K.H., Greenberg, R., Westphal, H. and Doerfier, W. (1976) J. Virol. 20, 355-372 51 Smiley, J.R. and Mak, S. (1978) J. Virol. 28, 227-239 52 Yoshida, K., Seikikawa, K. and Fujinaga, K. (1979) J. Virol. 32, 339-344 53 Siroki, K., Handa, H., Shimojo, H., Yano, S., Ojima, S. and Fujinaga, K. (1977) Virology 82, 462-471 54 Shiroki, K., Segawa, K. and Shimojo, H. (1980) Proc. Natl. Acad. Sci. U.S.A. 77, 2274-2278 55 Seikikawa, K., Shiroki, K., Shimojo, H., Ojima, S. and Fujinaga, K. (1978) Virology 88, 1-7 56 Dijkema, R., Dekker, B.M.M., Van der Feltz, J.M. and Van der Eb, A.J. (1979) J. Virol. 32, 943-950 57 Yano, S., Ojoma, S., Fujinaga, K., Shiroki, K. and Shimojo, H. (1977) Virology 82, 214-220 58 Segawa, K., Saito, I., Shiroki, K. and Shimojo, H. (1980) Virology 107, 61-70 59 Saito, I., Sato, J.-I., Handa, H., Shiroki, K. and Shimojo, H. (1981) Virology 114, 379-398 60 Chow, L.T., Lewis, J.B. and Broker, T.R. (1979) Cold Spring Harbor Symp. Quant. Biol. 44, 401-414 61 Flint, S.J. and Sharp, P.A. (1976) J. Mol. Biol. 106, 749-771 62 Galibert, F., Hrrissr, J. and Courtois, G. (1979) Gene 6, 1-22

63 Baker, C.C., Hrrissr, J., Courtois, G., Galibert, R. and Ziff, E. (1978) Cell 18, 569-580 64 Van der Vliet, P.C. and Levine, A.J. (1973) Nature New Biol. 246, 170-174 65 Russell, W.C. and Blair, G.E. (1977) J. Gen. Virol. 34, 19-35 66 Jeng, J.H., Wold, W.S.M., Sugawara, K., Gilead, Z. and Green, M. (1977) J. Virol. 22, 402-411 67 Levinson, A., Levine, A.J., Anderson, S., Osborn, M., Rosenwirth, B. and Weber, K. (1976) Cell 7, 575-584 68 Axelrod, N. (1978) Virology 87, 366-683 69 Fowlkes, D.M., Lord, S.T., Linn~, T., Pettersson, U. and Philipson, L. (1979) J. Mol. Biol. 132, 163-180 70 Van der Vliet, P.C. and Sussenbach, J.S. (1975) Virology 67, 415-426 71 Van der Vliet, P.C., Landberg, J. and Jansz, H.S. (1977) Virology 80, 98-110 72 Linn~, T. and Philipson, L. (1980) Eur. J. Biochem. 103, 259-270 73 Klessig, D.F. and Grodzicker, T. (1979) Cell 17, 957-966 74 Klessig, D.F. and Anderson, C.W. (1975) J. Virol. 16, 1650-1668 75 Klessig, D.F. and Chow, L.T. (1980) J. Mol. Biol. 139, 221 - 242 76 H~riss~, J., Courtois, G. and Galibert, F. (1980) Nucl. Acid Res. 8, 2173-2192

206 77 Harter, M.L., Shanmugam, G., Wold, W.S. and Green, M. (1976) J. Virol. 19, 232-242 78 Ross, S., Flint, S.J. and Levine, A.J. (1980) Virology 100, 419-432 79 Persson, H., Signas, C. and Philipson, L. (1979) J. Virol 29, 938-948 80 Ishibashi, M. and Maizel, J.V. (1974) Virology 58, 345-361 81 Chin, W.W. and Maizel, J.V. (1976) Virology 7, 518-580 82 Persson, H., Janssen, M. and Philipson, L. (1980) J. Mol. Biol. 136, 375-394 83 Persson, H., J~,rnvall, H. and Zabielski, J. (1980) Proc. Natl. Acad. Sci. U.S.A. 77, 6349-6353 84 Kvist, S., Ostberg, L., Persson, H., Philipson, L. and Peterson, P.A. (1978) Proc. Natl. Acad. Sci. U.S.A. 75, 5674-5678 85 Persson, H.., Oberg, B. and Philipson, L. (1978) J. Virol. 28, 119-139 86 Flint, S.J., Werwerka-Lutz, Y., Levine, A.S., Sambrook, J.and Sharp, P.A. (1975) J. Virol. 16, 662-671 87 Jones, W. and Shenk, T. (1978) Cell 13, 181-188 88 Craig, E.A. and Raskas, H.J. (1974) J. Virol. 14, 26-32 89 Steenberg, P.H., Maat, J., Van Ormondt, H. and Sussenbach, J.S. (1977) Nucl. Acid Res. 4, 4371-4389 90 Shinagawa, M., Padmanabhan, R.V. and Padmanabhan, R. (1980) Gene 9, 99-114 91 Galos, R., Williams, J., Binger, M.H. and Flint, S.J. (1979) Cell 17, 945-956 92 Williams, J.F., Galos, R.S., Binger, M.H. and Flint, S.J. (1979) Cold Spring Harbor Symp. Quant. Biol. 44, 353-366 93 Binger, M.-H., Rekosh, D. and Flint, S.J. (1982) J. Virol., in the press 94 Stillman, B.W., Lewis, J.B., Chow, L.T., Mathews, M.B. and Smart, J.E. (1981) Cell 23, 497-508 95 Wilkie, N.M., Ustacelebi, S. and Williams, J.F. (1973) Virology 51,499-503 96 Rekosh, D.M.K., Russell, W.C., Bellett, A.J.D. and Robinson, A.J. (1977) Cell 11,283-295 97 Lichy, J.H., Horwitz, M.S. and Horwitz, J. (1981) Proc. Natl. Acad. Sci. U.S.A. 78, 2678-2682 98 Lewis, J.B. and Mathews, M.B. (1980) Cell 21,303-313 99 Lewis, J.B., Atkins, J.F., Anderson, C.W., Baum, P.R. and Gesteland, R.F. (1975) Proc. Natl. Acad. Sci. U.S.A. 72, 1344-1348 100 Lewis, J.B., Anderson, C.W. and Atkins, J.R. (1977) Cell 12, 37-44 101 Chow, L.T. and Broker, T.R. (1978) Cell 15, 497-510 102 Shaw, A.R. and Ziff, E. (1980) Cell 22, 905-916 103 Nevins, J.R. and Wilson, M.C. (1981) Nature (London) 290, 113-118 104 Miller, I.S., Ricciardi, R.P., Roberts, B.E., Paterson, B.M. and Mathews, M.B. (1980) J. Mol. Biol. 142, 455-488 105 Berk, A.J. and Sharp, P.A. (1977) Cell 12, 45-55 106 Wilson, M.C., Fraser, N.W. and Darnell, J.E. (1979) Virology 94, 175-184 107 Weber, J., Jelinek, W. and Darnell, J.E. (1977) Cell 10, 611-616 108 Evans, R.M., Fraser, N., Ziff, E., Weber, J., Wilson, M. and Darnell, J.E. (1977) Cell 12, 733-739

109 Sehgal, P.B., Fraser, N.W. and Darnell, J.E. (1979) Virology 94, 185-191 110 Craig, E.A. and Raskas, H.J. (1976) Cell 8, 205-213 111 Craig, E.A., Sayavedra, M. and Raskas, H.J. (1977) Virology 77, 545-555 112 Bachenheimer, S.L. (1977) J. Virol. 22, 577-582 113 Blanchard, J.M., Weber, J., Jelinek, W. and Darnell, J.E. (1975) Proc. Natl. Acad. Sci. U.S.A. 75, 5344-5348 114 Goldenberg, C.J. and Raskas, H.J. (1979) Cell 16, 131-138 115" Weber, J., Blanchard, J.M., Ginsberg, H. and Darnell, J.E. (1980) J. Virol. 33, 287-291 116 Goldenberg, C.J. and Raskas, H.J. (1980) Biochemistry 19, 2719-2723 117 Solnick, D. (1981) Nature (London) 291,508-510 118 Khoury, G., Gruss, P., Dhar, R. and Lai, G.J. (1979) Cell 18, 85-92 119 Villareal, L.P., White, R.I. and Berg, P. (1979)J. Virol. 29, 209-219 120 Avvedimento, V.E., Vogeli, G., Yamada, Y., Maizel, J.V., Pastan, I. and De Crombrugghe, B. (1980) Cell 21,689-696 121 Lerner, M.R., Boyle, J.A., Mount, S.M., Wolin, S.L. and Steitz, J.A. (1980) Nature (London) 283, 220-224 122 Rogers, J. and Wall, R. (1980) Proc. Natl. Acad. Sci. U.S.A. 77, 1877-1879 123 Breathnach, R. and Chambon, P. (1981) Annu. Rev. Biochem. 50, 349-383 124 Yang, V.W. and Flint, S.J. (1979) J. Virol. 32, 394-407 125 Weing~rtner, B. and Keller, W. (1981) Proc. Natl. Acad. Sci. U.S.A. 78, 4092-4096 126 Goldenberg, C.J. and Raskas, H.J. (1981) Proc. Natl. Acad. Sci. U.S.A. 78, 5430-5434 127 Reddy, R. and Busch, H. (1981) The Cell Nucleus 8, 261 - 306 128 Yang, V.W., Lerner, M.R., Steitz, J.A. and Flint, S.J. (1981) Proc. Natl. Acad. Sci. U.S.A. 78, 1371-1375 129 Nevins, J.R., Ginsberg, H.S., Blanchard, J.M., Wilson, M.C. and Darnell, J.E. (1979) J. Virol. 32, 727-733 130 Yang, V.W., Binger, M.H. and Flint, S.J. (1980) J. Biol. Chem. 255, 2097-2108 131 Nevins, J.R. (1981) Cell 26, 213-220 132 Persson, H., Monstein, H.J., Akusjarvi, G. and Philipson, L. (1981) Cell 23, 485-496 133 G~nsberg, H.S., Ensinger, M.J., Kaufman, R.S., Mayer, A.J. and Lundholm, U. (1975) Cold Spring Harbor Symp. Quant. Biol. 39, 419-426 134 Carter, T.H. and Blanton, R.A. (1978) J. Virol. 25,664-674 135 Carter, T.H. and Blanton, R.A. (1978) J. Virol. 28,450-456 136 Blanton, R.A. and Carter, T.H. (1979) J. Virol. 29, 458-465 137 Nevins, J.R. and Winkler, J.J. (1980) Proc. Natl. Acad. Sci. U.S.A. 77, 1893--1897 138 Wilson, M.C., Nevins, J.R., Blanchard, J.M., Ginsberg, H.S. and Darnell, J.E. (1979) Cold Spring Harbor Symp. Quant. Biol. 44, 447-456 139 Berget, S.M., Flint, S.J., Williams, J.F. and Sharp, P.A. (1976) J. Virol. 19, 879-889 140 Ziff, E. and Evans, R. (1978) Cell 15, 1463-1475 141 Fraser, N., Nevins, J.R., Ziff, E. and Darnell, J.E. (1979) J. Mol. Biol. 129, 643-656

207 142 Manley, J.L., Sharp, P.A. and Gefter, M.L. (1979) Proc. Natl. Acad. Sci. U.S.A. 76, 160-164 143 Babich, A., Nevins, J.R. and Darnell, J.E. (1980) Nature 287, 246-248 144 Fraser, N. and Ziff, E. (1978) J. Mol. Biol. 124, 27-51 145 Ziff, E. and Fraser, N. (1978) J. Virol. 25, 897-906 146 McGrogan, M.and Raskas, H.J. (1978) Proc. Natl. Acad. Sci. U.S.A. 75, 625 147 Nevins, J.R. and Darnell, J.E. (1978) J. Virol. 25, 811-823 148 Nevins, J.R.and Darnell, J.E. (1978) Cell 15, 1477-1493 149 Berget, S.M., Moore, C. and Sharp, P.A. (1977) Proc. Natl. Acad. Sci. U.S.A. 74, 3171-3175 150 Chow, L.T., Gelinas, R.E., Broker, T.R. and Roberts, R.J. (1977) Cell 12, 1-8 151 Klessig, D.F. (1977) Cell 12, 9-21 152 Dunn, A.R. and Hassell, J.A. (1977) Cell 12, 23-36 153 Manley, J.L., Sharp, P.A. and Gefter, M.L. (1979) J. Mol. Biol. 135, 171-197 154 Ricciardi, R., Miller, J.S. and Roberts, B.E. (1979) Proc. Natl. Acad. Sci. U.S.A. 76, 4927-4931 155 Fraser, N., Sehgal, P.B. and Darnell, J.E. (1978) Nature 272, 590-593 156 Evans, R., Weber, J., Ziff, E. and Darnell, J.E. (1979) Nature 278, 267-370 157 Flint, S.J. (1981) Curr. Top. Microbiol. Immun. 93, 47-79 158 Persson, H., Mathiesen, B., Philipson, L. and Pettersson, U. (1979) Virology 93, 198-207 159 Zain, B.S. and Roberts, B.J. (1979) J. Mol. Biol. 131, 341-352 160 Zain, S., Sambrook, J., Roberts, R.J., Keller, W., Fried, M. and Dunn, A.R. (1979) Cell 16, 851-861 161 Akusj~a-vi, G. and Pettersson, U. (1979) Cell 16, 841-850 162 Akusj~irvi, G. and Pettersson, U. (1979) Proc. Natl. Acad. Sci. U.S.A. 75, 5822-5826 163 Dunn, A.R., Mathews, M.B., Chow, L.T., Sambrook, J. and Keller, W. (1978) Cell 15, 511-526 164 Berget, S.M. and Sharp, P.A. (1979) J. Mol. Biol. 129, 547-565 165 Ryfell, G.V., Wyler, T., Muellner, D.B. and Weber, R. (1980) Cell 19, 53-61 166 Tsai, M.-J., Ting, A.C., Nordstrom, J.L., Zimmer, W. and O'Malley, B.W. (1980) Cell 22, 219-230 167 Akusjarvi, G. and Persson, H. (1981) Nature (London) 292, 420-426 168 Carter, T.H. and Ginsberg, H.S. (1976) J. Virol. 18, 156166 169 Thomas, G.P. and Mathews, M.B. (1980) Cell 22, 523-533 170 Brison, O., K~dinger, C. and Chambon, P. (1979) J. Virol. 32, 91-97 171 Wu, C., Bingham, P.M., Livak, K., Holmgren, R.and Elgin, S.C.R. (1979) Cell 16, 797-806 172 Wu, C. (1980) Nature (London) 286, 797-806 173 Weil, P.A., Luse, D.S., Segal, J.and Roeder, R.G. (1979) Cell 18, 469-484 174 Hagenbtichle, O. and Schibler, U. (1981) Proc. Natl. Acad. Sci. U.S.A. 78, 2283-2286 175 Lee, D.C. and Roeder, R.G. (1981) Mol. Cell Biol. 1, 635-651

176 Fire, A., Baker, C.C., Manley, J.L., Ziff, E.G. and Sharp, P.A. (1981) J. Virol. 40, 703-719 177 Pribnow, D. (1979) Biological Regulation and Development, Vol. 1, pp. 219-277, Plenum Press, New York 178 Manley, J.L., Fire, A., Campo, A., Sharp, P.A. and Gefter, M.L. (1980) Proc. Natl. Acad. Sci. U.S.A. 77, 3855-3859 179 Corden, J., Wasylyk, B., Buchwalder, A., Sassone-Carsi, P., K&tinger, C. and Chambon, P. (1980) Science 209, 1406-1414 180 Hu, S. and Manley, J.L. (1981) Proc. Natl. Acad. Sci. U.S.A. 78, 820-824 181 Wasylyck, B., K6dinger, C., Corden, J., Brison, O. and Chambon, P. (1980) Nature (London) 285, 367-373 182 Tsai, S.T., Tsai, M.J. and O'Malley, B.W. (1981) Proc. Natl. Acad. Sci. U.S.A. 78, 879-883 183 Mathis, D.J. and Chambon, P. (1981) Nature (London) 290, 310-315 184 Wasylyck, B., Derbyshire, R., Guy, A., Molko, D., Roget, A., Teolue, R. and Chambon, P. (1980) Proc. Natl. Acad. Sci. U.S.A. 77, 7024-7028 185 Talkington, C.A., Nishioka, Y. and Leder, P. (1980) Proc. Natl. Acad. Sci. U.S.A. 77, 7132-7136 186 Gluzman, Y., Sambrook, J.F. and Frisque, R.J. (1980) Proc. Natl. Acad. Sci. U.S.A. 77, 3898-3902 187 Grosschedl, R. and Birnstiel, M.L. (1980) Proc. Natl. Acad. Sci. U.S.A. 77, 7102-7106 188 Benoist, C. and Chambon, P. (1980) Proc. Natl. Acad. Sci. U.S.A. 77, 3865-3869 189 Dierks, P., Van Ooyen, A., Mantel, N. and Weissmann, C. (1981) Proc. Natl. Acad. Sci. U.S.A. 78, 1411-1415 190 Ghosh, P.K., Lebowitz, P., Frisque, R.J. and Gluzman, Y. (1981) Proc. Natl. Acad. Sci. U.S.A. 78, 100-104 191 Gruss, P., Dhar, R. and Khoury, G. (1981) Proc. Natl. Acad. Sci. U.S.A. 78, 943-947 192 Benoist, C. and Chambon, P. (1981) Nature (London) 290, 304-310 193 Hashimoto, S. and Green, M. (1980) J. Biol. Chem. 255, 6780-6788 194 Rio, D., Robbins, A., Myers, R. and Tjian, R. (1980) Proc. Natl. Acad. Sci. U.S.A. 77, 5706-5710 195 Tate, V. and Philipson, L. (1979) Nucl. Acid Res. 6, 2769-2785 196 Sergent, A., Tigges, M.A. and Raskas, H.J. (1979) J. Virol. 29, 888-898 197 Brown, M. and Weber, J. (1980) Virology 107, 306-310 198 Ohe, K. and Weissman, S.M. (1970) Science 167, 879-881 199 Celma, M.L., Pan, J. and Weissman, S. (1977) J. Biol. Chem. 252, 9032-9042 200 Celma, M.L., Pan, J. and Weissman, S. (1977) J. Biol. Chem. 252, 9043-9046 201 Pan, J., Celma, M.L. and Weissman, S.M. (1977) J. Biol. Chem. 252, 9047-9054 202 Akusj~irvi, G., Mathews, M.B., Andersson, P., Vennstr/3m, B. and Pettersson, U. (1980) Proc. Natl. Acad. Sci. U.S.A. 77, 2424-2428 203 Harris, B. and Roeder, R.G. (1978) J. Biol. Chem. 253, 4120-4127

208 204 Mathews, M.B. and Pettersson, U. (1978) J. Mol. Biol. 119, 293-229 205 Mathews, M.B. (1975) Cell 6, 223-229 206 Pettersson, U. and Philipson, L. (1975) Cell 6, 1-4 207 StRlerlund, J., Pettersson, U., Vennstr6m, B., Philipson, L. and Mathews, M.B. (1976) Cell 7, 585-593 208 Price, R. and Penman, S. (1972) J. Mol. Biol. 70, 435-450 209 Weinmann, R., Raskas, H.J. and Roeder, R.G. (1974) Proc. Natl. Acad. Sci. U.S.A. 71, 3426-3438 210 Weinmann, R., Brendler, T.G., Raskas, H.J. and Roeder, R.G. (1976) Cell 7, 557-566 211 Vennstrt)m, B., Pettersson, U. and Philipson, L. (1978) Nucl. Acid Res. 5, 195-206 212 McGuire, P.M., Piatak, M. and Hodge, L.D. (1976) J. Mol. Biol. 101,379-396 213 Wu, G.J. (1978) Proc. Natl. Acad. Sci. U.S.A. 75, 21572179 214 Weil, P.A., Segal, J., Harris, B., Ng, S.Y. and Roeder, R.G. (1979) J. Biol. Chem. 254, 6163-6173 215 Fowlkes, D.M. and Shenk, T. (1980) Cell 22, 405-413

216 Guilifoyle, R. and Weinmann, R. (1981) Proc. Natl. Acad. Sci. U.S.A. 78, 3373-3382 217 Bogenhagen, D.F., Sakonju, S. and Brown, D.D. (1980) Cell 19, 27-35 218 Engelke, D.R., Ng, S.-Y., Shastry, B.S. and Roeder, R.G. (1980) Cell 19, 717-728 219 Sakonju, S., Bogenhagen, D.F. and Brown, D.D. (1980) Cell 19, 13-25 220 Koski, R.A., Clarkson, S.G., Kurjan, J., Hall, B.D. and Smith, M. (1980) Cell 22, 415-425 221 Hofstetter, H., Kressmann, A. and Birnstiel, M.L. (1981) Cell 24, 573-585 222 Kressmann, A., Hofstetter, H., DiCapua, E., Grosschedl, R. and Birnstiel, M.L. (1979) Nucl. Acids Res. 7, 17491763 223 De Franco, D., Schmidt, O. and SOil, D. (1980) Proc. Natl. Acad. Sci. U.S.A. 77, 3365-3368 224 Murray, V. and HoUiday, R. (1979) FEBS Left. 106, 5-7 225 Mathews, M.B. (1980) Nature (London) 285, 575-577