Cell, Vol . 4, 7 7 -93, February 1975, Copyright ©1975 by MIT
Units of Transcription and Translation : Sequence Components of Heterogeneous Nuclear RNA and Messenger RNA Benjamin Lewin Cell The MIT Press 28 Carleton Street Cambridge, Massachusetts 02142
Defining the units in which the eucaryotic genome is transcribed and translated is central to any analysis of eucaryotic gene expression . The relationship between heterogeneous nuclear RNA and messenger RNA raises the question of whether the primary transcript may be more complex than the sequence which is translated ; as I concluded last month in the first part of this review, kinetic analyses of these two RNA populations provide some suggestive indications but cannot prove whether the nuclear population includes messenger precursors that are much longer than mature cytoplasmic messengers (Lewin, 1975) . Here I discuss recent analyses of the sequence components present in hnRNA and mRNA and how they may be related to each other and to the organization of the genome . Functions of Nonrepetitive and Repetitive Sequences That the nonrepetitive component of DNA includes the structural genes has seemed an attractive hypothesis ever since Britten and Kohne (1968) demonstrated that eucaryotic DNA comprises not only sequences present in one copy in every haploid genome but also contains sequences repeated in many related but not identical copies (see Lewin, 1974a) . As the data of Table 1 show, when duplex formation of denatured DNA is followed by retention on hydroxyapatite columns under standard conditions, the principal component is nonrepetitive and much of the remainder contains moderately repeated sequences . (Only a small proportion usually is contained in the highly repetitive sequences that may form satellite DNAs and appear to have some structural role at the centromere .) The proposal of Britten and Davidson (1969) that the moderately repetitive component may provide control elements casts an interesting perspective from which to view recent observations that much of the eucaryotic genome may have a structure in which nonrepetitive sequences alternate with repetitive sequences . [See Lewin (1974b,c) for discussion of the characterization of control elements and structural genes .] From an analysis of the renaturation of labeled DNA preparations of various defined lengths with an excess standard preparation of unlabeled 450 nucleotide long fragments, Davidson et al . (1973a,b) and Graham et al . (1974) showed that about half of the genome comprises an alternation
Review
of nonrepetitive sequences of an average length of 1000 bases in the sea urchin Strongylocentrotus purpuratus and of 800 bases in Xenopus laevis with repetitive sequences on average about 300 bases long . A further 25% of each genome may have a similar, but longer interspersion pattern, with nonrepetitive sequences longer than 4000 bases separated by repetitive sequences of undefined length . Only some 6% of each genome constitutes what may be clusters of repetitive sequences . This means that some 75% of the repetitive sequences are interspersed with nonrepetitive sequences . Some 20% of each genome may represent largely or entirely nonrepetitive sequences . Similar observations have been made with the DNAs of the surf clam and oyster, in which at least 90% of the 2500 nucleotide long fragments contain nonrepetitive sequences interspersed with repetitive sequences (Davidson, personal communication) . The similarities in the interspersion patterns of Xenopus and sea urchin, two genomes only very distantly related in evolution, and observations consistent with similar interspersions in other eucaryotes, including mammals (Davidson et al ., 1974), immediately suggest that some functional significance underlies this arrangement . An obvious speculation is that the principal interspersed component of the genome includes nonrepetitive sequences of structural genes separated by shorter repetitive sequences that provide control elements (for review see Davidson and Britten, 1973 ; Lewin, 1974a) . The apparent proportion of repetitive components depends upon the conditions used in renaturation analysis ; not the least important implication of interspersion is that the proportion of the repetitive component may be much lower than suggested by the kinetics of renaturation on hydroxyapatite . As Table 1 reports, 41% of Xenopus and 46% of sea urchin DNA contains repetitive sequences when renaturation is followed by retention on hydroxyapatite of 450 nucleotide-long molecules containing duplex regions . Because the retained fraction may include molecules that are only partially duplex, comprising unrenatured single strand tails of nonrepetitive DNA attached to renatured repetitive sequences, the true proportion of repetitive sequences is lower than suggested by these values . At a fragment length of 330 nucleotides, only 30% of Xenopus DNA is retained on hydroxyapatite at the cot values characteristic of repetitive sequences ; and extrapolation to zero fragment length suggests a value of 25% for the repetitive component (Davidson et al ., 1973b) . Using conditions when single strands attached to renatured repetitive duplexes are destroyed, Graham et al . (1974) showed that only 27% of sea urchin DNA is repetitive . Potential coding sequences may therefore oc-
Cell 78
cupy 70-75% of the genome and the . putative control elements represent only 25-30%. A very different pattern of interspersion has been observed in the DNA of Drosophila melanogaster . Using electron microscopic characterization of renatured duplexes as well as the kinetics of renaturation, Manning, Schmid, and Davidson (1975) found that long stretches of repetitive DNA are interspersed with longer lengths of nonrepetitive DNA . Using DNA fragments of 4200, 11,600, or 17,400 nucleotide length, the types of duplex structure formed by renaturation at intermediate cot values (that is between moderately repetitive sequences)
were examined in the electron microscope ; the most common class of structure is a duplex with one single strand tail at each end, but three tailed and four tailed structures also are found . The lengths of duplex in the four tailed structures indicate the lengths of the reassociating repetitive elements ; these are more or less uniformly distributed over a range from 150 to 13,000 base pairs, with a number average of 5600 base pairs in the 17,400 nucleotide preparation . The attached single strand lengths give a minimum estimate for the length of the nonrepetitive sequences adjacent to the repetitive element, on average more than 13,000 base pairs . The rela-
Table 1 . Sequence Components of Eucaryotic Genomes Fractions Bound on Hydroxyapatite at
Species
Base Pairs in Haploid Genome
High Cot (Nonrepetitive) (%)
Middle Cot (Repetitive) Frequency (%)
Low Cot (Highly Repetitive) (%)
Reference
Bacterium
E . coli
4 .2 x 106
100
Slime mold
D . discoideum
3 .0 x 107
70
30
45
none reported
Firtel and Bonner (1972)
Nematode worm
C . elegans
8 .0 x 107
83
17
250
none reported
Sulston and Brenner (1974)
Fruit fly
D . melanogaster 1 .4 x 108
74
13
70
13
Manning et al . (1975)
Sea urchin
S . purpuratus
8 .0 x 108
50
27 19
20 160
none reported
Graham et al . (1974)
Mosquito
A. albopictus
9 .1 x 108
70
20
2,500
10
Spradling et al . (1974)
Snail
N . obsolete
2 .9 x 109
38
>12 >15
20 1,000
18
Davidson et al . (1971)
Toad
X . laevis
3 .1 x 109
54
10 31
20 1,600
6
Mouse
M . musculis
2 .7 x 109
70
15
10,000
10
Cow
B . domesticus
3 .2 x 109
60
38
60,000
2
Standard of Definition
Davidson et al . (1973a) Britten and Kohne (1968) Britten and Kohne (1968)
Values given for the number of base pairs in the haploid genome are those determined by chemical analysis ; very similar values may be calculated from the cot, of renaturation of the nonrepetitive component . The nucleotide content of the D . melanogaster genome has been estimated as 1 .4 x 108 by Laird (1971) and as 1 .8 x 108 by Rasch, Barr, and Rasch (1971) ; components almost identical to those shown above were resolved also in the renaturation analysis of Schachat and Hogness (1973) . Most renaturation experiments were carried out under standard conditions of DNA fragment length of 450 nucleotides, 0 .12 M phosphate buffer and 60°C temperature, with reaction monitored by retention of duplex molecules on hydroxyapatite . Results obtained under other conditions were converted before analysis of data to the equivalent cot values that would have been obtained under these standard conditions. The values given in this table for the proportion of repetitive DNA thus comprise the proportion of total DNA represented by the 450 nucleotide long fragments that bind to hydroxyapatite at middle cots ; because repetitive and nonrepetitive sequence elements are interspersed, these fragments may also contain unrenatured nonrepetitive sequences, so that the true proportion of the genome occupied by repetitive sequences may be appreciably less, and the proportion occupied by nonrepetitive sequences appreciably more, than the values given . For example, in Xenopus the repetitive DNA constitutes about 30% rather than the total of 41% given above . The repetitive DNA of each species renatures over a wider cot range than would be shown by a single component and thus appears to be heterogeneous in reiteration frequency ; the average frequencies of reiteration given in the table therefore do no more than establish a mid point for the distribution of frequencies . For example, although two repetitive components have been resolved in Xenopus the data do not preclude the interpretation that there is a continuous spectrum of reiteration frequencies from 20-6000 times ; and in mouse the frequencies vary from 103-105 repetitions. Highly repetitive components include satellite DNAs and consist of comparatively short sequences repeated many times, often in excess of 106 . The "zero binding DNA" that is retained on hydroxyapatite columns without incubation may be included in the values given for this component, for example for A . albopictus and N . obsoleta above.
Sequence Components of hnRNA and mRNA 79
tive frequencies with which the different structures occur are consistent with this model . And when excess amounts of 400 nucleotide long fragments enriched in repetitive DNA were associated with longer labeled tracer molecules, the fraction of tracer in the repetitive component increased with fragment length, from 21% at 4000 bases, to 30% at 8400 bases, reaching 36% at 12,400 bases . From these results it is possible to calculate an average length for the interspersed repetitive sequence of about 6000 bases . Most (more than 60%) and perhaps all of the repetitive sequences are interspersed, although the results do not exclude the possibility that some may be clustered to form long stretches of repetitive DNA . This model for the genome differs from that previously constructed by Wu, Hurn, and Bonner (1972) from electron microscopic analysis, where for technical reasons an apparent interspersion of short repetitive elements was inferred . The significance of the difference in interspersion pattern between Drosophila and other eucaryotes such as Xenopus or the sea urchin cannot be assessed until the functions of these components have been resolved ; but if interspersion represents an alternation of control and coding sequences, the operation of the control systems may be different in Drosophila . Drosophila is clearly the organism which offers the best prospects of correlating interspersion pattern with genetic function . If 13% of the DNA is moderately repetitive (see Table 1), there must be 1 .8 x 10 7 base pairs in this fraction, which could be divided into about 3000 sequences of the average interspersed repetitive length of 5600 base pairs . This number is very similar to the number of polytene bands, although at present it is not possible to say whether any significance should be attributed to this coincidence . When both the average length of the repeating element and the average degree of repetition are known, it is possible to calculate the number of different repetitive families, which is thus about 3000/70 = -40 . Analysis of the distribution in the genome of the individual members of each family may become possible with the technique developed by Wensink et al . (1974) . In principle this comprises the construction of hybrid plasmids, each of which possesses the sequences of a single chromosome fragment from Drosophila . After growth in E . coli and recovery of plasmid DNA, the Drosophila sequences present in the hybrids can be characterized by their kinetics of renaturation and localized by in situ hybridization . Of the three sequences so far analyzed, two are present in a single band at low or no repetition, and one is repeated about 90 times in the genome and is present in 15 bands as well as in the heterochromatin of the chromocenter . Extending
this analysis may allow the organization of related repetitive sequences to be determined . Questions of to what extent each class of component of the eucaryotic genome is represented in nuclear and cytoplasmic RNA and what functions their transcripts exercise can in principle be resolved by analysis of the sequences present in hnRNA and mRNA populations . The sequence comprising a primary transcript may be taken to represent the unit of transcription (although this need not necessarily include all the control elements present in DNA) ; although primary transcripts cannot be distinguished from possible intermediate precursors in the hnRNA population (Lewin, 1975), analysis of the sequence components present in the nuclear precursors to messenger RNA should reveal the organization of structural gene sequences in the genome . The sequences of the messenger templates in the polysomes define the unit of translation and the regions of the messengers that code for proteins directly represent the structural genes . What may be the role of any sequences present in a precursor hnRNA but not represented in its mature messenger is not revealed by such analysis, but an obvious speculation is that at least some may provide elements that control the stages of gene expression between transcription and translation . Sequence Components Represented in Messenger RNA In only a small number of cases has it been possible to isolate specific messenger templates whose protein products can be identified . It is therefore difficult to arrive at any firm conclusions on the structure of eucaryotic mRNA . But although obviously incomplete, the data of Table 2 suggest two speculations : messengers are not long enough to code for more than one protein and each must therefore have a single, monocistronic coding region ; and, even allowing for the length of poly(A) that may be present at the 3' terminus, each messenger is longer than needed simply to code for its proteinthe molecule must therefore include sequences additional to the coding region . Some tentative general support for these conclusions is lent by comparison of the distributions in HeLa cells of protein and messenger sizes, which appear fairly closely parallel ; and other data supporting this view of messenger structure have been presented by Davidson and Britten (1973) . When a single messenger can be isolated, it is possible to investigate the frequency with which its sequence is represented in the haploid genome . Although the limitations of hybridization reactions mean that the data usually are not precise enough to define the exact number of genes representing
Cell 80
a protein, Table 3 shows that the number is small ; it is probable that globin, ovalbumin, and fibroin proteins each are coded by single genes, a concept pleasingly in agreement with what genetic evidence is available in eucaryotes (for review see Lewin, 1974a,c) . Histone messengers clearly are derived from genes present in repetitive DNA, a duplication perhaps necessary because of the large amount of histone present in the genetic apparatus that must be synthesized in each cell cycle . No doubt other genes of this nature will be identified-the ribosomal protein genes, for example, are good candidates-
but it seems increasingly probable that repetition of structural genes represents an exception to a general rule that most genes exist in only one copy in the haploid genome . As the final intermediate between structural gene and protein, the messenger RNA of the polysomes defines the genes that are being expressed . The reiteration frequencies of the genes from which mRNA is transcribed can be determined by following the hybridization of labeled mRNA (or the cDNA derived from it by reverse transcription) in the presence of a large excess of unlabeled cellular DNA .
Table 2 . Lengths of Isolated Messenger RNAs Cell
Protein
Coding Length
mRNA Length
Poly(A) Length
Rabbit red blood cell
globin
430
550 610 650
40
Labrie (1969); Hunt (1973) Gould and Hamlyn (1973) Gaskill and Kabat (1971)
Mouse red blood cell
globin
40, 60, 100
Gorski et al . (1974)
Duck red blood cell
globin
100
Bishop, Rosbash, and Evans (1974)
Mouse myeloma
light Ig
660
Reference
1200 1250 1300
200
Stavnezer et al . (1974) Brownlee et al. (1973) Honjo et al . (1974)
Stevens and Williamson (1973)
Mouse myeloma
heavy Ig
1350
1800
150-200
Chick oviduct
ovalbumin
1164
1670 2640
not known
Calf lens
a A2-crystallin
520
1460
200
Calf lens
S-crystallin
1260
2000
not known
Zelenka and Piatigorsky (1974)
Haines, Carey, and Palmiter (1974) Berns, Janssen, and Bloemendaal (1974)
Bombyx mori silk gland
fibroin
14,000
16,000
100
Lizardi, Williamson, and Brown (1975)
Lytech i nus pictus (sea urchin)
histone f2al-
310
370-400
none
Grunstein et al . (1973)
HeLa
Molecules <1000 1000-2000 >2000
<1400 1400-3000 >3000
150-200 150-200 150-200
Davidson and Britten (1973)
<2100 >2100
<2200 >2200
150-200 150-200
50% total 25% total 25% total Mass 50% total 50% total
Coding lengths are the number of nucleotides required to specify each protein, estimated from its number of amino acids or molecular weight . The lengths of the mRNAs are those determined experimentally ; where more than one value is shown, each represents an independent determination . The length of poly(A) on a messenger is not constant but declines with age ; thus the apparent length depends on whether determined by steady state or pulse labeling and this explains the variation in measured globin mRNA poly(A) lengths . Not known indicates that poly(A) is present but its length has not been determined . The distribution of HeLa protein and messenger sizes is only approximate; an estimate of the number of molecules in each size class suggests a median coding length in mRNA of about 1200 nucleotides [1400 less the poly(A) content] and an estimate of the mass of protein or mRNA in each size class suggests a number-average molecular weight for the coding length of 2000 [that is 2200 less the poly(A) content] . The uses of median and number-average molecular weights are discussed in Table 6 .
Sequence Components of hnRNA and mRNA 81
If the excess of DNA is sufficiently large, usually with 102 more copies of a given sequence in DNA than in RNA, hybridization of the RNA depends only on the frequency with which the sequence is represented in DNA, that is, corresponds to the cot applied to the DNA . As Table 4 demonstrates, such experiments indicate that most, in some species virtually all, of the mRNA hybridizes at high cot values close to those characteristic of nonrepetitive DNA ; this confirms the impression that most structural genes reside in the nonrepetitive DNA component . A smaller proportion of the mRNA sequences may represent transcripts of the repetitive fraction ; it is then necessary to determine whether the repetitive transcripts comprise a separate population of molecules or represent sequences that are covalently linked to the nonrepetitive transcripts . The messengers of mammalian cells appear to be derived largely, but not entirely, from the nonrepetitive component of the genome . Estimates vary for the proportion of RNA molecules that represents transcripts of repetitive DNA sequences . It is important to note that in many of these experiments the mRNA was purified by a step such as retention to oligo(dT)-cellulose and therefore represents only poly(A)-containing molecules ; such experiments yield no information about the reiteration frequency of genes represented in any fraction of poly(A) messengers . Since, however, those experiments
which have been performed with total messenger fractions have given results very similar to those with poly(A)-containing messengers, there is no reason to suppose that the general properties of any poly(A)- mRNA population should be dissimilar from those of the poly(A)+ population . In both HeLa cells and rat myoblasts, the minor fraction of mRNA that hybridizes at low cot values appears to represent a population of molecules separate from the nonrepetitive sequence transcripts . That is, these messengers represent transcripts of genes residing in the repetitive component of the genome . That the mRNA hybridizing at low cot in the experiments of Klein et al . (1974) is derived from repetitive sequences and does not represent transcripts present in great abundance (which would render the excess of DNA insufficient) was confirmed by the effect of increasing the DNA :RNA ratio from 5 x 103 to 5 x 104 ; as Table 4 shows, this has only a small effect in increasing the amount of RNA hybridized at low cot . That the repetitive and nonrepetitive sequences in HeLa cell poly(A)-containing mRNA represent separate populations, and are not covalently linked, is suggested by the effect of treating the hybrid duplexes with RNAase before they are assayed . This treatment does not reduce the amount of RNA in the repetitive fraction, showing that the hybrids contain no attached single strand tails of nonrepetitive tran-
Table 3 . Reiteration Frequencies of Genes for Isolated Messengers Cell
Excess Messenger Reaction Components Ratio
cDNA Length
Duck erythroblast
globin
excess cDNA x DNA varying
249
Duck erythroblast
globin
cDNA x excess DNA
1 .2 x 105 450
cot, = 200 nonrepetitive cot, = 500
2-3 a 2-3 /3
Bishop and Rosbash (1973)
Mouse reticulocyte
globin
cDNA x excess DNA
-10 7
cot s, = 800 nonrepetitive cot % same
1 a 1 /3
Harrison et al . (1974)
Chick oviduct
ovalbumin cDNA x excess DNA
cot, = 480 nonrepetitive cot,,, = 660
1
Harris et al . (1973)
Bombyx mori silk gland
fibroin
excess mRNA x DNA
saturation at 0 .0022% DNA
<3
Suzuki, Gage, and Brown (1972)
Psammechinus milaris (sea urchin)
histones
mRNA x excess DNA 105
cot s, = 2 .5 1200 E . coli cot, = 15 .9, for DNA each 200 times less complex histone
1 .5 x 107
330
Reaction Parameter
Gene Number References
analytical complexity = 633 2-3 for Bishop and Freeman physical complexity = 249 a + /1 (1973)
Weinberg et al . (1972)
The globin mRNA used to prepare cDNA in these experiments included both a and /3 sequences, so that comparison of its analytical complexity with physical length gives the number of genes coding for both proteins ; measurements of the cot, of reaction give the number of genes coding for each protein . There is probably one gene for each globin chain ; in addition to the a and /3 chains, other globin genes are present in the eucaryotic genome and it is not known how much cross reaction may take place between the a and /3 sequences and these other genes . Although cDNA represents only the 3' region of the messenger sequence, the nonrepetitive nature of globin mRNA sequences means that the reverse transcript should react only with the DNA to which its messenger corresponds .
Cell 82
scripts-they must consist solely of sequences derived from repetitive DNA . By treating poly(A)-containing messengers of rat myoblasts with alkali, Campo and Bishop (1974) were able to ask whether repetitive sequence transcripts are restricted to any particular region of the molecule-for the 3' ends of the broken molecules contain poly(A) and so can be separated by their retention on oligo(dT)-cellulose . The hybridization profiles of the original preparation and of preparations derived from molecules cleaved an average of one or three times with alkali were identical . Repetitive sequence transcripts thus cannot be clustered at either end of messenger molecules whose remaining sequences are nonrepetitive . Alternative explanations of this result are that repetitive and non repetitive sequence transcripts form separate populations of mRNA or that they form a single population in which the messengers contain regularly interspersed sequences of each class . These were distinguished by experiments utilizing a preparation of unlabeled repetitive DNA which was denatured and linked to sepharose. Some 13-18% of a labeled mRNA population binds to this DNA preparation . When recovered, it shows less than one break in each molecule and displays a single transition at a low cot s when hybridized in DNA excess . The mRNA which does not bind to the DNA-sepharose also displays a single transition when hybridized in excess DNA, but at a high cot,,, close to that typical of nonrepetitive DNA . The messenger population can thus be separated into two fractions, one including repetitive sequence transcripts and one representing nonrepetitive sequence transcripts . More accurate estimates of the reiteration frequencies of the genes represented in the two populations can be made from experiments utilizing cDNA . These reverse transcripts, about 500 bases in length, represent the 3' terminal parts of the messengers . About 15% of the cDNA hybridizes at low cot ; and a further 55% reacts at high cot . When the cDNA preparation is separated into fractions that bind or fail to bind to DNA-sepharose, 10% falls into the repetitive class and 90% does not bind . When hybridized in excess DNA, the bound fraction shows a single transition with a low cot, ; and almost all this fraction is bound at completion . The unbound fraction anneals in DNA excess with the kinetics characteristic of nonrepetitive DNA, about 70% forming hybrids at the highest cot used (that is about 60% of the total cDNA preparation) . These results imply also that the repetitive and nonrepetitive sequences present in 3' terminal regions of mRNA are generally representative of the population ; however, the repetitive mRNA component has a cot, (about 10) slightly lower than that of cDNA
(about 20), although the RNA-DNA hybridization should proceed less rapidly than the DNA-DNA renaturation, giving the mRNA the higher cot,, . This may be due to some preferential copying of moderately repetitive transcripts into cDNA (Bishop, personal communication) . The messengers isolated from the polysomes of sea urchin gastrulae by Goldberg et al . (1973) constitute an extreme case where all the sequences appear to be derived from the non repetitive component of the genome . This conclusion has been confirmed by an experiment in which Galau, Britten, and Davidson (1974) isolated the DNA that hybridizes with mRNA and showed that its renaturation kinetics are those of nonrepetitive DNA . All the genes expressed at this stage of embryonic development must therefore be present in only one copy in the haploid genome . Although repetitive sequences lie adjacent to nonrepetitive sequences in the sea urchin genome, they therefore fail to be represented in the messenger population, a situation that must prevail also in the HeLa cell and rat myoblast, where nonrepetitive sequence transcripts comprise the major part of the messenger population . An alternative model for messenger RNA, which postulates a structure in which most of the molecule is derived from nonrepetitive DNA but a short sequence at the 5' end represents a repetitive sequence, has been proposed in two systems . Crippa, Meza, and Dona (1973) reported that repetitive "tags" of 50-60 nucleotides are present at the 5' ends of Xenopus neurula messengers whose remaining sequences are nonrepetitive in origin ; and by comparing the lengths of nuclear precursors and cytoplasmic messengers in growing cells of Dictyostelium discoideum, Lodish, Firtel, and Jacobson (1973) formulated a model in which most but not all of a 5' terminal repetitive sequence is lost during maturation . These data do not permit an unambiguous conclusion to be drawn that repetitive sequences always lie at the 5' ends of messengers of nonrepetitive sequences, however, and this model can be excluded for the sea urchin and HeLa messengers, where repetitive tags of more than 20-50 nucleotides would have been detected under the conditions used for hydroxyapatite retention . Extent and Rate of Reaction In RNA-Driven Hybridization Calculation of the proportion of the messenger population that is derived from each genome component is difficult because hybridization reactions in DNA excess do not proceed to completion for the RNA component ; one problem in driving the reaction is that it is not always possible to obtain RNA highly labeled enough to allow a sufficient excess
Sequence Components of hnRNA and mRNA 83
of DNA to be attained . Table 4 shows that in most experiments only 40-50% of the input RNA is able to hybridize . In experiments utilizing cDNA, up to 70-80% of the labeled minor component may be able to react . Another problem in interpreting these data is that the intrinsic rate of hybridization of RNA with DNA may be less than the rate of renaturation ; since estimates vary for the extent to which hybridization is retarded relative to renaturation, no single correction factor has been calculated that can be applied to these results . Apparent cot1,2 values for transitions in RNA-DNA hybridization may therefore be in error by a factor which depends upon the rate of retardation, apparently very small in the experiments of Hutton and Wetmur (1973), about two fold in those of Melli et al . (1971), and about 3-5 fold
in those of Davidson et al . (1975) . From the retardation of the reaction between sea urchin mRNA and the nonrepetitive DNA from which it is entirely derived, Davidson et al . (1975) calculated that the reaction does not follow the second-order kinetics previously attributed to it but is non second order ; the equation which they fitted to the data suggests that the extent of reaction at termination is greater than that produced by the second-order curve, for example about 50% instead of close to 40% in the first sea urchin experiment shown in Table 4 . Both the proportion of each sequence component represented in RNA and the cot1,2 that characterizes its transition are therefore only rather approximate, as noted in Table 4 . The proportions given in this table represent original results and
Table 4 . Reiteration Frequencies of Sequences Represented in Messenger RNA Populations
Cell
Source of mRNA
HeLa
polysomal poly(A)+
Nonrepetitive (%)
Repetitive
% Reaction RNA
DNA
DNA excess
(%)
Frequency
25
15
103
40
90
>105
polysomal poly(A)+
25
6
103
31
81
5 x 103
polysomal poly(A)+
29
7
103
36
81
5 x 104
cDNA 500 bases long
>70
10
103
>80
90
>105
polysomal poly(A)+ polysomal poly(A)+ cytoplasmic poly(A)+ actinomycin label cDNA 500 bases long
25
15
103
40
90
>105
32
18
103
50
80-90 3-5 x 104
60
10
102
70
80-90 1 .5 x 106
Mouse L cell
polysomal
33-55
7-15
104
40-70
100
Aedes culture
polysomal poly(A)+
45
8
102
53
85
>105
polysomal puromycin polysomal puromycin
38 45
none (<3%) none (<3%)
38 45
95 95
1-2 x 104 106
polysomal puromycin
65
none
65
95
2-8 X 10 4
cytoplasmic po ly(A)+
63
10
73
95
Rat myoblast
S . purpuratus gastrula
Dictyostelium
102
200-600
Reference Spradling et al . (1974) Klein et al . (1974) Klein et al . (1974) Bishop et al . (1974) Spradling et al. (1974) Campo and Bishop (1974) Campo and Bishop (1974) Greenberg and Perry (1971) Spradling et al, (1974) Goldberg et al, (1973) Goldberg et al . (1973) Davidson et al . (1975) Lodish et al (1973)
These experiments were performed by hybridization of labeled mRNA or cDNA derived from it with an excess of DNA . The source of the mRNA indicates whether the preparation was obtained from isolated polysomes or unfractionated cytoplasm, whether it was purified by reaction of poly(A) with oligo(dT) or by puromycin release from polysomes, or whether labeling was accomplished in the presence of low levels of actinomycin . Preparations of cDNA represent only the 3' parts of mRNAs for the length shown . The cot, values determined for many of these transitions are only approximate because of the incomplete nature of the reaction . The nonrepetitive messenger component hybridizes at a cot, close to that shown during the renaturation of nonrepetitive DNA ; the repetitive messenger component hybridizes more rapidly and a very approximate reiteration frequency can be calculated from its cot, . Because of the uncertainties in determination of these cot, values, these reiteration frequencies do no more than tentatively identify an order of magnitude for the average repetition frequency . The RNA and DNA columns under % reaction indicate the total extent of hybridization and renaturation, respectively ; since the hybridization data have not been corrected for the incomplete renaturation of DNA (except where 100 is shown in the % DNA reaction), the nonrepetitive sequences represented in mRNA are underestimated by from 10-20% . In all these experiments except the analysis of mouse L cell RNA, the extent of hybridization was judged from the resistance to ribonuclease of hybridized mRNA ; Tm data were used to judge the authenticity of L cell RNA-DNA hybrids . DNA excess is given w/w in all tables ; it is much less in terms of reacting sequences .
Cell 84
have not been normalized, since the assumptions made in different calculations may make results so treated inconsistent with each other . The transition of the slowly reacting component usually takes place at a high cot„2 value close to that of nonrepetitive DNA ; the simplest interpretation is to suppose that these transcripts are derived from sequences present in only one copy in each haploid genome, although the data usually do not exclude the possibility that these sequences may be repeated a small number of times . The transition of the more rapidly reacting component is usually characterized by an apparent cots/2 value two or more orders of magnitude lower ; the frequencies of repetition for this component which have been calculated in the table are therefore very approximate and intended only to give some indication of an order of magnitude . Because DNA renaturation does not proceed to completion in these experiments, in most cases the content of nonrepetitive transcripts is underestimated (by 10-20%) . After allowing for this correction, two views may be taken of the nature of the sequences of RNA that fail to hybridize . One is to suppose that all the repetitive sequence transcripts succeed in hybridizing with DNA but that only some of the nonrepetitive transcripts are able to react . According to this interpretation, all the unhybridized molecules are transcripts of nonrepetitive sequences . The argument that these sequences may be present in great abundance in the messenger population (more than 5000 copies each per cell), so that the excess of DNA is insufficient to ensure availability of sequences with which they can react, has been advanced by Goldberg et al . (1973) and Klein et al . (1974) . As Table 4 shows, at a DNA excess of 1-2 x 10 4 fold, the maximum amount of sea urchin mRNA driven into hybrid form is about 38% ; that at least some of the unhybridized molecules fail to react because they are present in high concentrations is suggested by the effect of increasing the DNA :RNA ratio to 10 5 -106 , when reaction advances to 45% of input RNA . Similar results are shown for HeLa cell mRNA at excess ratios of 5 x 103 and 5 x 104 . Klein et al . (1974) have observed also that retrieving the unhybridized RNA and reannealing with a fresh excess of DNA does not allow further sequences to react, suggesting that there are no unreacted repetitive sequence transcripts in the unhybridized fraction . According to this interpretation, the proportion of input RNA annealing at low cot represents the true extent of transcription of repetitive sequences ; the remaining molecules are derived entirely from nonrepetitive DNA . An alternative explanation has been proposed by Spradling et al . (1974), who observed that reactions with a clear excess of DNA, such as that of ribo-
somal RNA, may fail to proceed to completion ; and they noted also that large changes in the amount of excess DNA in their experiments did not drive an increased proportion of the mRNA into hybrid form . The idea that the unhybridized RNA includes both repetitive and nonrepetitive sequences, because some artifact of the system equally prevents their reactions, implies that the true extent of repetitive sequence transcription should be represented by the proportion of total hybridizing RNA that reacts at low cot . This would mean that from 2030% of HeLa or rat myoblast messengers would be repetitive, rather than the lower values relative to total input RNA given in the table . But whatever form of calculation is used, it is clear that most eucaryotic messengers are derived solely from nonrepetitive sequences, with a smaller number transcribed from genes that must be repeated in DNA . Sequences Represented in hnRNA Populations If the population of hnRNA molecules contains the precursors to mRNA, all the sequences found in cytoplasmic messengers should be present in nuclear RNA. In view of the discrepancy in size between hnRNA and mRNA, and given the extensive turnover of RNA in the nucleus, it is important to determine what proportion of the sequences represented in hnRNA is found in mRNA and what fraction may be restricted to the nucleus . The early experiments of Shearer and McCarthy (1967, 1970) with L cells and of Scherrer et al . (1970) with HeLa cells, in which an excess of RNA was annealed with filterbound DNA to reach a plateau of hybridization, achieved about five fold greater retention of nuclear RNA than cytoplasmic RNA . Because hybridization on membrane filters does not reach cot values great enough to allow reaction of nonrepetitive sequences, and given the difficulties of interpreting levels of hybridization with repetitive sequences, these experiments allow only the qualitative conclusion that only some of the repetitive sequences transcribed in the nucleus manage to enter the cytoplasm . Saturation of nonrepetitive DNA with total cell RNA under conditions allowing more extended reaction generally reaches levels of about 5% ; since transcription in the cell is presumably asymmetric, this corresponds to utilization of about 10% of the genome . Saturation with isolated messenger RNA, however, usually reaches little more than 1 %, corresponding to about 2% expression of the genome . (A summary of these data has been given by Davidson and Britten, 1973 .) More recent experiments allow the sequence complexities of hnRNA and mRNA to be analyzed in more detail and suggest that hnRNA is about 10 fold more complex than
Sequence Components of hnRNA and mRNA 85
Table 5 . Reiteration Frequencies of Sequences Represented in hnRNA Populations Repetitive
% Reaction
(%)
Frequency
RNA
25
15
103
40
Rat myoblast actinomycin >50S
25
15
103
Rat ascites
30
25
102
Mouse L cell actinomycin
27
13
Aedes
38 19 25
Cell
Source of labeled hnRNA
Nonrepetitive (%)
HeLa
actinomycin
>80S
actinomycin
S . purpuratus pulse label >40S pulse label >40S
DNA excess
Reference
90
>105
Spradling et al . (1974)
40
90
>105
Spradling et al . (1974)
55
90
104
40
15
102
8 8
102 102
DNA
4 x 105
Melli et al . (1971)
100
300
Greenberg and Perry (1971)
53
85
>105
27 33
90 90
300 1 .5 x 104
Spradling et al . (1974) Smith et al . (1974) Smith et al . (1974)
These experiments were performed by hybridization of labeled hnRNA with the excess of DNA shown . The source of the hnRNA indicates whether specific labeling was achieved by suppression of rRNA synthesis with actinomycin or by use of a pulse label and whether a specific size class of hnRNA was utilized . Other conditions are the same as described in Table 4 ; the transition on the cot curve representing repetitive sequences is again not well characterized so that estimates of the amount of this component and its reiteration frequency are only very approximate .
mRNA in sea urchin gastrulae (Smith et al ., 1974) and about four fold more complex in L cells (Hames and Perry, personal communication) . The difference presumably represents sequences that are transcribed within the nucleus and turn over there without ever reaching the cytoplasm (for review see Davidson and Britten, 1973 ; Lewin, 1974c) . The implication of this discrepancy is that selection of sequences for transport from nucleus to cytoplasm may be important in the control of gene expression ; and it becomes of obvious import to identify the sequences restricted to the nucleus and determine whether they represent protein-coding sequences that are not utilized or fulfill some altogether different function . Both repetitive and nonrepetitive sequence transcripts are found in the population of hnRNA molecules . As Table 5 reports, some 15% of the input hnRNA usually hybridizes at low cot values, in general corresponding to a frequency of reiteration rather similar to that represented in the repetitive component of messenger RNA . That a wide spectrum of repetitive sequences is represented in HeLa hnRNA is indicated by the observations of Darnell et al . (1970) and Pagoulatos and Darnell (1970) that in reactions with filter bound DNA, components can be identified which have rates of reaction varying over two orders of magnitude . Comparison between Table 4 and Table 5 suggests that the repetitive content of hnRNA is never less than that of mRNA and probably usually is greater . Since the hnRNA population contains many more sequences than are present in mRNA, nonrepetitive as well as repetitive sequence transcripts must be included in the fraction that turns over within the nucleus . The location of sequences in hnRNA can be determined by isolating 3' terminal fragments of various sizes and comparing their contents of some marker. The technique used by Molloy et al . (1974) was to cleave the poly(A)-containing hnRNA of
HeLa cells randomly by alkaline hydrolysis ; the poly(A)-containing fragments released by this treatment represent the 3' parts of the broken molecules . Small 3' terminal fragments, of about 3000 nucleotides, show the same initial rate of hybridization with filter-bound DNA as that displayed by mRNA ; longer fragments hybridize more rapidly, close to the initial rate characteristic of intact hnRNA . This is consistent with a model for hnRNA in which repetitive sequences are rare in the potential messenger sequences at the 3' ends, but are interspersed with nonrepetitive sequences in the parts of the molecule on the 5' side of the messenger sequence . Some of these repetitive sequences may form regions of secondary structure, for Jelinek and Darnell (1972) and Jelinek et al . (1974) showed that duplex fragments recovered after ribonuclease treatment behave as repetitive sequence transcripts . In the sea urchin gastrula embryo, in which messenger RNA is virtually entirely derived from the nonrepetitive component of the genome, most of the hnRNA molecules contain a repetitive sequence . After hybridizing labeled hnRNA to a cot of 40 in excess DNA, Smith et al . (1974) isolated the duplex hybrids on hydroxyapatite . Some 23% of the hnRNA was retained . After treatment with ribonuclease, however, only some 8% of the hnRNA remained in hybrid form ; only these sequences therefore represent RNA-DNA duplex regions and the remaining sequences that are retained in the absence of RNAase treatment must represent single strands that remain attached to the duplex hybrid regions . This conclusion was confirmed by recovering the RNA bound to DNA at a cot of 13 ; in a rehybridization, all this RNA bound again at low cot and extension of the reaction to a cot of 4000 allowed more than half of the sequences to bind, showing the presence of nonrepetitive sequences .
Cell 86
The size of the hnRNA in these experiments was reduced from an average of 3000 nucleotides before hybridization to 1100 nucleotides after . If the distribution of repetitive sequences in hnRNA were random (an assumption that cannot be verified but which is useful nonetheless for illustrative purposes as a basis of calculation), then the observation that 23% of the 1100 nucleotide molecules contain a repetitive sequence would imply that at least 50% of the 3000 long molecules must do so . An average of one third of the length of the 1100 nucleotide sequence is derived from repetitive DNA, that is, a total average length per molecule of about 300 bases . Most of the hnRNA anneals only at high cot values . At a cot of 4000, from 50-70% of the input hnRNA may enter duplex form ; a minimum of 70% and perhaps up to 90% of the sequences in hnRNA appear to be transcribed from nonrepetitive sequences . Because increases in DNA :RNA ratios cause a rise in the total level of hybridization similar to that seen with mRNA (as noted in Tables 4 and 5), the sequences of hnRNA also appear to fall into more than one abundance class ; a small number of sequences may provide many of the molecules and much of the sequence complexity may be contained in transcripts represented only infrequently in the population . Rough calculations show that the hnRNA sequences are some ten fold more complex than those of mRNA. The extent to which rat ascites cell hnRNA hybridizes with an excess of DNA after it has been fragmented to different lengths also suggests an inand nonrepetitive terspersion of repetitive sequences . Holmes and Bonner (1974) found that about 70% of the hnRNA molecules of the size class 15-30 x 10 3 daltons form hybrids at low cot that are retained on filters when there is no treatment with ribonuclease ; the use of ribonuclease, however, reduces retention to 12-24% . Most of these hnRNA molecules must therefore contain at least one repetitive sequence . Shearing the hnRNA preparation to a length of 10 x 103 nucleotides produced little reduction in retention in the absence of ribonuclease ; and 20-40% of the fragments of length 1200 nucleotides contain a repetitive sequence . This suggests that there may be extensive interspersion of repetitive and nonrepetitive sequences in this mammalian hnRNA . One point that should be stressed is that the metabolism of hnRNA may not be the same in all eucaryotes . In contrast with the apparently extensive processing of mammalian hnRNA, the primary transcript synthesized at the BR2 locus of Chironomus tentans is transported intact to the cytoplasm ; to what extent this may be typical of other loci in this and other Diptera, including Drosophila, is not
known (for review see Daneholt, 1975) . Only a comparatively small part of the molecule appears to be removed from nuclear messenger precursors in cells of the slime mold Dictyostelium (Lodish et al ., 1973) . Whether these observations are related to the smaller sizes of the genomes of these organisms can be a subject only for speculation . Identification of Messenger Sequences in hnRNA Since the molecules of hnRNA populations that contain the sequences of the corresponding messenger populations have not yet been isolated, it is not possible directly to compare precursor and product and to determine what sequences in addition to the messenger length may be part of the unit of transcription . By using the cDNA prepared by reverse transcription of mRNA, it is possible to test hnRNA molecules for their content of messenger sequences ; but the experiments published at present have been performed only with specific messenger sequences (see Lewin, 1975) . Using cDNA prepared from duck globin message, Imaizumi, Diggelmann, and Scherrer (1973) reported that hnRNA molecules of various very large sizes appear to contain globin messenger sequences ; but Macnaughton, Freeman, and Bishop (1974) have since identified a somewhat smaller hnRNA precursor, about three times the length of globin mRNA, as the predominant and perhaps sole precursor to mRNA in this system . By testing the ability of hnRNA molecules of chick oviduct to react with cDNA prepared from ovalbumin messenger, McKnight and Schimke (1974) showed that ovalbumin message sequences can be identified only in molecules of the single size class represented in the mRNA itself. Experiments of this nature with populations of hnRNA and cDNA prepared from polysomal mRNA, although technically more difficult, offer an approach that may allow some of the difficult questions about the relationship of hnRNA and mRNA to be answered . Another method for isolating sequences complementary to mRNA which can be used to probe hnRNA populations for messenger precursors has been developed by Harries and Perry (personal communication) . After hybridizing mRNA of mouse L cells with nonrepetitive DNA at relatively high ratios of mRNA to DNA, the hybridized DNA can be treated with S1 nuclease to remove unrenatured single strands and isolated as a sample of "mDNA" . The presence of complementary sequences in an RNA population can then be tested by using this mDNA preparation as the excess component in a DNA-driven hybridization . In the control reaction between mDNA and its mRNA template, 53% of the mRNA was hybridized ; relative to this level of reaction, different fractions
Sequence Components of hnRNA and mRNA 87
of nuclear RNA showed hybridization of 24 ± 5% for large (>45S) poly(A) - hnRNA, 27 ± 8% for small (<28S) poly(A) - hnRNA, 27 ± 5% for large poly(A)+ mRNA, and 56 ± 7% for small poly(A)+ hnRNA . The implication of these experiments is that both large and small classes of hnRNA include messenger sequences; about 25% of the poly(A)- and the large poly(A)+ hnRNA sequences correspond to messengers, but more than half of the small poly(A)+ hnRNA sequences represent messengers found in the cytoplasm . One of the most interesting aspects of these results is that poly(A) - as well as poly(A)+ hnRNA contains sequences of the poly(A)containing messengers of the polysomes . The small poly(A)+ hnRNA molecules that are enriched in messenger sequences might represent either a class of primary transcripts or comprise intermediates produced by the processing of large molecules . By hybridizing nonrepetitive DNA with an excess of hnRNA, Hames and Perry found that the nuclear RNA population comprises sequences present in two abundance classes . The major component of the hnRNA, about 90% of the RNA driving the hybridization reaction, saturates 1 .4% of the DNA, corresponding to 2 .0% expression of the genome ; and the minor component, 10% of the driver RNA, saturates 11 % of the DNA, corresponding to expression of 15% of the genome . This is analogous to the results obtained when mRNA is used to drive a hybridization reaction with nonrepetitive DNA, which also show the presence of two abundance classes . As noted in Table 6, the major mRNA component consists of a small number of sequences corresponding to 0 .55% of the genome ; and the minor component contains a larger number of sequences, corresponding to 2 .8% of the genome . The ratios of the complexities of the corresponding abundance classes of mRNA and hnRNA are 0 .55/ 2 .0 = 27% and 2 .8/15 = 19% ; so that on average the complexity of the messenger RNA population corresponds to about 23% of the complexity of the hnRNA population . This value is close to the proportion of sequences in hnRNA that reacts with the mDNA preparation for all classes except the messenger-enriched small poly(A)+ hnRNA . This lends some confidence to the conclusion that about one quarter of the sequences of hnRNA comprise messenger sequences, although of course it does not reveal their organization in hnRNA and the relationship between messenger and other sequences . A similar discrepancy between the complexities of hnRNA and mRNA has been observed in Friend mouse cells . Getz et al . (1975) found that when cDNA is transcribed from poly(A)+ hnRNA, only 10% hybridizes with an excess of unlabeled total mouse DNA at low cot ; the remaining 90% of the
hybridizing cDNA hybridizes with nonrepetitive DNA . Thus the 3' terminal sequences adjacent to the poly(A) are almost entirely derived from nonrepetitive DNA . When an excess of hnRNA was annealed with this cDNA, two transitions were displayed in the rot curve : about 25% of the RNA has a complexity corresponding to about 0 .01% of the genome and is present at an order of about 200 copies per nucleus ; about 75% of the RNA represents about 3% of the genome and is present in only about 6 copies per nucleus . This contrasts with the three abundance classes of mRNA reported in these cells by Birnie et al . (1974) and whose total sequence complexity is at the most 20% of that of the hnRNA, with the major (77%) component displaying a complexity less than 5% of that of the major hnRNA component. Organization of Message Sequences in DNA Identification of the sequences of the genome that represent structural genes can be achieved by hybridizing either messengers or the cDNA derived from them with their complements in DNA . Such experiments indicate that nonrepetitive messengercoding sequences lie adjacent in the sea urchin genome to repetitive sequences, a conclusion which has obvious implications for the significance of the interspersed organization of repetitive and nonrepetitive sequences observed in several eucaryotic genomes . The nonrepetitive sequences adjacent to repetitive sequences of DNA were isolated by Davidson et al . (1975) by a protocol in which fragments of average length 1800 nucleotides were renatured to low cot and those containing a duplex region (of minimum length 50 base pairs) retained on hydroxyapatite . The unbound fraction represents nonrepetitive sequences ; the bound molecules, which take the form of four tailed duplexes, include nonrepetitive sequences that lie within 1750 bases of the repetitive duplexes responsible for retention . After repeating this step a second time, the bound fraction was sheared to a length of 450 nucleotides and renatured to low cot to remove the repetitive sequences ; repeating this step a second time sees elution from the hydroxyapatite of the repeat-contiguous fraction of DNA, that is the nonrepetitive sequences that reside close to repetitive sequences . Not all the interspersed nonrepetitive sequences are represented equally in this fraction, since those very close to and those farther from the repetitive sequence are present at a lower frequency than those located at intermediate distances ; the range of concentrations, however, varies by less than an order of magnitude . The repeat-contiguous fraction of DNA renatures more rapidly than the bulk nonrepetitive DNA component, consistent with the conclusion that it repre-
Cell 88
sents a specific part of the genome . About 70% of the isolated repeat-contiguous DNA renatures as though corresponding to about one third of the total sea urchin nonrepetitive DNA . This constitutes the prevalent sequence complexity ; another 2-5% of the fraction renatures more slowly, presumably representing contamination with genome sequences more distant from repetitive elements, but because this is present in much reduced concentration it is the prevalent sequence complexity alone with which messenger RNA should hybridize . About 15% of the repeat-contiguous DNA contains repetitive sequences, but these also are irrelevant to the reaction with mRNA, for the messenger population is derived solely from the nonrepetitive component. The proportion of labeled mRNA that hybridizes with an excess of the repeat-contiguous DNA is 50 ± 5% ; this compares with the 65% that hybridizes with total nonrepetitive DNA (the third sea urchin experiment shown in Table 4) . Thus 50/65 = 77% of the hybridizing mRNA is represented in the repeat-contiguous sequences . It is possible to calculate that if structural genes were randomly positioned in nonrepetitive DNA, some 12-22% of the hybridizing RNA would react with the repeat contiguous fraction ; if every expressed structural gene lies adjacent to a repetitive sequence at one end this value becomes 80-100%, and is larger yet if repetitive sequences lie on both sides of the structural genes . Of the genes represented in the sea urchin gastrula mRNA population, most (from 80100%) must constitute nonrepetitive sequences adjacent to repetitive sequences . These genes are contained in the prevalent sequence complexity of repeat contiguous DNA, which represents only about one third of the total nonrepetitive DNA . Experiments suggesting a similar conclusion have been performed by Bishop (personal communication) with duck fibroblasts . By isolating nonrepetitive sequences that lie adjacent to repetitive sequences, a significant enrichment (5-6 fold) is achieved for sequences complementary to messenger RNA . This complements the analysis which Bishop and Freeman (1973) reported for the duck erythroblast . When duck DNA is renatured at low cot values, the duplex molecules (formed with a cot„2 of 30) contain single strand regions of nonrepetitive DNA . When retrieved and hybridized with cDNA, globin coding sequences are found in this fraction . Since the cDNA represents the 3' end of the messenger, this suggests that repetitive sequence elements (belonging to a family repeated 10-20 times) lie beyond the messenger sequence in the genome. A hint that a similar situation may prevail in the mouse is provided by the observation of Harrison et al . (1974) that DNA must be fragmented to less than 120 nucleotides before globin cDNA
anneals with the cot„2 expected for nonrepetitive sequences-with longer fragment lengths, more cDNA enters duplex molecules at much lower cot values, possibly because adjacent repetitive sequences promote early renaturation . Sequence Complexity of Messenger Populations When RNA is present in great excess and DNA is the minor labeled component, a hybridization reaction depends only upon the diversity of sequences in RNA; RNA-driven reactions may therefore be used to reveal the number of different sequences present in a messenger population, that is the number of genes which are represented . Both the saturation level (Table 6) and kinetics of hybridization (Table 7) have been used to follow the reaction . Saturation experiments are usually performed by hybridizing an excess of mRNA with nonrepetitive DNA; when saturation has been reached, the DNA taken into hybrid form should correspond to the genes under translation . Reaction of excess mRNA with labeled cDNA can be used to compare the rot„2 of reaction with that of a standard RNA of known complexity ; this gives the analytical complexity-that is the total length of unique sequences -of the RNA population, and dividing this value by the physical length of the molecules gives the number of different sequences present . A limitation inherent in RNA driven hybridization is that only RNA sequences derived from the nonrepetitive component of the genome can be assayed ; transcripts derived from repetitive DNA may anneal with many sequences in addition to those from which they were transcribed, making it impossible to estimate the number which represents them in the genome . Since the messenger population of sea urchin gastrulae is derived exclusively (within experimental limitations) from the nonrepetitive component of the genome, an accurate estimate can be made of its complexity through RNA-driven hybridization . Galau et al . (1974) followed the reaction of a 3-40 fold excess of mRNA with labeled nonrepetitive DNA . Hybridization has a rot l ,2 of about 70 mol sec/I and is complete by a rot value of 300 mol sec/I . That the reaction shown in Table 6 had proceeded to completion was indicated by an experiment in which unhybridized DNA was recovered and challenged with a fresh excess of RNA ; virtually none reacted . Allowing for incomplete renaturation of DNA under these conditions, 1 .35% of the nonrepetitive DNA is complementary to mRNA . Assuming that transcription is asymmetric in the cell, this represents 2 .7% of the nonrepetitive component of the genome or 2 .0% of the total genome sequences . This gives a total sequence complexity for the mRNA of 1 .6 x 107 nucleotides . (Since only 1% of
Sequence Components of hnRNA and mRNA 89
the labeled DNA hybridizes with the RNA, the true RNA :DNA excess ratio in these experiments is 3004000 .) Taking the number-average length of mRNA to be 2000 nucleotides, the number of genes represented in the polysomes must be about 8000 (± 20%) . The median molecular weight of 1200 nucleotides used by Galau et al . (1974) in this calculation yields a gene number of 14,000 . (The difference between the number-average and median is discussed in Table 6 ; I have used the number-average to calculate gene numbers .) Some of the messengers may have been synthesized in the oocyte prior to fertilization and stored for later use, others may be newly synthesized, but the importance of the estimate of messenger complexity is that it represents the number of genes that are translated into protein at the gastrula stage of embryogenesis . Most of this mRNA population appears to consist of a small number of sequences represented a large number of times . The rot, /2 expected of an RNA population of 1 .6 x 10 7 nucleotides is 5 .6 ; and the
ratio of this value to the observed rot,/2 gives the proportion of mRNA driving the reaction, 8 ± 4% . Since the kinetics of hybridization show only a single transition, this minor fraction of the messenger population must contain most of its sequence complexity ; because no transition is observed at lower rot values, the remaining 92% of the mRNA molecules must contain a relatively small number of sequences, with a complexity less than 5-10% of that observed for the driver (8%) population, each sequence represented many times . The total amount of messenger RNA in the gastrula embryo is about 7 x 10 10 nucleotides ; if 8% of these nucleotides are contained in the mRNA molecules driving the reaction, then 5 .6 x 10 9 nucleotides include a sequence of length 1 .6 x 107 nucleotides, so that there must be an average of 5 .6 x 10 9 / 1 .6 x 10 7 = 340 messengers representing each gene in the polysomes . The gastrula embryo contains about 600 cells, so this must imply some sequestration of messengers to different cells . No cal-
Table 6 . Sequence Complexity of Messenger RNA Populations Determined by Saturation of Nonrepetitive DNA with Excess mRNA
Cell
Source of mRNA
S . purpuratus polysomal, puromycin-released
% RNA Driver
% Available Nonrepetitive DNA Saturated
8 ± 4 92 t 4
1 .35 2 .0 8,000 cannot be estimated but is of low complexity
% Genome Expression
Gene Number
Copy Number 340
Reference Galau et al . (1974)
HeLa
cytoplasmic poly(A)+
50 50
1 .40 2 .0 30,000 10 less complex as shown by cDNA x mRNA hybridization
Bishop et al . (1974a)
Mouse L cell
polysomal poly(A)+
2-5 95-98
2 .0 0 .39
2 .8 0 .55
36,000 7,000
Harries and Perry (personal communication)
Drosophila culture (Schneider S3) cytoplasmic poly(A)+
1 .9
2 .7
2,200
Xenopus oocyte
0 .6-0 .9
1 .0
15,000
total
<1 40
lzquierdo and Bishop (personal communication) 104
Davidson and Hough (1971)
The level of saturation of nonrepetitive DNA in excess RNA is the proportion of available DNA that was bound, that is, the bound proportion corrected for any incomplete reaction when isolated nonrepetitive DNA is renatured . The per cent of genome expression assumes that transcription is asymmetric in the cell, so that only half of the DNA is available for reaction with mRNA and also takes into account the proportion of the genome that is nonrepetitive . These calculations all refer to the haploid genome ; the values taken for HeLa and L cells are those of man and mouse, respectively, although these cells are no longer diploid . Estimates for gene number assume a number-average molecular weight for mRNA of 2000 nucleotides (Bishop et al ., 1974) . The use of this value relies upon the argument that the complexity of the RNA population can be represented as EN,M,, where N ; is the number EN,M; and M, the length of the messenger RNA molecules in each size class . Since the number average molecular weight is defined as E N; the division of complexity by number average molecular weight gives EN,, the total number of different messengers (Rosbash, personal communication) . In nontechnical terms, the use of the number average takes into account the greater contribution made to the sequence complexity by the larger molecules . However, Davidson and Britten (1973) have advanced the argument that an alternative parameter, the median molecular weight, should be used to characterize the mRNA size distribution ; this is defined as the messenger length compared with which 50% of the molecules are shorter and 50% longer, about 1200 nucleotides for animal mRNA [after deducting the length of 200 nucleotides of poly(A) added posttranscriptionally] . See Table 2 for comparison of the number-average and median molecular weights . Use of the median molecular weight gives gene number estimates of 14,000 for sea urchin gastrula, 45,000 for HeLa, 60,000 and 12,000 for L cell, 3500 for Drosophila, and 26,000 for Xenopus oocyte messengers . Copy number estimates are calculated by dividing the total number of nucleotides in the appropriate fraction of mRNA by the estimated total sequence complexity of the fraction . This parameter is thus independent of the estimate used for messenger length . It represents only an average value for the number of copies of each messenger ; the copy number varies considerably for different genes, for example by some 50 fold in the Xenopus oocyte .
Cell 90
culations can yet be made of the complexity and hence number of copies of messengers representing the major (92%) class. The concept that messengers vary greatly in abundance in the population, however, is consistent with the data in Table 4 which show that increasing DNA :RNA ratios increase the extent of hybridization in DNA excess, presumably by making available more DNA sequences complementary to the RNA sequences present in high frequency . Kinetic analysis of the messengers of HeLa cells suggests the presence of three abundance classes of mRNA, representing in all about 30,000 genes . Bishop et al . (1974a) found that when HeLa cDNA is hybridized with an excess of the template mRNA, some 85% hybridizes by a rot of 100 . The three transitions given in Table 7 define the reaction : 20% of the cDNA hybridizes at a rot,/ of 0 .07 ; 25% hybridizes at a rot,/2 of 1 .2 ; and 45% hybridizes with a rot,/, of 60 . In control experiments with globin cDNA and mRNA, 90% of the cDNA is available for reaction at a rot,/, of 6 x 10- 4; this represents a sequence length of 1200 nucleotides . Comparison of the HeLa cDNA x mRNA transition values with the globin control (as detailed in Table 7) suggests
that 22% of the hybridizing cDNA includes only 15 sequences, 28% contains 340 sequences, and 50% represents 30,000 sequences of an average length of 2000 bases. Because the cDNA may contain some 10% of repetitive sequences (see Table 4), the values of the first two transitions may include hybridization due to this component and so may be less accurate than the third transition point . The amount of mRNA In a HeLa cell is about 1 .25 x 10 9 nucleotides, and so it is possible to calculate the number of copies present of each messenger . As Table 7 shows, there are 12,000 copies of each sequence in the most abundant class, 700 copies of each mRNA in the intermediate class, and only 13 copies of each messenger in the major class of sequence complexity . Of course, these are only average values and the distribution of messenger abundances in each class may be quite broad . The number of sequences present in the messenger population can also be calculated from hybridization between nonrepetitive labeled DNA and an excess of mRNA . By using the same rot value which is sufficient for complete hybridization between cDNA and excess mRNA, it is possible to be sure that the reaction has proceeded to completion . As
Table 7 . Sequence Complexity of Messenger RNA Populations Determined by Kinetics of Reaction of cDNA with Excess mRNA % cDNA Hybridized
Rot y, Observed
Rot, Corrected
Gene Copy Number Number
22 28 50
0 .07 1 .2 60
0 .015 0 .335 30
15 340 30,000
40 35
54 46
0 .2 20
0 .108 9 .2
110 9,000
Campo and Bishop (personal communication)
cytoplasmic poly(A)+
30 25 28
36 30 34
0 .0036 0 .15 2,7
4 150 2,700
Izquierdo and Bishop (personal communication)
Xenopus oocyte
total poly(A)+
75
100
10
10
10,000
Xenopus ovary
total poly(A)+
75
100
10
10
10,000
Rosbash and Bishop (personal communication)
Triturus ovary
total poly(A)+
75
100
10
10
10,000
Rosbash and Bishop (personal communication)
Cell
Source of mRNA
HeLa
polysomal poly(A)+
20 25 45
Rat myoblast
cytoplasmic poly(A)+
Drosophila culture
% in Transition
0 .01 0 .5 8
12,000 700 13
5 x 106
Reference Bishop et al. (1974a)
Rosbash and Bishop (personal communication)
The rot, values determined in these experiments are compared with the rot, of a control reaction between globin cDNA and an excess of globin mRNA (including sequences for both a and $ chains) which has a value of 6 x 10 -4 . If the combined length of sequences present in both globin messengers is assumed to be 1200 nucleotides, then a rot, of 10-3 will correspond to a 2000 nucleotide-long messenger . Dividing an experimental rot, by 10-3 thus gives the number of 2000 base sequences, within about 10% . (The rot, values observed for HeLa mRNA differ from those published to take account of the true mRNA content in the population .) The observed rot, values must be corrected for the presence of other components before they can be compared with the control ; the corrected rot, values are obtained by multiplying the observed rot, values by the proportion of total hybridizing cDNA represented in each transition . This assumes that the reaction has proceeded to completion so that any unreacted cDNA does not belong to any particular class; if the unreacted cDNA is part of the most complex (last hybridizing) class, the number of genes in this class will be slightly (about 10%) greater and the number of genes in the less complex classes will be slightly (about 10%) lower . When only one transition is observed, there is presumably an approximately even representation of all transcribed genes in messengers . These estimates for the number of expressed genes again assume the number-average molecular weight of 2000 bases ; and the copy number is calculated as described in Table 6 so that it depends only on the total complexity of the mRNA and the total mass present in the cell .
Sequence Components of hnRNA and mRNA 91
reported in Table 6, 1 .4% of the non repetitive DNA forms a hybrid at a rot of 350 ; allowing for incomplete reaction and asymmetric transcription, this represents 2 .8% of the nonrepetitive DNA or 2 .0% of the total genome . Since this corresponds to 6 x 107 bases of RNA, it could constitute about 30,000 sequences of 2000 bases; this reaction presumably is driven by the 50% component that contains most of the sequence complexity in the cDNA x excess mRNA reaction . Of course, since isolated nonrepetitive DNA was used, this estimate does not include any transcripts derived from repetitive sequences . Because the experiments with HeLa mRNA all utilized cytoplasmic sequences purified by reaction with oligo(dT)-cellulose, these estimates of sequence complexity and abundance apply only to the poly(A)-containing messengers ; in addition to this class, an appreciable number of messengers lacking poly(A) may also be present in the cell (see Lewin, 1975) . The use of the cytoplasmic poly(A)+ RNA fraction assumes that all the poly(A)-containing molecules are messengers ; but if some of this fraction were not associated with polysomes, the number of translated genes might be smaller than these estimates . The estimates for expressed gene numbers given in Tables 6 and 7 are not precise but probably are accurate within about 20% ; and of course they rest upon the assumptions that are detailed in the tables . The sequence complexity of the poly(A)containing messengers of mouse L cell polysomes is therefore very close to that of the HeLa mRNA population ; Harries and Perry (personal communication) - identified a total of more than 40,000 messengers, in two abundance classes ; a small fraction of the RNA includes most of its complexity and represents messengers present in very low abundance whereas the majority of the RNA represents a smaller number of sequences present in larger amounts (see Table 6) . The presence of two abundance classes also is a feature of the rat myoblast, although its total number of messengers is less, about 10,000 (see Table 7) . Implications of Estimates for Gene Number and Messenger Abundance If messenger abundances represent the number of proteins synthesized, the HeLa cell must depend largely on a comparatively small number of genes whose protein products, including both structural proteins and metabolic enzymes, are present in large amount . By far the great majority of genes represented in the polysomes are expressed at a much lower level ; these may represent proteins which are sufficient at much lower levels or which are unnecessary but whose control is perhaps "leaky" . Of course, these results do not imply that
all cells have an identical content of messenger sequences, so that the less abundant messengers might be present in some cells but not in others . A hint that this situation may prevail in the L cell is provided by the very low abundance of the most complex class; the more abundant class includes many more genes than the number in the abundant HeLa mRNA classes, with a correspondingly lower average abundance . Of course, a culture of these cells might display some variation in phenotype and genotype . And since HeLa and L cells are not diploid and have been adapted to grow in culture, it is far from clear that they require the same number of gene products as a normal cell of the organism ; it is therefore difficult to draw any further conclusions about utilization of the genome from these estimates . That variation in messenger abundance may be a feature of normal cells, however, is suggested by the complexity analysis of sea urchin messengers . It is reasonable to suppose that the majority of sequences (92%) that failed to react may represent a comparatively small number of sequences (less than 500) required by all cells and thus expressed extensively. The more complex 8% of mRNA driving the reaction may represent genes concerned with the specialization of cell types, which are therefore expressed in only some cells of the embryo, as is suggested by the discrepancy between the average number of messengers (340) and the number of cells (600) . It seems reasonable to suppose the gastrula embryo expresses only a small proportion of the genes utilized in toto through all stages of development, making the gene number of the sea urchin very much greater than 8000 . Some confidence that these gene number estimates are close to the number of expressed functions is lent by the analysis of Xenopus and Triturus ovaries and oocytes . The results of Rosbash and Bishop (personal communication) summarized in Table 7 show that ovary cells of both species contain about 10,000 messengers ; the number of messengers stored in the Xenopus oocyte is the same when measured by the kinetics of reaction, a result close to the 15,000 sequences suggested by the earlier saturation analysis of Davidson and Hough (1971) . During oogenesis in amphibia, lampbrush chromosomes are formed in which loops protruding from the axis appear to represent sites of synthesis of messenger RNA ; the number of loops in the oocyte is similar to the gene number estimated from the sequence complexity of messenger RNA, supporting the idea that complexity measurements represent the number of genes that are active in a cell . The total number of genes in the eucaryotic genome remains a paradox (see Lewin, 1974a,c) . The amount of nonrepetitive DNA is large enough to
Cell 92
code for an enormous number of proteins ; nonrepetitive Drosophila DNA could represent 50,000 genes of 2000 base pairs and nonrepetitive mammalian DNA could be divided into one million sequences of this length . And organisms that are related may possess widely different contents of DNA. Although the genomes of Triturus viridescens and Xenopus laevis differ in complexity by almost an order of magnitude, the number of genes represented in the poly(A)-containing RNA of the ovaries is the same, an indication that the size of the genome does not necessarily reflect the number of genes expressed, at least in this tissue (see Rosbash, Ford, and Bishop, 1974) . Equation of the number of bands in Drosophila polytene chromosomes with the number of genome functions has recently received support from estimates of the number of lethal complementation groups (see Lewin, 1974d) ; this would suggest a gene number of about 5000 . The discrepancy between the number of functional groups and the coding potential of nonrepetitive DNA suggests that much of it may not code for protein ; what function it may exercise remains a matter for speculation . The genetic estimate of 5000 genes is not too much greater than the estimate from sequences complexities of lzquierdo and Bishop (personal communication) that there are between 2000 and 3000 genes expressed in cultured Drosophila cells of the Schneider S3 line, again in several abundance classes (see Tables 6 and 7) . One explanation of this coincidence is that a large proportion of the total number of genes may be expressed (albeit at rather low levels) ; although of course it remains possible that the number of lethal complementation groups underestimates the number of genes . In view of these results, it would be especially interesting to analyze the sequence complexities of messenger RNA populations from Drosophila melanogaster cells of defined phenotypes . The sequence complexities of messenger RNA populations summarized in Table 6 correspond to about 2% of the genome . Yet some half of the sea urchin and Xenopus genomes may display the interspersion pattern of short (about 300 base) repetitive sequences and long (about 1000 base) nonrepetitive sequences that provides a putative alternation of control and protein-coding sequences . That 80% of the sea urchin gastrula mRNA hybridizes with the 30% of the nonrepetitive sequences that lie adjacent to (within 1750 nucleotides of) repetitive sequences is consistent with such a model . If interspersion reflects functional organization, the 2% of the genome represented in the messengers should therefore correspond to only a small part of the total protein-coding sequences ; this would imply that the number of genes must be much
greater than represented in the messengers of any individual cell phenotype (with the corollary that the basis for genetic estimates of gene number is mistaken) . If, however, the genes expressed in these cells represent an appreciable proportion of the total gene number, then at least a large proportion of the interspersed sequences cannot represent an alternation of control and coding functions. If a single messenger sequence forms only a short part of each primary transcript, it is not attractive to postulate that all the nonrepetitive interspersed sequences code for proteins, since each hnRNA would correspond to several units of interspersion . Reversing this argument, to say that each interspersed nonrepetitive sequence represents a structural gene is to imply that any large hnRNA precursor contains more than one messenger sequence, although of course it is possible to speculate that controls over processing may ensure that only a single one of such potential messenger sequences ever reaches the cytoplasm in any particular cell phenotype . If hnRNA molecules much larger than mRNA act as messenger precursors, it is tempting to speculate that at least some repetitive sequences act as elements that control processing ; this concept is consistent with the observation that most hnRNA molecules of the sea urchin gastrula or Hela cell contain a repetitive sequence transcript but that such elements are absent from messenger RNA, largely or entirely derived from the nonrepetitive sequences . One approach to the critical problem of defining the relationship between mRNA and genome, that is how structural genes are expressed, may be to define the structure of the precursor hnRNAs comprising the primary transcripts, so that the unit of transcription can be compared with both the unit of translation and with the organization of the genome . Acknowledgment I am extremely grateful to John Bishop, Eric Davidson, and Robert Perry for their very helpful comments upon this review and to Michael Rosbash for interesting discussions . References Berns, A., Janssen, P ., and Bloemendaal, H . (1974) . Biochem . Biophys . Res . Commun . 59, 1157-1164 . Birnie, G . D ., MacPhail, E ., Young, B. D ., Getz, M . J ., and Paul, J. (1974) . Cell Differentiation, in press . Bishop, J . 0 ., and Freeman, K . B . (1973) . Cold Spring Harbor Symp . Quant . Biol . 38, 707-716. Bishop, J . 0., and Rosbash, M . (1973) . Nature New Biol . 241, 204207 . Bishop, J . 0 ., Morton, J . C ., Rosbash, M ., and Richardson, M . (1974a) . Nature 241, 204-207 .
Sequence Components of hnRNA and mRNA 93
Bishop, J . 0 ., Rosbash, M ., and Evans, D . (1974b). J . Mol . Biol . 85, 75-86 .
Jelinek, W., and Darnell, J . E . (1972) . Proc . Nat. Acad . Sci . USA 69, 2537-2541 .
Britten, R . J ., and Davidson, E . H . (1969) . Science 165, 349-357.
Jelinek, W., Molloy, G ., Munoz, R . F ., Salditt, M ., and Darnell, J . E. (1974) . J . Mol . Biol . 82, 361-370 .
Britten, R . J ., and Kohne, D . E . (1968) . Science 161, 529-540 . Britten, R . J ., Graham, D . E ., and Neufeld, B . R . (1974) . In Methods in Enzymology, 29E, S . P . Colowick and N . O. Kaplan, eds. (New York : Academic Press), pp . 363-418 .
Klein, W. H ., Murphy, W ., Attardi, G ., Britten, R. J ., and Davidson, E . H . (1974) . Proc . Nat . Acad . Sci . USA 71, 1785-1789 .
Brownlee, G. G., Cartwright, E . M ., Cowan, N . J ., Jarvis, J . M ., and Milstein, C . (1973). Nature New Biol . 244, 236-239 .
Laird, C . D . (1971) . Chromosoma 32, 378-406 .
Campo, M . S ., and Bishop, J . O. (1974) . J . Mol . Biol . 90, 649-664 .
Lewin, B. (1974b) . Gene Expression, 1, Bacterial Genomes . (London and New York: John Wiley and Sons) .
Crippa, M ., Meza, I ., and Dina, D . (1973) . Cold Spring Harbor Symp . Quant . Biol . 38, 933-942 . Daneholt, B . (1975). Cell 4, 1-9 .
Labrie, F. (1969) . Nature 221, 1217-1222 . Lewin, B . (1974a) . Cell 1, 107-112 .
Lewin, B . (1974c) . Gene Expression, 2, Eucaryotic Chromosomes . (London and New York : John Wiley and Sons) .
Darnell, J . E ., Pagoulatos, G . N ., Lindberg, U ., and Balint, R . (1970) . Cold Spring Harbor Symp . Quant. Biol . 35, 555-560.
Lewin, B . (1974d). Nature 251, 373-375 .
Davidson, E . H ., and Britten, R. J . (1973) . Quart . Rev . Biol . 48, 565-613 .
Lizardi, P . M ., Williamson, R ., and Brown, D . D. (1975) . Cell 4, in press .
Davidson, E . H ., and Hough, B . R . (1971) . J . Mol . Biol . 56, 491-506 .
Lodish, H . F ., Firtel, R . A., and Jacobson, A . (1973) . Cold Spring Harbor Symp . Quant. Biol . 38, 899-914 .
Davidson, E . H ., Hough, B . R ., Chamberlin, M ., and Britten, R . J . (1971) . Develop. Biol . 25, 445-463 . Davidson, E . H ., Hough, B . R ., Amenson, C . S ., and Britten, R . J . (1973a) . J . Mol . Biol . 77, 1-24 . Davidson, E . H ., Graham, D . E ., Neufeld, B . R ., Chamberlin, M . E ., Amenson, C. S ., Hough, B . R ., and Britten, R . J . (1973b). Cold Spring Harbor Symp . Quant . Biol . 38, 295-301 . Davidson, E . H ., Hough, B . R ., Smith, M . J ., Graham, D . E ., Klein, W . H ., Galau, G . A ., Chamberlin, M . E ., and Britten, R . J . (1974) . In The Eucaryotic Chromosome, W . J . Peacock and R . D . Brock, eds . (Canberra : Australian National University Press), in press . Davidson, E . H ., Hough, B . R ., Klein, W . H ., and Britten, R . J . (1975) . Cell 4, in press . Firtel, R . A ., and Bonner, J . (1972). J . Mol . Biol . 66, 339-362. Galau, G . A ., Britten, R . J ., and Davidson, E . H . (1974) . Cell 2, 9-22 . Gaskill, P ., and Kabat, D . (1971) . Proc. Nat . Acad . Sci . USA 68, 72-75 . Getz, M . J ., Birnie, G . D ., Young, B . D ., MacPhail, E ., and Paul, J. (1975) . Cell 4, 121-129 .
Lewin, B . (1975) . Cell 4, 11-20 .
McKnight, G . S ., and Schimke, R . T . (1974). Proc. Nat . Acad . Sci . USA, 71, 4327-4331 . Macnaughton, M ., Freeman, K . B ., and Bishop, J . O . (1974) . Cell 1, 117-125 . Manning, J . E ., Schmid, C . W ., and Davidson, N . (1975). Cell 4, 141-155 . Melli, M ., Whitfield, C ., Rao, K . V ., Richardson, M ., and Bishop, J . O . (1971) . Nature New Biol . 231, 8-12 . Molloy, G . R ., Jelinek, W ., Salditt, M ., and Darnell, J . E . (1974) . Cell 1, 43-54 . Pagoulatos, G . N ., and Darnell, J . E . (1970) . J . Mol . Biol. 54, 517536 . Rasch, E . M ., Barr, H . J ., and Rasch, R . W . (1971) . Chromosoma 33, 1-18 . Rosbash, M ., Ford, P . J ., and Bishop, J . (1974) . Proc . Nat. Acad . Sci . USA 71, 3746-3750 . Schachat, F. H ., and Hogness, D . S . (1973) . Cold Spring Harbor Symp . Quant. Biol . 38, 371-381 .
Goldberg, R . B ., Galau, G . A ., Britten, R . J ., and Davidson, E . H . (1973) . Proc. Nat . Acad . Sci . USA 70, 3516-3520 .
Scherrer, K ., Spohr, G ., Granboulan, N ., Morel, C ., Grosclaude, J ., and Chezzi, C . (1970) . Cold Spring Harbor Symp . Quant . Biol . 35, 539-554 .
Gorski, J ., Morrison, M . R ., Merkel, C. G ., and Lingrel, J . B. (1974) . J . Mol . Biol . 86, 363-372 .
Shearer, R . W ., and McCarthy, B . J . (1967) . Biochemistry 6, 283289 .
Gould, H . J ., and Hamlyn, P . H . (1973) . FEBS Left. 30, 301-304 .
Shearer, R . W ., and McCarthy, B . J . (1970) . J . Cell . Physiol . 75, 97-106 .
Graham, D . E ., Neufeld, B . R ., Davidson, E . H ., and Britten, R . J . (1974) . Cell 1, 127-137 . Greenberg, J . R ., and Perry, R . P . (1971) . J . Cell Biol . 50, 774-787 .
Smith, M . J ., Hough, B . R ., Chamberlin, M . E ., and Davidson, E . H . (1974) . J . Mol . Biol . 85, 103-126 .
Grunstein, M ., Levy, S ., Schedl, P ., and Kedes, L. (1973). Cold Spring Harbor Symp . Quant . Biol . 38, 717-724 .
Spradling, A., Penman, S ., Campo, M . S ., and Bishop, J . O. (1974) . Cell 3, 23-30 .
Haines, M . E ., Carey, N . H ., and Palmiter, R . D . (1974) . Eur . J . Biochem . 43, 549-560 .
Stavnezer, J ., Huang, R . C . C ., Stavnezer, R ., and Bishop, M . (1974) . J . Mol . Biol . 88, 43-64 .
Harris, S . E ., Means, A . R ., Mitchell, W . M ., and O'Malley, B . W. (1973) . Proc . Nat . Acad . Sci . USA 70, 3776-3780 .
Stevens, R . H ., and Williamson, A . R . (1973) . Proc . Nat . Acad . Sci . USA 70, 1127-1131 .
Harrison, P . R ., Birnie, G . D ., Hell, A., Humphries, S ., Young, B . D ., and Paul, J . (1974) . J . Mol . Biol . 84, 539-554 .
Sulston, J . E ., and Brenner, S . (1974) . Genetics 77, 95-104 .
Holmes, D . S . and Bonner, J . (1974). Proc . Nat . Acad . Sci. USA 71, 1108-1112 . Honjo, T., Packman, S ., Swan, D ., Nau, M ., and Leder, P . (1974) . Proc . Nat . Acad . Sci . USA 71, 3659-3663 . Hunt, J . (1973) . Biochem . J . 131, 327-333 . Hutton, J . R ., and Wetmur, J . G . (1973) . J . Mol . Biol . 77, 495-500 . Imaizumi, T ., Diggelmann, H ., and Scherrer, K . (1973) . Proc. Nat . Acad . Sci . USA 70, 1122-1126 .
Suzuki, Y ., Gage, L . P ., and Brown, D . B . (1972) . J . Mol . Biol . 70, 637-656 . Weinberg, E . S ., Birnsteil, M . L ., Purdom, I . F ., and Williamson, R . (1972) . Nature 240, 225-228 . Wensink, P . C ., Finnegan, D . J ., Donelson, J . E ., and Hogness, D . S . (1974). Cell 3, 315-325 . Wu, J ., Hurn, J ., and Bonner, J . (1972) . J . Mol . Biol . 64, 211-220 . Zelenka, P ., and Piatigorsky, J . (1974) . Proc . Nat . Acad . Sci . USA 71, 1896-1900 .