The Evolution
of a Complex
Bert W. O’Malley, Two
thirds
of the
surrounding however,
natural
junctions.
the sequences
surrounding
the other
a primordial
of the ovomucoid
ovomucoid
were
Finally,
the positions
during
are unambiguous.
present
for functional the process
in the
gene
of the introns domains
by two
primordial
The three
has
P. Stein, and Anthony been
junction
contain
sequences.
separate
This analysis
intragenic gene
the ovomucoid and provide
birds
gene support on the
all exons four
and
of the
and thus
to the GT-AG
sequence
to examine
suggests
that
the theory manner
that
introns
genes
redundant;
sites
We
at either
compare
the
of the evolution
300
separate
sequences
are
gene
analyses
about
eucaryotic
intron
ovomucoid
sequence
diverged,
by which
rule.
theories
the present
Furthermore,
and mammals
the
introns
the splicing
conforms
duplications.
before
insight
including
no redundancies
protein
Gene
R. Means
surrounding
in all cases
the ovomucoid
suggest
million
gene
years
segments
were
of
evolved that ago. that
constructed
of evolution.
T
HE ORGANIZATION of eucaryotic genes has been an area of acute interest to us for several years. One particularly attractive model system for studying the structural organization of functionally related genes in a single tissue has been the hen oviduct.’ The chicken ovalbumin gene has been cloned and sequenced.*” More recently, the chicken ovomucoid gene was cloned and partially sequenced.’ ” We discovered that both of these oviduct genes exhibited a surprisingly complex structure; each gene contained seven nontranslated regions of DNA sequence (intervening sequences or introns) interspersed among eight genomic DNA regions (exons) that code for the mature polypeptide chains. A number of laboratories in addition to our own have identified the existence of these intervening sequences in diverse eucaryotic genes. ‘* *’ Even though the existence of these intervening sequences seems to be the rule rather than the exception in eucaryotic genes, their function and origin remain an intriguing mystery. The ovomucoid gene afforded us a unique opportunity to examine the raison d’etre of intervening sequences. We have sequenced most of the chicken ovomucoid gene, including all of the intron/exon junctions.” During the course of our investigation, the sequence of the secreted chicken ovomucoid protein was published.29 We are thus able to analyze the structure of the ovomucoid gene in light of the protein sequence. The results of this analysis, which we present here, have allowed us to propose a theory of the function of intervening sequences in the organization of the ovomucoid gene. This in turn has led us to
From the Department of Cell Biology. Baylor College of Me& tine, Houston, Texas, Division of Endocrinology, University of
Texas Health Science Center, Houston, Texas. Address reprint requests to Bert W. O’Malley. Department sf Cell Biology, Baylor College of Medicine. Houston. Texas 77030. @ (982 by Grune & Stratton. Inc. 0026-0495/82/3107~002$01.00/0
646
sequenced, sequences
introns
The splicing
ovomucoid
within
of proteins
gene
gene with
as well as the origin of intervening
introns code
introns
organization
ovomucoids from
ovomucoid
intron/exon
end of these three structural
chicken
all fourteen
Joseph
Eucaryotic
propose a mechanism eucaryotic genes.
for the evolution
of complex
‘The nucleotide sequence of ovomucoid mRNA was determined from two recombinant DNA plasmids. pOM 100 and pOMS02. (These recombinant plasmids were derived from the parent bacterial plasmid. pBR322; the OM refers to the fact that ovomucoid DNA sequences were inserted, and the three digit numbers simply refer to the chronological isolation sequence.) Preparation of the recombinant plasmid pOMlO0, which contains cDNA synthesized from ;I partially purified mRNA,,,,,. has been described.. This plasmid contained a 650 base pair (bp) DNA,,,,, insert. but lacked - I50 bp corresponding to the 5’ end of mRNA,,,,,. To obtain cDNA sequences containing the 5’ end of the mRNA sequence. a 204 bp fragment removed from the middle of the ovomucoid cDNA sequence in pOMlO0 by cutting with the restriction endonuclease Pst I. was purified and hybridized with ;t partially purified preparation of mRNA,,,,,. DNA complementary to the 5’ end of mRNA,,,>, was synthesized by reverse transcriptase in the absence of an oligo deoxythymidilate primer. The use of a sequence specific primer prevented the polymerization of cDNA from mRNAs other than mRNA,,,. Double-stranded cDNA,,, was prepared and cloned in the Pst I site of pBR322.” The largest resulting recombinant plasmid was designated pOM502. This plasmid contained :I DNA,,,, insert of 538 nucleotides, excluding the CC‘ tails at either end of the cDNA,, sequence. These two plasmids were sequenced, using the method of Maxam and Gilbert.‘” Together, they contained all 821 nucleotides of the cDNA,,,. The ovomucoid structural gene sequence determined from these plasmids is schematically presented in Fig. 1. The 82 I nucleotides that represent the mature ovomucoid mRNA can be divided into four regions. There is a 53
MetaboCsm.Vol.
31,
No
I iJulyl,
1982
647
EVOLUTION OF A COMPLEX EUCARYOTIC GENE
OVOMUCOIDSTRUCTURALGENE
SEQUENCE
82 1 nucleotides 558
I
“Ph
I
&II
SIGNAL
138
1 HPh I
Hph I
5’ LEADER
I
I UQA
.skI
SEQUENCE
PEPTIDE
OVOMUCOID 3’ NONCODING
SEQUENCE
PROTEIN
SEQUENCE
SEQUENCE
Fig. 1. The Ovomucoid Structural Gene Sequence. The lengths of various regions of the mRNA. in nucleotides, are given at the top of the figure. Region 1. the 5’ leader sequence, refers to the portion of the mRNA at the 5’ end that occurs before the AUG initiation signal and hence is not translated into a protein sequence. Region 2. the signal peptide sequence, is that portion of the mRNA which is translated into the signal peptide. the N-terminal portion of the protein which is clipped off during the secretion of the protein. Region 3, the ovomucoid protein sequence, is that portion of the mRNA that codes for the secreted protein sequence. Region 4, the 3’ noncoding sequence, is that portion of the mRNA that follows the UGA termination signal and hence is not translated into a protein sequence.
nucleotide 5’ leader sequence prior to the AUG initiator codon. This is followed by a 72 nucleotide signal peptide sequence, which is translated but removed from the secreted polypeptide during the secretion process.3’ The 558 nucleotides that code for the secreted ovomucoid are next, followed by the 138 nucleotides of 3’ noncoding sequence which are located after the UGA translation termination signal. A poly (A) tail of unspecified length follows this sequence. The cloned ovomucoid structural gene sequences were nick-translated with labeled nucleotides and used to isolate the ovomucoid gene from a library of chicken oviduct DNA fragments. The structural organization of the ovomucoid native gene was then determined by restriction endonuclease, electronmicroscopic mapping, and limited DNA sequence analysis. This physical map of the entire ovomucoid gene is shown in Fig. 2. The gene is 5.6 kilobases (kb) in length, with seven intervening sequences of various lengths interspersed among the eight coding (exon) segments. Remarkably, only 15% of the ovomucoid gene is represented in the mature mRNA.
These junctions demonstrate two different types of splicing sites: those with a redundant sequence at the adjacent junctions and those with no redundancy. There is an extensive terminally redundant sequence surrounding introns A and B and a very limited redundancy surrounding introns F and G. The splice point could occur at any position within the boxes and still yield the correct mature mRNA sequence. Terminally redundant sequences such as these have been found at all splicing sites examined prior to ovomucoid, although the redundant length varied from one to five nucleotides.32 However, the junctions at either end of three ovomucoid introns, C, D, and E, have no redundant sequences. Since the mRNA sequence is known” Structure of the Ovomucoid
71
Gene
AUG
v UGA
A
8
CDE
F
G1
Sequence Analysis of the Ovomucoid Gene
The DNA sequencing method of Maxam and Gilbert” was used to sequence more than 70% of the chicken ovomucoid gene, including all of the exons, the sequences surrounding all fourteen intron/exon junctions, and several complete introns. Because the ovomucoid mRNA was sequenced, we were able to determine exactly where in the coding sequence the introns are inserted. The DNA sequences surrounding these fourteen intron/exon junctions are listed in Fig. 3.
tp
5.6Kbp4
0
Structural
m
Intervening
-
Flanking
Gene Sequence Gene Sequence DNA Sequence
Fig. 2. Structural Organization of the Chicken Ovomucoid Gene. The ovomucoid exon sequences are shown as open bars, the intron sequences es filled bars, and the flanking genomic DNA as e solid line. The seven introns are labeled A through G. The overall length of the ovomucoid gene is 5.6 kb.
O’MALLEY.
sequence
Exon ---
lntron
I
Length 111 -900
A II
20 6
Ill/C
725
TGCCTACAGCATGTGTGTACTGCA
C/IV
TTCCCTCTTCAGlAGAATTTGGAAC
iv/D
GAAACTGTTCCTGTAAGTGAAACC
DIV
CTCCTTCCACAGATGAACTGCAGT
VIE
TGCCCACAAAGTGTTATTGTACCG
E/VI
TTTTCCTTTCAOAGAGCAGGGGGC
VI/F
GCTGCTGTGAG
F/VII
GTGCTTTTGCA
VII/G
CAATGCAGTCG
G/VIII
CCTCGCTTTCAG
TACGTACAGCC AAAGCAACGG
t
STEIN, AND
MEANS
were completing the sequence analysis of the ovomucoid gene, this group published the complete amino acid sequence of the secreted chicken ovornucoid.‘” Furthermore, these investigators noted ;I general sequence homology among avomucoids and several mammalian submandibular and seminal plasma trypsin inhibitors. Specifically. the sequence of chicken ovomucoid shown in Fig. 4, could be divided into three structural domains, I, II, and 111. Each domain ih apparently capable of binding one molecule of trypsin or other serine protease, so there is a good correlation of functional and structural domains. Whereas all three domains contain three disulfide bonds in almost identical positions in each domain, the sequence homology between domains I and II is far stronger than that between domains I and Ill or II and III. Kato ct al.” used these observations to propose that distinct intragenic doubling events of a primordial gene
Fig. 3. DNA Sequences at all Fourteen Intron/Exon Junctions of the Ovomucoid Gene. The DNA sequences at each of the intronlexon junctions ere listed at the left, with exon sequences underlined. The sequences are aligned so that the putative splice points ere above the errow. Terminally redundant sequences at either end of a particular intron are shown in boxes. The actual length of each intron and exon, in nucleotide pairs, is shown at the right.
(underlined in Fig. I), there is only one dinucleotide bond at each junction that could be cleaved and joined with the adjacent exon sequence to yield the correct mature mRNA sequence. In each case, splicing at the indicated position yields an intron whose sequence begins with a GT and ends with an AG dinucleotide. The sizes of the exon sequences are listed in the columns at the right of Fig. 3. These coding segments vary widely in length, the largest, exon VIII, being nine times longer than the shortest, exon II. Exon II is remarkable in that it is only 20 nucleotides in length. This is among the shortest structural gene segments (exons) described to date; only the D gene segment of the immunoglobulin heavy chain gene, which might consist of as few as 13 nucleotides, is as small as exon 11.” The size of such short exons would seem to put a severe limitation on any model of RNA splicing that involves a specific protein and/or RNA interaction with the coding (exon) sequence. The Sequence
of the Secreted
C-terminus
Hen Ovomucoid
Avian ovomucoids are a family of glycoproteins that account for about 10% of the protein content of all bird egg whites.32’33 Chicken ovomucoid has been shown to be responsible for most of the trypsin inhibitory activity of chicken egg white.34 Laskowski and coworkers have for several years been investigating the comparative biochemistry of these avian ovomucoids. While we
Fig. 4. Amino Acid Sequence of the Secreted Chicken Ovomucoid. The single-letter amino acid code by Dayhoff is used. The drawing is meant to depict three similar structural regions held together by disulfide bonds (solid lines) separated by short connecting peptides.
649
EVOLUTION OF A COMPLEX EUCARYOTIC GENE
resulted in the present ovomucoid gene. This hypothesis can now be examined more critically by comparing the ovomucoid genomic DNA sequence with the amino acid sequence of the secreted polypeptide. Ovomucoid Intervening Sequences Functional Domains
DOMAIN _
I
DOMAIN
II
Specify
The illustration of the amino acid sequence in Fig. 4 is depicted so that the functional, trypsin-binding domains of Kato et a1.29are clearly delineated. They are depicted (in one dimension) as globular regions held together by disulfide bonds and separated by short connecting peptides. An analysis of the positions of the intervening sequences within the ovomucoid gene determined that several introns interrupted the portions of the gene sequence that code for the connecting peptides. We used the locations of the introns to redefine the three functional domains of chicken ovomucoid. These domains are depicted separately in Fig. 5. They are aligned to show the structural similarities as well as the similar positions of the introns within the domains. Each domain is separated from the next by one intervening sequence. All domains also contain one internal intervening sequence at identical positions within the three domains. As the lower part of Fig. 5 shows, 46% of the amino acids of domains I and II are identical. Sixty-six percent of the mRNA sequences coding for domains I and II are identical. The homology of domains I and II with domain III is less extensive, but still significant. This suggests that the domains are related, a point which will be discussed in more detail below. Although introns B, D, and F apparently separate the genomic domains coding for the functional peptide domains shown in Fig. 5, what about the other three introns, C, E, and G? These three introns occur at identical positions within the three domains, I, II and III, shown in Fig. 5. These “internal” introns divide the subdomains that code for the trypsin-binding sites, the N-terminal halves of the peptide domains in Fig. 5, from the C-terminal subdomains. It is quite possible that a primordial ovomucoid domain consisted only of the 5’ subdomain, and that the internal introns were created in the course of the addition of the 3’ subdomain. By adding another disulfide bond to the primordial ovomucoid polypeptide (a single domain), this may have resulted in increased stability and increased resistance to digestion by the protease bound by the ovomucoid protein. If these subdomains were separated and the disulfide bridge broken, the functional integrity of the inhibitor would probably be destroyed. Thus, a strong argument can be made that the intervening sequences of the ovomucoid gene divide the gene into structural domains. These genomic domains
InWon E
DOMAIN
Ill
t lntron F
) lntron G % HOMOLOGY
DOMAINS l/II
A.A. 46
N.A. 66
ll/lll
30
42
l/Ill
33
50
Fig. 5. Alignment of the Three Domains of the Secreted Ovomucoid Showing the Similar Positions of Introns. The top portion of the figure shows the secondary structure of the three domains of the secreted ovomucoid polypeptide chain, aligned to emphasize the structural similarities. Disulfide bonds are indicated by the solid lines. The positions of the six introns which interrupt the DNA sequence coding for the three domains are shown with errors. The homology between the amino acid sequence of the domains and the mRNA sequence coding for these domains is shown at the bottom of the figure. The amino acid homologies were calculated by comparing the amino acids at each position of the three domains, beginning with the first amino acid following introns B, D. and F. as shown. The nucleic acid homologies were calculated in the same manner, by comparing each nucleotide that codes for the amino acids shown in the figure. (Therefore, there are three times as many comparisons as for the amino acid homology.) A nine amino acid deletion after the ninth position of domain Ill is assumed to maximize homology with domains I and II.
code for structural domains of the ovomucoid polypeptide, which appear to correspond as well to functional polypeptide domains. We conclude, therefore, that the ovomucoid intervening sequences specify functional domains of the protein, and suggest that this is likely a general concept of eucaryotic gene evolution. Evolution
of the Ovomucoid
Gene
The hypothesis that the present ovomucoid is the result of two intragenic doubling events is quite tenable
650
in light of our investigations of the structural organization of the ovomucoid gene. All three functional polypeptide domains can be aligned so that the secondary structures are remarkably similar. All three domains contain three disulfide bonds in virtually identical positions. Three introns, B, D, and F. delineate the three functional domains of the secreted ovomucoid polypeptide as well as a separate domain that contains the signal peptide. All three domains contain an internal intron at identical positions within the genomic coding sequence. A more detailed analysis of the DNA sequences surrounding the six splice points indicated in Fig. 5 is even more revealing. The splice points (B. D. and F) that separate each domain occur between two codons of the genomic DNA sequence. The internal splice points (C, E, and G) within each domain, however, occur between the second and third nucleotides ofa codon of the genomic DNA sequence. Thus, not only are the positions of the introns within the amino acid sequence the same for all three domains, but the pattern of intron interruption of the codon sequence is also identical. These observations are only plausible if the present ovomucoid gene is the result of two intragenic duplications of a primordial ovomucoid gene which itself contained introns. The primordial ovomucoid gene probably consisted only of the 5’ subdomain of domain I I I lntron G would have been created in the course of addition of the 3’ subdomain, which would have added stability to the primordial protease inhibitor, as discussed above. This recombinational event would have resulted in the creation of the original ovomucoid domain, represented as domain III in Fig. 5. Two genomic duplications, with the retention of noncoding sequence within and at the 5’ end of the primordial gene, would subsequently have led to the present ovomucoid gene. After the initial duplication event, in which domain III gave rise to domain II, there must have been a deletion of twentyseven nucleotides (coding for nine amino acids, see Fig. 5) in the 5’ region of domain 111. The duplication of domain II to form domain I must have occurred much more recently on the evolutionary scale, as judged by the significant homology between the nucleotides at corresponding positions of the genomic sequences coding for domains I and II. An approximate time when this second duplication must have occurred can be estimated by comparing the avian ovomucoids with mammalian protease inhibitors. For example, the dog submandibular protease inhibitor has been sequenced and was shown to consist of two polypeptide domains; the dog domain II (the carboxyterminal domain) shows close homology to the ovomucoid (carboxy terminal) domain III.29 Since all avian species examined to date contain ovomucoids
O’MALLEY.
STEIN. AND MEANS
with three domains, the duplication to form the newest domain ( I ) must have occurred after the divergence of birds from mammals (about 300 million years ago). The evolutionary ancestor to birds and mammals must. therefore, have contained an ancestral ovomucoid with two domains, and we can conclude that the intervening sequences must have existed in the ovomucoid gene well before 300 million and probably over 500 million years ago. Up to this point. our discussion has not included the signal peptidc because it is not part of the secreted ovomucoid, and is, of course, significantly different than the three polypeptide domains. lntron B separates the genomic DNA coding for the bulk of the secreted ovomucoid from the gcnomic domain that codes for the signal peptide and the first two amino acids of the secreted ovomucoid polypeptide. However. intron A separates the DNA coding for the tirst I9 amino acids of the signal peptide from the DNA coding for the remainder of the signal peptide and the first two amino acids of the secreted protein. If each cxon corresponds to a functional or structural domain, then the first 19 amino acids of the ovomucoid signal peptide should comprise a domain that contains the information necessary for translocation, while the remainder of the signal peptide might comprise a domain that contains the peptidasc specificity. In support of this prediction. the initial 14 amino acids which constitute the signal peptide sequence are encoded in exon 1 of the conalbumin gene” as are the first I5 amino acids of the mouse immunoglobin light chain leader sequence.“.” However, this does not appear to be the case for the rat preproinsulin genes. “2 In our evolutionary scheme. the genomic ovomucoid domain containing these first two cxons must have been added to the 5’ end of the primordial ovomucoid gene at the time of the development of the capacity for secretion of the ovomucoid polypeptide. We can now formulate a theory for the evolution ot the ovomucoid gene, and its resultant protein structure, as shown in Fig. 6. The top line shows a single exon sequence. which codes for an ancestral protease inhibitor. The inhibitor was probably only moderately successful as its chosen task, the inhibition of proteases, due to digestion of the inhibitor by the protease. However, with the addition of another appropriate exon to the 3’ end of the original exon (step I). the primordial ovomucoid gene was created, which now coded for a stable inhibitor due to the addition of the strategically located third disultide bond, as discussed above. This polypeptide corresponds to the present day ovomucoid domain 111. Step 2 represents the initial duplication of the primordial ovomucoid gene, perhaps by unequal crossing-over to form an ovomucoid gene
651
EVOLUTION OF A COMPLEX EUCARYOTIC GENE
Ll
Fig. 6. The Proposed Evolution of the Chicken Ovomucoid Gene. The column on the left represents the ovomucoid gene structure at various times during its evolution, and the protein structure for which it codes is shown in the column to the right. The 5’ and 3’ represent the transcription initiation and termination sites, respectively. The solid boxes representexon sequences.Between exons, the thin lines represent intervening sequences; outside of the exons, the thin lines represent untranslated or flanking sequences.
coding for a stable inhibitor with two active sites, domains II and III. It was at this point in evolution, about 300 million years ago, when the avian and mammalian lineages diverged. Mammals have retained this two-domain protease inhibitor, while the avian gene evolved further. Step 3 represents the duplication of domain II, again most likely an unequal crossover event, to create a new domain, the present polypeptide domain I. The result was an ovomucoid gene that codes for the three-domain inhibitor that birds have today. Because all avian ovomucoids examined to date have this three-domain structure,29 the partial genomic duplication of step 3 must have occurred before the avian speciation, or 80-300 million years ago. Step 4 represents the addition of the exons coding for the signal peptide, which created an ovomucoid that could be secreted by the cells responsible for its synthesis (e.g., tubular gland cells of the oviduct). However, as noted in step 1, this addition could have occurred at any point in the evolution of the gene, and indeed must have occurred before the secretion of the protease inhibitor became a necessity. INTRONS AND EUCARYOTIC GENES
In conclusion, our results on the structural organization of the ovomucoid gene have led us to formulate a unified theory of the evolution of the ovomucoid gene. The structure of this gene is consistent with the model
stableblata. 3 Achre
sites
secreted ovomucoid
proposed by Gilbert and ourse1ves38+39*40 that exons may represent protein domains that were “marshalled” together during evolution. Furthermore, we have extended these ideas to make certain predictions concerning the evolution of complex eucaryotic genes in general. Intervening sequences apparently specify structural/functional domains of proteins. This is certainly the case for the ovomucoid, immunog10bin4’ and insulin genes,24 and apparently also for the globin gene.42 The origin and function of intervening sequences cannot be proven, but we can make some educated guesses now as to how and why they arose. We have deduced that the ovomucoid introns are very old; in fact, they probably date from the time when the genes were first formed. As shown in Fig. 6, introns were formed when two exon segments, which already had a built-in function, were brought together to form a primordial gene. The recombinational advantage that these intron sequences afforded is shown in more detail in Fig. 7. Column I shows the problem inherent in the assembly of complex genes from diverse exon segments: the recombination must occur at precise sites at either end of the exons. Otherwise, the reading frame of the resulting mRNA would be altered and the functional integrity of the resultant polypeptide possibly destroyed. Since the recombination process itself is random, a requirement for absolute precision in break-
652
O’MALLEY.
STEIN,
AND
MEANS
Hypothesis for the Origin of lntwwning Sequmes
An EvoUii
Fig.
7.
pothesis vening
Origin
represent
of DNA
lines
introns
or
from
elements
without while
the
DNA
exon
benefit
column
of
segene
diverse
of
II repre-
alternative
generation
thin
either
I shows
assembly
counted
the
flanking
assembly
gene
open regions
while
Column
sents
The
represent
quences.
introns,
Hy-
of Inter-
coding
(exons),
solid
age (or excision) of the segments of DNA to be of low assembled makes the event an occurrence process would be probability. Thus, the evolutionary lengthened greatly. Column II depicts the same event, as we now propose it actually occurred, leading to the development of modern complex eucaryotic genes. Recombination, to bring together the diverse exon segments, could have occurred at any one of numerous sites in the flanking DNA sequence on either side of the exon sequences. This model would greatly facilitate genetic evolution since the probability of occurrence would be orders of magnitude more. In this manner, eucaryotes would be afforded the capacity for greater
the
Sequences.
boxes
(Protein Synthesis)
An Evolutionary for
method in
of
which
the
introns
is
ac-
for.
genetic development per unit time. After rccombination. an intervening sequence would have been created between the two exons. Provided that an early mechanism existed (or developed) for splicing these introns out of the primary RNA transcripts. this process would allow the rapid development of a diverse functional gene. Recent evidence from a number of laboratories has shown that such a splicing enzyme does exist and that introns do not provide a barrier to functional gent expression.” 4XThus, our experimental data and the line of reasoning described above. lead us to believe that intervening sequences are intimately related to the origin and evolution of complex eucaryotic gents.
REFERENCES 1. O’Malley mechanism
BW, McGuire
WL.
Kohler PO, et al: Studies on the
of steroid hormone regulation
of synthesis of specific
preliminary
Woo SLC.
Molder
characterization
et al: Preparation
LA, O’Malley
BW: The ovalbumin
gene: in vitro enzymatic synthesis and characterization. 4. McReynolds
LA, Catterall
JR, O’Malley
BW: The ovalbumin
in a bacterial
plasmid. Gene
Dugaiczyk
A. Tsai MJ. et al: The ovalbumin gene:
cloning of the natural gene. Proc Nat1 Acad Sci USA 75:3688%3692. 6. Woo SLC,
Beattie
WC,
Catterall
JR, et al: The complete
sequence of the chicken chromosomal
and its biological significance. J Biol Chem I98 7. Stein JP, Catterall
Biochem 17:5763%577X
8. Lai EC. Stein JP, Catterall flanking
nucleotide
9. Catterall
JR, Stein
gene
press) cloning of
purified ovomucoid mes-
JF, et al: Molecular
structure and
chicken
ovomucoid
I
~687. 1980 RNA
Krihto
messenger
tary DNA.
J Cell Biol 87:480-487.
Breathnach
R, Mandel
split in chicken DNA. 13. Weinstock
I’. et al.
as determined
ovomucoid
R. Sweet
bcquencc
270:314
P: Ovalbumin
gene i:,
319. 1977
R. Weiss M, et al: Intragcnic
the ovalbumin
01
19X0
JL. Chambon
Nature
Primary
from cloned cumplemen-
gene. Proc Natl
DhA
.Acad Sci USA
75: 1299.. 1303. I978 14. Garapin
AC.
2731349.-354, IS.
Ixpennec
ofa fragment
JP. Roskarn W. CI ‘11. Iwlarion
b!
of the split ovalbumin gene. Yaturc
1978
Mandel JL, Breathnach
coding and intervening gene. Cell 14:641-653. 16. Lindenmaier
R. Gerlinger
P, et al: OrganiTa~wn
sequences In the chicken ovalbumin
ol
split
1978
W. Nguyen-Huu
MC,
Lurz R. et al: Arrange-
Proc Nat1 Acad Sci USA. 76:6 196-6200. 17. Cachet
M.
Gannon
F. Hen
R.
1979 et al: Organization
sequence studies of the 17.piece chicken conalbumin
1979 JP,
Intervening
ment of coding and intervening sequences of chicken lysolyme gene.
1978
sequences of the natural
gene. Cell 18:829-842,
1 (in
JF, Woo SLC, et al: Molecular
ovomucoid gene sequences from partially senger RNA.
ovalbumin
JF, Kribto P. et al: Ovomuw~d
JR. Stein JP.
molecular cloning
1978 nucleotide
Cattcrall
spacers interrupt
1977
5. Woo SLC,
phism. Cell 21:68
I?.
gene: cloning of a complete ds-cDNA 2:217-231,
J Biol Chem
1976
778:3X
sequences specify functional domains and generate protein polymor-
I I.
4:69%78. I975
JJ, McReynolds
251:7355-7362.
and
of purified ovalbumin messenger RNA
from the hen oviduct. B&hem 3. Monahan
JW,
scqucncc~. %aturc
327. 1979 10. Swin JP. Cattcrall
proteins. Recent Prog Horm Res 25: 105-I 60, 1969 2. Rosen JM,
gene contains ai least 51x lntervcning
Lai EC, et al: The chick ovomucoid
282:567--575,
1979
and
gene. Mature
653
EVOLUTION OF A COMPLEX EUCARYOTIC GENE
18. Brack C, Tonegawa S: Variable and constant parts of the immunoglobin light chain gene of a mouse myeloma cell are 1250 nontranslated bases apart. Proc Nat] Acad Sci USA 74:5652-5656, 1977 19. Maki R, Traunecker A, Sakano H, et al: Exon shuffling generates an immunoglobin heavy chain gene. Proc Nat] Acad Sci USA 77:2138-2142,198O 20. Early P, Huang H, Davis M, et al: An immunoglobin heavy chain variable region gene is generated from three segments of DNA: V,, D and Jn. Cell 19:981-992,198O 21. JetTreys AJ, Flavell RA. The rabbit &globin gene contains a large insert in the coding sequence. Cell 12:1097-l 108, 1977 22. Tilghman SM, Tiemeier DC, Seidman JF, et al: Intervening sequences of DNA identified in the structural portion of a mouse j?-globin gene. Proc Nat1 Acad Sci USA 75:725-729, 1978 23. Valenzuela P, Venegas A, Weinberg F et al: Structure of yeast phenylalanine-tRNA genes: an intervening DNA segment within the region coding for the tRNA. Proc Nat] Acad Sci USA 75:19&194,1978 24. Lomedicao P, Rosenthal N, Efstratiatis A, et al: The structure and evolution of the two nonallelic rat preproinsulin genes. Cell 18:545-558, 1979 25. Bell GI, Pictet RL, Rutter WJ, et al: Sequence of the human insulin gene. Nature 284:2633, 1980 26. Nunberg JH, Kaufman RJ, Chang ACY, et al: Structure and genomic organization of the mouse dihydrofolate reductase gene. Cell 19:355-364, 1980 27. Gorin MB, Tilgham SM: Structure of the cr-fetoprotein gene in the mouse. Proc Nat1 Acid Sci USA 77:1351-1355, 1980 28. Fyrberg EA, Kindle KL, Davidson N, et al: The actin genes of drosophila: a dispersed multigene family. Cell 19:365-378, 1980 29. Kato I, Kohr WJ, Laskowski MJ: Evolution of avian ovomucoids. Proceedings of the 1Ith FEBS Meeting 47:197-206, in Magnusson S, Ottesen M, Foltman B, et al (eds): Oxford, Pergamon press, 1978 30. Maxam AM, Gilbert W: A new method for sequencing DNA. Proc Nat] Acad Sci USA 74:506-564, 1977 31. Blobel G, Dobberstein B: Transfer of proteins across membranes II. Reconstitution of functional rough microsomes from heterologous components. J Cell Biol 67:852-862, 1975 32. Feeney RE, Allison RG: Inhibitors of proteolytic enzymes. In Evolutionary Biochemistry of Proteins. New York, Wiley Interscience. 1969 33. Feeney RE: The non-bond splitting mechanism of action of inhibitors of proteolytic enzymes-the conservative interpretation. Proceedings of the International Research Conference on Proteinase
Inhibitors, lst, Munich 1970, in Fritz H, Tschesche H (eds): Berlin, DeGruyter, 1971, pp 162-168 34. Lineweaver H, Murray CW: Identification of the trypsin inhibitor of egg white with ovomucoid. J Biol Chem 171:565-572, 1947 35. Bernard 0, Hozumi N, Tonegawa S: Sequences of mouse immunoglobin light chain genes before and after somatic changes. Cell 15:1133-1144,1978 36. Tonegawa S, Maxam AM, Tizard R, et al: Sequence mouse germ-line gene for a variable region of an immunoglobin chain. Proc Natl Sci USA 75: 1485-l 589, 1978
of a light
37. Cordell B, Bell G, Tischer E, et al: Isolation and characterization of a cloned rat insulin gene. Cell 18:533-543, 1979 38. Gilbert W: Why genes in pieces? Nature 271:501, 1978 39. Gilbert W: Eucaryotic gene regulation. ICN-UCLA Symposia on Molecular and Cellular Biology, 14, in Axe] R, Maniatis T, and Fox CF (eds): New York, Academic Press, 1979 40. O’Malley BW, Roop DR, Lai EC, et al: The ovalbumin gene: organization, structure, transcription, and regulation. Recent Prog Horm Res 35:l, 1979 41. Sakano H, Rogers JH, Huppi K, et al: Domains and the hinge region of an immunoglobin heavy chain are encoded in separate DNA segments. Nature 277:627-633, 1979 42. Craik CS, Buckman SR, Beychok S: Characterization of globin domains: heme binding to the central exon product. Proc Nat] Acad Sci USA 77:138&1388, 1980 43. Knapp G, Beckmann JS, Johnson PF, et al: Transcription and processing of intervening sequences in yeast tRNA genes. Cell 14:221-236, 1978 44. Blanchard JM, Weber J, Jelinek W, et al: In vitro RNARNA splicing in adenovirus 2 mRNA formation. Proc Nat1 Acad Sci USA 7553445348, 1978 45. Kinniburgh AJ, Ross J: Processing of the mouse @-globin mRNA precursor: at least two cleavage-ligation reactions are necessary to exercise the larger intervening sequence. Cell 17:915-921, 1979 46. Hamer DH, Leder P: Splicing and the formation of stable RNA. Cell l&1299-1302, 1979 47. Halbreich A. cytochrome b mRNA splicing steps and an 329, 1980 48. Seif I, Khoury analysis of preferred 6:3387-3398.1979
Pajot P, Foucher M, et al: A pathway of processing in yeast mitochondria: specific intron-derived circular RNA. Cell. 19:321G, Dhar R: BKV splice sequences based on donor and acceptor sites. Nucl Acids Res