Biochimica et Biophysica Acta 827 (1985) 283-297 Elsevier
283
BBA32116
Relationships b e t w e e n e x o n s and the predicted structure o f membrane-bound proteins P a t r i c k A r g o s a n d J.K. M o h a n a R a o * Department of Biological Sciences, Purdue University, West Lafayette, IN 47907 (U.S.A.) (Received May 17th, 1984) (Revised manuscript received November 5th, 1984)
Key words: Exon; Intron; Splice junction; Membrane-bound protein; Protein structure prediction
A prediction algorithm designed to detect transmembrane helices and connecting exposed turn regions was applied to the primary sequences of five different lipid-bound proteins whose intron/exon structures are known. It was found that the splice junctions largely map to the predicted surface segments and that the number of junctions correlate with the length of the surface spans in the five proteins. Implications for the evolutionary development and mechanism of exon utilization are discussed.
Introduction In the 1970's it was discovered that many eukaryotic genes do not possess a contiguous DNA region that codes for proteins. Instead, non-coding segments (introns) were interspersed amongst coding regions (exons) such that untranslated spans were excised from the inital RNA transcript to yield a message translatable at the ribosome into the final protein product. The intron/exon boundaries at the nucleotide level have been referred to as splice junctions. At the protein level, the position of the two amino acids coded for by the termination region of one exon and by the initial segment of the following exon will be referred to as exon/exon junctions. Considerable controversy has developed as to the purpose, etiology and mechanism of such splice junctions. One hypothesis states that exons represent structural and functional domains in proteins and therefore constitute the basic genetic building blocks fused through intron-mediated recombination [1,2]. The intrusion model allows the * To whom correspondence should be addressed. 0167-4838/85/$03.30 © 1985 Elsevier Science Publishers B.V.
disruption of preexisting genes through the insertion of an exon flanked by introns. Craik et al. [3] have recently proposed for soluble proteins the 'sliding' junction hypothesis where genetic mutation alters a splice juntion, resulting in an increased length for a surface exposed polypeptide loop. Reasons for the splice junctions range from eukaryotic attempts at functional diversity and complexity to stabilization of duplicate genes by acting as barriers to recombination [4]. It has also been suggested that the junctions existed in a primitive genetic structure which the eukaryotes have maintained but the prokaryotes have since eliminated [5-7]. The issues have been further complicated by the intricacy of soluble protein tertiary architecture consisting of various patterns of four structural types: helices,/]-strands, turn conformations and coil structures. It is often difficult to imagine how exons which code for extensive secondary structures, or just a few, or which terminate coding in the middle of a fl-strand or helix, might result in a functional and structurally stable protein. Craik et al. [3,8] have recently postulated, through an examination of the exonic and tertiary structures of
284
selected soluble proteins, that sliding splice junctions map to the protein surface and therefore provide minimal disturbance to the central core region largely responsible for protein folding integrity. Membrane-bound proteins display a much simpler architecture, at least as far as structural units are concerned. They are thought to be generally composed of successive transmembrane helical segments connected by surface exposed loops or turn regions. The only lipid-embedded polypeptide topology known to a reasonably high resolution has been determined by electron-scattering techniques for the Halobacterium halobium purple membrane protein (bacteriorhodopsin) whose seven rods of electron density are believed to be a-helical segments traversing the membrane [9,10]. Prediction algorithms [11,12] have been developed to detect the 20-to-25-residue helical segments which are largely composed of hydrophobic residues. Several structures have been proposed (cf. Refs. 13, 14] which are in agreement with present empirical evidence. In the present work, the prediction method of Argos et al. [11] was applied to five functionally distinct lipid-bound proteins whose exonic structure is known in an attempt to shed further light on the mechanism and etiology of splice junctions. It was found that the exon/exon boundaries almost exclusively occur in the predicted exposed regions, in agreement with the observations of Craik et al. [3] on soluble proteins. Furthermore, many of the exon boundaries are within the longest exposed segments. The structural content and positioning of the exon junctions is discussed relative to the three prevailing models (domain, intrusion and sliding junction hypotheses) of splice junction mechanism and development. Methods and data
Prediction algorithm A method has been developed by Argos et al. [11] to delineate hydrophobic regions within a primary sequence that are likely candidates for transmembrane helices. Though details and applications of the technique are given by Argos et al. [11] and Mohana Rao et al. [15], a brief summary will be given here.
The only membrane bound polypeptide topology known to a reasonably high resolution has been determined by electron-scattering techniques for the purple-membrane protein of Halobacterium halobium [9,10]. Through an examination of electron densities calculated from electron and neutron diffraction data as well as model building analysis, seven specific regions in the bacteriorhodopsin primary structure have been delineated as transmembrane helices [16,17]. In the present prediction algorithm, weighted 'smoothed' curves of five residue physical characteristics versus the bacteriorhodopsin amino acid sequence number were combined to give the best fit to a theoretical curve that corresponded to the bacteriorhodopsin model of lipid-bound helices and exposed turn regions connecting the helical structures. The smoothing process was performed by a linear least squares over successive, contiguous seven-point clusters. The five physical parameters used would be expected to detect membraneburied and exposed turn spans: amino acid hydration potential, polarity, lipid-buried transfer free energy, residue turn propensity, and residue bulk [11]. The final weighted, summed and smoothed curve was normalized to a value of 0.0. The two rules (one of which must be satisfied) used to delineate the helical segments in bacteriorhodopsin included (1") a positive peak with maximum value greater than 1.0 and containing at least a 13-residue span which could be expanded to at least 16 residues (if necessary) until encountering a charged residue and (ii) a positive peak with maximum less than 1.0 and containing at least 18 residues. A 16-amino acid a-helix, representing about 25 along its axis, was chosen as the minimum-length helix required to traverse the lipid portion of a typical membrane. Furthermore, no more than three charged (K, R, D, E) residues were allowed in any one transmembrane region. If the peak value was less than 1.0, no more than three charged or strongly polar (Q, N) residues were allowed. The bacteriorhodopsin physical chatacteristic curve is shown in Fig. 1, while Table I displays the helical spans determined from application of the above rules and by Trewhella et al. [17] as well as the helical lengths, charged and polar residues contained within them, peak maxima and helical designations corresponding to the characteristic
285
peaks of Fig. 1. The two models clearly agree in seven transmembrane spans and select the same sequence positions at the 85% agreement level. The same rules and prediction algorithm were applied to the other membrane-associated proteins discussed in this paper to detect their putative helical regions. In a previous paper applying the same prediction algorithm discussed here, Argos et al. [11] ascertained the predicted bacteriorhodopsin helical boundaries by using only the positive algorithm values. In the present prediction, the aforementioned formalized rules were used. Nonetheless, the helical boundaries did not differ by more than one residue position on the average.
Lipid-associated protein data base The membrane-associated proteins used in this study along with references for their primary sequences and exon/intron structures are bovine rhodopsin [18], human a-subunit prescursor of muscle acetylcholine receptor [14], rat liver phenobarbital-inducible cytochrome P-450 [19], subunit I of Saccharomyces cereoisiae (strain D273-10B) cytochrome oxidase (20], and cytochrome b of S. cerevisiae (strain D273-10B) [21].
2.0-
2
3
4
5
6
7
1.0-
ul
O-
-2.0-
-3.0 0
~o
lbo ~so 260 2~o
Boderiorhodopsin haiobacterium holobium
Fig. 1. Plot of the amino acid sequence number for purple membrane protein from H. ha/obium [74] versus a weighted five-parameter characteristic value for a given residue. The smoothed curve has been normalized to a value of 0.0. Details of the prediction method are given by Argos et al. [11].
Soluble protein data base The list of soluble proteins and their species used to compile amino acid compositions at exon boundaries are given in Table II. References which relate the associated nucleotide sequence exon/intron structure are also listed. The residues counted include the amino acids on either side of an
TABLE I THE HELICAL SEQUENCE SPANS DELINEATED BY THE PREDICTION ALGORITHM (FIG. 1) FOR PURPLE MEMBRANE PROTEIN FROM HALOBACTERIUM HALOBIUM. The notation used is described in the caption of Table IIIa. The number in parentheses given along side the helical segments are those given by Trewliella et al. [17] from their bacteriorhodopsin model. Helical spans
Helical length
Charged residues
Polar residues
Peak height
Designation (Fig. 1)
(9)Nt 10-29a(9-29)
20
0
0
1.08
1
26
0
0
1.14
2
16
D
0
1.00
3
21
D
0
0.61
4
22
0
0
1.77
5
18
0
N
0.64
6
22
E, D, K
0
0.98
7
(14)t 44-69a(42-62) (16)t 86-101a(79-99) (6)t 108-128a(107-127) (6)t 135-156a(136-156) (19)t 176-193a(176-196) (9) t 203-224a(204-224) (25)Ct
286
TABLE II LIST OF GENES AND ASSOCIATEDREFERENCES USED TO COMPILE THE AMINO ACID COMPOSITIONSNEAR EXON/EXON JUNCTIONS Gene
Species
Ref.
Actin, a Actin
chicken drosophila maize and soybean
22 23 24 25 26 27 28 29 3 30 31 32
S. cerevisiae Amylase Antitrypsin, a1 CarboxypeptidaseA Chorion ChymotrypsinB Corticotropin//~-fipoprotein Dihydrofolatereductase EIA (transformingprotein) Elastas¢ II Fetoprotcin, a Globin, a Growth hormone Immunoglobulin(heavychain Cv) Immunoglobulin (heavy chain "t~) Immtmoglobulin(heavychain #) Immunoglobulin(heavychain, J region) Insulin Lysozyme Metallothionein-I Ovalbumin Ovomucoid
~epro~ulin Pro a2(I)collagen Prolactin T-antigen Transplantation antigens Trypsin I X gene
rat human rat silkmoth rat bovine mouse
adenovirus type 12 rat mouse chicken human rat mouse mouse
mouse mouse human chicken
mouse chicken chicken rat chicken chicken rat SV40 mouse rat chicken
exon/exon boundary. If the code for a residue was interrupted by a splice junction, then it and the two adjacent residues in the protein sequence were utilized in the count. The total number of amino acids satisfying the above conditions was 290. Results
Application of prediction algorithm The exon/intron structure is known for five membrane-associated proteins: bovine rhodopsin, human a-subunit precursor of muscle acetylcholine receptor, rat liver phenobarbital-inducible
3
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48, 49 50 51 52 3 44
cytochrome P-450, subunit I of S. cerevisiae (strain D273-10B) cytochrome oxidase, and cytochrome b from S. cerevisiae (strain D273-10B). References describing the primary and exon/intron structures are given in the Methods and Data section. A prediction algorithm, which attempts to delineate membrane-buried helical regions and exposed connecting turn spans, was applied to the primary sequences of the five proteins. The procedures and rules of the technique are given in the Methods and Data section. The resulting weighted, summed and smoothed residue physical characteristic versus sequence number curves are shown in
287 1
3.0-
2
3
5
4
6
1
3.0-
7
2
3
4
5
6
7 B
2.0-
E 2.0-
E 1.0e
~
0-
8
0-
.~ -10-
[~ -1.0-
-2D-
.~ -2.0h -3.0-
-3.0 160 150 200 250 360 Bovine rhodopsin
0
350
e
-4.0 0
4.0-
S
1
.~
o-
g
4
b
Fig. 2. (a) Plot of the amino acid sequence number for bovine rhodopsin versus a weighted five-parameter characteristic value for a given residue. The smoothed curve has been normalized to a value of 0.0. Details of the prediction method are given by Argos et al. [11]. An asterisk (*) within the curve indicates an exon/exon junction in the primary structure. The helical regions are numerically designated above the curve as also given in Table IIIa. (b) As (a) except for the human a-subunit
J
2.01.0-
3
2
3.0-
E
-1.o-
precursor of muscle acetylcholine receptor. The helices are -2.0-
numerically designated above the curve as also given in Table IIIb. The symbol S refers to the predicted signal sequence helix.
-3,& -4.0
(c) As (a) except for the rat liver phenobarbital-inducedcyto-
b
5b ~6o 1~o 260 25o 36O 3go 46o 4~o 5oo
0
Acetylcholine receptor (:flphosubunit human
subunit I of S. cerevisiae (strain D273-10B) cytochrome oxidase. The helices are numerically designated above the curve as also given in Table IIId. The symbol X refers to sequence spans that did not conform to the rules (discussed in the Methods and Data base section) used in accepting a region as a membraneburied helix; however, these spans came very close to satisfying the criteria as they were 15 residues long with the rules requiring 16. The regions could possibly be transmembrane helices but are not regarded as such in the text. (e) As (a) except for cytochrome b of S. cerevisiae (strain D273-10B). The helices are numerically designated above the curve as also given in Table IIIe.
4.0. 1
23
45
6
3.0=E 2.0. ¢1 1.0'
lZ
chrome P-450. The helices are numerically designated above the curve as also given in Table Illc. The symbol S refes to the
predicted signal sequence helix. (d) As (a) except for the
5.0-
E
5o 16o 15o 26o 25o 36o 350 46o
Cytochrome B yeast S. cerevisioe
0
-lO -2.0 -3.0
50 16O 150 26o 2go 300 350 4bo 450 500 Cytochrome p - 4 5 0 rot liver
30.
1
2
3
4
5
x
6
7 8
x
9
10
d
2.0~- 1.0-
~
o-
>o-1.0LT. -2.0-3.0
0
50
160 150 26o 250 3c~0 350 460 45o 5oo 5~o
Cytochrome oxidase subunit ] yeast S. cerevisiae
Fig. 2; the exon/exon junctions within the primary structures are shown as asterisks on the plots. The lipid-buried helical regions predicted are given in Tables IIIa-e. It is noteworthy that the algorithm was able to identify 85% of the helical residues in purple membrane protein of Halobacterium halobium, the best known membrane-buried structure to date [17]. The method suggests seven transmembrane helices in bovine rhodopsin which is supported by considerable empirical evidence: circular dichroism [53], linear dichroism [54,55] and neutron diffraction [56]. The four-helical model for the acetylcholine receptor (excluding the signal
288
span) is also suggested by experiment as discussed by Noda et al. [14,57]. There are two regions (designated X in Table III and Fig. 2d) in cy-
tochrome oxidase that were just shy (15 residues) of passing 16-residue length criterion; these regions are not considered as predicted helices, which
TABLE Ilia THE SEQUENCE SPANS DELINEATED BY THE PREDICTION ALGORITHM (FIG. 2a) FOR BOVINE RHODOPSIN The symbols (a) and (t) refer respectively to predicted lipid-buffed helical and exposed turn regions. The length of the transmembrane regions, charged (Lys, K; Arg, R; Asp, D; Glu, E) and strongly polar (Gin, Q; Ash, N) amino acids contained within the predicted helices (zero if none), physical characteristic peak maxima and helical designation (S, signal sequence; X, marginal prediction according to rules of helical delineation) corresponding to the curve of Fig. la are also given. The numbers in parentheses between the helical spans are the length of the predicted exposed turn regions with N and C designating respectively the surfaces N- and C-terminal segment lengths. The primary sequence positions for the exon/exon junctions are given as a single number if the intron interrupts the nueleotide coding trilogy, and as two position numbers separated by a slash if the intron interrupts after a complete amino acid code. Helical spans (36)Nt 37-60a (14)t 75-94a (20)t 115-139a (13)t 153-173a (28)t 204-230a (22)t 253-276a (9)t 286-307a (42)Ct
Exon/exon boundary
121
Hefieal length
Charged residues
Polar residues
Peak height
Designation (Fig. 2a)
24
0
N
2.40
1
20
D
N
1.54
2
25
E, E, R
0
1.70
3
21
0
0
1.90
4
27
0
Q
2.56
5
24
0
0
2.90
6
22
K
N
0.68
7
177 232/233
312/313
TABLE IIIb As Table IIla except for the human a-subunit precursor of muscle acetycholine receptor. The results given here were taken from the prediction curve of Fig. 2b. Helical spans (2)Nt 3-21a
(208)t
Exon/exon boundary 15 63 78/79 115
Helical length
Charged residues
Polar residues
Peak height
Designation (Fig. 2b)
19
0
0
1.99
S
26
0
N
2.53
1
22
E
0
3.26
2
20
0
0
3.33
3
20
0
0
3.33
4
180/181 230-255a (7)t 263-284a (12)t 299-316a (112)t 429-448a
(ll)Ct
260
334/335 414/415
289 TABLE IIIc As Table Ilia except for rat liver phenobarabital-indueed cytoehrome P-450. The results given here were taken from the prediction curve of Fig. 2c. Helices 4 and 5 are not predicted to have a buried turn segment between them. Helical spans
Exon/exon boundary
(2)Nt 3-22a (42)t 65-84u (82)t
57/58
167-1858 (9)t 195-2148 (73)t 288-3228 (l17)t
Helical length
Charged residues
Polar residues
Peak height
Designation (Fig. 2c)
20
R
0
4.10
S
20
R, D
0
0.95
1
19
0
Q, N
2.00
2
20
R, E, R
0
1.13
3
35
E, R, K
0
2.03, 1.51
4, 5
21
R, E
Q, N, N
1.81
6
112 162
215/216 274/275 322 384/385 431
440-4608 (32)Ct TABLE IIId
As Table Ilia except for subunit I of S. cerevisiae (strain D273-10B) cytoehrome oxidase. The results given here were taken from the prediction curve of Fig. 2d. Helical spans (14)Nt 15-36a (18)t 55-78a (20)t 99-117a (26)t 144-166a (13)t 180-208a (32)t 241-255a (14)t 270-289a (16)t 306-325a (6)t 332-357a (17)t 375-389a (20)t 410-4298 (20)t 450-471a (31)Ct
Exon/exon boundary
55/56 66/67 78/79
Helical length
Charged residues
Polar residues
Peak height
Designation (Fig. 2d)
22
0
0
1.56
1
24
0
0
2.32
2
19
0
0
1.40
3
23
0
N
0.72
4
29
0
0
2.09
5
15
E
0
0.83
X
20
0
0
1.25
6
20
K
0
0.72
7
26
0
0
1.91
8
15
0
0
0.91
X
20
0
Q, N
1.42
9
22
0
0
1.67
10
237/238
374/375
475/476 485/486
29O TABLE Ille As Table IIla except for cytochrome b from S. cerevisiae (strain D273-10B). The results given here were taken from the prediction curve of Fig. 2e. Hefical spans (31)Nt 32-52a (27)t 80-98a (15)t 114-134a (46)t
181-201a (26)t 230-248a (40)t 289-307a (12)t 320-337a (11)t 349-373a (12)Ct
Exon/exon boundary
Helical length
Charged residues
Polar residues
Peak height
Designation (Fig. 2e)
21
0
Q
1.79
1
19
0
N
1.28
2
21
0
N
2.10
3
21
0
0
2.12
4
19
0
0
2.72
5
19
0
0
2.53
6
18
K
N
1.88
7
25
0
Q
1.81
8
139 143/144 169
252/253 270/271
has little effect on the general conclusions reached in this paper.
Exon junctions in exposed turn regions There are 32 exon/exon boundaries in the five lipid-associated proteins. 29 of these appear in predicted exposed turn segments. If a junction occurred within one residue of a predicted helix/turn boundary of which there were three examples, the junction was assigned to the turn segment. Of the three exon boundaries appearing in predicted helical segments, one was within six residues of a helix C-terminus (the signal helix of acetylcholine receptor) and one within six residues of a helix N-terminus (helix 3 of bovine rhodopsin), while only one was in the middle of a helix (helix 2 of cytochrome oxidase). It is possible that the former two junctions are within exposed turn regions and the prediction algorithm is simply inaccurate. The latter splice junction is within a strongly predicted transmembrane span. There is a tendency for the exon/exon boundaries to be within the longest turn segments. Table IV lists the predicted exposed spans in each
protein by length; it is clear that only one of the 29 exposed exon/exon boundaries does not occur within the longest turn spans. Furthermore, the structure associated with 12 of the exons is predicted as a sequentially contiguous surface feature. Another indication that exon/exon junctions may be in surface exposed regions is derived from an analysis of the amino acid compositions near the junction. The two residues adjacent to a boundary were counted; if the intron insertion interrupted a nucleotide coding trilogy for an amino acid, then that residue was also placed in the statistics along with the two neighboring residues, resulting in a total count of 74 residues for both cases. Three amino acids were responsible for 355g of the total composition: glycine (14.9~), asparagine (10.8~) and giutamine ((9.4%). The conformational preference (normalized to a value of 1.0) for these amino acids to be in a turn configuration [58] are, respectively, 1.76, 1.48 and 1.00 for soluble proteins. Turn segments are almost universally found on the exposed surfaces of proteins. Furthermore, giutamine and asparagine are strongly polar residues. Similar statistics were
291 TABLE IV LIST OF R E S I D U E L E N G T H S OF SEQUENCE SEGMENTS PREDICTED N O T TO BE MEMBRANE-BURIED IN T H E FIVE PROTEINS E X A M I N E D Asterisks (*) indicate the number of exon/exon boundaries within the sequentially contiguous surface structure. Cytochrome oxidase I
Cytochromeb
Cytochrome P-450
Rhodopsin
Acetylcholine receptor a
65 56 31 26 20 18 16 14 13 6
46 *** 40 ** 31 27 26 15 12 11
117 103 73 42 32 9 2
42 * 36 28 * 22 * 20 14 13 9
208 ~ *
* * ** * *
*** ** ** *
also gathered for soluble proteins which are listed and referenced in Table II. The total count was 290 residues with glycine (13.1%), serine (9.0%), and glutamic acid (8.6%) providing 31% of the total composition. Their turn preferences are 1.76, 1.29, and 0.78 respectively. Serine is polar and the glutamate is charged, once again pointing to the possible surface exposure of the amino acids in the protein tertiary architecture. Craik et al. [3,81] have examined splice junctions in proteins whose atomic resolution structures have been determined by Xray crystallographic techniques; they observe that the intron/exon splice junctions map at the protein surface. There is a tendency for the exon junctions in the predicted structures of the five lipid-buried proteins to map near helical termini. For each protein the number of helical termini, including those of signal peptides, were divided into the total number of amino acids. The number of residues per terminus was then averaged over the five proteins resulting in a mean of 17 which would be the expected average distance of a randomly selected position from any helical boundary. The average distance of the 29 junctions from the nearest helical terminus was about ten residues, which is clearly less than the expected 17. The longest residue separations (42 and 50) were in acetylcholine receptor. If the latter were eliminated from the statistics, the average over 27 exon boundaries would be only eight residues. Thirteen junctions
112 ** 12 11 7* 2
are within six residues of a predicted helical boundary. The non-buried segments that are wholly contained within one exon are much shorter in length than those broken by exon/exon junctions. The average length of predicted exposed spans (of which there are 41) is 35 residues, while the mean length of such predicted segments appearing within an exon (of which there are 24) is 17 residues, half the average length of all the exposed regions. These statistics suggest that, when an exon contains two or more transmembrane helices with connecting surface links, there appears to be little attempt to tamper with or lengthen such exposed loops. There appears to be no clear-cut pattern that exons in lipid-associated proteins code for a single transmembrane helical segment flanked by turn regions; i.e., no basic structural unit is inferred for building the proteins by domain addition. A count was made of the number of complete predicted transmembrane helices contained within a single exon. For the three cases where an exon boundary predicted to be in a helical interior was within six residues of a helix terminus, the helix was considered within the exon where a majority of its residues resided; for the one exon boundary which divides a helix, half was assigned to one exon and the other half to the following exon. The statistics revealed 15 exons not containing prediced helices (a), two with 0.5 a, nine with 1 a, seven with 2 a, and four with 3 a.
292
For two for of the five proteins examined, bacterial analogues exist: bacteriorhodopsin with bovine rhodopsin and rat cytochrome P-450 with Pseudomonas putida P-450. Gotoh et al. [59] have attempted to align the cytochrome sequences, while helical and turn regions in the two rhodopsins can be compared by virtue of their seven-helical predicted structures. Table V displays the differences in lengths of the comparable exposed regions. The eukaryotic protein appears to show a lengthened exposed segment relative to the prokaryotic protein. There is also a correlation with the appearance of exons and the extent to which the eukaryotic spans are expanded. However, since there are only two examples, any inferences must be viewed with reservation. Functional role of exonic structures Though the structure-function relationships in the five membrane-bound proteins considered here are not well known, almost all of the primary sequence regions or specific residues thought to he functionally significant are near exon/exon junc-
tions. The supporting evidence for each protein will be given in the succeeding paragraphs. The rhodopsin chemistry, structure, and topography have been recently reviewed by Hargrave [60]. Phosphorylation of rhodopsin which is light dependent appears to play an important role in the molecule's function. Models have been proposed that correlate rhodopsin phosphorylation and closing of Ca 2+ channels opened as a result of bleaching. Another model relates phosphorylation to the regulation of rod outer cell segment cyclic nucleotide metabolism along with its relationship to visual transduction which involves the attachment of GTP-binding protein to bleached rhodopsin. The sites of phosphorylation are threonines and serines spanning the rhodopsin C-terminal residus 334 to 343 which are contained within the putatively exposed C-terminal exon initiating at residue 313. This exon also provides the longest contiguous region on the surface of the rhodopsin's suggested structure. Physical, chemical and kinetic studies on Paracoccus denitrificans cytochrome c oxidase complex
TABLE V T H E L E N G T H O F P R E D I C T E D EXPOSED S E G M E N T S W I T H I N T H E I D E N T I F I E D P R O T E I N S F R O M T H E I R N- TO C-TERMINI The successive helical regions are designated merely by ( a ) and the turn spans by (t). An asterisk (*) indicates and e x o n / e x o n junction within the eukaryotic surface segment while ( A t ) is the (eukaryotic-prokaryotic) length. The numbers in parentheses given for the P. putida P-450 show the shortening of the prokaryotic helical lengths according to the alignment of Gotoh et al. [59], which m a y suggest a different structure for this protein. Bovine rhodopsin(a)
H. halobilum rhodopsin (b)
36Nt
9Nt
a
a
14t a 20t
14t a 16t
a
a
13t a
At( a - b)
Rat P-450(c)
+ 24
2Nt a
6t
0
+7*
1Nt
At( c - d)
+1
a(0)
42t a
+4
P. putida P-450(d)
82t a
36t a(3) 67t
+6* + 15 **
a(10)
ot
28t
6t
a
a
22t a 9t
19t a 9t
•~
a
42Ct
25Ct
+ 22 * +3 0 + 17 *
9t
5t
a
a(8)
73t a 117t
62t a(10) 95t
a
a(0)
32Ct
34Ct
+4 + 11 ** + 22 *** - 2
293 which contains only two subunits suggest that the heme a3-Cua3 oxygen binding site is contained within subunit I [61,62]. Through the use of secondary structure prediction techniques, residue physical characteristics, and a comparison of conserved histidine sequences in the three mammalian cytochrome oxidase units with those spans in globin superfamilies, Welinder and Mikkelsen [63] propose histidines 233 and 376 as the respective distal (oxygen-binding) and proximal ligands to the protoporphyrin IX iron. In the S. cereoisiae (strain D273-10B) subunit I primary structure the comparable histidines are located at positions 232 and 375. These residues are close to exon boundaries (237 junction between exons 4 and 5 and the 374 junction for exons 5 and 6), implying that the exons may have been inserted to provide the heme binding functions. Welinder and Mikkelsen [63] also note that a comparable exon junction relative to the distal histidine exists in the globin structures. Though the structural and functional chemistry of cytochrome P-450 is far from delineated, several suggestions have been made for the important functional loci in the molecule. Through the use of secondary structure predictons, residue physical characteristics and amino acid sequence homologies amonst rat and rabbit liver microsomal cytochrome P-450 and camphor-hydroxylating P-450 of P. putida, Gotoh et al. [59] have proposed that cysteines-152 and -436 of the rat sequence are likely candidates for one of the ligands to the heme iron. They suggest further that neighboring residues (430 to 450 and 142 to 159) could provide the proper heme attachment pocket. Black et al. [64] also propose cysteine-152 for heme liganding by adding the sequence of liver microsomal isozyme 2 cytochrome P-450 from rabbit to the comparison statistics. Heinemann and Ozols [65] indicate that the largely hydrophilic region (residues 320 to 443) of the rabbit primary structure may bind the reductase, while Black and Coon [66], through an examination of rabbit reductase cleavage products, suggest a cluster of reductase basic residues (33 to 36) as the interaction re#on with the P-450 molecule. It is noteworthy that cysteine152, putatively near the heme, is surrounded by glutamate residues [64] which could attach to the reductase basic region. A similar model has been
proposed for bovine cytochrome b5 and its reductase [67]. Once again, exon boundaries are near the suggested functional residues. Exon 2 for the rat P-450 spans an exposed predicted turn region (112 to 162) and its C-terminal end corresponds well to residues 142 to 159 in the putative heme pocket environment around cysteine-152. This region also contains the glutamates possibly responsible for reductase recognition. Similarly the Nterminal position of exon 9, initiated at residue 431 and containing cysteine-436, spans residues 431 to 540, which compose the other possible heme-binding environment. The predicted signal helix constitutes the only transmembrane element in the P-450 exon 1. Primary structures for the four subunits (a, fl, ~, and 8) of fish acetylcholine receptors as well as the calf and human a-subunits have all been recently determined [14,57,68-70]. All the researchers propose a four-helical transmembrane model (excluding the signal region) for the highly homologous subunit sesqunces [57] and present some empirical evidence (e.g., proteolysis) for its correctness. Their model is in good agreement with the one presented here. Noda et al. [57] favor the first transmembrane segment (human a subunit residues 230 to 255) as the most highly specific and functional of the four helices due to the strong conservation of its residues (especially the three prolines and one cysteine) in all four subunits. Exon 6 of the human precursor is devoted to an exposed turn segment and this helix. All the researchers point to the functional significance of the conserved S-S bridge (cysteines 148 and 162 of the human a precursor) and the conserved glycosylated asparagine-161. These amino acids are contained within a putatively exposed exonic region spanning positions 115 to 180 in the human subunit. The acetylcholine receptor binding site has been suggested to be near the S-S region of exon 5 or near the main immunogenic region spanning residues 181 to 186 in the human protein [14,57]. The latter segment forms the N-terminus of exon 6. Thus, exons 5 and 6 may code for important functional regions of the receptor and possess differentiable tasks. A majority of the predicted signal peptide helix is contained in the first acetylcholine receptor exon. There appears to be insufficient knowledge regarding possible exposed sequence areas important
294 to the function of the photosynthetic cytochrome b. However, it is noteworthy that all the cytochrome b exon junctions occur within the two longest exposed predicted turn regions, which may point to their functional significance. Widger et al. [71] have recently proposed a cytochrome b model involving the binding of two hemes by four histidines contained in predicted transmembrane helices 2 and 4 of this work.
Discussion and evolutionary implications It might be argued that, since structural prediction methods based on only one reasonable well known membrane-bound structure (bacteriorhodopsin) are utilized, any inferences drawn from the predicted models are inavlid. However, it is noteworthy that the alogrithm delineates a sevenhelical model for bovine rhodopsin and a fourhelical structure for acetylcholine receptor subunit, both of which are supported by considerable empirical evidence. It might also be argued that the helical boundaries cannot be delineated with sufficient accuracy to examine their correlation with exon/exon sites. It is unlikely that the predicted helical termini could vary by more than five residue positions, which represents about 30% of the minimal required length for a hydrophobic helical span considered here. Of the 32 exon/exon sites, 17 (which are predicted to be in exposed regions) are removed from predicted helical termini by six or more residue positions. In seven further exon sites which are within five residues of a predicted helical boundary and one predicted to be in a loop, there are at least five charged (K, D, E, R) and polar (Q, N, T, S) amino acids contained within ten successive residue positions centered at the exon/exon position. In the final eight exon/exon cases, one is clearly predicted near a helical midpoint, while two are predicted within helices and are within six residues of a boundary; the remaining five exon sites are predicted to be outside of buried helices but within five residues of the boundary. Thus 75% (24 cases) of the exon/exon sites do not appear to be within hydrophobic spans, while nearly 16% (five cases) are close to a predicted helical terminus and yet not predicted within the helix. Since only 39% of the residue sample is predicted to be within the mem-
brane, the 75-91% prediction of exon/exon sites not contained within predicted buried helices is at least a factor of 2 greater than the random-expected value. There appears to be no a priori reason that predicted structural conformations and relative positions of exon/exon locations should display any relationship. Nonetheless, since only five proteins can be examined and prediction methods are used, the trends observed here should be viewed with some caution. The statistical observations made here on lipidburied proteins can be summarized by the following trends. (i) The splice junctions frequently occur in primary structural segments predicted not to be within the lipid bilayer. (ii) Most of the exon/exon boundaires are associated with the largest exposed segments, which may be important for protein function, especially as external recognition sites. (iii) Where functional oligopeptides or residues are known or suggested for the membrane-bound proteins, they are often in the primary structural proximity of splice junctions. (iv) It is possible that the exon interface residues are surface exposed as suggested by the composition of amino acids adjacent to the splice points. (v) The exon/exon junctions often map near the predicted transmembrane helical termini. (vi) Interconnecting loops wholly contained within an exon tend to be the shortest turn regions. (vii) There appears to be no consistent trend for exons to code for a particular number of helical structural units. (viii) For two protein types where eukaryotic and prokaryotic sequences are known, the eukaryotic linking loops are generally longer, and, where the exposed segments are longest relative to the prokaryotic spans, exon/exon boundaries are often found. (ix) The mean number of residues in exons coding for only exposed segments is near 40 while the average for exons coding for at least one transmembrane segment is over 80. At least three basic mechanisms and etiologies have been purported for the use of splice junction in eukaryotes [3]. The domain model [1,2] allows recombinant genetic events through intron mediation such that exons coding for functional and structural domains are shuffled and fused to create new structurally viable proteins that contribute to functional diversity and complexity in eukaryotic organisms. The intron intrusion model involves
295 disruption of a preexisting gene through addition of a coding exon flanked by noncoding introns resulting in an altered structure and function for the protein polypeptide. Craik et al. [3] have recently proposed the sliding junction model for protein length polymorphism in which a mutation alters a splice junction resulting in an extension of the original exon and an additional loop segment in the polypeptide chain. The domain hypothesis does not appear to be supported universally by the results presented here. It would be expected for the lipid-bound proteins that splice junctions would flank a natural structural unit such as a single transmembrane helix with adjacent turn segments or. perhaps a helical hairpin (two helices and a connecting loop) which is the building block suggested by Engelman and Stietz [72] from energetic considerations. Nearly half the exons do not code for a membrane-buried helix and the other half are distributed rather evenly over a one-, two- or three-helical content. Craik et al. [3] and Blake [7] list those soluble protein systems where the domain model appears consistent, but they also note several which are not. Exons in soluble proteins can be as short as seven amino acids [45]; a five-residue exon is found in yeast cytochrome b (Table III). Such lengths would considerably dilute the domain definition which typically requires an integral unit capable of globular folding and a specific function. Nonetheless, the domain hypothesis can explain single splice junctions contained within one sequentially contiguous exposed loop. The structures contained within exons delineated by such junctions consist of one to four transmembrane helices and could constitute a domain. Building genes with such entities would be particularly amenable for alterations in lipid-buried protein core functions as retinal bindng in rhodopsin or channel formation in acetylcholine receptor. Alternatively, any coincidence between splice junctions and domain boundaries might simply be fortuitous with their insertion into protein surface features being the overriding principle. It is possible that the domain exons were originally built from smaller units encoding for perhaps one transmembrane helix followed by self-excision of introns resulting in the presently observed exons as suggested by Blake [7].
The intron intrusion model finds considerable support from the present prediction results. Several of the exons are contained in the predicted exposed segments; the exon boundaries are near helical termini; the comparable spans for prokaryotes which do not presently avail themselves of splice junctions are shorter; the exons interfaces appear in the longer exposed extensions; and the greatest number of complete exons are found in the longest non-membrane segments, However, only seven of the exons in the predicted surface regions contain 5 to 38 residues which could fold into fully exposed structures while the remaining five exons are 47 to 81 amino acids in length. These latter peptides could fold as normal soluble proteins with inner hydrophobic cores and hydrophilic surfaces, making them susceptible to the intrusion model weakness discussed by Craik et al. [3], the major being the single splice junction in a single loop. The sliding junction model is supported by the several observations of single splice junctions within one helical interconnecting segment and the longer exons in the predicted surface segments. However, the model cannot explain the origin of the junction. Craik et al. [3,8] have observed that splice junctions map on the surfaces of soluble proteins whose exonic and tertiary architectures are known. The membrane-bound proteins appear similar in this feature given the few instances of exon/exon boundaries within predicted transmembrane regions. As also noted by Craik et al. [3], interruptions in the central core of a protein structure could be devastating to its structural integrity. Additions or alterations in the exposed areas of eukaryotic membrane-bound proteins would allow functional changes as well as increasing functional complexity such as interaction with external proteins. This is also supported by the observation of the proximity of exon junctions and residues suggested to be significant in the function of the lipid-bound proteins. The issue concerning the origins of splice junctions remains arcane. It is easiest to assume that prokaryotes never adopted splice junctions, though others [5-7] have proposed that prokaryotes possessed them and then lost them due to the necessity of nucleic acid economy and improved tRNA
296 reading mechanisms. It has also been suggested that introns m a y exist in the ribosomal R N A genes of archaebacteria [73]. Eukaryotes could have adopted splice junctions during or after their evolutionary split from prokaryotes; the specialization of transposable elements displayed by both sets of organisms could provide the vehicle resulting in an intron intrusion mechanism. However, since in some cases a single exon junction is contained within a single contiguous connecting loop region, the sliding junction model would also provide a m o d e for structural and therefore functional alteration. Perhaps, under the intrusion model, two such junctions did exist in a loop region but through genetic mutation, upon which the sliding hypothesis also relies, one of the junction pairs could have been eliminated. Since prokaryotes also possess complex proteins with sophisticated structure (e.g., the seven-helical bacteriorhodopsin comparable to bovine rhodopsin), the eukaryotes m a y have adopted the splice junctions to achieve functional complexity and diversity. The appearance of exon boundaries in the longer, exposed segments of lipid-bound proteins would provide an ideal vehicle for increasing functional attributies (e.g., recognition sites) without disturbing the structurally stable, multihelical t r a n s m e m b r a n e core inherited during very early evolution. Whatever the mechanism, it appears not amenable for preserving residue stretches with severe selection constraints as the 20-to-25 hydrophobic a m i n o acids required for t r a n s m e m b r a n e helices. The splice junction process of introducing additional residues is apparently a trial-and-error method involving r a n d o m sequences which are best integrated b y the exterior protein surfaces. The observation that one to five predicted helices contain junctions, of which there are 32 in total, suggest that core regions can be altered, albeit with lower probability of success.
Acknowledgements The authors are indebted to Dr. Abelardo M. Silva for helpful discussions. Preparation of the manuscript required the diligent aid of Ruth Rafferty and the word processing staff at Purdue University (Lucy Winchester and Marge Miles).
References 1 Gilbert, W. (1978) Nature (London) 271, 501 2 Blake, C.C.F. (1979) Nature (London) 277, 598 3 Craik, C.S., Runer, W.J. and Fletterick, R. (1983) Science 220, 1125-1129 4 Tiemeier, D.C., Tilghman, S.M., Polsky, F.I., Scidman, J.G., Leder, A., Edgell, M.H. and Leder, P. (1978) Cell 14, 237-245 5 Darnell, J.E., Jr. (1978) Science 202, 1257-1260 6 Doofittle, W.F. (1978) Nature (London) 272, 581-582 7 Blake, C.C.F. (1983) Nature (London) 306, 535-537 8 Craik, C.S., Sprang, S., Fletterick, R. and Rutter, W.J. (1982) Nature (London) 299, 180-182 9 Henderson, R. and Unwin, P.N.T. (1975) Nature (London) 257, 28-32 10 Michel, H., Oesterhelt, D. and Henderson, R. (1980) Proc. Natl. Acad. Sci. LISA 77, 338-342 11 Argos, P., Mohana Rao, J.K. and Hargrave, P.A. (1982) Eur. J. Biochem. 128, 565-575 12 Kyte, J. and Doolittle, R. (1982) J. Mol. Biol. 157, 105-132 13 Hargrave, P.A., McDowell, J.H., Curtis, D.R., Wang, J., Juszczak, E., Fong, S.L., Mohana Rao, J.K. and Argos, P. (1983) Biophys. Struct. Mech. 9, 235-244 14 Noda, M., Furutani, Y., Takahashi, H., Toyosato, M., Tanabe, T., Shimizu, S., Kikyotani, S., Kayano, T., Hiros¢, T., Inayama, S. and Numa, S. (1983) Nature (London) 305, 818-823 15 Mohana Rao, J.K., Hargrave, P.A. and Argos, P. (1983) FEBS Lett. 156, 165-169 16 Engelman, D.M., Henderson, R., McLachlan, A.D. and Wallace, B.A. (1980) Proc. Natl. Acad. Sci. USA 77, 2023-2027 17 Trewhella, J., Anderson, S., Fox, R., Gogol, E., Khan, S. Engelman, D. and Zaccai, G. (1983) Biophys. J. 42, 233-241 18 Nathans, J. and Hogness, D.S. (1983) Cell 34, 807-814 19 Mizukami, Y., Sogawa, K., Suwa, Y., Muramatsu, M. and Fujii-Kuriyama, Y. (1983) Proc. Natl. Acad. Sci. USA 80, 3958-3962 20 Bonitz, S.G., Coruzzi, G., Thalenfeld, B.E. and Tzagoloff, A.T. and Macino, G. (1980) J. Biol. Chem. 255, 11927-11941 21 Nobrega, F.G. and Tzagoloff, A. (1980) J. Biol. Chem. 255, 9828-9837 22 Fomwald, J.A., Kuncio, G., Peng, I., and Ordahl, C.P. (1982) Nucleic Acids Res. 10, 3861-3878 23 Fyrberg, E.A., Bond, B.J., Hershey, N.D., Mixter, K.S. and Davidson, N. (1981) Cell 24, 107-116 24 Shah, D.M., Hightower, R.C. and Meagher, R.B. (1983) J. Mol. Appl. Genet. 2, 111-126 25 Gallwitz, D. and Sures I. (1980) Proc. Natl. Acad. Sci. USA 77, 2546-2550 26 MacDonald, R.J., Crerar, M.M., Swain, W.F., Pictet, R.L., Thomas, G. and Rutter, W.J. (1980) Nature (London). 287, 117-122 27 Leicht, M., Long, G.L., Chandra, T., Kurachi, K., Kidd, V.J., Mace, M., Jr., Davie, E.W. and Woo, S.L.C. (1982) Nature (London) 297, 655-659 28 Quinto, C., Quiroga, M., Swain, W.F., Nikovits, W.C., Jr.,
297
29 30
31 32 33 34 35 36 37
38 39 40 41 42 43 44 45 46 47
48
49 50
Standring, D.N., Pictet, R.L., Valenzuela, P. and Rutter, W.J. (1982) Proc. Natl. Acad. Sci. USA 79 31-35 Jones, C.W. and Kafatos, F.C. (1980) Cell 22, 855-867 Nakanishi, S., Teranishi, Y., Watanabe, Y, Notake, N., Noda, M., Kakidani, H., Jingami, H. and Numa, S. (1981) Eur. J. Biochem. 115, 429-438 Crouse, G.D., Simonsen, C.C., McEwan, R.N. and Schimke, R.T. (1982) J. Biol. Chem. 257, 7887-7893 Perricaudet, M., Akusjarvi, (3., Virtanen, A. and Pettersson, U. (1979) Nature (London) 281, 694-696 Eiferman, F.A., Young, P.R., Scott, R.W. and Tilghman, S.M. (1981) Nature (London) 294, 713-718 Dodgson, J.B., McCune, K.C., Rusting, D.J., Krust, A. and Engel, J.D. (1981) Proc. Natl. Acad. Sci. USA 78, 5998-6002 DeNoto, F.M., Moore, D.D. and Goodman, H.M. (1981) Nucleic Acids Res. 9, 3719-3730 Barta, A., Richards, R.I., Baxter, J.D. and Shine, J. (1981) Proc. Natl. Acad. Sci. USA 78 4867-4871 Sakano, H., Rogers, J.H., Huppi, K., Brack, C., Traunecker, A.T., Maid, R., Wall, R. and Tonegawa, S. (1979) Nature (London) 277, 627-633 Yamawaki-Kataoka, Y., Miyata, T. and Honjo, T. (1981) Nuclei Acids Res. 9, 1365-1381 Kawakami, T., Takahashi, N. and Honjo, T. (1980) Nucleic Acids Res. 8, 3933-3945 Early, P., Hnang, H., Davis, M., Calame, K. and Hood, L. (1980) Cell 19, 981-992 Ulrich, A., Dull, T.J., Gray, A., Brosius, J. and Sures, I. (1980) Science 209, 612-615 Jung, A., Sippel, A.E., Grez, M. and Schutz, G. (1980) Proc. Natl. Acad. Sci. USA 77, 5759-5763 Durnam, D.M., Perrin, F., (3annon, F. and Palmiter, R.D. (1980) Proc. Natl. Acad. SCi. USA 77, 6511-6515 Heitig, R., Perrin, F., (3annon, F., Mandel, J.L. and Chambon, P. (1980) Cell 20, 625-637 Stein, J.P., Catterall, C.F. Kfisto, P., Means, A.R. and (3'Malley, B.W. (1980) Cell 21, 681-687 Lomedico, P., Rosenthal, N., Efstratiadis, A.E., Gilbert, W., Kolodner, R. and Tizard, R. (1979) Cell 18, 545-558 Perler, F., Efstratiadis, A.E., Lomedico, P., Gilbert, W., Kolodner, R. and Dodgson, J. (1980) Cell 20, 555-566 (1980) Dickson, L.A., Ninomiya, Y., Bernard, M.P., Pesciotta, D.M., Parsons, J., Green, G., Eikenberry, E.F., deCrombrugghe, B., Vogeli, G., Pastan, I., Fietzek, P.P. and Olsen, B.R. (1981) J. Biol. Chem. 256, 8407-8415 Wozney, J., Hanakan, D., Tate, V., Boecltker, H. and Doty, P. (1981) Nature (London) 294, 129-135 Gubbins, E.J., Maurer, R.A., Lagrimini, M., Erwin, C.R. and Donelson, J.E. (1980) J. Biol. Chem. 255, 8655-8662
51 Chu, G. and Sharp, P.A. (1981) Nature (London) 289, 378-382 52 Steinmetz, M., Moore, K.W., Frelinger, J.(3., Sher, B.T. and Shcn, F.W. (1981) Cell 25, 683-692 53 Stubbs, G.W., Smith, H.G., Boys¢, E.A., Hood, L. Jr. and Litman, B.J. (1976) Biochim. Biophys. Acta 426, 46-56 54 Michel-Villaz, J., Saibil, H.R., and Chabrc, M. (1979) Proc. Natl. Acad. Sci. USA 76, 4405-4408 55 Rothschild, K.J., Sanches, R., Hsiao, T.L. and Clark, N.A. (1980) Biophys. J. 31, 53-64 56 Saibil, H., Chabrc, M. and Worcester, D. (1976) Nature (London) 262, 266-270 57 Noda, M., Takahashi, H., Tanabe, T., Toyosato, M., Kikyotani, S., Furutani, Y., Hirose, T., Takashima, H., Inayama, S., Miyata, T. and Numa, S. (1983) Nature (London) 302, 528-532 58 Levitt, M. (1978) Biochemistry 17, 4277-4285 59 (3otoh, O., Tagashira, Y., Iizuka, T. and Fujii-Kuriyama,Y. (1983) J. Biochem. 93, 807-817 60 Hargrave, P.A. (1982) Prog. Retinal Res. 1, 1-51 61 Ludwig, B. (1980) Biochim. Biophys. Acta 594, 177-189 62 Sotioz, M., Carafoti, E. and Ludwig, B. (1982) J. Biol. Chem. 257, 1579-1582 63 Wetinder, K.G. and Mikkeisen, L. (1983) FEBS Lett. 157, 233-239 64 Black, S.D., Tarr, G.E. and Coon, M.J. (1982) J. Biol. Chem. 257, 14616-14619 65 Heinemann, F.S. and Ozols, J. (1983) J. Biol. Chem. 258, 4195-4201 66 Black, S.D. and Coon, M.J. (1982) J. Biol. Chem. 257, 5929-5938 67 Mathcws, F.S., Levine, M.R. and Argos, P. (1979) In The Porphyrins (Dolphin, D., ¢d.), Vol. VII, Part B, pp. 107-147, Academic Press, New York 68 Noda, M., Takahashi, H., Tanabe, T., Toyosato, M., Furutani, Y., Hirose, T., Asai, M., Inayama, S., Miyata, T. and Numa, S. (1982) Nature (London) 299, 793-797 69 Devillers-Thiery, A., G-iraudat, J., Bentaboulet, M. and Changeux, J.P. (1983) Proc. Natl. Acad. Sci. USA 80, 2067-2071 70 Claudio, T., Ballivet, M., Patrick, J. and Heincmann, S. (1983) Proc. Natl. Acad. Sci. USA 80, 1111-1115 71 Widger, W.R., Cramer, W.A., Herrmann, R. and Trebst, A. (1984) Proc. Natl. Acad. Sci. USA 81, 674-678 72 Engelman, D.M. and Steitz, T.A. (1981) Cell 23, 411-422 73 Kaine, B.P., (3upta, 1L and Woes¢, C.R. (1983) Proc. Natl. Acad. Sci. USA 80, 3309-3312 74 Khorana, H.(3., Gerber, G.E., Herlihy, W.C., Gray, C.P., Anderegg, R.J., Nihei, K. and Biemann, K. (1979) Proc. Natl. Acad. Sci. USA 76, 5046-5050