ARCHIVESOF BIOCHEMISTRYAND BIOPHYSICS Vol. 210, No. 2, September, pp. 6334X2,1981
Comparison
of the Primary Structure of the Acidic Polypeptides of Glycinin’
M. A. MOREIRA,*t2 M. A. HERMODSON,? B. A. LARKINS,*
AND
N. C. NIELSEN*3
*United States Department of Agriculture, Science and Education Administration, and the Departments of *Botany and Plant Pathology, ~Biochemist?y, and SAgronomy, Purdue University, West Ldagette, Indiana 47907 Received October 28, 1980 Some essential features of the primary structures of five acidic polypeptides from the major 11 S soybean storage protein were studied. Each purified polypeptide was cleaved at methionine using cyanogen bromide, and then each resulting fragment was purified. A comparison of the NH*-terminal amino acid sequences of each fragment, coupled with an identification of both the carboxyl and amino terminal fragments, permitted ordering of the fragments along the polypeptides. It was found that the five acidic polypeptides were synthesized at the direction of a family of homologous genes. Evidence was also obtained which suggested that there were repeated domains of amino acid sequence within each of the five molecules.
The storage proteins are an important group of plant proteins which provide nitrogen and amino acids for developing seedlings. Legume seed proteins are of special interest because they occur in large amounts and constitute a major source of protein for both humans and other animals. Recent research on these proteins has emphasized improving their nutritional quality. The major nutritional limitation of legume seed proteins is directly related to their low levels of sulfur-con-
taining amino acids, methionine and cysteine. There are two obvious approaches to altering the sulfur amino acid content of the legume seed proteins. One is through classical breeding techniques, and the other is through utilization of recent developments in genetic engineering. Efforts to improve nutritional quality using either approach will be facilitated by careful biochemical characterization of the structure of these storage proteins and the genes directing their synthesis. Two globulins account for about 70% of the total seed protein in selected soybean 1 Cooperative research of the USDA-SEA and the Purdue Agricultural Experiment Station. This is cultivars (1). They have been given the trivial names glycinin (M, N 350,000) and Journal No. 8267 from the Purdue Agricultural ExP-conglycinin (Mr N 140,000). Of these two periment Station. Mention of a trademark of proprietary product does not constitute a guarantee or globulins, glycinin contains the highest warranty of the product by either the USDA or Purlevel of sulfur amino acids (2). Reports due University from a number of laboratories have also 2 Recipient of a predoctoral scholarship from Coor- established that glycinin contains about denacao do Aperfeicomento de Pessoal de Nivel Su12 polypeptides, half of which have acidic perior (CAPES) of Brazil. Present address Departisoelectric points (M, N 37,000~44,000)and mento de Quimica, Instituto do Ciencias Exatas e half of which are basic (J&N 17,000Tecnologicas, Universidade Federal de Vicosa, 36.750 22,000) (3,4). However, the precise number Vicosa-Minas Gerais, Brazil. of different acidic and basic subunits that 3 Supported by the USDA Competitive Grants Proare found in glycinin remains to be estabgram (Grant 5901-0510-8-0020-0) and the American Soybean Association Research Foundation. lished. 633
0003-9861/81/100633-10$02.00/O Copyright AII rights
Q 1981 by Academic Press. Inc of reproductmn in any form reserved.
634
MOREIRA
In an effort to rigorously define the structure of the glycinin molecule, we have purified each of its component polypeptides. An earlier report (5) described the amino acid composition and NHz-terminal sequence analysis of six acidic and four basic polypeptides (see Fig. 2 for summary of NHz-terminal sequences). This communication extends those earlier observations, and describes essential features of the internal primary structure for five acidic proteins. MATERIALS
AND METHODS
Chemacal cleavage at methumine resufues. The acidic and basic polypeptides of glycinin were purified from the soybean cultivar CX635-l-l-l (G&cane moz (L.) Merr.) as previously described (5). Of each polypeptide, 20 to 40 mg was individually dissolved in 2 ml of 70% aqueous formic acid. Cleavage of the purified polypeptides at methionine was accomplished by treating each of the above mixtures with cyanogen bromide at a K&fold molar excess over methionine (6). The reaction was allowed to proceed in the dark for 18 h at room temperature, and then the mixture was diluted with water and lyophilized. Cleavage at tr.yptophan residues. Cleavage at tryptophan was accomplished using o-iodosobenzoic acid as described by Mahoney and Hermodson (7). Forty milligrams of reduced and alkylated polypeptide were dissolved in 4 ml of 80% acetic acid and 4 M guanidine HCl which contained 80 mg of freshly dissolved o-iodosobenzoic acid. The cleavage reaction proceeded in the dark at room temperature for 24 h, and then the mixture was dialyzed in Spectrapor No. 3 membrane (Spectrum Medical Industries, Inc., Los Angeles, Calif.) tubing against 10% acetic acid and lyophilized. Isolataon and characterization of thx cyanogen bra mide fvwents. Nearly all of the cyanogen bromide fragments of the acidic polypeptides were purified by gel filtration using 2.5 X lOO-cm columns of Sephadex G-75 that were equilibrated with 9% formic acid. In the cases of Ai., AZ, As, and Al, the first peaks of protein emerging were due to uncleaved polypeptides and overlap fragments which were in the excluded volume. Polypeptide Aib cleaved essentially completely and lacked the excluded fraction. For polypeptide Ax, the last peak to emerge from the column contained two fragments, AxF2 and A2F3. They were separated using a DEAE-Sephadex A50 column (1.0 X 20 cm) that had been equilibrated with 4 M urea in 0.1 M Tris-HCl, pH 8.5. The column was developed with a linear gradient between 0 and 0.2 M of sodium chloride in the same buffer to separate the fragments.
ET AL. SDS4-gel electrophoresis was performed on slab gels using a modification of the Laemmli system (8, 9). For analyses of the peptide fragments, running gels of 25% a&amide (15O:l by weight, acrylamide:bisacrylamide) were used. NH*-terminal sequence analyses of the fragments were carried out using a Beckman 890-C sequencer as described by Hermodson et al. (10, 11). Identification of the amino acid phenylthiohydantoins was accomplished by high-pressure liquid chromatography using a modification of the procedure of Zimmerman et al. (12). Occasionally, spot tests were used to confirm identification of arginine and histidine (11). The quantities of the phenylthiohydantoins were determined at selected residues during the degradations. The early cycles produced at least 70% of the expected amounts of the amino acid derivatives based on the weight of lyophilized polypeptide. The stepwise yields of the degradations were greater than 90%. No identification was made unless the peak to background ratio exceeded 3 and the quantity of amino acid was consistent with those of preceding cycles. No degradation was used where any detectable contaminating sequence was observable above 10% of the yields of the predominant sequence unless specifically noted in the results. Where the above conditions were not met, but where a low yield of an amino acid was consistently observed with no other amino acid appearing in the same cycle, tentative identifications were made. In these cases, they are indicated by parentheses in the figures. RESULTS
Structural Analysis of the Cganogen Bromide Fragments from the Acidic Polypeptides The five acidic polypeptides of glycinin (Al,, An,, Az, A, and A4) were compared by cleaving each of them with cyanogen bromide, and then purifying and characterizing the resulting fragments using SDS-gel electrophoresis, amino acid analysis and NHz-terminal sequence analysis. A summary of the results of this work is shown in Fig. 1. Each unique fragment of each polypeptide has been numbered consecutively (F,, Fz, F3, etc.) beginning at its NH2 terminus. The alignment of the sequences has been shifted horizontally with respect to one another in order to obtain 4 Abbreviations used: SDS, sodium dodecyl sulfate; IEF, isoelectric focusing; PTH, phenylthiohydantoin.
GLYCININ
_----------
-----
---l’.F~WS++,J Ah
----a-
_
4Fc
POLYPEPTIDE
PRIMARY
-------------_----
635
STRUCTURES
-----,,
‘ ------
COD”
*,
~II,~~~~H~~O~~~X~~~I~G-*~~~‘~~---------,~------CDQ~ Y..~o_O~,~Wt,
~~~*!5’?*~~~,~“s~ --
ACIDIC
Y..‘.lloo
;;;;;;;;:;;;;;;
_____
Ai.
_*n~~.II*H)O-_--_-----
;;;;;;;~~~;~;;;x;~rr-*LhV’rlPWO .--g,,I
--------
+wr)
-,,-----
“NNIDT,Y”*“sI
-.CDO”
*,b
I+A~ft*~‘~--ccaw
A,
-*~~rhul’“loo--_.-------~,--.----too*
FIG. 1. Comparison of the primary structures from the NH*-terminal sequence analysis of their letter nomenclature identifies each amino acid end of the polypeptides were identified by their
maximum sequence identity. Points of homology between adjacent sequences in the figure have been denoted by dots between them. Polypeptides AI, and An, were nearly identical and each gave three unique cy-
.,
of six acidic polypeptides of glycinin as deduced cyanogen bromide fragments. The standard oneFragments originating from the COOH-terminal lack of homoserine (HS).
anogen bromide fragments. One fragment from each had a M, 2: 11,666 and gave the same NH2-terminal amino acid sequence as the respective intact molecule. The two remaining fragments from each polypeptide had apparent molecular weights of
TABLE
I
AMINO ACID COMPOSITION OF THE CYANOCEN BROMIDE FRAGMENTS OF THE AI. POLYPEPTIDE Amino acid ASX Thr Ser Hse Glx Pro GUY Ala Val Ile LeU 5r
Phe Lus Am His Met CYS
AIRI 12.8 3.8 5.9 1.0 19.8 7.8 8.9 5.0 1.5 7.0 7.5 2.5 3.4 5.1 7.8 0
&.Fz 5.7 3.3 4.9 1.7 17.9 9.5 7.2 3.5 3.3 4.7 2.0 3.6 3.8 2.0 7.2 1.7
Ads 13.4 4.3 8.3 0 49.9 9.0 17.3 6.0 4.3 5.2 9.3 3.1 6.6 15.4 7.8 4.8
Total
Al.
31.9 11.4 19.1 2.7 86.7 26.3 33.4 14.5 9.1 16.9 18.8 9.2 13.8 22.5 22.8 6.5
36.8 12.0 18.3 85.3 24.0 31.0 14.4 11.9 17.6 20.1 7.3 12.2 21.2 18.1 6.0 3.6 4.5
Note. Duplicate samples of each cyanogen bromide fragment were hydrolyzed in 6 N HCl at 110°C for 24 h and the amino acids were determined using standard techniques. The number of residues per fragment were calculated assuming an average molecular weight of 110 for each amino acid, and using the apparent molecular weight obtained from SDS-gel electrophoresis. Number of amino acids per fragment: Ai.Fr, 100; AI.F2, 82; and AI.Fs, 164.
MOREIRA TABLE
ET AL. II
AMINO ACID COMPOSITION OF THE CYANOGEN BROMIDE FRAGMENTS OF THE An POLYPEFTIDE Amino acid Asx Thr Ser Hse Glx Pro GUY Ala Vai Ile IA?U
5r Phe LYS Arg His Met PECys
AlbF,
bbF2
Ads
Total
An 34.8 12.4 19.0
10.3 3.4 5.0 1.0 15.0 6.2 7.3 5.2 1.5 5.4 5.6 2.2 4.8 2.7 7.0 0
5.0 3.8 5.1 1.6 15.5 8.0 7.3 3.9 2.2 4.3 2.8 3.0 5.8 3.5 6.0 3.1
13.0 5.3 9.2 0 51.6 10.3 14.3 7.1 5.6 5.2 9.6 4.0 7.0 10.0 9.3 2.6
28.3 12.5 19.3 2.6 82.1 24.5 28.9 16.2 9.3 14.9 18.0 9.2 17.6 16.2 22.3 5.7
2.2
1.1
1.0
4.3
85.3 25.0 27.3 15.9 12.4 16.6 18.7 8.7 17.9 15.9 21.2 4.8 4.1
Note. Duplicate samples of each cyanogen bromide fragment were analyzed as described in Table I. Number of amino acids per fragment’ AlbF1, 100, AlbFZI 82; and AlbFSr 164.
18,000 and 8000, respectively. Since the largest of the two fragments lacked homoserine and the smaller one had homoserine (Tables I and II), the larger one originated from the COOH-terminal region of the molecule and the smaller one was from its central portion. While the cyanogen bromide cleavage patterns of both Ai, and An, were identical, the two polypeptides could clearly be distinguished from one another based on amino acid substitutions found in both the NHzterminal and central fragments. Polypeptide A2 yielded four unique cyanogen bromide fragments. One fragment (A2F1) had the same NHz-terminal amino acid sequence as the intact molecule (5), and had an apparent molecular weight equal to that of the NHz-terminal fragments of Al, and Alb (Mr N 11,000). The second cyanogen bromide fragment, A2F2, was both homologous with and equal in size to the central fragments of the A1 subfamily (M, 1: 9,000). The two remaining A2 fragments together accounted for an apparent molecular weight of about
18,000, which was equal in size to the COOH-terminal fragments of Ai, and An,. The smaller fragment, A2F3, was homologous to the COOH-terminal fragments of the Al subfamily whereas AzF4 exhibited no homology with any other fragment sequenced. It was concluded that A2F4 originated from cleavage at a methionine located toward the COOH terminal of A2 which was not present in the A1 subfamily of polypeptides. Fragment AzF4 presented difficulties during its analysis. It migrated with M, N 13,900 on SDS-gel electrophoresis, but a molecular weight of about 8000 was estimated from its amino acid composition (Table III). Since AzF, was more insoluble than the other fragments encountered during this study and had a tendency to aggregate, it may have been incompletely denatured by SDS. This could have contributed to its size overestimation by SDSgel electrophoresis. In addition, two major NHz-terminal sequences were obtained for A2F4 which together accounted for about 80% of the yield of phenylthiohydantoins
GLYCININ
ACIDIC
POLYPEPTIDE
PRIMARY
637
STRUCTURES
ond fragment (A3F2) obtained from A3 was homologous with the central fragments from Al,, Ai,,, and AZ, although the positions of homology between A3Fz and the corresponding parts of the other polypeptides began 14 amino acids into A3F2 from its NH2 terminus. The 13 amino acid sequence at the beginning of A3F2 accounted for the larger size of the central fragment (Mr 1: 13,500) and the smaller size of the NHz-terminal fragment of A3 as compared to Al,, Ai,,, and AZ. Both A3F1 and A3Fz contained homoserine (Table IV). The third fragment from As(AsF3) contained no homoserine, yielded a single band on SDSgel electrophoresis and was blocked to NHz-terminal sequence analysis. Fragment A3F3 therefore originated from the COOH terminal of As. It was likely that the blocked NH2 terminal was pyroglutamate, since all glycinin polypeptides contain large amounts of glutamine (5). Two unique cyanogen bromide fragments were recovered from Ad. Fragment
expected at each cycle of the analysis. One sequence was YNNEDTVVAVSII, while the second was XYNNEDTVVAVSII. The second sequence was identical to the first except that it was displaced one residue to the right and began with a derivative which did not correspond to any of our standards. The second sequence may be an overlap fragment derived from a methionine-methionine sequence in AZ, in which case the first derivative would be PTHhomoserine. This interpretation was consistent with the presence of homoserine in AzF4 (Table III). Three unique cyanogen bromide fragments were recovered from A* Fragment A3F1 had the same NHz-terminal amino acid sequence as the intact molecule (5) and was homologous with both the Ai subfamily and Al. However, A3F1 had a smaller apparent molecular weight (M, N 9000) than the corresponding NHz-terminal fragments of the other acidic polypeptides of glycinin (ll& N 11,000). A secTABLE
III
AMINO ACID COMPOSITIONOFTHECYANOGENBROMIDEFRAGMENTSOFTHEA~POLYPEPTIDE Amino acid Asx Thr Ser Hse Glx Pro GUY Ala Val Ile Leu ‘br Phe LYS Arg His Met CYS PECys
@I
AzFz
11.0 3.3 3.6 0.9 18.0 6.5 8.4 5.4 1.7 5.2 6.3 2.0 30 3.9 6.9 0
5.2 3.8 4.3 2.0 16.9 7.1 8.0 4.5 5.2 2.9 2.6 1.6 2.6 2.0 9.4 2.3
2.4
1.1
AzF,
Total
6.2 0 5.6 1.2 28.3 2.8 11.6 4.7 1.5 1.4 5.9 2.8 7.7 6.0 3.3 0
14.3 4.0 4.6 1.2 15.2 3.8 3.9 3.2 4.0 2.5 5.1 2.9 2.0 2.0 3.0 0.7
36.7 11.1 18.1 5.3 78.4 20.2 31.9 17.8 12.4 12.0 19.9 9.3 15.3 13.9 22.6 3.0
0
0
3.5
A2
42.1 12.3 16.4 86.4 21.3 29.2 18.1 15.3 15.3 20.0 6.6 12.3 14.9 22.7 2.6 5.8 4.3
Note. Duplicate samples of each eyanogen bromide fragment were analyzed as described in Table I. Number of amino acids per fragment: AeFi, 100, A2F2, 82; AaF,, 90; and A2Fl (estimated from amino acid analysis), 71.
638
MOREIRA TABLE
ET AL. IV
AMINO ACID COMPOSITION OF THE CYANOCEN BROMIDE FRAGMENTS OF THE Aa POLYPEPTIDE Amino acid Asx Thr Ser Hse Glx Pro GUY Ala Val Ile LeU 5r
Phe LYS Arg His Met
@‘I
AaFz
AsFa
Total
10.0 4.6 8.0 1.0 13.8 6.8 5.3 2.6 3.7 2.4 9.4 2.4 1.3 2.7 4.1 3.7
15.7 5.7 8.0 0.9 22.7 12.1 10.3 4.7 7.5 5.9 6.9 2.8 6.0 3.9 6.2 3.5
20.5 6.6 13.1 0 50.3 12.4 15.4 3.7 4.9 2.6 7.2 3.3 4.7 9.3 13.5 8.0
46.2 16.9 29.1 1.9 86.8 31.3 31.0 11.0 16.1 10.9 23.5 8.5 12.0 15.9 23.8 15.2
45.5 15.5 27.1 91.6 33.9 29.5 10.9 17.4 12.2 21.8 5.6 12.0 14.8 22.2 14.1 2.4
Note. Duplicate samples of each cyanogen bromide fragment were analyzed as described in Table I. Number of amino acids per fragment: AaFi, 82; A,F, 123; and A3Fs, 196.
A4F1 had the same NH2 terminal as intact A4 (5) and was about the same size as the NH,-terminal fragments from Al,, An,, and Az. The second fragment, A4F2, was blocked to sequence analysis, but had an apparent molecular weight of about 28,000, which with the NHz-terminal fragment accounts for the entire A4 molecular weight. Two sets of data suggested that the fragments recovered after cyanogen bromide treatment of each of the acidic polypeptides accounted for most of the length of their respective proteins. First, the sum Cba”.W
hphd. .*‘.l”T, .&KS)
T~l”tOIOOYICOI~XL**LIIDW--------~, . . . . . I”,GC~IT~lt.0010l,W~00,1,--------------Y.1
AJ, T&,2)
BOAI
I I”“OG(C,G~Gr.r.G,C,,(rrt------. . . . . . . . . . . )““OIIGIDGr*rP--------------llp
e4
.
-----Y.3
FIG. 2. Evidence for internal repeated sequences in the acidic polypeptides. Points of homology are denoted by an asterisk between sequences. TAa(2) refers to a tryptophan fragment of Aa generated upon cleavage with o-iodosobenzoic acid. It was the second fragment emerging from a Sephadex G-75 column equilibrated with 9% formic acid.
of apparent molecular weights of the fragments closely approximated the sizes of the intact molecules (Fig. 2). Second, an amino acid analysis was conducted on each of the purified fragments, and the number of residues of each amino acid in each of them was calculated (Tables I to IV). The sums of the various residues in all of the fragments from each individual polypeptide were nearly the same as the values calculated from a similar analysis of the intact molecules. Discrepancies between the expected and calculated totals rarely exceeded one to two residues, which was within the error of these determinations. Thus, if additional parts of the acidic polypeptides existed, they were small fragments which went undetected during purification of the major fragments. Repeated Sequences in the Acidic Polypeptides Repeated sequences were found within Al,, An,, AZ, and As. As summarized in Fig. 2, one such repeat centered around the pentapeptide EQPQQ, which began 5 res-
GLYCININ ACIDIC POLYPEPTIDE
idues from the NH2 terminus and 11 residues into the central fragments of Ai, (Fig. 2). Similar regions were also evident in Alb and AZ (Fig. 1). Another more extensive region of internal homology was found in AS. As seen in Fig. 2, fragment TA3(2), which was obtained following cleavage of A2 at tryptophan, had homology to A3F3 at 11 of 15 overlapping residues. Sufficient differences between the two fragments existed, however, to eliminate the possibility that they both originated from the same part of the molecule. of Acidic and Basic Polypeptides in Glycinin
Stoichiometry
Knowledge of the amino acid sequence of each of the major polypeptides contained in glycinin permitted identification of specific cycles in the sequence characteristic of individual polypeptides which could be quantitated to determine the stoichiometry between them. To define the ratios between the acidic polypeptides, the phenylthiohydantoins generated during the first and third cycles of the sequence were analyzed as follows (see Fig. 2). In the first cycle, phenylalanine reflected Ai, plus Al,,, leucine reflected Az, and isoleutine reflected A3 plus F2(2). In the third cycle, glycine reflected Ad, and phenylalanine estimated Alb. The relative amount of Al, was obtained by subtracting the phenylalanine content of the third cycle from that obtained in cycle 1. The data obtained were consistent with a ratio of 1 Al,:1 Az:1(A3 + Fe(2)):1/4 A,:1/8 An,. Since earlier studies indicated that F2(2) was a minor component of glycinin (5); the isoleucine content in cycle 1 was considered to be contributed mainly by At. To define the ratios between the basic polypeptides, the phenylthiohydantoins in the second sequencer cycle were quantitated. Isoleucine reflected B1 + &, and valine was due to B3 + B4. A ratio of 2(B1 + l3.J: l(Bs + BJ was found. Finally, using both these data and those for quantitating the acidic polypeptides, a 1:l ratio between acidic and basic polypeptides was obtained as expected (2).
PRIMARY STRUCTURES
639
FIG. 3. Isoelectric focusing analysis of purified glycinin polypeptides. The gel contained 7.5% acrylamide (acrylamide:bisacrylamide, 759, 3.7% LKB ampholines, pH 3.5-10.0 and 6 M urea. Samples were mounted on filter paper strips lying toward the cathode and focused 6 h at 0°C and 800 V. Each sample contained 30 pg of protein in 0.01 M phosphate (pH 7.0) containing 6 M urea.
IEF Anlaysis of the Glycinin Polypeptides The purified acidic and basic polypeptides exhibited considerable charge heterogeneity when subjected to isoelectric focusing under denaturing conditions in gels prepared using ampholytes in the pH range 3.5 to 10.0 (Fig. 3). Three, four, three, and two major bands, along with several minor ones, were found in the case of Ala, AZ, Al and Ad, respectively. Polypeptides B1 and & had similar IEF patterns, with each generating two major bands and several minor ones. Polypeptides B3 and B4 were strongly basic, and did not focus well in the pH range used (3.5-10.0). However, at least two unresolved bands could be identified in both cases. DISCUSSION
An earlier report from this laboratory described the purification of six acidic and four basic polypeptides from glycinin, and presented the NHz-terminal sequences for each of them (5). The present study has extended those observations to include a substantial amount of internal sequences for five acidic subunits. These studies provide a description of the essential features of the glycinin polypeptides and define the relationships between the molecules un-
640
MOREIRA
ambiguously. The data has permitted identification of mutants having altered glycinin polypeptides (Nielsen, unpublished observations). Genetic studies are underway to identify linkage relationships between the various polypeptides, as well as their linkage to genetic markers in the soybean genome. Comparison of the primary structures of the five acidic polypeptides revealed that there was considerable homology among these functionally related molecules, indicating they had been synthesized at the direction of a gene family. It has been presumed that such gene families originated as duplications of a common ancestral gene, although the subsequent evolutionary history of the duplicated genes may vary considerably. In cases such as those represented by the hemoglobin (13) and immunoglobulin (14) multigene families, the genes have remained clustered together on one or two chromosomes. In other cases, however, the genes are dispersed throughout the genome to varying extents (15). Comparable data distinguishing between these two possible extremes in gene organization for glycinin are not available. Sequence analysis of the fragments of Al, and Alb showed that they were very closely related. Polypeptide A2 differed from the A1 subfamily in having at least one extra methionine located toward the COOH-terminal end of the molecule. A2 was, however, more similar to the Al subfamily than it was to A3 and Ad. Based on the available sequence data, it was difficult to accurately gauge the extent of homology that A3 and A4 had with Al, and AZ. Both A3 and A4 contained less methionine than the other three polypeptides, so fewer cyanogen bromide fragments could be generated and correspondingly less total sequence obtained. To compound that difficulty, A3 and A4 each had a fragment that was blocked to Edman degradation, presumably because of NH2-terminal pyroglutamate. Despite the reduced amount of sequence information available for A3 and Aq, several features of each of these polypeptides bear mentioning that justify including them in the same gene family as Al,, Alb, and AZ. Nearly 30% of
ET AL.
the amino acids in the NHz-terminal sequence of A4 were homologous with those of the A1 subfamily, and of the remaining nonhomologous amino acids, many could be explained by single base changes in condons specifying them. Further, the single methionine found in A4 was located at about the same distance from its NH2 terminus as the first methionine from the NH2 terminals of Al,, Al,,, and AZ. Polypeptide A3 was about 5000 M, larger than the other acidic polypeptides (5). Cyanogen bromide treatment of A3 caused different sized fragments to be generated, such that the NHz-terminal fragment of A3 was smaller than the ones from Al,, Al,,, AZ, and Ad, while the central fragment of A3 was larger than the others. While these fragments were of different size, substantial sequence homology was evident between the homologous fragments. The data suggest that A3 underwent structural rearrangement during its evolutionary history. Amino acid analysis predicted five, five, seven, three, and two fragments for Al,, Alb, Az, AS, and Ad, respectively (5). In the case of A3 and Ad, the expected number of fragments was recovered. However, only three of the five expected fragments were obtained for Al, and Alb and only four of the seven expected were purified in the case of Az. Several explanations could be advanced to rationalize the low recovery of cyanogen bromide fragments for the A1 subfamily and Az. Small fragments could have been produced which went undetected during purification of the major fragments. Alternatively, both methionine-serine and methionine-threonine bonds have been reported to be resistant to cleavage with cyanogen bromide (16, 17). In this event, certain cyanogen bromide fragments could contain more homoserine plus homoserine lactone than expected from a single COOH-terminal residue. Interestingly, AlaF2, An,Fz, and AzFz were consistently found to have homoserine plus la&one contents approaching two residues per fragment, but only a single sequence (see Tables I, II, and III, respectively). In addition, the fragment assigned to the COOH-terminal region of A2 contained homoserine. As pointed out
GLYCININ
ACIDIC
POLYPEPTIDE
earlier, this could have been due to a methionine-methionine linkage in A% However, these data must be interpreted cautiously since homoserine emerged from the ion-exchange column for amino acid analysis as a small peak on the leading edge of the large one due to glutamate, and this could easily have resulted in an overestimation of homoserine content. Charge heterogeneity was observed during characterization of the polypeptides from glycinin. As shown in Fig. 3, two or more major charged species of each purified subunit were detected by analytical IEF despite the fact that only one major phenylthiohydantoin was released at each cycle in the sequence. Under our analytical criteria, we would have detected minor sequences present at 10% or more of the major sequence. Since we did not, it is conceivable that the charge heterogeneity was due to point mutations in structural genes that encoded a family of closely related molecules which were not separated by our chromatographic techniques. This would suggest that heterogeneity will become evident in the primary structure of these molecules as more sequence data become available. Another possibility for the apparent heterogeneity seen in Fig. 3 would be polymerization accompanied by charge burial since these peptides do associate noncovalently in their native state and are at their isoelectric point. A third explanation for the charge heterogeneity is that there had been partial deamination of either asparagine or glutamine residues in the polypeptides. This would not be unlikely since both the acidic and basic polypeptides of glycinin had a high content of these two amino acids (5). Croy et al. (22) have asserted that the heterogeneity among the 11 S polypeptides from pea was the result of post-translational modification rather than different structural genes. The heterogeneity was suggested to arise when a 60,000 M, precursor was cleaved to yield the 40,000 M, acidic and 20,000 M, basic polypeptides associated with legumin. We have also observed a 60,000 M, precursor of glycinin (Turner et al, submitted for publication), but cannot explain the substantial differences in the primary structures of its var-
PRIMARY
STRUCTURES
641
ious acidic and basic molecules (cf. Fig. 2 and Ref. (5)) on the basis of post-translational modification. Were only posttranslational modification involved, heterogeneity would be observed at the ends of the polypeptides and not internally. The amino acid substitutions found in both the NHz-terminal and central fragments of Al,, Au,, and Az (Fig. 2) show that the proteins are the products of different structural genes. Further substantial differences in the number and position of methionines in the acidic polypeptides studied clearly indicate that they are all the products of different structural genes. Kitamura and Shibasaki (18) also purified and studied the acidic polypeptides from glycinin using polypeptide immunological cross-reactivity, the gel filtration profile of subunit cyanogen bromide fragments, and ion-exchange chromatographic profiles of subunit cyanogen bromide fragments, and ion-exchange chromatographic profiles of subunit tryptic digest as criteria. From these studies they concluded that the acidic polypeptides shared a high degree of sequence homology. A direct comparison of our data with theirs, however, revealed differences. The polypeptides they referred to as Al and Az were likely our Al, and An,, respectively, since both contained NHz-terminal phenylalanine. The molecule designated A3 by the Japanese group probably corresponded to our Az since both contained NHz-terminal leucine. Their subunit A4 was undoubtedly comparable to our A3 since both were the 42,000 molecular weight component and had NHz-terminal isoleucine. What we have designated Ad, and which contained NHz-terminal arginine, was absent from their DEAE-Sephadex elution profile. Likewise, what we have designated Fz(2) (5) was also missing from their DEAESephadex elution profiles. However, F2(2) and A4 accounted for minor amounts of the polypeptides from glycinin. Varietal differences in the soybeans used for the two studies exist, and this contributes to the differences observed (Staswick et al., submitted for publication). Upon comparing the available sequence data for the five acidic polypeptides, several repeated amino acids sequences were
642
MOREIRA
found in their primary structures. Such regions of internal homology have been reported in proteins from a wide variety of sources, but the plant protease inhibitors are particularly well-studied examples. Regions of internal homology have been reported in the lima bean inhibitor (19), the Bowman-Birk inhibitor of soybean (20), and the garden bean inhibitors (21). These proteins have two homologous domains, each containing a catalytic site. The repeated domains could be considered to be extensions of shorter ancestral polypeptides which originated by gene duplication and unequal crossing over between the resulting tandemly arranged structural genes. The presence of these internal regions of homology within the glycinin polypeptides and between the homologous members of the glycinin family of proteins implies that they evolved as a consequence of a complicated series of gene duplications. ACKNOWLEDGMENTS The authors gratefully acknowledge the skillful technical assistance from Sandy Spruill and Judy Taylor. Special thanks go to Walt Mahoney for his valuable suggestions during this work. REFERENCES 1. HILL, J. E., AND BREIDENBACK, R. W. (1974) Plant
Physiol 53,742-746. 2. DERBYSHIRE, E., WRIGHT, D. J., AND B~UTER, D. (1976) Phytochemistry 15,3-24. 3. BADLEY, R. A., ATKINSON, D., HAUSER, H., ODANI, D., GREEN, J. P., AND STUBBS, J. M. (1975)
Biochim. Baophys. Acta 412,214-2X 4 KITAMURA, K., TAGAKI, T., AND SHIBASAKI, K. (1976) Agr. Biol Chem. 40.1837-1844.
ET AL. 5. MOREIRA, M. A., HERMODSON, M. A., LARKINS, B. A., AND NIELSEN, N. C. (1979) J. Bzol. Ck 254,9921-9926. 6. NUTE, P. E., AND MAHONEY, W. C. (1979) Biochemistry 18.467-472. 7. MAHONEY, W. C., AND HERMODSON, M. A. (1979)
Biochemistry 18,3810-3814. 8. LAEMMLI, U. K. (1970) Nature (London) 227,6&l685. 9. LARKINS, B. A., AND HURKMAN, W. S. (1978) Plant
PhysioL 62,256-24X 10. HERMODSON, M. A., SCHMER, G., AND KURACHI, K. (1977) J. Biol Chem. 252,6267-6279. 11. HERMODSON, M. A., ERICSSON, L. H., RTANI, K., NEURATH, H., AND WALSH, K. A. (1972) Bio-
chemistry 11.4493-4502. 12. ZIMMERMAN, C. C., APPELLA, E., AND PISANO, J. J. (1977) And Biochem. 77,569-573. 13. DEISSEROTH, A., NIENHIUS, A., LAWRENCE, J., GILES, R., TURNER, P., AND RUDDLE, F. H. (1978) Proc. Nat. Acad. Sci USA 75,1456-1460. 14. HOOD, L., CAMPBELL, J. H., AND ELGIN, S. C. R. (1975) Annu. Rev. Genet. 9,305-353. 15. FINNEGAN, D. J., RUBIN, G. M., YOUNG, M. W., AND H~GNESS, D. S. (1977) Cold Stiw Harbor Symp. @ant. Baol. 42,1053-1063. 16. SCHROEDER, W. A., SHELTON, J. B., AND SHELTON, J. R. (1969) Arch. B&hem. Bzophys. 130,551556. 17. KUMAR, A. A., BLAKESHIP, D. T., KAUFMAN, B. T., AND FREISHEIM, J. H. (1980) Biochemistry 19.667-678. 18. KITAMURA, K., AND SHIBASAKI, K. (1977) Agr. BioL Chem. 41,351-357. 19. TAN, C. G. L., AND STEVENS, F. C. (1971) Eur. J. Bzochem l&515-523. 20. ODANI, S., AND IKENAKA, T. (1972) J. Biochem 71, 839-848. 21. WILSON, K. A., AND LASKOWSKI, M., SR. (1975) J.
Baol Chem. 250.4261-4267. 22. CROY, R. R. D., GATEHOUSE, J. A., EVANS, I. M., AND BoIJLTER, D. (1980) Phta 148.49.