Biochimica et Biophysica Acta, 950 (1988) 435-440 Elsevier
BBA 90112
435
BBA Report
M o l e c u l a r c l o n i n g of the barley s e e d protein C M d :
a variant m e m b e r of the a - a m y l a s e / t r y p s i n
inhibitor family of cereals
N i g e l G. H a l f o r d a, N i c h o l a s A. M o r r i s a, P e t e r U r w i n a, M a r t i n S. W i l l i a m s o n a, D o n a l d D. K a s a r d a b Ellen J-L. L e w b M a r t i n K r e i s a a n d P e t e r R. S h e w r y a a AFRC Institute of Arable Crops Research, Biochemistry Department, Rothamsted Experimental Station, Harpenden, (U.K.) and b USDA / A R S Western Regional Research Center, Albany CA (U.S.A.)
(Received 11 March 1988)
Key words: Seed protein; Enzyme inhibitor; Molecular cloning; Amino acid sequence; (Barley seed)
The nucleotide and deduced amino-acid sequences of a cDNA clone encoding the barley seed protein CMd are described. The sequence is homologous with those of a family of inhibitors of a-amylase and trypsin, except for two short insertions. The longest of these (14 residues) is at the junction between the three proposed ancestral regions that comprise this family of proteins, and has limited identity with a-amylases of bacterial origin.
Although the CM proteins of cereal grains were originally defined and named on the basis of their extraction in mixtures of chloroform and methanol (see ref. 1), subsequent work has shown that they are also salt-soluble [2] and belong to a family of inhibitors of a-amylase and trypsin [3,4], at least one of which is bifunctional [5]. They also have limited sequence identity with other proteins, including 2 S storage globulins of dicotyledonous plants and specific domains of cereal seed prolamins [3,4]. Of the five major CM proteins present in barley, two (CMc, CMe) are trypsin inhibitors, and one (CMa) an inhibitor of insect a-
The sequence data in this paper have been submitted to the EMBL/Genbank Data Libraries. Correspondence: P.R. Shewry, Biochemistry Department, Rothamsted Experimental Station, Harpenden, Herts., AL5 2JQ, U.K.
amylase [6]. Although the two remaining proteins (CMb, CMd) have no known inhibitory activity, Sanchez-Monge et al. [7] have suggested that they combine with CMa to form a tetrameric protein with increased inhibitory activity to insect aamylase. We report here the amino-acid sequence of CMd, determined from a cloned cDNA and confirmed by comparison with partial protein sequences. Sequence comparisons show that it differs from related inhibitors in the absence of a reactive (inhibitory) site for trypsin, and in the presence of two inserts. The position of the largest of these (14 residues) provides support for the hypothesis [4] that the family of proteins has evolved by triplication of a single ancestral domain, while its amino-acid sequence shows an intriguing similarity to a-amylases of bacterial origin. CMd was prepared from the mutant barley line Riso 56 [3]. The N-terminal amino-acid sequence
0167-4838/88/$03.50 © 1988 Elsevier Science Publishers B.V. (Biomedical Division)
436
was determined initially using a Beckman Model 890B automated sequencer [3] and confirmed and extended using a solid-phase system [8]. 10 mg of protein was digested with 65 #g of proteinase (Staphylococcus aureus. V8) [9] and fractionated on a column of Sephadex G50 in 0.1 M acetic acid. The N-terminal amino-acid sequence of one peptide was determined using the Beckman 890B sequencer. A cDNA library was constructed in the plasmid pUC9 using a size-fractionated (to enrich for small messages) poly(A)-rich RNA fraction from membrane-bound polysomes prepared from developing endosperms of the barley line Hiproly (Ref. 10, method B). Part of the library was transformed into E. coli JM83 and 1600 colonies were screened using a 32p-labelled oligonucleotide of 18 bases. One hybridizing colony was detected. The clone contained an insert of about 700 bp. Restriction fragments were subcloned into the M13 vectors m p l 8 and m p l 9 and the nucleotide sequences were determined in both directions using the dideoxy method. Paz-Arez et al. [11] reported the characterization of two partial cDNA clones (pUP13 and pUP38) related to the CM proteins of barley. Both were truncated within the 5' end of the coding region, and no relationship with the directly determined N-terminal amino-acid sequences of C M a - e [3,6,12] was detected. In order to facilitate the identification of a cDNA clone for CMd we had previously isolated a peptide resulting from digestion of the protein with V8 proteinase and had determined its N-terminal amino-acid sequence. 33 of the 37 unambiguously determined residues were identical to part of the deduced sequence of pUP38 (see Fig. 1), indicating that the latter encoded CMd. An oligonucleotide corresponding to bases 127-144 of pUP38 (see Fig. 1) was used to isolate a single c D N A clone from a library prepared using membrane-bound polysomal poly(A)-rich RNA from developing barley endosperms. The insert encoded a protein of 160 residues, of which residues 15-51 were identical to the directly determined N-terminal amino-acid sequence of CMd (Fig. 1). It was therefore concluded that residues 1-14 constituted a signal peptide, which is consistent with the use of membrane-bound polysomal RNA to construct the
library, and with the observation of Paz-Ares et al. [13] that CMd is synthesised on membrane-bound polysomes as a precursor of higher M r that is processed co-translationally. However, the 14-residue signal peptide of pCMd is almost certainly truncated, as most eukaryotic signals are above 17 residues [14] including those of cereal prolamins, which are 18-21 residues [15]. There is no obvious sequence identity with the putative signal peptide of pUP13 which was deduced by Paz-Arez et al. [11] according to Von Heijne [16]. It is of interest that the cleavage site in pCMd is after the first five of a sequence of nine alanine residues. No preview of residues 5 and 6 (threonine and aspartic acid) was noted when the N-terminal sequence of the mature protein was determined, and there was no alanine at cycle 5 beyond what would be expected for the normal lag that develops during sequencing (the alanine yield in cycle 5 was only 6% of that in cycle 1 and 8% of that in cycle 4). This indicates that the cleavage of the signal sequence after the fifth alanine was precise. What determines such precise cleavage almost in the middle of a string of alanine residues is unknown. Although cleavage of cereal protein signals frequently occurs after alanine [15], our results indicate that other aspects of signal peptide structure are also important in determining the precise site. The mature protein encoded by pCMd consists of 146 residues. The calculated M r of 16 032 is in excellent agreement with the M r of about 16000 determined by SDS-PAGE [11,17]. Residues 39-146 are identical, with the exception of three single amino-acid substitutions, to the deduced amino-acid sequence of pUP38, and residues 76-113, except for one substitution, to the V8 proteinase peptide (Fig. 1). The cleavage site of V8 protease after Glu-75 is consistent with its reported specificity. The three sequences were derived from different lines of barley (Riso 56, Hiproly, Bomi), and the differences may therefore be due to genetic variation. None of the substitutions involves charged residues, which may account for the absence of genotypic variation in the mobility of CMd on two-dimensional electrophoresis (pH gradient electrophoresis followed by electrophoresis at pH 3.2) [18]. The amino-acid sequence of CMd is aligned with those of related inhibitors of a-amylase and
437
[- L
A
T
V
M
V
S
V
F
A
A
C C
S S
P P
G O
V V
A A
F F
P P
T T
N N
L L
r~-~N-tetminal sequence of CMd signal _ lA A A A T D A A'---~ A A A A T D
pCMd~cTTGGCCACCGTCATGGTCTCCGTCTTCGCCGCCGCTGCCGCTGCCGCCGCCGCCACCGA L L
G O
H H
C C
R R
D D
Y Y
V V
CTGCTCCCCAGGGGTGGCTTTTCCGACCAATCTGCTCGGACACTGCCGCGACTATGTGTT Q Q
Q Q
T T
C C
A A
V V
F F
T T
P P
G G
S
K
L
P
E
W
M
T
S
ACAGCAGACTTGTGCCGTCTTCACTCCCGGGTCGAAGTTACCCGAATGGATGACATCCGC I
E L N Y P G Q P Y GGAGCTGAACTACC CCGGGCAGCCATAC
I
P
Q
Q
C
R
C
E
L A K L Y C C Q E L CTCGCCAAGTTGTATTGCTGCCAGGAGCTTGC T A
A
r. a~ i~ ~ r l ,
A r. m
L
A
R
~I
L L 120
A 180
~ pUP-38
A[-~CMd V8 oeRtide E
60
~I
M
L
P
v V
A 240
P P 300
Q P V D P S ~ G N V G Q S G L M D O P V D P S IT] G N v G Q S G L M D TCAGC CCGTGGACC CGAGCACCGGCAATGTTGGTCAGAGCGG~CTCATGGACCTGC
G X P G C P CGGATGCCC
R E M Q I~ E M Q 11 D F V R L L V A P G CAGGGAGATGCAACGGGACTTCGTCAGATTACTCGTCGCCCCGGGGCAGTG
N L A T I H N V R Y CAACTTGGCGACCATTCACAACGTTCGATAC
L L
P P C
Q
C
C P A V E Q P L W I TGCCCCGC CGTGGAACAGCCGCTGTGGAT
CTAGCGATGATAAAATCAGTCGTTCGTGAATAAGCATGCATGTTGCGTCCATAGGCGTAG T TGT A
G CTTGTGCGTGTGGTGTGCATGTATGCATATGTGAG G
CTC
360
420
480
540
579
Fig. 1. Comparison of the nucleotide and deduced amino-acid sequences of p C M d with those of pUP38 [11] and with the directly determined N-terminal amino acid sequences of the whole protein and a V8 proteinase peptide. The continuous nucleotide sequence is that of pCMd, and its deduced amino-acid sequence is shown above. The directly determined amino-acid sequences are shown above the p C M d sequences. The start of pUP38 is indicated, and positions where its nucleotide and deduced amino-acid sequences differ from those of p C M d are shown below. The nucleotide sequence of pUP38 extends beyond that of p C M d at the 3' end. Variant amino-acid residues are boxed. An oligonucleotide complementary to the underlined sequence was used to screen the c D N A library. Standard single-letter codes for amino acids are used, and the stop codon is shown by *
trypsin in Fig. 2. They are divided into three regions, called A, B and C. Kreis et al. [4,15] showed that these regions have related amino-acid sequences, and suggested that they originated from
triplication of a single ancestral domain. CMd is most closely related to the trypsin inhibitor of rye (53 identical residues), and less closely related to the trypsin inhibitor (CMe) of barley (48 matches),
438
and the bifunctional trypsin/a-amylase inhibitor of ragi (47 matches). The identity with the maize trypsin inhibitor and with the three a-amylase inhibitors of wheat is weaker, although 8 of the 10 cysteine residues of CMe are present in all the
50 Insert in CMd [TSAELNYPGQPYLA I0 20 30 40 P--~F~ ~qS~E~L~ ~ T P G ~ L P E ~
Region A
CMd
AP~Tq
RTI
SVG.G~
BTI(CMe)
proteins. The trypsin reactive (inhibitory) site has been identified as Arg-33-Leu-34 and Arg34-Leu-35 in the barley and ragi inhibitors, respectively [5,12], and similar sites are present in the rye and maize inhibitors. The presence of
FG.DS]
RBI
SVG.TS
MTI
SAG.TS
WAI-0.19
SGPWM. P ~ F Q !
0.28
SGPWSW
0.53
SGPWM. ~ Q X
Region B CMd
6O ~'(
RTI
;(
BTI (CMe)
7A
RBI
](
MTI
KRR ](
WAI-0.19
,RD31
0.28
, RD ]I
0.53
.RD]~
P~ CI~PLLR[~.IIGSQVPEA... VL
p~Ct~PLLKt~.~GSQVP~A... VL 70
80
90
.
.
.
.
.
E ~ . A~CRC~ ~ ~ . . . . . 0 ~ 0 . . . Q . ~ . ~ . . . . .
IS. EWR C ~ S M L D .
Region C
II0
..SMYKEHG... VS ~L~A
120
130
RTI
~
BTI (CMe)
yP ~ I ~ C P R E ~ . qTSYAA.. ~LVI.TPQ~ :ND~I~GS~
RBI
~APGCP~V
~
CMd
100
.
~..~-:x-~q~N~
140 146 AVEQPLWI TLQAGY
•~ "
•PqL~ TEV~ 2NL~TI~GGF
ELQPGYG LSLLGAGE
I
MTI WAI-O.19
K.DVAAYPDA
0.28
IYGDW~AYPDV
0.53
G T G A ~ Z ~ V ~ L ~ . . KS I rAVCRLOIVVDASGDGAn (.DVAYPDAA
Fig. 2. Alignment of the deduced amino-acid sequence of CMd with those of the rye trypsin inhibitor (RTI) [25], the barley trypsin inhibitor (CMe) (BTI) [12], the ragi bifunctional inhibitor (RBI) [5], the maize trypsin inhibitor (MTI) [26], and the wheat a-amylase inhibitors (WAI) 0.19 [27], 0.28 [28] and 0.53 [29]. The sequences are aligned manually to maximise homology, resulting in some gaps. Residues present in CMd and other proteins are boxed. Standard single-letter abbreviations, are used, and cysteine residues are additionally indicated *. The reactive (inhibitory) sites of BTI and RBI (Arg-Leu) correspond to residues 37-38 of CMd.
439
Ser-37 at position 37 in CMd may account for the absence of activity. The active sites of the aamylase inhibitors are not known, but CMe shows little sequence identity, especially in region C. CMd is also longer than the other proteins in Fig. 2, which, with the exception of the maize inhibitor (112 residues), all consist of 120 to 125 residues. This is due to the presence of two additional blocks of amino acids which presumably result from insertion events. One of these is in region B (residues 83-87). The second insert consists of 14 residues and is at the junction between regions A and B. The position of this insert is consistent with our previously proposed model for the origin and evolutionary relationships of this group of proteins. Regions with sequence identity to A, B and C can also be recognised in other seed proteins, including 2 S globulins of dicotyledonous plants (castor beans, oilseed rape, lupin and brazil nut) [15,19-22] and in specific structural domains of the major prolamin storage proteins of barley, wheat and rye [15], and the minor prolamins of maize [23,24]. The sequences of these regions (notably the cysteine residues) are generally conserved, with divergence occurring by the insertion of sequences between and flanking them. Although inserted sequences are present between regions A and B in the 2S globulins (where they contain a site for proteolytic cleavage to give two subunits), and in the prolamins, these are not related to the inserted sequence in CMd. In order to determine whether sequences related to the 14 residue inserted sequence of CMd are present in other proteins, the deduced aminoacid sequences were compared with those in the Brookhaven protein sequence and the Genbank nucleotide sequence databases. This showed short related sequences in two a-amylases of bacterial origin, from Bacillus subtilis [30] and Streptomyces
Barley
CMd
B. subtili$ a-amylase S. hvgroscopicus ~-amvlase
nnSGiS A E LIT NT M
Fig. 3. Related amino-acid sequences in CMd and in a-amylases from Bacillus subtilis [30] and Streptomyces hygroscopicus [31], Identical residues are boxed.
hygroscopicus [31] (Fig. 3), but not in a-amylases from other species of Bacillus (B. licheniformis, B. amyloliquefaciens, B. stearothermophilus), mammalian saliva and pancreas, or barley. The search of the Brookhaven protein sequence database also identified related sequences in 12 other proteins with no obvious functional relationships to CMd. This is not surprising in view of the large number of protein sequences in the database (5415 with a total length of 1 302966 residues) and the short region that was compared. The in vivo function of CMd and the role, if any, of the inserted sequence are not known. It is well-established that the subunit lacks inhibitory activity against trypsin and against a-amylases from mammals, an insect (Tenebrio molitor), a fungus (Aspergillus) and barley [6]. However, the report [7] that it combines with CMa and CMb to form a tetrameric protein with enhanced inhibitory activity against the T. molitor enzyme needs to be substantiated. In particular, it is necessary to purify the tetrameric protein to homogeneity, establish the subunit stoichiometry and demonstrate that the complex is formed in vivo rather than during extraction and purification. The solubility of CM proteins in alcohol/water mixtures and in ratios of chloroform:methanol ranging from 2 : 1 to 7 : 1 indicates that hydrophobic interactions could stabilize a tetramer, whether formed in vivo or in vitro. Although it is tempting to suggest that the inserted sequence in CMd could be involved in the interactions with CMa by mimicking the a-amylase sequence that it recognises, there is no evidence for related sequences in the a-amylase of T. molitor. Also, this might be expected to reduce rather than enhance the inhibitory activity of CMa. Further speculation on the role of the 'a-amylase related' sequence in CMd must await conclusive studies of the mode of interaction with CMa, and the mechanism of recognition and inhibition of a-amylase by the latter subunit. We are grateful to Dr. J. Findlay and Dr. D.J. Pappin of the Protein Sequencing Unit, Department of Biochemistry, University of Leeds, for the solid-phase protein sequencing. The work was partly supported by contract BAP-0099-UK of the Biotechnology Action Programme of the EEC, and by N A T O Travel Grant No. 324/82.
440 References 1 Garcia-Olmedo, F. (1984) Kulturpflanze 32, $21-$32. 2 Salcedo, G., Sanchez-Monge, R., Argamenteria, A. and Aragoncillo, C. (1980) Plant Sci. Lett. 19, 109-119. 3 Shewry, P.R., Lafiandra, D., Salcedo, G., Aragoncillo, C., Garcia-Olmedo, F., Lew, E.J.-L, Dietler, M.D. and Kasarda, D.D. (1984) FEBS Lett. 175, 359-363. 4 Kreis, M., Forde, B.G., Rahman, S., Miflin, B.J. and Shewry, P.R. (1985) J. Mol. Biol. 183, 499-502. 5 Campos, F.A.D. and Richardson, M. (1983) FEBS Lett. 152, 300-304. 6 Barber, D., Sanchez-Monge, R., Mendes, E., Lazaro, A., Garcia-Olmedo, F. and Salcedo, G. (1986) Biochim. Biophys. Acta 869, 115-118. 7 Sanchez-Monge, R., Gomez, L., Garcia-Olmedo, F. and Salcedo, G. (1987) FEBS Lett. 207, 105-109. 8 Shewry, P.R., Parmar, S. and Pappin, DJ.C. (1987) Biochem. Genet. 25, 309-325. 9 Shewry, P.R., Lew, E.J.-L. and Kasarda, D.D. (1981) Planta 153, 246-253. 10 Williamson, M.S., Forde, J., Buxton, B. and Kreis, M. (1987) Eur. J. Biochem. 165, 99-106. 11 Paz-Ares, J., Ponz, F., Rodriguez-Palenzuela, P., Lazaro, A., Hernandez-Lucas, C., Garcia-Olmedo, F. and Carbonero, P. (1986) Theor. Appl. Genet. 71,842-846. 12 Odani, S., Koide, T. and Ono T. (1983) J. Biol. Chem. 258, 7998-8003. 13 Paz-Ares, J., Ponz, F., Aragoncillo, C., Hernandez-Lucas, C., Salcedo, G., Carbonero, P. and Garcia-Olmedo, F. (1983) Planta 157, 74-80. 14 Van Heijne, G. (1985) J. Mol. Biol. 184, 99-105. 15 Kreis, M., Shewry, P.R., Forde, B.G., Forde, J. and Miflin, B.J. (1985) in Oxford Surveys of Plant Cell and Molecular Biology (Miflin, B.J., ed.), Vol. 2. Oxford University Press, Oxford.
16 Van Heijne G. (1983) Eur. J. Biochem. 133, 17-21. 17 Festenstein G.N., Hay, F.C. and Shewry P.R. (1987) Biochim. Biophys. Acta 912, 371-383. 18 Molina-Cado, J.L., Fra-Mon, P., Salcedo, G., Aragoncillo, C., Roca de Togores, F. and Garcia-Olmedo, F. (1987) Theor. Appl. Genet. 73, 531-536. 19 Sharief, F.S. and Li, S. S.-L. (1982) J. Biol. Chem. 257, 14753-14759. 20 Crouch, M.L., Tenbarge, K.M., Simon, A.E. and Ferl, R. (1983) J. Mol. Appl. Genet. 2, 273-283. 21 Lilley, G.G. and Inglis, A.S. (1986) FEBS Lett. 195, 235-241. 22 Ampe, C., Van Damme, J., Castro, L.A.B., Sampaio, M.J.A.M. and Van Montagu, M. (•986) Eur. J. Biochem. 159, 597-604. 23 Marks, M.D., Lindell, J.S. and Larkins, B.A. (1985) J. Biol. Chem. 260, 16451-16459. 24 Prat, S., Cortadas, J., Puigdomenech, P. and Palau, J. (1985) Nucleic Acids Res. 13, 1493-1504. 25 Lyons, A., Richardson, M., Tatham, A.S. and Shewry, P.R. (1987) Biochim. Biophys. Acta 915, 305-313. 26 Mahony, W.C., Hermodsen, M.A., Jones, B., Powers, D.D., Corfman, R.S. and Reek, R.G., (1984) J. Biol. Chem. 259, 8412-8416. 27 Maede, K., Hase, T. and Matsubara, H. (1983) Biochim. Biophys. Acta 743, 52-57. 28 Kashlan, N. and Richardson, M. (1981) Phytochemistry 20, 1781-1784. 29 Maede, K., Kakabayashi, S. and Matsubara, H. (1985) Biochem. Biophys. Acta 828, 213-221. 30 Yang, M., Galizzi, A. and Henner, D. (1983) Nucleic Acids Res. 11, 237-249. 31 Hoshiko, S., Makabe, O., Nojiri, C., Katsumata, K., Satoh, E. and Nagaoka, K. (1987) J. Bacteriol. 169, 1029-1036.