Gene, 127 (1993) 221-225 r> 1993 Elsevier Science Publishers
B.V. All rights reserved.
221
0378-I 119/93~$06.00
GENE 07060
Nucleotide sequence and expression in Eschericia coEiof cDNAs encoding papaya proteinase omega from Carica papaya (Recombinant DNA; zymogen; isozymes; cysteine proteinase)
D.F. Revell, N.J. Cummings, KC. Baker, ME. Collins, M.A.J. Taylor, I.G. Sumner, R.W. Pickersgill, I.F. Connerton and P.W. Goodenough
Received by J. Knight:
20 October
1992; Revised/Accepted:
11 December/
12 December
1992; Received at publishers:
29 January
1993
We have cloned and sequenced two similar, but distinct, cDNAs from both fruit and leaf tissues of Carica papaya. The C-terminal portion of the predicted amino acid (aa) sequence of one of the clones has complete identity with the mature enzyme sequence of the cysteine proteinase papaya proteinase omega (PpQ). The second clone contains ten individual bp changes compared with the first and encodes a protein with three single-aa substitutions, only one of which is located in the mature sequence, but most noticeably carries an additional 19-aa C-terminal extension. The clones encode pre-pro precursor isoforms of PpQ The former of these clones has been expressed in ~sc~e~ic~~~ cofi using a T7 polymerase expression system to produce insoluble pro-enzyme which has been solubilized and refolded to yield auto-activable pro-PpQ.
INTRODUCTION
Cysteine proteinases (CyP) form an ubiquitous superfamily in plants and animals (reviewed extensively by Brocklehurst et al., 1987). The most commercially important members of the CyP superfamily come from plant sources, in particular C. papaya. The enzymes from C. papaya have been used since ancient times in the food industry and have been studied in biochemical detail. One Correspondence Research, 357~~
to: Dr.
Whiteknights Fax (M-734)
I.F.
Connerton,
Rd., Reading RG6 357148; e-mail:
AFRC
Institute
of Food
2EF, UK. Tel. (44-734) CONNRTO~(~ARCB.
AFRCACUK Abbreviations:
aa, amino
DNA complementary dithiothreitoi; IPTG,
acid(s); bp, base pair(s); C., Carica; cDNA,
to RNA; CyP, cysteine proteinase(s); DTT, isopropyl-~-~-tRiogalac~opyranoside; kb, kilo-
base(s) or 1000 bp; N, any nucleoside; nt, nucleotide(s); oligo, oligodeoxyribonucleotide; PAGE, polyacrylamide-gel electrophoresis; PCR, polymerase chain reaction; PpIV, papaya proteinase IV; PpR, papaya proteinase a, PpQ, gene (DNA) encoding Pp@, poly(A)+RNA, polyadenylated RNA; rDNA, ribosomal DNA; SDS, sodium dodecyl sulfate.
of them, papain (EC 3.4.22.2), was only the second enzyme for which a three-dimensional structure was determined (Drenth et al., 1968). Considering this plethora of information, it is perhaps surprising that only one full-length cDNA from the four or five CyP present in papaya latex has been cloned (Cohen et ai., 1986). In an earlier paper we reported the cloning of a cDNA for papain from leaf tissue and its expression from E. coli in quantities sufficient for the crystallographic analysis of mutants (Taylor et al., 1992). In this paper we report the cloning and expression in E. coli of full-length cDNAs from fruit and leaf tissues encoding the most abundant and basic CyP from C, p~~u~‘u,Ppfl. We have also recently crystallized and solved the three-dimensional structure of Pp0 (Pickersgill et al., 1991) and compared it with papain (Pickersgill et al., 1992). Kinetic analyses of PpsZ indicate that it is quite distinct from papain (Topham et al., 1991; Sumner et al., 1993). Thus, variant cDNAs of PpQ represent a valuable source of genetic material to enable protein engineering studies on the family of CyP from C. papaya.
222 EXPERIMENTAL
AND DISCUSSION
(a) cDNA clones Two types of clones were selected from the fruit cDNA library in X&W, The first was shown to have identity in its deduced aa sequence with that ex~~menta~y determined for PpQ (Dubois et al., 1988); the second was noted to have an extended 3’ non-coding region, However, both types of clone were found to be incomplete when compared with the pre-pro-encoding nt sequence of the cDNA for papain (Cohen et al., 1986), terminating prematurely beyond the end of the pro-encoding region. The leaf library generated two similar sets of clones, and these also proved to be incomplete, terminating after the proregion. cDNAs containing the pre-region were amplified direct from fruit and leaf cDNAs using the PCR with antisense loop and pre-propapain-based primers. The
PCR products were cloned blunt-ended into the EcoRV site of pBluescript, and those clones with sequence identity to the initial library-derived cDNAs (i.e., devoid of apparent PCR errors) were used to generate full-length cDNAs by splicing the clones together as partial restriction digests at an internal Sty1 site. As the extreme 5’ ends of these clones were specified by the papain-based PCR primer, direct mRNA sequencing was performed to confirm these sequences. The full-length cDNA sequences of both clones are presented in Fig. 1, including the additional sequences found at the 3’ end of the second clone (PpQII). The two cDNAs are 82% and 79% homologous, respectively, with the cDNA reported for papain (Cohen et al., 1986). The clones are actually more closely related (90% homology) to the anonymous partial cDNA sequence (pLBPc 13) isolated from leaf material by McKee et al. (1986), which we
SVSFGDFSIVGYSQ M AM I PSISKLLFVAICLFVHM ATGGCTATGATAGCTTCMTT~~GTTGCTTTTTG~GC~TATGT~CTTTGT~A~~TGAGTG~GTCC~~GGT~~T~TCTATCGTGGG~ATTCG~AAGATGACTTGACA~CT
le
C
D
D
L
T
S
40 120
T
IFKDNLNYID TERLIQLFNSWMLNHNXFYENVDEKLYRFE ACTGAGAGGCTTATTCAGCTCTTTAACTCGTGGATGTTGAATCACAATAAATTTTACCAGAG~TGTTGATGAG~ACTTTACAGATTTGAAATTTTTAAGGACAATCTAAACTACATTGAC
a0 240
ETNKKNNSYWLGLNEFADLSNDEFNEKYVGSLIDATIEQS GAGACARAC~GRATRACAGTTATTGGC~GGATTRRACGAGTTTGCTGAT~~G~~TGATG~TTC~TGAG~GTATGTTGGTTCCCTTATTGATGC~~G~~TGAACAATCC A A R SCGSCWAF YDEEFINEDTVNLPENVDWRKKGAVTPVRHQG GGAGCTGTCACTCCCGTAAGACATCAGGGTTCATGC~GTA~~TGTT~CATTC TATGATGRAGAGTTTATTRG~GATACTGT~TTTGCCCGAG~TGT~GAT~GAG~ T
120 360
160 480
I
SAVATVEGI N K I RTGKLVELSEQE TCGGCCGTTGCAACTGTAGAGGGAATAAATAAGATTAGAACGTAGCCATGGGTGCAAAGGAGGTTATCCG
L
V
D
C
ERRSHGC
PYKAKQGT CRAKQVGGP PYALEYVAKNGIHLRSKY CCGTATGCACTTG~TATGTGGCT~G~TGGTATTCACTTGAGATC~GTAC~C~TAT~GC~GC~GGGACTTGTCGAGCC~ACAAGT~GAGGTCCCAT~TG~CTTCT
KG
G
Y
P
200 600
IV
K
T
S
240 720
G
G
I
280 840
G KGRPFQ; I A K Q PVSVVVES GVGRVQPNNEGNLLNA GGGGTTGGACGTGTGCAACC~T~TGRRGGGAATCTCTT~TGC~TTGC~~~CCTGTGAG~GTTGTGGTTG~TCC~GGG~GACCTTTCC~TTGTAT~GGGGGAATA
Y
K
KNSWGTAWGEK FEGPCGTKVDHAVTAVGYGKSGGKGYILI TTTGAGGGGCCATGCGGAACCAAAGTAGATCATGCAGT~CAGCAGTTGGTTATGGAAAkAGTGGAGGCk4AGTTACATACTCATC~G~TTCATGGGGTACTGCATGGGGTGAG~
320 960
349 1047
GYIRIKRAPGNSPGVCGLYKSSYYPTKN* GGATATATAA~~~~GCCCCfGGRAACTCCCCAGGRGTGTGTG T I
A G~T~TGGACGGAT~~A~ATCCG~~~A?CATCT RDNGRIQ I R P s s
CAACACCTCACAAGCCATGAATGA QHLTSHE*
Fig. 1. The nt sequences of the cDNAs encoding isozymes of the CyP prepro-PpQ from C. papaya. The continuous sequences represent the cDNA and deduced aa sequences of PpaI; the lower lines show the nt changes observed and the corresponding deduced aa changes in PpaII, including the C-terminal extension, Asterisks mark the stop codons. The heavy dots mark the termination of the two primary cDNAs isolated in IZAP. These sequence data will appear in the EMBL/GenBank/DDJB Nucleotide sequence libraries under the accession Nos. X66060 and X69877. The cDNA libraries were constructed in &ZAP (Stratagene, La Jolla, CA, USA) from leaf and fruit mRNAs. These were probed using two mixed degenerate 3tPm labefled oligos (S-~NAA~TCNGGN~NA~GG and S-GGNA~~AG~GGNGGNAA~~) based upon the reverse translation of the corresponding aa sequence of PpQ that is absent in papain (aa 167-174 of mature PpQ, Dubois et al., 1988; Pickersgill et al., 1991). Positive clones were rescued into pBluescript from the cDNA library; all were similarly deficient in the pre-region as compared with papain. The correct nt sequence of the loop region was determined from these clones (using a Genesis 2000 automatic DNA sequencer; Dupont, UK) and used to design a P&‘-specific primer to initiate cDNA synthesis as a substrate for the PCR. PCR products were cloned into the EcoRV site of pBluescript(KS) and sequenced. The sequences of the PCR products confirmed those derived from cDNA clones and were used to construct full-length clones. Further, the sequence of the extreme 5 end of the PpsZ message was confirmed by direct sequencing of poly(A)‘RNA (10 pg) using a 32P-end-labelled 5’ site-specific antisense primer (S-GGACACACTCATATG) and reverse transcriptase (Boehringer Mannheim, UK).
223 predicts an aa sequence identical to that determined for PpIV (Ritonja et al., 1989). Southern blots using the PpO clones as probes estimate there are between 20 and 25 members of the multigene family in C. papaya. However, even under stringent conditions the PpQ signals may not be distinguishable from PpZV given their close homology.
note
(b) Northern blots
Northern transfers of total RNA (10 ug) from root, shoot, stem, leaf and fruit tissues were probed with labelled PpsZ cDNA fragments (Fig. 2). Similar mRNA levels were detected in stem, leaf and fruit tissues. However, young shoots demonstrated reduced levels of mRNA, and in roots the message was barely detectable. These data are consistent with the presence of the enzyme activity in the developing lactifer system above ground.
(c) Synthesis of PpC2 in E. coli The cDNA for pro-PpS2I (without the C-terminal extension) was expressed from the inducible T7 promoter of pET3a in BL21(DE3) cells in two forms differing by just 5 aa. Initially, a convenient NdeI site over a potential start Met four aa upstream from the putative pro-region was subcloned as a N&I-BamHI (site present in the polylinker region of pBluescript) fragment into the T7-polymerase-based expression vector pET3a. Upon induction with IPTG, E. coli BL21(DE3) carrying this construct together with the T7-lysozyme-producing plasmid pLysS (Studier, 1991) either failed to produce the corresponding sized protein, as determined on SDS-PAGE, or died. The signal peptide region of pre-pro-PpS2 was therefore removed cleanly from the clone by PCR mutagenesis to generate a new start codon before Asp2’. Recombinant E. coli carrying pro-PpQ alone demonstrated a prominent protein band on SDS-PAGE corresponding to the correct size for pro-PpR (41 kDa; Fig. 3). This band and a second, much less intense band of approximately 26 kDa were found to be recognised by a discriminating A
B
C
D
kDa 107 69.3 45.8
28.7 28s RNA
18.2
Fig. 2. Northern transfers of total RNAs (IO ug) isolated from root, shoot, stem, leaf and fruit (lanes A-E, respectively). Equal RNA loadings and transfer
were insured
lel, as shown in the performed using total dehyde gels to nylon or rDNA fragments.
by probing
15.4
with a 28s rDNA probe in paral-
corresponding lanes below. Northern blots were RNA transferred from 1% agarose/2.2 M formalmembranes and probed with 3zP-labelled cDNA C. papaya was grown from seed under controlled
Fig. 3. PAGE
of recombinant
pro-PpQI
solubilised
from
inclusion
bodies made in Escherichia coli as described by Taylor et al. (1992). Lanes: A, molecular weight markers (BRL pre-stained markers) in kDa; B, pro-PpRI after solubihzation in 6 M guanidineHCl/lOmM
greenhouse conditions (16 h of daylight, 30°C and 60-70% humidity). RNA was extracted from root, shoot, stem, young leaf and fruit tissues after they were frozen and ground in liquid nitrogen in the presence of guanidinium isothiocyanate (GliHin et al., 1974). Total RNA was pelleted
DTTjlOO mM Trisacetate pH 6.8, the formation of mixed disulfides in the presence of 0.1 M oxidized glutathione and refolding by dilution
after CsCl centrifugation. Poly(A)‘mRNA was selected oligo(dT) column. The integrity of the RNA was checked on gels and by in vitro translation.
the presence of excess L-cysteine (20 mM); D, native mature PpR purified from papaya latex. Proteins were analysed by 0.1% SDS-12.5% PAGE and stained with Coomassie brilliant blue R250.
using
an
1% agarose
IOO-fold into 100 mM Trisacetate Ppn generated post-autocatalytic
pH 8.6/3 mM L-cysteine; C, mature activation of pro-PpSZI at 60°C in
224
monoclonal antibody (PAP8, European Collection of Animal Cell Cultures No. 86080702) raised against purified PpR (Goodenough et al., 1986). It was noted that the smaller of the immunoreactive bands on SDS-PAGE approximated in size to the mature enzyme; however, proteinase activity was not detected at this stage. The majority of this material proved to be insoluble and was therefore concentrated by centrifugation before solubilization in 6 M guanidine.HCl/ 10 mM DTT/lOO mM Trisacetate pH 6.8. This solution was then diluted to refold the pro-enzyme, which was then able to be activated at 60°C in the presence of excess cysteine to yield active mature PpR as described for papain by Taylor et al. (1992). The application of a folding regime to a proenzyme produced in E. coli may be of general use to the study of the CyP and circumvent the problems of generating multiple forms of inactive enzyme as experienced by Vernet et al. (1989) and Cohen et al. (1990). (d) Conclusions
(I) Like papain, the PpQ cDNAs demonstrate that Ppl;z is initially synthesized as a zymogen, presumably to protect the plant cells from proteolytic disruption before the active product is sequestered to a membrane bound organelle. The pre-sequence can be apparently dispensed with, as we have shown pro-PpQ can be produced and activated from E. coli. The pro-region may therefore serve to prevent and/or enable at a later stage the final active conformation of the protein to be achieved. The additional 19 aa found at the C terminus encoded by the second cDNA (QQII) may equally represent another cleavage site in the formation of the final protein product. In fact, two cDNAs isolated for actinidin {EC 3.4.22.14), a related CyP from kiw~fru~t (Acrinidia c~~~~~s~s), demonstrated 25-aa C-terminal extensions that were equally not present in the mature protein sequence (Praekelt et al., 1988). There is no obvious homology between the PpRH and the actinidin C-terminal extensions, but both carry consecutive Ser residues. C-terminal extensions have been observed in other potentially hazardous gene products, including the animal CyP, cathepsin B (EC 3.4.22.1), for which protein targeting functions have been suggested (San-Segundo et al,, 1985; Chan et al,, 1986). It is therefore possible that there is yet another level of complexity in the maturation of some of the plant CyP, dependent on their pre-programmed intracellular targeting to a membrane-bound organelle, the site for their sequestration. (2) Using immunocytochemical techniques, Smith et al. (1987) were able to detect the presence of the CyP from C. papaya in the latex of the lactifer system in young leaves (2-3 months post-germination). Consistent with these observations we detected reduced mRNA levels in young shoots (2-3 weeks post-germination) but similar
levels in mature stem, leaf and fruit tissues. This is in contrast to actinidin from A. chinensis, where immunoreactive enzyme and abundant mRNA levels could only be detected in the fruit (Praekelt et al., 1988). (3) There are 26aa substitutions in the pro-region of PpsZI as compared with papain, and many of them preserve the general characteristics of the protein, although a few charge replacements can be observed. In addition to the substitutions, an in-frame deletion of 2 aa can be observed at codon 112, a feature common to both PpQ sequences. It will be interesting to swap the pro-regions of papain and PpQ to ascertain whether they are capable of functional chimeric c~s-complemc~tation. It is possible that they may have functional differences as each has been honed for the specific activation of their individual proteinases. In evolutionary terms the time span since the ancestral divergence of these genes within a single lineage may not be great. However, the prepro-regions may have been subjected to considerable selective pressure in response to changes within the active enzyme sequence and/or protein targeting requirements. Those genes whose products are hazardous to the organism when active at an inappropriate time or location can only therefore co-evolve with their respective prepro-regions or be lost. Equally those genes whose products cannot activate at all will become defunct in time due to drift. Considering these possibilities, the papaya CyP may constitute an interesting example of a dynamic, evolving multigene family.
REFERENCES Brocklehurst, K., Wilienbrock, F. and Salih, E.: The cysteine proteinases. In: Neuberger, A. and Brocklehurst, K. (Eds.), Hydrolytic Enzymes. Elsevier, Amsterdam, 1987, pp. 39- 158. Chart, S.J., San-Segundo, B., McCormick, MB. and Steiner, D.F.: Nucleotide and predicted amino acid sequences of cloned human and mouse preprocathepsin B cDNAs. Proc. Natl. Acad. Sci. USA 83 (1986) 7721-7725. Cohen, L.W., Coghlan, V.C. and Dihel, L.C.: Cloning and sequencing of papain-encoding cDNA, Gene 48 (1986) 219-227. Cohen, L.W., Fluharty, C. and Dihel, L.C.: Synthesis of papain in Escherichia coli. Gene 88 (1990) 263-267. Drenth, J., Jansonius, J.N., Koekoek, R., Swen, H.M. and Wolthers, B.G.: Structure of papain. Nature 2 18 ( 1968) 9299932. Dubois, T., Kleinschmidt, T., Schnek, A.G., Looze, Y. and Braunitzer, G.: The thiol proteinases from the latex of Carica papaya L, II. The primary structure of proteinase ft. Bioi. Chem. Hoppe-Seyier 369 (1988) 741-754. GliSin. G.N., Crkvenjakov, R. and Byus. C.: Ribonucleic acid isolated by cesium chloride centrifugation. Biochemistry 13 (1974) 2633-2638. Goodenough, P.W., Kilshaw. P.J., McEwan, F. and Owen, A.J.: Monoclonal antibodies to the two most basic papaya proteinases. Biosci. Rep. 6 (1986) 759-766. McKee, R.A., Adams. S.. Matthews. J.A., Smith, C.J. and Smith, H.:
225 Molecular cloning of two cysteine proteinases from paw-paw (Caricn papaya). Biochem. J. 237 (1986) 105-I 10. Pickersgill, R.W., Rizkallah, P., Harris, G.W. and Goodenough, P.W.: Determination of the structure of papaya proteinase R. Acta Cryst. B47 (1991) 766-771. Pickersgill, R.W., Harris, G.W. and Carmen, E.: Structure of monoclinic papain at 1.6 A resolution. Acta Cryst. B48 (1992) 59-67. Praekelt, U.M., McKee, R.A. and Smith, H.: Molecular analysis of actinidin, the cysteine proteinase of Actinidiu chinensis. Plant Mol. Biol. 10 (1988) 193-202. Ritonja, A., Buttle, D.J., Rawlings, N.D., Turk, V. and Barrett, A.J.: Papaya proteinase IV amino acid sequence. FEBS Lett. 258 (1989) 109-l 12. San-Segundo, B., Chan, S.J. and Steiner, D.F.: Identification of cDNA clones encoding a precursor of rat liver cathepsin B. Proc. Natl. Acad. Sci. USA 82 (1985) 2320-2324. Smith, H., McKee, R.A. and Praekelt, U.M.: The molecular biology of plant thiol proteinases. In: Andrews, A.T. (Ed.), Chemical Aspects of Food Enzymes, Special Publication 63. Royal Society of Chemistry, London, 1987, pp. 196-207.
Studier, F.W.: Use of bacteriophage T7 lysozyme to improve an inducible T7 expression system. J. Mol. Biol. 219 (1991) 37-44. Sumner, I.G., Vaughan, A., Eisenthal, R., Pickersgill, R.W., Owen, A.J. and Goodenough, P.W.: Kinetic analysis of papaya proteinase Q. Biochim. Biophys. Acta. (1993) in press. Taylor, M.A.J., Pratt, K.A., Rev&, D.F., Baker, KC., Sumner, LG. and Goodenough, P.W.: Active papain renatured and processed from insoluble recombinant propapain expressed in Escherichin coli. Prof. Eng. 5 (1992) 455-459. Topham, M., Salih, E., Frazao, C., Kowlessur, D., Overington, J.P., Thomas, M., Brocklehurst, SM., Patel, M., Thomas, E.W. and Brocklehurst, K.: Structure-function relationships in the cysteine proteinase actinidin, papain and papaya proteinase St. Biochem. J. 280 (1991) 79-92. Vemet, T., Tessier, DC., Lalibertt, F., Dignard, D. and Thomas, D.Y.: The expression in ~~~~~~~c~~~ eofi of a synthetic gene coding for the precursor of papain is prevented by its own putative signal sequence. Gene 77 (1989) 229-236.