Gene, 149(1994)363-366 0 1994 Elsevier Science B.V. All rights reserved. 0378-l 119/94/$07.00
363
GENE 08213
Cloning and sequence analysis of the gene encoding human lymphocyte prolyl endopeptidase (Prolyl oligopeptidase family; cytoplasmic serine endoprotease; primary structure; catalytic triad)
G. Vanhoof”, F. Goossensa, L. Hendriksb, I. De Meestera, D. Hendriks”, G. Vriend”, C. Van Broeckhovenb and S. ScharpC” ‘Laboratory ofMedical Biochemistry, Department of Pharmacy, University of Antwerp, Unioersiteitsplein 1, B-2610 Wilrijk, Belgium; bLaboratory of Neurogenetics, Department of Biochemistry, University of Antwerp, Uniuersiteitsplein I, B-2610 Wilrijk, Belgium: Tel. (32-3) 820-2601; and “Department of Biocomputing, EMBL, Meyerhofstrasse 1, D-6900 Heidelberg, Germany. Tel. (49-6221) 387-473
Received by J.K.C. Knowles: 8 November 1993; Revised/Accepted: 23 March/24 March 1994; Recieved at publishers: 27 June 1994
SUMMARY
The human cDNA encoding prolyl endopeptidase, a cytoplasmic endoprotease which hydrolyses the peptide bond at the C-terminal side of proline, was sequenced. After the isolation of the 3’ terminal fragment of the pep cDNA sequence from a human lymphocyte cDNA library, an approach based on the polymerase chain reaction (PCR) was undertaken to obtain the complete pep cDNA. Overlapping DNA fragments were generated by PCR from cDNA synthesized from human lymphocyte mRNA. The DNA fragments were subcloned and sequenced. The complete cDNA is 2562 nucleotides (nt) in length and contains an open reading frame coding for a protein of 710 amino acids (aa). Comparison of the primary PEP sequences from human lymphocyte and pig brain shows 97% identity. The aa sequence analysis shows homology with bacterial PEPS and with protease II from Escherichia coli. Asp641 probably participates in the active site of PEP.
INTRODUCTION
Prolyl endopeptidase (PEP; EC 3.4.21.26) is a cytoplasmic serine protease cleaving peptide bonds at the C-terminal side of Pro residues. The protease was first isolated as an enzyme which cleaves the pregnancy hormone oxytocine (Walter et al., 1971). In contrast to all other Pro-specific enzymes, PEP activity is confined to action on oligopeptides of < 10 kDa (Moriyama et al., 1988) and it has an absolute requirement for the transCorrespondence to: Dr. G. Vanhoof, Laboratory of Medical Biochemistry, Department of Pharmacy, University of Antwerp; Universiteitsplein 1, S-6, B-2610 Wilrijk, Belgium. Tel. (32-3) 820-2727; Fax (32-3) 820-2745; e-mail:
[email protected]
Abbreviations: aa, amino acid(s); bp, base pair(s); kb, kilobase or 1000 bp; nt, nucleotide(s); ORF, open reading frame; PCR, polymerase chain reaction; pep, gene (DNA) encoding PEP, PEP, prolyl endopeptidase; PHA, phytohaemagglutinin; UTR, untranslated nt sequence. SSDI 0378-1119(94)00424-4
configuration of the peptide bond preceding Pro (Lin et al., 1983). Although its physiological role is unknown, it was shown that specific inhibitors for this enzyme prevent scopolamine-induced amnesia (Yoshimoto et al., 1987). Recently the cDNA sequence of Flavobacterium meningosepticum (Yoshimoto et al., 1991), Aeromonas hydrophyla (Kanatani et al., 1993) and pig brain PEP were determined, and the active site Ser residue was identified (Rennex et al., 1991; Chevallier et al., 1992). The linear arrangement of the active site residues (Ser-AspHis) of dipeptidyl peptidase IV (DPP IV; EC 3.4.14.5), a Pro-specific enzyme with a protease domain which aligns with prolyl endopeptidase, has recently been established and it differs from those of the trypsin (His-Asp-Ser) and subtilisin (Asp-His-Ser) family of serine proteases. It was our aim to obtain the aa sequence of human prolyl endopeptidase since probes derived from this sequence can be valuable tools for in situ hybridization studies.
364
agcc
-4
1 mtgtcct 61 121 181 241 301 361 421 481 541 601 661 721 781 841 901 961 1021 1081 1141 1201 1261 1321 1381 1441 1501 1561 1621 1681 1741 1801 1861 1921 1981 2041 2101 2161 2221
2281 2341 2401 2461 2521
ggtcataaaa gcctttgtgg ggtttataca aagaaaggaa tatgtacagg gacgatggca tatggtctga gccaaagagc gatgggaagg gagacatcta gaagatattt tctgatgatg ctctggtact aaactgattg acattcaaga cctgaagagt atagcttgtg ctgcagctcc agcattgtag tttttatctc gttttccgag ttctacccta aaattggatg acacccaact gtggccaaca gccaacaaac ggttacacat gctgcttgtg atggacatgc tgctcggaca aagttaccag gatgaccgcg ggccgcagca ggggcgggga gcgcggtgcc gacagaaaac ccccacggga ttttatcttt tctaggcaca tggcaacaaa tatttatatc acttctttgc
tccagtaccc
cgacgtgtac
cgcgacgaga
ccgccgtaca
ggattatcat
tttgtgaccc aggcccagaa aagagagaat aacggtattt attccttaga cagtggcact gtgccagtgg ttccagatgt gaatgttcta ccaatctcca tgtgtgctga gccgctatgt gtgacctaca acaactttga cgaatcgcca ctaagtggaa tcaggtccaa atgacctgac ggtaca,gcgg caggtatcat aggtgaccgt gcaaggatgg gctctcatcc acagtgtttc tcagaggagg aaaactgctt c’tcccaagag caaatcagag tgaagtttca gcaaacaaca aagcagatga tggtcccgct ggaagcaaag agcccacagc tgaatgtcga ctcaagggct acattattcc tttaggcttc tgtggttttt cacattgtga cataacctct aacaataata
ttacgcctgg taagattact gactgaacta ttatttttac gggggaggcc ccgaggttat ctcagactgg gcttgaaaga caactcatac ccaaaagctc gtttcctgat cttgttatca gcaggaatcc aggggaatat gtctcccaac agtacttgtt cttcttggtc tactggtgct tcagaagaag ttatcactgt aaaaggaatt tacgaagatt agctttctta caggcttatt tggcgaatat tgatgacttt gctgactatt acctgacctc taaatatacc ctttgaatgg catccagtac tcactccctg caaccccctg caaagtgata ctggattcca ttcccacgtt tgcactcaca tcctttttag tggtgttttt atattagatt atatctttaa aatgttattt
cttgaagacc gtgccatttc tatgattatc aatacaggtt agagtgttcc gcgttcagcg gtgacaatca gtcaagttca cctcaacagg tactaccatg gaacctaaat ataagggaag agtggcatcg gactacgtga tatcgcgtga cctgagcatg ttatgctacc ctccttaaga gacactgaaa gatcttacca gatgcttctg ccaatgttca tatggctatg tttgtgagac ggagagacgt cagtgtgctg aatggaggtt tttggttgtg atcggccatg cttgtcaaat ccgtccatgc aagttcattg cttatccacg gaggaagtct mcagttt gacaccaaga ggctacagtt caaggccttg tttaagggca gctgaattaa ataaatgtga aagaatgaaa
ccgacagtga ttgagcagtg ccaagtatag tgcagaacca tggaccccaa aagatggtga agttcatgaa gctgtatggc atggaaaaag tcttgggaac ggatgggtgg gatgtgatcc cgggaatcct ccaatgaggg tcaacattga agaaagatgt tccatgacgt ccttcccgct tcttctatca aagaggagct attaccagac ttgtgcataa gcggcttcaa acatgggtgg ggcataaagg ctgagtatct caaatggagg ttattgccca cttggaccac actctccatt tgctcctcac ccacccttca tggacaccaa cagacatgtt tcgtgcttcc aaccactggg gaacagaact gtgtttcttt tgttgggata ggatcatagt gaactgttct aa
acagactaag tcccatcaga ttgccacttc gcgagtatta catactgtct atattttgcc agttgatggt ctggacccat tgatggcaca cgatcagtca agctgagtta agtaaaccga gaagtgggta gacggtgttc cttctgggat cttagaatgg caagaacatt cgatgtcggc gtttacttcc ggagccaaga agtccagatt aaaaggcata catatccatc tatcctggca tggtatcttg gatcaaggaa cctcttagtg agttggagta tgattatggg gcataatgtg tgctgaccat gtacatcgtg ggcgggccac tgcgttcatc tcctgacagc cataatgctt gccgtgggaa ttccaccctg aatagctaaa cgggcatact catggagaag
Fig. 1. The nt sequence of the cDNA for human prolyl endopeptidase. Start and stop codons are boxed. The putativepolyadenylation signals, aataaa, are underlined. The sequence has been deposited in the GenBank/EMBL Data library under accession NO X74496. Methods Primer 5’-GTGCCTGAACATCGACTGGATTCCG corresponding to nt 2301-2325 of pig pep (Rennex et al., 1991), was end labeled with [Y-~‘PIATP and used to screen lo6 recombinants of a human PHA-stimulated plaque purification and two additional rounds of screening, isolated
clone using 1gtll
forward
and reverse primers.
lymphocyte cDNA library constructed in hgtll (Clontech, Palo Alto, CA, USA). After one positive clone was isolated. PCR was used to determine the insert length of the
The PCR fragment
was isolated
after gel electrophoresis
and subclonedinto the SmaI Siteof
the pUC18 vector (Pharmacia Sure Clone Ligation kit, Pharmacia, Uppsala, Sweden). The nt sequence was determined by the dideoxy sequence method using the Sequenase Version 2.0 kit (US Biochemical, Cleveland, OH, USA). Sequencing was done in both directions. The clone isolated from the cDNA library corresponds to nt 1799-2562 of the above sequence. To obtain the remaining cDNA sequence poly(A)+RNA was prepared from PHA-stimulated lymphocytes by the guanidinium isothiocyanate method using the Quick prep mRNA purification kit (Pharmacia, UPPsala, Sweden). .First strand total cDNA was synthezised using a combination of oligo(dT) primers and random hexamers (Superscript Preamplification MD, USA). Specific cDNA was enriched by PCR reaction using the forward primers system, Gibco BRL, Gaithersburg, 5’-AGCCATGCTGTCCTTCCAGT
(nt 192-211
of the pig pep cDNA; Rennex et al., 1991), S-TACAAAGAGAGAATGACTGAACTAT
(nt K-406
of the pig pep cDNA) and S-ATTCTCCCAACTACCGCCTG (nt 1115-1134 of the pig pep cDNA) and the reverse Primers T-CAGGCGGTAGTTGGGAGAAT (nt 1115-l 134, pig pep cDNA), 5’XTGTTTATGGAATCCAGTCG (nt 2117-2137 of the human pep cDNA). The thermocycler was programmed as follows: a denaturation step of 5 min at 94°C; 35 cycles of 94°C 30 s; 55% 90 S; 72”C, 90 s; and 1 cycle of 72”C, 2 min. Subcloning and sequencing was performed as above, apart from the first amplification of the PCR product the primers based on the pep cDNA
sequence
were used.
365
H P F A C
~IKF~K~~GAKELPovLE~~KF~~~~A TIKFUKVOGAKELPOVLEAVKFSCMA KI I I LOAE~KKOLOE~LLOVKFSGI EI HLUOVESKOPLETPLROVKFSQI ‘“* ’ G,,-, RFRNLETGNWYPELLONVEPS..
S S
o’d[l
i’.p”T,_,_p
l;oGK MF~NSVPOOOGKSOG~E~S~NL~IOKLV ~HOGKGUFYNAVPOOOGKSOG~E~S~N~H L~GOEGFFYSSVOK~KEQSVLSGM~O~.KHKVV L. GNEGFFVSSYOKPO. GSELSAATO.. O’” ao.‘o.K”. FVWAND WI
KLY
OHKLY
H ;
A C
280 .
H P i C
SO, AG, NGI TGI lo......F~Pl Ap..
p .
280
CKWVKCl LKWVKLl
0NFEGE;OYVlNE 1”:TFKTNROSP~YRVI ONFEGEVOYVlNEGlYFlFKlNR,,SPNY,,~I IlGFOSNVNVAOlDGOlLYLFlOKDAPNKR~VKllIONPKAEl [r Jw g -LLT”OGOLAA~“SL”DNKGSR~Y~~~N~~DAPN,,N~V~VEAONPGPEO FYFLPRRKOHEYSL~OHYOHRFlLRSNRHOKNFOLYRTRMND~~EOO
HI NI
OFWDPEESK OFTDPEESK ‘),
ijo
KVL; KVL” IKOij ROL, EELI
: F A C
PTA~~V~~EVSOMFAF~AN~LNV~WI~ PTAKVI EEVSDMFAFl AACLNl owl STE*VV~ENA~LL~AL”~UG,KSL~..... PVAKL,EOSADIYAF~LFEUG”NO~PNOP....................... GNFKSYEG~AUEYAFLV~ALAOG~~P~N~NTKVFPDNVSV~NAAPGSCCPGV
.
.
.
.
. .
. .
.
:::.
. .
.
.
.
.
.
.
.
.
.
.
P-. . .
.
.
.
.
.
.
. .
.
.
Fig. 2. Multiple sequence alignment of PEPS from human lymphocytes (H), porcine brain (P), Flauobacterium meningosepticum (F), Aeromonas hydrophila (A) and protease II from Escherichia coli (C). Sequences were extracted from the Swissprot Protein Sequence Database and aligned using the Pileup multiple sequence alignment software included in the Genetics Computer Group (Madison, WI, USA) package (Devereux et al., 1984). Residues identical in at least four proteins are boxed. Arrowheads indicate the consensus sequence of serine proteases, the active site His and the most plausible candidate for the active site Asp.
366 EXPERIMENTAL AND DISCUSSION
(a) Isolation and sequencing of the cDNA clones One positive cDNA fragment comprising 335 nt of the 3’ end of the pep sequence flanked by 429 nt of the 3’ UTR was isolated from a human PHA-stimulated lymphocyte cDNA library as described in the legend to Fig. 1. Total mRNA was isolated from human PHA stimulated lymphocytes and first-strand total cDNA was synthesized using a combination of oligo(dT) primers and random hexamers. Specific cDNA was enriched by PCR reaction using the primers described in the legend to Fig. 1. Three overlapping fragments were obtained. One of 943 nt ranging from nt -4 to 939, one of 1218 nt ranging from nt 920-2137 and one of 1951 nt ranging from nt 187-2137. The combined sequence of the different cDNA clones provided a 2562-nt cDNA which contained a 2133-nt ORF flanked by a 429~nt 3’ UTR. Putative polyadenylation signals AATAAA were identified at nt 2490-2495 and 2537-2542 of the human nt sequence (Fig. 1). (b) Sequence analysis The ORF encodes a 710-aa protein (80 745 Da) which shows 97% identity to the porcine brain PEP sequence. Sequence analysis shows homology with PEPS of bacterial origin and with protease II from E. coli (Fig. 2). The sequences flanking the active site Ser554 and His680 are conserved. From experiments with mutant PEP from F. meningosepticum (Kanatani et al., 1993) Asp”’ and Asp641have been put forward as the two plausible candidates for the active site residue. Moreover, the Asp involved in the active site of DPP IV has been identified (David et al., 1993). Based on homology at the aa level we postulate Aspb41 as the third member of the catalytic triad of PEP. PEP is classified in a new group of serine proteases which is called the prolyl oligopeptidase family (Barrett and Rawlings, 1992). Other members of this family include DPP IV and the acylaminoacyl peptidases. Our findings give further support to the hypothesis that this novel class of peptidases exhibits a Ser-Asp-His linear sequence arrangement of the active site residues, which is distinct from those of the classical serine proteases.
ACKNOWLEDGEMENTS
This work was supported by grants VLAB/074b of the Flemish Biotechnology Program and by IUAP-II of the
Belgian Science Policy Programme. C.V.B. is a research associate and I.D.M. a senior research assistant of the Belgian National Fund for Scientific Research. We are grateful to Dr E. Bosmans, M. Cruts, N. Lamoen and Y. Sim for valuable discussions and technical support.
REFERENCES Barrett, A.J. and Rawlings, N.D.: Endopeptidases, and the emergence of the prolyl oligopeptidase family. Biol. Chem. Hoppe-Seyler 373 (1992) 353-360. Chevallier, S., Pierrette, G., Thibault, P., Banville, D. and Gagnon, J.: Characterization of a prolyl endopeptidase from Flauobacterium meningosepticum. Complete sequence and localization of the active site serine. J. Biol. Chem. 267 (1992) 8192-8199. David, F., Bernard, A., Pierre& M. and Marguet, D.: Identification of serine 624, aspartic acid 702, and histidine 734 as the catalytic triad residues of mouse dipeptidyl-peptidase IV (CD26). J. Biol. Chem. 268 (1993) 17247-17252. Devereux, J., Haeberli, P. and Smithies, 0.: A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res. 12 (1984) 387-395. Kanatani, A., Yoshimoto, T., Kitazono, A., Kokubo, T. and Tsuru, D.: Prolyl endopeptidase from Aeromonas hydrophyla : cloning, sequencing, and expression of the enzyme gene, and characterization of the expressed enzyme. J. Biochem. 113 (1993) 790-796. Lin, L. and Brandts, J.: Evidence showing that a proline-specific endopeptidase has an absolute requirement for a trans peptide bond immediately preceding the active bond. Biochemistry 22 (1983) 4480-4485. Moriyama, A., Nakanishi, M., Takenaka, 0. and Sasaki, M.: Porcine muscle prolyl endopeptidase: limited proteolysis of tryptic peptides from hemoglobin P-chains at prolyl and alanyl bonds. Biochim. Biophys. Acta 956 (1988) 151-155. Rennex, D., Hemmings, B.A., Hofsteenge, J. and Stone, S.R.: cDNA cloning of porcine brain prolyl endopeptidase and identification of the active-site seryl residue. Biochemistry 30 (1991) 2195-2203. Walter, R., Shlank, H., Glass, J.D., Schwartz, I.L. and Kerenyi, T.D.: Leucylglycinamide release from oxytocin by human uterine enzyme. Science 173 (1971) 827-829. Yoshimoto, T., Kado, K., Matsubara, F., Koriyama, N., Kaneto, H. and Tsuru, D.: Specific inhibitors for prolyl endopeptidase and their anti-amnesic effects. J. Pharmacobio-Dyn. 10 (1987) 730-735. Yoshimoto, T., Kanatani, A., Shimoda, T., Inaoka, T., Kokubo and T., Tsuru, D.: Prolyl endopeptidase from FIauobacterium meningosepticum: cloning and sequencing of the enzyme gene. J. Biochem. 110 (1991) 873-878.