Biochimica et Biophysica Acta 1491 (2000) 289^294 www.elsevier.com/locate/bba
Short sequence-paper
Mouse cathepsin M, a placenta-speci¢c lysosomal cysteine protease related to cathepsins L and P Katia Sol-Church, Jennifer Frenck, Robert W. Mason * Laboratory of Enzymology, Department of Research, Alfred I. duPont Hospital for Children, P.O. Box 269, Wilmington, DE 19899, USA Received 30 November 1999; received in revised form 21 January 2000; accepted 31 January 2000
Abstract The complete nucleotide sequence of a novel cathepsin cDNA derived from mouse placenta was determined and is termed cathepsin M. The predicted protein of 333 amino acid is a member of the family C1A proteases and is related to mouse cathepsins L and P. Mouse cathepsin M is highly expressed in placenta, whereas no detectable levels were found in lung, spleen, heart, brain, kidney, thymus, testicle, liver, or embryo. Phylogenic analyses of the sequences of human and mouse cathepsins show that cathepsin M is most closely related to cathepsins P and L. However, the differences are sufficiently large to indicate that the enzymes will be found in other species. This is in contrast to human cathepsins L and V, which probably resulted from a gene duplication after divergence of mammalian species. ß 2000 Elsevier Science B.V. All rights reserved. Keywords: Cathepsin; Placenta; Speci¢city; Evolution
In the past few years it has become apparent that the family of mammalian proteases related to papain by amino acid sequence (called C1A) is much larger than originally envisioned and several of the proteases are tissue-speci¢c. The major C1A family members discovered to date are cathepsins B, C, F, H, K, L, O, P, Q, S, V, W and X [1^18]. Cathepsins B, C, F, H, L, O, and X appear to be expressed in a variety of tissues, but expression of the other enzymes is limited. Thus cathepsin V is expressed in corneal epithelial cells and thymocytes [8,19], cathepsin S is primarily expressed in cells of lymphatic origin [20], cathepsin K is primarily expressed in osteoclasts [21], cathepsin W is expressed in lymphocytes [17] and cathepsins P and Q are expressed in placenta
* Corresponding author. Fax: +1-302-651-68888; E-mail:
[email protected]
[15,18]. The limited tissue expression of these latter enzymes suggests that they might have speci¢c physiological roles, including MHC class II protein processing, bone remodeling and embryonic development. Cysteine proteases in extra-embryonic tissues are proposed to support fetal growth and nutrition [22,23]. Inhibition of these enzymes can result in developmental abnormalities and growth retardation [24,25]. Although the inhibitors target the ubiquitously expressed enzymes, cathepsins B and L, other enzymes are likely to be the critical targets as gene knockout experiments have shown that cathepsins B and L are not essential for normal fetal growth [26^ 28]. In this study we report the identi¢cation of a new placental protease, that we have called cathepsin M, and show that it is related by sequence to cathepsins P and L.
0167-4781 / 00 / $ ^ see front matter ß 2000 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 7 - 4 7 8 1 ( 0 0 ) 0 0 0 3 0 - 0
BBAEXP 91370 29-3-00
290
K. Sol-Church et al. / Biochimica et Biophysica Acta 1491 (2000) 289^294
In order to identify novel proteases, the cDNA sequence of mouse cathepsin P [15] was used to search the mouse EST database of the NCBI using the blastn and blastx programs [29]. Sequences from three EST clones, accession numbers AI508247, AA833244, and AA024360 were identi¢ed that gave overlapping sequences, suggesting that they are derived from the same gene and encode the 5P, middle and 3P portions of a novel cathepsin. Translation of the composite sequence and alignment with the amino acid sequence of cathepsin P predicted a deletion of bases that encode approximately 40 amino acids, including the catalytic asparagine and histidine residues. This analysis would suggest that the mRNAs encode a non-functional protease. We considered the possibility that at least one of the sequences obtained from the EST clones was in error and decided to use information from the EST sequences to design primers for PCR ampli¢cation and sequencing of cDNA derived from mouse placenta. First strand cDNA synthesis of 15.5 day mouse placental RNA (a generous gift from Carlisle Landel, Wilmington, DE, USA) was performed in the presence of T7-oligo-dT using a cDNA cycle kit from Invitrogen (Carlsbad, CA, USA). Primers for PCR ampli¢cation and sequencing were designed initially using the EST sequences. The new sequence information obtained was used to design additional primers to fully characterize the coding sequence of this novel protease. The primers were custom-made by Life Technologies (Gaithersburg, MD, USA). PCR conditions were as follows: 2 Wl of the 30 Wl reverse transcription reaction were ampli¢ed in a 50 Wl volume containing 5 Wl of 10UPCR bu¡er, 3 Wl of 10 mM dNTPs, 1 WM of each speci¢c primer and 2.5 units of Taq polymerase (Qiagen, Valencia, CA, USA). PCR reactions were run in a Twin block thermocycler (Ericomp, San Diego, CA, USA) for 30 cycles (30 s at 94³C, 30 s at 55³C and 1 min at 72³C). For sequencing, PCR products were visualized on ethidium bromide-stained agarose gels and puri¢ed using a Quick Step PCR puri¢cation kit (Edge Bio Systems, Gaithersburg, MD, USA). Sequencing was performed using Big Dye Terminator technology as recommended by the manufacturer (PE Applied Biosystems, Foster City, CA, USA). Sequences were analyzed using MacVector, Oxford Molecular Ltd's Sequence Analysis Software.
Fig. 1. Sequence of cathepsin M. Cathepsin M was sequenced as described in the text. Sites of sequencing primers are underlined with direction of sequencing shown above. Primers used in PCR ampli¢cation for tissue expression studies are double underlined. The predicted amino acid sequence is shown below the nucleotide sequence.
Using this strategy, we were able to obtain a fulllength cDNA sequence that had an open reading frame of 999 bases that encoded a novel cathepsin. The sequence was submitted to GenBank (accession no. AF202528) and is displayed in Fig. 1. We have termed this newly identi¢ed gene product cathepsin M. The predicted amino acid sequence is closely related to that of papain, making it a member of the C1A family of cysteine proteases. The sequence of cathepsin M predicts a protein of 333 amino acids that is 57% identical to mouse cathepsin P and 55% identical to mouse cathepsin L (Fig. 2). One polymorphic site was detected at bp 949 (G to C). This substitution would change Cys272 to Ser.
BBAEXP 91370 29-3-00
K. Sol-Church et al. / Biochimica et Biophysica Acta 1491 (2000) 289^294
291
Fig. 2. Comparison of the amino acid sequence of cathepsin M with other human and mouse cathepsins. The predicted amino acid sequence of mouse cathepsin M (mouse cat M) was aligned with those of mouse cathepsins P, L, S, H and K and human cathepsins L, V, S, H and K. Amino acids that are conserved in all of the di¡erent cathepsins are displayed. Residue numbers refer to mouse cathepsin M. Active site cysteine, histidine and asparagine are boxed and potential glycosylation sites are underlined. Residues of the pro-peptides that interact with the speci¢city sites of the mature enzyme are in a box. Database accession numbers of the proteins are shown in the legend to Fig. 3.
The predicted amino acid sequence of cathepsin M was aligned with both mouse and human cathepsins B, C, F, H, K, L, P, S and X and human cathepsin V using the MacVector CLUSTALW program. A phenogram was constructed using the PHYLIP package available on the server of The Internet Bioinformatics Group of the Internet Research and Development Unit of the National University of Singapore
(http://sdmc.krdl.org.sg:8080/~lxzhang/phylip/). Distance computation was performed using the PAM protein sequence program and the tree generated using the UPGMA tree construction program (Fig. 3). This analysis shows that cathepsin M is in a subfamily of cysteine proteases related to cathepsin L and is a little more distantly related to a second sub-family that includes cathepsins S and K. The
BBAEXP 91370 29-3-00
292
K. Sol-Church et al. / Biochimica et Biophysica Acta 1491 (2000) 289^294
corresponding sites in cathepsin L, indicating that the protein is probably post-translationally glycosylated for targeting to the endosomal-lysosomal compartment [30]. There are a number of unique features of cathepsin M that distinguish it from cathepsin L. In human cathepsin L, amino acids Met92 -Asn93 -Gly94 -Phe95 Gln96 of the pro-peptide bind the enzyme speci¢city sites SP2 -S3 [31]. In human cathepsin K the corresponding amino acids (Met91 -Thr92 -Gly93 -Leu94 Lys95 ) are also bound in these sites [32]. This binding e¡ectively blocks the active site of the enzymes. The Phe side-chain that ¢ts into S2 of cathepsin L and Leu that ¢ts in S2 of cathepsin K are consistent with the known speci¢cities of these enzymes [33,34]. The Fig. 3. Evolutionary relationship between mouse and human cathepsins. All of the currently known full-length sequences of mouse C1A cysteine proteases and their human counterparts were aligned using CLUSTALW and analyzed using the PHYLIP phylogenic analysis programs. Protein sequences were extracted using Entrez at the NCBI. Human cathepsins B, C, F, H, K, L, S, V and X have protein accession numbers P07858, P53634, NP_003784, P09668, P43235, P07711, P25774, BAA25909, and NP_001327, respectively. Mouse cathepsins B, C, F, H, K, L, P, S and X have numbers P10605, P97821, CAB42884, CAA77182, CAA64218, P06797, AAD41898, AAB94925, and CAB44494, respectively.
aligned sequences of the more closely related enzymes are shown in Fig. 2. An alignment of all of the human sequences is available elsewhere [3]. The di¡erences in sequence between mouse cathepsins M, P and L are larger than would be expected for sequences that diverged since mammalian speciation, thus it is likely that mRNAs for these enzymes will be expressed in other mammals. This contrasts with the close similarity between human cathepsins L and V, indicating that the thymus-speci¢c enzyme, cathepsin V, may be restricted to the primates or humans. Cathepsin M is predicted to have a signal peptide of 20 amino acids, a pro-peptide of 93 amino acids and a mature protein of 220 amino acids. The essential active site residues Cys138 , His276 and Asn300 are conserved, as are many structurally important residues (Fig. 2). There are three potential N-glycosylation sites in the mature portion of the protein. All three sites are found at the corresponding sites in cathepsin P and two are found at the
Fig. 4. Expression of cathepsin M mRNA. RT-PCR was performed using speci¢c primers to mouse cathepsin M (top panel), mouse cathepsin L (middle panel) and mouse actin (bottom panel). Primers used for ampli¢cation of cathepsin M are shown in Fig. 2. 5P-TGTGACTCCTGTGAAGAACC-3P and 5P-CATACCCCATTCACTTCCC-3P were used for ampli¢cation of cathepsin L, and 5P-TGTATGCCTCTGGTCGTACCAC-3P and 5P-ACAGAGTACTTGCGCTCAGGAG-3P were used for ampli¢cation of actin. The PCR products were separated by gel electrophoresis on 2% agarose gels. The ¢rst lane in each panel contains a 100 bp ladder marker (M). Lanes 1^10 contain RTPCR products of lung, spleen, heart, brain, kidney, thymus, testicle, liver, embryo, and placenta, respectively. The gene-speci¢c products are as indicated.
BBAEXP 91370 29-3-00
K. Sol-Church et al. / Biochimica et Biophysica Acta 1491 (2000) 289^294
glycine residue in S1 enables the hydrophobic residues to penetrate deep into the speci¢city pockets of cathepsins L and K. In cathepsin M, the amino acid corresponding to this glycine is Glu94 suggesting quite a di¡erent interaction with the mature enzyme. Cathepsin P also has a negatively charged amino acid in this part of the propeptide, and both enzymes have a positively charged residue in the SP binding sites (Arg134 and Lys134 in cathepsins M and P, respectively). These enzymes may therefore prefer negatively charged residues in the S1 -SP1 sites or may be carboxypeptidases. Several other residues found in the active-site cleft are also unique to cathepsin M, including Asn136 , Thr275 , Ser277 , Met302 (in the other enzymes these are Gly or Ala; Asn, Asp or Gly; Ala, Gly, Ile or Val; and Trp, respectively). Although the full signi¢cance of these unique features is not yet clear, it is likely that the substrate speci¢city of cathepsin M will be di¡erent to that of cathepsin L and the other enzymes. The signi¢cance of these observations awaits crystallization of the protein structure of cathepsin M. For analysis of expression of cathepsin M in tissues, cDNAs were synthesized from total RNA isolated from mouse tissues that were purchased from Ambion (Austin, TX, USA). Primers shown in Fig. 2 were used to amplify a 604 bp fragment of cathepsin M in mouse tissues (Fig. 4, top panel). For the internal control (actin) each of the cDNAs were diluted 10 fold prior to ampli¢cation. Cathepsin M could only be detected in placenta and not in mouse embryo or a range of adult tissues. This expression pattern is similar to that of cathepsins P and Q [15,18]. By contrast, cathepsin L was found to be expressed in all tissues examined (Fig. 4, middle panel). It is interesting to note that human cathepsin V, which is even more closely related in sequence to human cathepsin L than it is to mouse cathepsin L, also has a restricted tissue expression pattern, but in this case is primarily expressed in thymocytes and corneal epithelial cells [8,19]. In conclusion, a novel cysteine protease has been identi¢ed that is primarily expressed in placenta. Whilst the structure of the protein is closely related to that of the more ubiquitously expressed enzyme, cathepsin L, unique features of its sequence suggest that it will have a di¡erent proteolytic function.
293
References [1] S.J. Chan, B. San Segundo, M.B. McCormick, D.F. Steiner, Proc. Natl. Acad. Sci. USA 83 (1986) 7721^7725. [2] A. Paris, B. Strukelj, J. Pungercar, M. Renko, I. Dolenc, V. Turk, FEBS Lett. 369 (1995) 326^330. [3] I. Santamaria, G. Velasco, A.M. Pendas, A. Paz, C. LopezOtin, J. Biol. Chem. 274 (1999) 13800^13809. [4] R. Fuchs, H.G. Gassen, Nucleic Acids Res. 17 (1989) 9471. [5] G.P. Shi, H.A. Chapman, S.M. Bhairi, C. DeLeeuw, V.Y. Reddy, S.J. Weiss, FEBS Lett. 357 (1995) 129^134. [6] S. Gal, M.M. Gottesman, Biochem. J. 253 (1988) 303^306. [7] G.P. Shi, J.S. Munger, J.P. Meara, D.H. Rich, H.A. Chapman, J. Biol. Chem. 267 (1992) 7258^7262. [8] W. Adachi, S. Kawamoto, I. Ohno, K. Nishida, S. Kinoshita, K. Matsubara, K. Okubo, Invest. Ophthalmol. Vis. Sci. 39 (1998) 1789^1796. [9] I. Santamaria, G. Velasco, A.M. Pendas, A. Fueyo, C. Lopez-Otin, J. Biol. Chem. 273 (1998) 16816^16823. [10] C.T.N. Pham, R.J. Armstrong, D.B. Zimonjic, N.C. Popescu, D.G. Payan, T.J. Ley, J. Biol. Chem. 272 (1997) 10695^ 10703. [11] D.K. Nagler, T. Sulea, R. Menard, Biochem. Biophys. Res. Commun. 257 (1999) 313^318. [12] M. Soderstrom, H. Salminen, V. Glumo¡, H. Kirschke, H. Aro, E. Vuorio, Biochim. Biophys. Acta 1446 (1999) 35^46. [13] J. Rantakokko, H.T. Aro, M. Savontaus, E. Vuorio, FEBS Lett. 393 (1996) 307^313. [14] D.A. Portnoy, A.H. Erickson, J. Kochan, J.V. Ravetch, J.C. Unkeless, J. Biol. Chem. 261 (1986) 14697^14703. [15] K. Sol-Church, J. Frenck, D. Troeber, R.W. Mason, Biochem. J. 343 (1999) 307^309. [16] I. Santamaria, G. Velasco, A.M. Pendas, A. Fueyo, C. Lopez-Otin, J. Biol. Chem. 273 (1998) 16816^16823. [17] C. Linnevers, S.P. Smeekens, D. Bromme, FEBS Lett. 405 (1997) 253^259. [18] K. Sol-Church, J. Frenck, R.W. Mason, Biochem. Biophys. Res. Commun. 267 (2000) 791^795. [19] I. Santamaria, G. Velasco, M. Cazorla, A. Fueyo, E. Campo, C. Lopez-Otin, Cancer Res. 58 (1998) 1624^1630. [20] S. Petanceska, L. Devi, J. Biol. Chem. 267 (1992) 26038^ 26043. [21] T. Inaoka, G. Bilbe, O. Ishibashi, K. Tezuka, M. Kumegawa, T. Kokubo, Biochem. Biophys. Res. Commun. 206 (1995) 89^96. [22] D.A. Beckman, J.E. Pugarelli, M. Jensen, T.R. Koszalka, R.L. Brent, J.B. Lloyd, Placenta 11 (1990) 109^121. [23] D.A. Beckman, R.L. Brent, J.B. Lloyd, Placenta 18 (1997) 79^82. [24] S.J. Freeman, J.B. Lloyd, J. Embryol. Exp. Morphol. 78 (1983) 183^193. [25] J.L. Ambroso, C. Harris, Teratology 50 (1994) 214^228. [26] K. Sol-Church, J. Shipley, D.A. Beckman, R.W. Mason, Arch. Biochem. Biophys. 372 (1999) 375^381. [27] J. Deussing, W. Roth, P. Saftig, C. Peters, H.L. Ploegh, J.A.
BBAEXP 91370 29-3-00
294
K. Sol-Church et al. / Biochimica et Biophysica Acta 1491 (2000) 289^294
Villadangos, Proc. Natl. Acad. Sci. USA 95 (1998) 4516^ 4521. [28] T. Nakagawa, W. Roth, P. Wong, A. Nelson, A. Farr, J. Deussing, J.A. Villadangos, H. Ploegh, C. Peters, A.Y. Rudensky, Science 280 (1998) 450^453. [29] S.F. Altschul, T.L. Madden, A.A. Scha«¡er, J. Zhang, Z. Zhang, W. Miller, D.J. Lipman, Nucleic Acids Res. 25 (1997) 3389^3402. [30] S. Kornfeld, I. Mellman, Annu. Rev. Cell Biol. 5 (1989) 483^ 525.
[31] R. Coulombe, P. Grochulski, J. Sivaraman, R. Menard, J.S. Mort, M. Cygler, EMBO J. 15 (1996) 5492^5503. [32] J.M. LaLonde, B.G. Zhao, C.A. Janson, K.J. D'Alessio, M.S. McQueney, M.J. Orsini, C.M. Debouck, W.W. Smith, Biochemistry 38 (1999) 862^869. [33] R.W. Mason, G.D.J. Green, A.J. Barrett, Biochem. J. 226 (1985) 233^241. [34] D. Bromme, A. Steinert, S. Friebe, S. Fittkau, B. Wiederanders, H. Kirschke, Biochem. J. 264 (1989) 475^ 481.
BBAEXP 91370 29-3-00