Structure and Chromosomal Assignment of the Human Cathepsin K Gene

Structure and Chromosomal Assignment of the Human Cathepsin K Gene

SHORT COMMUNICATION Structure and Chromosomal Assignment of the Human Cathepsin K Gene BRUCE D. GELB,*,†,1 GUO-PING SHI,‡ MATTHEW HELLER,† STANISLAWA ...

154KB Sizes 0 Downloads 88 Views

SHORT COMMUNICATION Structure and Chromosomal Assignment of the Human Cathepsin K Gene BRUCE D. GELB,*,†,1 GUO-PING SHI,‡ MATTHEW HELLER,† STANISLAWA WEREMOWICZ,§ CYNTHIA MORTON,§ ROBERT J. DESNICK,† AND HAROLD A. CHAPMAN‡ *Division of Pediatric Cardiology and †Department of Human Genetics, Mount Sinai School of Medicine, New York, New York 10029; and ‡Department of Medicine and §Department of Pathology, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts 02115 Received August 28, 1996; accepted January 14, 1997

Cathepsin K is a recently identified lysosomal cysteine proteinase that is the major protease responsible for bone resorption and remodeling. Mutations in this gene cause the sclerosing osteochondrodysplasia pycnodysostosis. To assess its evolutionary relatedness to other cysteine proteases and to facilitate mutation identification in patients with pycnodysostosis, a genomic clone, 74e16, containing the cathepsin K gene was isolated from a human PAC library, and the cathepsin K genomic structure was determined. The cathepsin K gene contained eight exons and spanned approximately 9 kb. The transcription initiation site, determined by primer extension analysis, was 169 nucleotides upstream from the translation initiation site. The 5*-flanking region lacked a TATA box but contained two AP1 sites. Comparison of genomic and cDNA sequences suggested that this flanking sequence may be the major promoter in osteoclasts and macrophages. Cathepsin K was mapped to chromosome 1q21 by fluorescence in situ hybridization and found to reside within 150 kb of an evolutionarily related cysteine protease, cathepsin S. These findings expand our understanding of the papain family lysosomal cysteine proteases and should facilitate mutation analysis in pycnodysostosis. q 1997 Academic Press

Cathepsin K (EC 3.4.22.38) is a lysosomal cysteine proteinase that has been strongly implicated as the major cathepsin in bone resorption and remodeling. The cathepsin K cDNA, cloned initially from rabbit osteoclasts (12) and subsequently from several human tissues (2, 6, 7, 9), encoded a polypeptide with the pre-, pro-, and mature peptide organization typical for the papain family of cysteine proteases. The cathepsin K sequence was highly homologous with those of cathepsins S and L (2, 9). Cathepsin K was highly expressed in osteoclastomas, with lower levels of expression in 1

To whom correspondence should be addressed at Mount Sinai School of Medicine, One Gustave L. Levy Place, New York, NY 10029. Telephone: (212) 241-8662. Fax: (212) 360-1809. E-mail: gelb@ms vax.mssm.edu.

GENOMICS

41, 258–262 (1997) GE974631

heart, lung, skeletal muscle, colon, ovary, and placenta (2, 6). Recently, mutations in the cathepsin K gene were shown to cause pycnodysostosis, a rare autosomal recessive skeletal dysplasia characterized by short stature, osteosclerosis, bone fragility, and abnormal bone and tooth development (4). Cathepsin K has been overexpressed in the baculovirus system, and the recombinant enzyme has been isolated and its physical and kinetic properties have been determined (1, 3). Cathepsin K was synthesized as a prepropeptide of 37 kDa and the mature enzyme was apparently a monomeric protein with an apparent molecular mass of about 29 kDa. The mature polypeptide has three putative N-glycosylation sites, which may facilitate its lysosomal targeting via the mannose-6phosphate receptor system. Cathepsin K activity had a pH optimum of 6.1 with a broad bell-shaped activity profile. The S2P2 substrate specificity for synthetic substrates was similar to that of cathepsin S, but differed from those of cathepsins L and B. The enzyme had strong collagenolytic, elastase, and gelatinase activities, exceeding those of cathepsins S and L and consistent with its proposed role in bone metabolism. To investigate the evolutionary relatedness of cathepsin K with the other cathepsins and to facilitate the analysis of cathepsin K mutations causing pycnodysostosis, the genomic organization of this gene was determined, including the exon and intron sizes, the exon–intron boundary sequences, the transcription initiation site, and the 5*-flanking region. In addition, the chromosomal location of the gene was determined by fluorescence in situ hybridization (FISH). To isolate a genomic clone containing the cathepsin K gene, a human PAC library (Genome Systems, St. Louis, MO) was screened by PCR using a cathepsin K sequence-tagged site (STS) from the 3*-untranslated region (UTR). PAC DNA from the positive clone, 74e16, was isolated by standard methods. Using the genomic organization data from cathepsins S and L (10), probable sites for exon–intron boundaries in the cathepsin K cDNA were selected. The seven predicted introns

258

ARTICLE NO.

0888-7543/97 $25.00 Copyright q 1997 by Academic Press All rights of reproduction in any form reserved.

AID

GENO 4631

/

6r2b$$$421

03-15-97 05:35:53

gnmxas

SHORT COMMUNICATION

259

FIG. 1. Genomic organization and exon–intron boundary sequences of the human cathepsin K gene. The eight exons are shown as black boxes with the translation initiation ATG and TGA stop codons indicated. The seven introns are indicated by the thick black lines. The cumulative size of the cathepsin K genomic sequence is shown on the scale. The sizes of the eight exons and seven introns are shown below along with the 5* and 3* splicing site sequences. The numbering of the cDNA position and codon phase is according to Ref. 9.

were PCR amplified from either human genomic or PAC 74e16 DNA with exonic oligonucleotides. The amplified products were sized on an agarose gel and cycle sequenced on an ABI 377 sequencer. The genomic sequence was divided into eight exons ranging from 103 to 665 nt (Fig. 1). Exon 2 contained the translation initiation site, all codons for the signaling prepeptide, and a portion of the propeptide, exon 4 encoded the Cterminus of the propeptide and the N-terminus of the mature enzyme, and exon 8 contained the stop codon. The seven introns ranged in size from 86 nt to 2.4 kb, with the entire gene being approximately 9 kb. As shown in Fig. 1, the exon–intron boundary sequences conformed to the GT/AG rule with a single exception (8). The 5* splicing donor site of exon 3 was nonconforming, containing the sequence TGgc, a pattern observed in approximately 1% of these sites (8). The boundary sequences were generally consistent with the 5* and 3* splice consensus sequences. Variations from the consensus sequences were all permissible for primate exon–intron boundary sequences. To determine the cathepsin K transcription initiation site, primer extension analysis was performed using a cathepsin K-specific antisense oligonucleotide primer that was end-labeled with [g-32P]ATP using T4 poly-

AID

GENO 4631

/

6r2b$$$421

03-15-97 05:35:53

nucleotide kinase. Either human brain total RNA or human lung mRNA was used with the radiolabeled primer for the reverse transcription reaction. The primer extension experiments were carried out as previously described (10). Three bands were identified with brain RNA, corresponding to positions 0169, 0102, and 040 (Fig. 2a). Identical results were obtained with the lung RNA (data not shown). The 5*-flanking region for the cathepsin K gene was subcloned into pGEM11z from PAC 74e16 after NsiI digestion and identified by colony hybridization with an end-labeled exon 1 oligonucleotide. Comparison of this 5*-flanking sequence cloned with the 5* UTRs from the cathepsin K cDNAs revealed sequence divergence (Fig. 2b). The 18-nt 5* UTR from the osteoclastoma cathepsin K cDNA (Accession No. U20280) (7) matched all other sequences and is presumed to represent a partial transcript. The 5* UTR of a macrophage-derived cDNA (Accession No. U13665) (9) was 105 nt and matched the genomic sequence, nearing the transcription start site at 040. Its first 12 nt did not match and presumably represents a cloning artifact. The cathepsin K 5* UTR cloned from arthritic bone (Accession No. X82153) (6) showed identity with 46 nt of exon 1 but its first 84 nt did not match the genomic sequence.

gnmxas

260

SHORT COMMUNICATION

FIG. 2. Identification of the human cathepsin K transcription initiation sites by primer extension and sequence of the human cathepsin K exon 1 and upstream flanking region in comparison to the published 5*-untranslated regions from cathepsin K cDNA clones. (a) Fifty micrograms of human brain total RNA was used for the primer extension (PE) reaction with an antisense primer from nucleotides /84 to /60. The primer extension products were determined to be 124, 186, and 253 bp by denaturing polyacrylamide gel electrophoresis using the adjacent DNA sequencing reaction (GATC) as a size marker. This corresponded to expected 5* UTRs of 40, 102, and 169 nt, respectively. (b) Sequence of the human cathepsin K exon 1 and upstream flanking region with comparison to the published 5*-untranslated regions from cathepsin K cDNA clones. Exon 1 (capitalized) and upstream flanking (lowercase) sequence was derived from PAC 74e16. The three transcription initiation sites from primer extension analysis are indicated in boldface type. The 5*-untranslated regions from four different cathepsin K cDNA clones are aligned to exon 1, with identical bases indicated. The regions of the bone cDNA 5* UTR that were identical to the cathepsin K coding region downstream at nt 864–876 and 786–856 are underlined and designated A and B, respectively. The sense primer sequence, CTSK43, used for PCR amplification from first-strand cDNA (see text), is underlined. Two putative AP1 binding sites in the upstream flanking region are also underlined.

Sequence comparison of these 84 nt revealed complete identity with the cathepsin K coding sequence at nt 864–876 and the reverse complement of 786–856. Interestingly, the latter region of identity corresponded to the 5* boundary of exon 7, suggesting a transcription or message splicing error rather than a cloning artifact.

AID

GENO 4631

/

6r2b$$$421

03-15-97 05:35:53

In contrast, the 141-nt 5* UTR from a spleen cathepsin K cDNA clone reported by Bro¨mme and Okamoto (Accession No. S79895) (2) was identical to 100 nt of exon 1, ending at the 0102 transcriptional start site, but failed to match the genomic sequence at its most 5* 40 nt (Fig. 2b). A BLASTn search of the nonredundant

gnmxas

261

SHORT COMMUNICATION

sequence and dbEST databases with these 40 nt matched only self. To determine the authentic 5* sequence, sense oligonucleotides, corresponding to this 5* UTR and to the predicted sequence near the transcription initiation site (primer CTSK43 in Fig. 2b), and an antisense primer in exon 2 were used to PCR amplify the 5* UTR from human brain and lung randomly primed first-strand cDNAs. A product of the predicted size was amplified using the primer matching the genomic sequence, but no product was amplified using the primer based on the Bro¨mme–Okamoto 5* UTR (data not shown). These results demonstrated that the 5*flanking region shown in Fig. 2b is the major, and perhaps only, promoter in lung and brain. Since the relevant portions of cathepsin K cDNA clones from bone and osteoclastomas showed sequence identity with exon 1, this 5* flanking region is likely the promoter controlling expression of cathepsin K in osteoclasts as well. Analysis of the genomic sequence upstream from the transcription initiation site revealed a rather featureless promoter that lacked the canonical TATA and CAAT box sequences. Promoters from other members of the papain family of cysteine proteinases, cathepsins S and B, have also lacked TATA boxes (5, 10). The cathepsin K promoter contained two AP1 sites at 0315 and 0346, was not particularly GC-rich, and lacked SP1 sites. This promoter was similar to the cathepsin S promoter, which contained multiple AP1 sites and few SP1 sites, but contrasted with the cathepsin B promoter, which was GC-rich and contained 15 SP1 sites, corresponding well with the fact that both cathepsins K and S are highly regulated while cathepsin B is expressed constitutively. The signals that regulate cathepsin K expression have not been delineated. Since the principal role for cathepsin K is in bone remodeling, putative estrogen response elements were specifically sought in this genomic sequence but none were found. Since such elements have been identified further upstream in promoters or in introns in other estrogenresponsive genes (11), additional studies are required to evaluate fully the possible role of estrogen in regulating the expression of cathepsin K in osteoclasts. The chromosomal localization of the cathepsin K gene was identified by FISH. Purified human cathepsin K cDNA was labeled with biotin-11–dUTP (Gibco BRL, Gaithersburg, MD) by nick-translation and hybridized to human metaphase chromosomes from two normal males as previously described (10). Probe signals were detected with avidin–fluorescein antibody (Oncor, Gaithersburg, MD), and chromosomes were stained with 4,6-diamidino-2-phenylindole dihydrochloride (DAPI; Oncor). Among the 22 metaphases scored, hybridization signal was observed at chromosome band 1q21 from one or both chromatids in 21 (data not shown). Since cathepsin S had also been mapped to 1q21 by FISH (10), the possibility was considered that cathepsins K and S might reside in tandem. Indeed,

AID

GENO 4631

/

6r2b$$$421

03-15-97 05:35:53

STSs for both cathepsins K and S were amplified from PAC 74e16 by PCR. Since PAC insert sizes range from 100 to 150 kb, these two cathepsin genes, each approximately 9 kb, lie in close proximity to one another. In summary, the cathepsin K gene was localized to within 150 kb of cathepsin S on chromosome 1q21. Like cathepsins S and L, with which it is highly homologous, the cathepsin K genomic structure comprised 8 exons that were interrupted by introns in the same coding sequence, suggesting that all three genes evolved from a common ancestral cysteine proteinase gene. Three transcription initiation sites were identified at nt 0164, 0102, and 040 in human brain and lung. Analysis of the 5*-flanking region, believed relevant for the high expression in osteoclasts, revealed a TATA-less promoter with two AP1 sites. These findings will permit further investigation of the regulation of cathepsin K, which is a critical proteinase in bone remodeling and, possibly, in other pathologic degradative processes. Also, knowledge of the cathepsin K genomic structure will facilitate mutation analysis in pycnodysostosis patients and further studies of the evolutionary relatedness of the lysosomal cysteine protease genes. ACKNOWLEDGMENTS This work was supported in part by National Institutes of Health research grants (R29 AR44231 to B.D.G., 5 R37 DK34045 and R01 DK31775 to R.J.D., and R01 HL44816 to H.A.C.), a grant from the National Center for Research Resources for the Mount Sinai General Clinical Research Center (5 M01 RR00071), and a grant for the Mount Sinai Child Health Research Center (5 P30 HD 28822) and by a Basic Research Award from the March of Dimes (B.D.G.). Note added in proof. While this paper was under review, a mutation in cathepsin K gene (CTSK) was reported by Johnson et al. (Genome Res. 6: 1050–1055, 1996).

REFERENCES 1. Bossard, M. J., Tomazek, T. A., Thompson, S. K., Amegadzie, B. Y., Hanning, C. R., et al. (1996). Proteolytic activity of human osteoclast cathepsin K. J. Biol. Chem. 271: 12517–12524. 2. Bro¨mme, D., and Okamoto, K. (1995). Human cathepsin O2, a novel cysteine protease highly expressed in osteoclastomas and ovary: Molecular cloning, sequencing and tissue distribution. Biol. Chem. Hoppe-Seyler 376: 379–384. 3. Bro¨mme, D., Okamoto, K., Wang, B. B., and Biroc, S. (1996). Human cathepsin O2, a matrix protein-degrading cysteine protease expressed in osteoclasts. J. Biol. Chem. 271: 2126–2132. 4. Gelb, B. D., Shi, G.-P., Chapman, H. A., and Desnick, R. J. (1996). Pycnodysostosis, a lysosomal disease due to cathepsin K deficiency. Science 273: 1137–1139. 5. Gong, Q., Chan, S. J., Bajkowski, A. S., Steiner, D. F., and Frankfater, A. (1993). Characterization of the cathepsin B gene and multiple mRNAs in human tissues: Evidence for alternative splicing of cathepsin B pre-mRNA. DNA Cell Biol. 12: 299– 309. 6. Inaoka, T., Bilbe, G., Osamu, I., Tezuka, K.-I., Kumegawa, M., and Kokubo, T. (1995). Molecular cloning of human cDNA for cathepsin K: Novel cysteine proteinase predominantly expressed in bone. Biochem. Biophys. Res. Commun. 206: 89–96.

gnmxas

262

SHORT COMMUNICATION

7. Li, Y.-P., Alexander, M., Wucherpfennig, A. L., Yelick, P., Chen, W., and Stashenko, P. (1995). Cloning and complete coding sequence of a novel human cathepsin expressed in giant cells of osteoclastomas. J. Bone Miner. Res. 10: 1197–1202.

10. Shi, G. P., Webb, A. C., Foster, K. E., Knoll, J. H., Lemere, C. A., et al. (1994). Human cathepsin S: Chromosomal localization, gene structure, and tissue distribution. J. Biol. Chem. 269: 11530–11536.

8. Shapiro, M. B., and Senapathy, P. (1987). RNA splice junctions of different classes of eukaryotes: Sequence statistics and functional implications in gene expression. Nucleic Acids Res. 15: 7155–7174. 9. Shi, G.-P., Chapman, H. A., Bhairi, S. M., DeLeeuw, C., Reddy, V. Y., and Weiss, S. J. (1995). Molecular cloning of human cathepsin O, a novel endoproteinase and homologue of rabbit OC2. FEBS Lett. 357: 129–134.

11. Sohrabji, F., Miranda, R. C. G., and Toran-Allerand, C. D. (1995). Identification of a putative estrogen response element in the gene encoding brain-derived neurotrophic factor. Proc. Natl. Acad. Sci. USA 92: 11110–11114. 12. Tezuka, K., Tezuka, Y., Maejima, A., Sato, T., Nemoto, K., et al. (1994). Molecular cloning of a possible cysteine proteinase predominantly expressed in osteoclasts. J. Biol. Chem. 269: 1106–1109.

AID

GENO 4631

/

6r2b$$$421

03-15-97 05:35:53

gnmxas