Gene 265 (2001) 55±60
www.elsevier.com/locate/gene
ERGL, a novel gene related to ERGIC-53 that is highly expressed in normal and neoplastic prostate and several other tissues Noga Yerushalmi a,1,2, Andrea Keppler-Hafkemeyer a,2,3, George Vasmatzis a,2,4, Xiu Fen Liu a, PaÈr Olsson a, Tapan K. Bera a, Paul Duray b, Byungkook Lee a, Ira Pastan a,* a
Laboratory of Molecular Biology, National Cancer Institute, National Institutes of Health, 37/4E16, 37 Convent Drive MSC 4255, Bethesda, MD 20892-4255, USA b Division of Clinical Sciences, National Cancer Institute, National Institutes of Health, 37/4E16, 37 Convent Drive MSC 4255, Bethesda, MD 20892-4255, USA Received 17 November 2000; received in revised form 29 December 2000; accepted 9 January 2001 Received by R. Di Lauro
Abstract We have identi®ed a new gene, that is highly expressed in normal and neoplastic prostate, and is also expressed in cardiac atrium, salivary gland, spleen and selective cells in the CNS. Database analyses of ESTs indicated prostate speci®city but experimental results showed the expression in other tissues. The full length transcript is 1800 bp with an open reading frame of 526 aa. The amino-terminal 230 residues of the expressed protein has high homology to a family of lectins, especially to the sugar binding domain of ERGIC-53. We therefore designate the new gene ERGL (ERGIC-53-like). There is a transmembrane domain at amino acid positions 468±482 suggesting that the product of ERGL is a type-I membrane protein. In prostate there are two fully processed transcripts one of which is a splice variant with a deletion in the region of the transmembrane domain of the protein. q 2001 Elsevier Science B.V. All rights reserved. Keywords: EST's data mining; Prostate cancer; Alternate splicing; In situ hybridization
1. Introduction Along with the elaborate effort to sequence the whole human genome, there is a very substantial effort to sequence expressed genes, locate their tissue distribution, and determine their activity. As a ®rst step in that effort, clones isolated from cDNA libraries of different tissues are sequenced (expressed sequence tags - ESTs) and those sequences are deposited in dbEST-a division of GenBank. There are over 4 million entries to-date, of which more than 1.8 million are sequences from human libraries. The dataAbbreviations: ERGL, ERGIC-53-Like; ESTs, expressed sequence tags; RACE, rapid amplication of cDNA ends; UTR, untranslated regions * Corresponding author. Tel.: 11-301-496-4797; fax: 11-301-402-1344. E-mail address:
[email protected] (I. Pastan). 1 Current Address: Peptor, LTC, Building 16, Kiryat Weizmann, Rehovot 76326, Israel. 2 Equal contribution. GV performed the bioinformatics and modeling. 3 Current Address: University Hospital, Department of Medicine, Division of Hematology/Oncology, Hugstetterstr. 55, D-79106 Freiburg, Germany. 4 Current address: Mayo Clinic, Cancer Center and Division of Experimental Pathology, 200 First Street SW, Hilton 800-B, Rochester, MN 55905, USA.
base is available publicly on: http://www.ncbi.nlm.nih.gov/ dbEST/index.html. Although this database suffers from inaccuracies and incomplete sequences (Gerhold and Caskey, 1996), it is a major tool in the process of discovering new genes (Okubo et al., 1997; Burke et al., 1998), or con®rming the identity of new genes discovered by other approaches (Frazer et al., 1997). Since most cDNA libraries are primed to hybridize to the poly (A) tail of mRNAs, ESTs usually contain untranslated regions (UTR) of the genes. Their length varies, and rarely does one cover a full length gene, but their sequence can be a lead for isolating full length transcripts from cDNA libraries or other sources of RNA. Together with the sequence information, there is a strong need for sophisticated computer analysis tools, to perform the clustering and analysis of the ESTs. We have previously shown, how database mining and the use of computational approach to cluster sequences from the EST database, can lead to identify new genes that are speci®cally expressed in prostate tissues (Vasmatzis et al., 1998; Brinkmann et al., 1998, 1999; Essand et al., 1999). The computational approach we have taken resulted in a list of potential new genes of interest.
0378-1119/01/$ - see front matter q 2001 Elsevier Science B.V. All rights reserved. PII: S 0378-111 9(01)00347-X
56
N. Yerushalmi et al. / Gene 265 (2001) 55±60
In this study, using a cluster of ESTs as a lead to screen and search human cDNA libraries, we identi®ed a new gene highly expressed in normal and neoplastic prostate tissue and several other tissues. The protein which it encodes (ERGL or ERGIC-53-like) has a number of common features with ERGIC-53, an ER Golgi intermediate compartment 53 kDa transmembrane protein (Fiedler and Simons, 1994), which belongs to a family of plant and animal lectins, that selectively bind to certain carbohydrate structures. 2. Materials and methods 2.1. EST clustering Clustering of ESTs was done as previously described (Vasmatzis et al., 1998). 2.2. Library screening and full length cDNA clone A 281 bp PCR fragment obtained from the EST nc75f10.r1 was radiolabled by 32p random primer extension (Lofstrand Labs Limited, Gaithersburg, MD) and used to screen a normal prostate cDNA library (Clontech, Palo Alto, CA), by colony hybridization. Positive clones were isolated, and sequenced. Rapid amplication of cDNA Ends (RACE) was used to obtain 5 0 cDNA sequence by using normal prostate cDNA (Clontech, Palo Alto, CA & Origene, Gaithersburg, MD) with gene speci®c primer. Sequence was assembled using the sequencer program (Gene Codes Corp. U.3.1 for Power Macintosh) and other programs (BLAST [ Altschul et al., 1990; http://www.ncbi.nm.nih.gov/blast/blast.cgi] GCGLITE HYPERLINK http://molbio.info.nih.gov/molbio/ gcglite/) (http://molbio.info.nih.gov/molbio/gcglite/) 2.3. RNA dot blots and Northern blot hybridization RNA hybridizations were performed on multiple tissue Northern blots (MTN, Clontech, Palo Alto, CA) and a Human Multiple Tissue Expression Array (Clontech, Palo Alto, CA, # 7775-1) containing mRNA from 76 human tissues in separate dots. The same labeled PCR fragment that was used for library screening was used as a probe. This probe is designated `EST-probe'. A 249 bp long probe (`gap-probe') speci®c for the `deleted fragment' (alternatively spliced exon) was also obtained by PCR. Hybridizations were as previously described (Brinkmann et al., 1998). 2.4. PCR rapid scan gene expression panel A rapid scan gene expression panel, containing PCRready ®rst-strand cDNA from 24 different tissues (OriGene, Rockville, MD) was used as a template for PCR with a primer pair (`P1, ACTCAATAAGGACTCTGCCAAG; and P2, ATTAGCGGCCGCCTCTGAGGTGGGTCAGGCAGGCAT') that distinguishes between the sequences
with and without the `deletion fragment' giving a 230 bp PCR fragment for the secreted version of the protein, and a 480 bp fragment for the membrane-bound form of the protein. 2.5. RNA in situ hybridization A 1000 bp fragment of the 3 0 end of the clone, excluding the poly(A) signal, and including the 249 bp deletion fragment, was cloned into pBluescript II SK (Stratagene, La Jolla, CA). Anti-sense and sense 35S-riboprobes were transcribed by T7 and T3 RNA polymerase, respectively (Lofstrand Labs Limited, Gaithersburg, MD). In situ hybridizations were performed by Molecular Histology, Gaithersburg, MD using standard conditions. 2.6. RT-PCR analysis of atrial tissue Total RNA was prepared from cardiac atrial tissue obtained from left heart catheterization, subsequently cDNA was prepared with a cDNA synthesis kit (Pharmacia-Biotech, Piscataway, NJ). PCR was performed with the primer pair `P1' and `P2'. Commercially available prostate cDNA (Clontech, Palo Alto, CA) was used as positive control, untranscribed RNA as negative control. The 480 bp PCR product was gel puri®ed, T/A cloned (In Vitrogen, Carlsbad, CA) and sequenced. 2.7. Chromosomal mapping Chromosomal localization of ERGL was determined by using the G3 Human/Hamster radiation hybrid panel (Research Genetics, Inc., Huntsville, AL). The ERGL gene product was ampli®ed by PCR using the following primers: 5 0 -TCGACAGCAGGCAGGAGCTGAA-3 0 and 5 0 -AATCACCTTTAACGAGGTGGGA-3 0 which amplify a 300 bp fragment of the ERGL gene. The data for the radiation hybrid panel were submitted to the Stanford RH server (HYPERLINK http://shgc.stanford.edu/rhserver2/ rhserver_form.html http://shgc.stanford.edu/rhserver2/ rhserver_form.html) to obtain the map position. 3. Results We previously reported that EST analysis of the human database revealed that there were many clusters of ESTs that appeared to be prostate speci®c (Vasmatzis et al., 1998). In this study we analyzed a cluster of ESTs, that we de®ned as cluster C14, (and later designated as ERGL) that only contained six ESTs (nc74f10.r1, nc74f10.s1, nc75f10.r1, nc75f10.s1, nj94a02.s1, wr37e05.x1). Four came from neoplastic prostate and two from normal prostate. When assembled, the sequence was 320 nucleotides in length, and contained a polyadenylation site and a polyA stretch. There were no ESTs in other human libraries that aligned with this cluster suggesting that C14 was prostate speci®c.
N. Yerushalmi et al. / Gene 265 (2001) 55±60
57
atrium (C4 and D4). There was a weaker signal in cerebellum (B2), the spinal cord (E3), appendix (G5), the adrenal gland (C9), fetal spleen (E11) and testis (F8). A signal in dot blot hybridization does not necessarily mean that the identical transcript is present; cross-hybridization is a possibility. To further investigate speci®city and transcript size a northern hybridization was carried out using the same probe (EST-probe) as for the dot blot. Three transcripts are detected in prostate (Fig. 1B). The strongest is a diffuse high molecular weight band. The others are distinct bands of 1.8 and 2.0 kb. In spleen there is a strong high molecular weight band and a faint low molecular weight (1.8 kb) transcript (Fig. 1B). Also a faint band in brain and weak bands in heart is detected (data not shown). The nature of the high molecular weight band is not clear, but may be incompletely spliced transcripts, since the RNA was prepared from whole tissues. The two small transcripts presumably are alternatively spliced fully processed transcripts. A panel of cDNAs prepared from many different organs (Origene) was studied using a primer pair (Fig. 2A) that could distinguish between the two transcript forms. The data in Fig. 2B shows that two speci®c bands, 480 and 230 bp in size were present in prostate, as expected. Expression in spleen and salivary gland but not in many other tissues was also detected. A single speci®c band (230 bp) was detected in salivary gland and spleen but not in heart ventricle. Since heart atrium was not present in this panel, we obtained atrial RNA and prepared cDNA. We found both bands expressed in atrium (Fig. 2C). The 480-bp fragment of the atrium RT-PCR was sequenced and is identical in prostate. Thus expression of ERGL is not prostate speci®c, contrary to the predictions from the EST database analysis. 3.2. Isolation of full length clones Fig. 1. Analysis of ERGL RNA expression. (A) RNA hybridization of a multiple tissue dot blot using a cDNA probe from the 3 0 end of the ERGL transcript. Positive tissues are prostate (E8), spleen (C7), salivary gland (E9) and right and left cardiac atrium (C4 and D4). There is a weak signal observed in cerebellum (B2), spinal cord (E3), the appendix (position G5), the adrenal gland (C9), fetal spleen (E11) and possibly testis (F8). The ®lm was exposed for 20 h. (B) Northern blot analysis showing differential expression and transcript sizes of ERGL in different normal tissues. A cDNA probe from the 3 0 end of the ERGL transcript was used for hybridization. The ®lm was exposed for 20 h. The transcripts expressed in prostate are 7.5, 2.0 and 1.8 kb. In spleen only the 7.5 kb transcript is observed.
3.1. Speci®city analysis To investigate the speci®city of ERGL expression a dot blot hybridization with RNAs from different tissues was performed (Fig. 1A). A very strong signal was present in prostate RNA (position E8), but signals were also found in spleen (C7), salivary gland (E9) and right and left cardiac
A full length cDNA was obtained by supplying appropriate primers to OriGene (Rockville, MD) who isolated the cDNA from one of their custom libraries. The deduced amino acid sequence of the full-length cDNA and the alternatively spliced variant are shown in Fig. 3. The transcript is 1.8 kb in length and encodes a protein of 527 amino acids. It consists of a signal sequence, a long extracellular domain, a transmembrane domain and a short intracellular domain. There is one possible glycosylation site (aa 75±77) and two possible phosphorylation sites in the intracellular portion. The shorter variant has a deletion of 83 amino acids which contains the putative transmembrane domain. To investigate the possible involvement of ERGL gene in human disease, chromosomal mapping of ERGL was carried out. It is localized at chromosomes 15q22-15q23. The ERGIC-53 gene was reported to be localized at 18q21.3-18q22. 3.3. Possible structure of the ERGL protein A BLAST search with the deduced full amino acid
58
N. Yerushalmi et al. / Gene 265 (2001) 55±60
Fig. 2. RT-PCR analysis of ERGL expression. (A) Schematic diagram of the ERGL cDNA. Indicated are also the location of the initial EST cluster, primers for PCR analysis, the ERGL open reading frame and the deleted region of a splice variant. (B) Tissue speci®city of ERGL. RT-PCR analysis of 24 different human tissues (Rapid Scan panel, Origene). Location of the primers used is shown in (A). The primers were able to discriminate between the two splice variants. The positive control was commercial prostate cDNA. (C) RT-PCR analysis of atrium RNA.
sequence against the GenBank nr database resulted in ten signi®cant hits, including p58, ERGIC-53, VIP36 and Gp36 family of genes. All the hits have a homologous stretch of about 230 amino acids close to their amino termini. The most homologous among these is ERGIC-53, a 53 kDa membrane-bound animal lectin of the ER-golgi intermediate compartment (Schindler et al., 1993; Fiedler and Simons, 1994). The sequence alignment between ERGL and this protein is shown in Fig. 3. The ERGIC-53 protein is made of amino terminal signal sequence of 30 residues, a 240 residue domain which is homologous to legume lectins, a middle domain of about 210 residues, an 18 residue transmembrane segment, and a short cytoplasmic tail of 12 residues that contains an ER retention signal. The ERGL protein is of similar size and has a similar domain organization. The lectin and the transmembrane domains are highly homolo-
gous, although the latter is spiced out in the shorter splice variant (Figs. 2 and 3). But the middle domain is not homologous and the cytoplasmic tail is longer and does not contain a recognizable ER retention motif. 3.4. In situ hybridization To determine the prostatic cell type expressing ERGL, several in situ hybridization experiments were carried out. Using anti-sense RNA, there was a strong and speci®c signal over normal and neoplastic prostate epithelial cells (Fig. 4I). No differences in signal intensity were evident between normal and carcinomatous acini (Fig. 4I). No signal was detected over other cell types. In the other tissues examined, the acinar and emptying duct epithelia of the salivary glands showed a clear positive signal and both
N. Yerushalmi et al. / Gene 265 (2001) 55±60
59
Fig. 3. Alignment of the ERGL and ERGIC-53 amino acid sequences. In the sequence for ERGL the putative signal is shown in italics and the transmembrane domain is shown in bold. The deleted region is underlined. The glycosylation and phosphorylation sites are shown in outlined letters (positions 76, 492 and 510). The homology of ERGL to ERGIC-53 and other lectins is high up to amino acid 260.
the right and left cardiac atrial muscle sarcoplasm was diffusely positive as were isolated subendocardial cells (Fig. 4II) but no signal was detected in the ventricle (data not shown). In the brain, neurons and astrocytes reacted intensely, while oligodendrocytes were variable with many showing little signal (Fig. 4II). Purkinje cells of the cerebellum were also positive (data not shown). A striking hybridization signal was seen in motor neurons of the ventral horns of the spinal cord (Fig. 4II). Cells of the dorsal cord were much less intense. A speci®c signal was also observed in the spleen (data not shown). Signal strength was as strong as that in the prostate carcinoma IS4, and was selective of the broad marginal zones of B-cell splenic follicles. The germinal centers and mantle zones showed no signals. There were no signals when radiolabelled sense RNA was used (data not shown). 4. Discussion We report here the identi®cation and sequence of a new gene, highly expressed in prostate but also other tissues. This new gene, that we designate ERGL, was predicted by EST database analysis to be prostate speci®c, but by hybridization and PCR derived screening methods was found to
be also expressed in cardiac atrium, salivary gland, spleen, brain and spinal cord. This result is in part explained by the small size of some EST libraries from different tissues. However it was surprising not to ®nd it in brain EST libraries as they are quite large. This result should be taken into consideration while searching the database for speci®c genes. The new gene encodes a protein with high homology to a family of animal lectins (Figs. 2 and 3). Its sequence and domain organization is strikingly similar to ERGIC-53, a 53 kDa protein from the ER-Golgi intermediate compartment. ERGIC-53 carries a Lys-Lys-X-X ER retention motif in its carboxy terminus (Schindler et al., 1993), which is lacking in ERGL, suggesting that ERGL protein is not retained in the ER. ERGL has a transmembrane domain, which is deleted in an alternatively spliced variant of the protein indicating that a secretory form of the protein is also expressed. It seems possible that ERGK like ERGIC-53 has a role in protein processing and secretion. Acknowledgements PaÈr Olsson is the recipient of a fellowship from the Swedish Cancer Society.
60
N. Yerushalmi et al. / Gene 265 (2001) 55±60
Fig. 4. In situ hybridization analysis of ERGL expression. (I) In situ hybridization of prostate tissue. Bright ®eld on the left, corresponding dark ®eld on the right. A: Prostate adenocarcinoma, Gleason grade 8. The dark ®eld on the right con®rms the signal that is not easily seen in the bright ®eld at 5 £ magni®cation. Note the small islands of neoplastic epithelial reaction against the well-demarcated non-reactive stromal background. B: Normal Prostate glandular epithelium with strong signal, contrasting with the background and gland luminal spaces. C: Normal Prostate following hybridization with the sense ERGL probe serving as an example of negative hybridization control. All samples had similar sense negative controls run in parallel. (II) In situ hybridization of salivary gland, cardiac atria, cerebral cortex and spinal cord. Bright ®eld is on the left, corresponding dark ®eld on the right. A: Salivary gland showing epithelial reaction of duct and acinar cells. B: Cardiac atria ®eld with diffuse, uniform signal of virtually all atrial cytoplasm. C: Frontal cerebral cortex demonstrating reaction in the triangular neurons and smaller astrocytes; Note the low signal in the oligodendrocyte seen at the left lower. D. Ventral horn of spinal cord: striking hybridization signal in the motor neurons.
References Altschul, S.S., Gish, W., Miller, W., Myers, E.W., Lipman, D.J., 1990. Basic local alignment search tool. J. Mol. Biol. 215, 403± 410. Brinkmann, U., Vasmatzis, G., Lee, B., Yerushalmi, N., Essand, M., Pastan, I., 1998. PAGE-1, an X chromosome-linked GAGE-like gene that is expressed innormal and neoplastic prostate, testis, and uterus. Proc. Natl. Acad. Sci. USA 95, 10757±10762. Brinkmann, U., Vasmatzis, G., Lee, B., Pastan, I., 1999. Novel genes in PAGE and GAGE family of tumor antigens found by homology walking in the dbEST database. Cancer Res. 59, 1445±1448. Burke, J., Wang, H., Hide, W., Davison, D.B., 1998. Alternative gene from discovery and candidate gene selection from gene indexing projects. Genome Res. 8, 276±290. Essand, M., Vasmatzis, G., Brinkmann, U., Duray, P., Lee, B., Pastan, I., 1999. High expression of a speci®c T-cell receptor gamma transcript in epithelial cells of the prostate. Proc. Natl. Acad. Sci. USA 96, 9287± 9292. Fiedler, K., Simons, K., 1994. A putative novel class of animal lectins in the
secretory pathway homologous to leguminous lectins. Cell 77, 625± 626. Frazer, K.A., Ueda, Y., Zhu, Y., Gifford, V.R., Garofalo, M.R., Mohandas, N., Martin, C.H., Palazzolo, M.J., Cheng, J.F., Rubin, E.M., 1997. Computational and biological analysis of 680 kb of DNA sequence from the human 5q31 cytokine gene cluster region. Genome Res. 7, 495±512. Gerhold, D., Caskey, C.T., 1996. It's the genes! EST access to human genome content. Bioessays 18, 973±981. Okubo, K., Matsubara, K., 1997. Complementary DNA sequence (EST) collections and the expression information of the human genome. FEBS Lett. 40, 225±229. Schindler, R., Marra, M.N., McKelligon, B.M., Lonnemann, G., Schulzeck, P., Schulze, M., Oppermann, M., Shaldon, S., 1993. Plasma levels of bactericidal/permeability-increasing protein (BPI) and lipopolysaccharide-binding protein (LBP) during hemodialysis. Clin. Nephrol. 40, 346± 351. Vasmatzis, G., Essand, M., Brinkmann, U., Lee, B., Pastan, I., 1998. Discovery of three genes speci®cally expressed in human prostate by expressed sequence tag database analysis. Proc. Natl. Acad. Sci. USA 95, 300±304.