Gene 382 (2006) 111 – 120 www.elsevier.com/locate/gene
CRELD2: Gene mapping, alternate splicing, and comparative genomic identification of the promoter region Cheryl L. Maslen a,b,c,⁎, Darcie Babcock a , Jennifer K. Redig b , Katannya Kapeli b , Yassmine M. Akkari b , Susan B. Olson b b
a Department of Medicine, Oregon Health and Science University, Portland Oregon 97239, USA Department of Molecular and Medical Genetics, Oregon Health and Science University, Portland Oregon 97239, USA c Heart Research Center, Oregon Health and Science University, Portland Oregon 97239, USA
Received 4 April 2006; received in revised form 20 June 2006; accepted 20 June 2006 Available online 7 July 2006 Received by K. Gardiner
Abstract CRELD2 is the second member of the CRELD family of proteins. The only other CRELD family member, encoded by CRELD1, is also known as the AVSD2 gene as mutations in CRELD1 are associated with cardiac atrioventricular septal defects (AVSD). Like CRELD1, CRELD2 is ubiquitously expressed during development and by mature tissues. Recently, a specific CRELD2 isoform (CRELD2β) was implicated as a regulator of α4β2 nicotinic acetylcholine receptor expression, suggesting that the CRELD family has widely diverse biological roles in both developmental events and subsequent cell function. Here we report additional characterization of CRELD2, which was undertaken to further our understanding of this important family. Mapping of CRELD2 by FISH shows that it maps to 22q13 rather than the GenBank reported locus of 22p13. Comparative genomic analysis of upstream sequences shows a discrete region that is highly conserved among diverse species with hallmark features indicative of a promoter region. Functional analysis demonstrates that this region has promoter activity. Consistent with widespread expression of CRELD2, this region is GC-rich and lacks a TATA box. Overall, the highest levels of CRELD2 expression occur in adult endocrine tissues. However, alternative splicing of CRELD2 is extensive with positive identification of several splice variants expressed by most normal fetal and adult tissues. Confirmed splice variants encode 5 different CRELD2 isoforms that differ significantly in composition indicating that CRELD2 function is varied and as yet poorly understood. © 2006 Elsevier B.V. All rights reserved. Keywords: AVSD2; Congenital heart defects; CRELD1; Endocrine system; Nicotinic acetylcholine receptor
1. Introduction
Abbreviations: aa, amino acid; AVSD, atrioventricular septal defects; bp, base pairs; cds, coding sequence; CRELD, cysteine rich with EGF-like domains; CRELD2–10a, isoforms using exon 10a as the terminal exon; CRELD2–10b, isoforms using exon 10b as the terminal exon; ER, endoplasmic reticulum; EST, expressed sequence tag; FISH, fluorescence in situ hybridization; h., hours; hCRELD2, human CRELD2; kb, kilobase pairs; min, minutes; PCR, polymerase chain reaction; SDS, sodium dodecyl sulfate; SSC, 0.15 M NaCl/ 0.015 M Na3·citrate pH 7.6; nAChR, nicotinic acetylcholine receptor; UTR, untranslated region; var, mRNA variant. ⁎ Corresponding author. Oregon Health and Science University, L-465, 3181 SW Sam Jackson Park Rd., Portland, OR 97239, USA. Tel.: +1 503 494 2011; fax: +1 503 494 6986. E-mail address:
[email protected] (C.L. Maslen). 0378-1119/$ - see front matter © 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.gene.2006.06.016
Identification of the CRELD family of proteins (Rupp et al., 2002) led to the characterization of CRELD1 as the first known susceptibility gene for cardiac atrioventricular septal defects (AVSD), a common form of congenital heart defect (Robinson et al., 2003). Multiple CRELD1 missense mutations that are specifically associated with AVSD have been identified in patients with isolated sporadic AVSD and in AVSD with heterotaxy syndrome (Robinson et al., 2003; Zatyka et al., 2005). In addition, there is evidence of a role for CRELD1 in cancer. There are two characterized CRELD1 isoforms produced by alternative splicing (Rupp et al., 2002). The major isoform of CRELD1 (CRELD1α) has a type III
112
C.L. Maslen et al. / Gene 382 (2006) 111–120
transmembrane domain that anchors the protein to the cell surface. CRELD1α has the characteristics of a cell adhesion molecule since it includes multiple EGF domains and has a membrane topology with the bulk of the protein projected from the cell surface out into the extracellular space (Hynes and Zhao, 2000). The transmembrane domain of CRELD1 is eliminated in the alternatively spliced product, CRELD1β, because of the presence of an alternatively spliced exon (exon 9b) that shifts the reading frame and utilizes an alternate termination codon. The resulting isoform is predicted to be secreted into the extracellular space as opposed to being tethered to the cell membrane. Since the composition of CRELD1β is otherwise identical to CRELD1α, it is likely that CRELD1β competes with CRELD1α function as it would have the capacity to bind the same extracellular ligands. CRELD1β is expressed at high levels by multiple tumors and cancer cell types, and we previously speculated that the CRELD1β isoform (previously reported as CRELD1–9b) may play a role in tumor progression or metastasis (Rupp et al., 2002). In addition, Dilley et al. recently reported that another CRELD1 variant (GenBank Accession AL050275) has a significantly increased level of expression in neuroendocrine tumors from patients with multiple endocrine neoplasia type 1 (MEN1) syndrome (Dilley et al., 2005). These studies indicate that the CRELD family plays a critical role in many aspects of human health. To better understand this important protein family we have further characterized the one other existing family member, designated CRELD2. Like CRELD1, CRELD2 is highly conserved with orthologues recognized in many vertebrates, and homologues found in widely diverse species including D. melanogaster and C. elegans. As part of the human genome project, there is some information about CRELD2 in public databases. The National Center for Biotechnology Information website reports that CRELD2 is located on chromosome 22p13, and that the gene produces numerous transcripts produced from alternative splicing (http://www.ncbi.nlm.nih.gov/IEB/Research/ Acembly/av.cgi?c=locusid and org=9606 and l=79174). In addition, Ortiz et al. recently identified a CRELD2 isoform that interacts with the large cytoplasmic domain of human neuronal nicotinic acetylcholine receptor α4 and β2 subunits (Ortiz et al., 2005). This isoform is the product of alternate splicing of a previously unrecognized exon that is contiguous with exon 8, designated here as exon 8a. Utilization of exon 8a causes termination of the reading frame at the end of that exon, resulting in a truncated protein isoform known as CRELD2β. Northern blot analysis with a CRELD2β-specific probe showed that it is encoded by transcripts of 2.5 and 3.8 kb, which are present in all tissues tested but in lower abundance than the prominent 1.5 kb transcript visualized with a CRELD2 probe that does not distinguish between splice variants. The CRELD2β isoform was localized to the endoplasmic reticulum (ER), where it is thought to act as a specific regulator of α4β2 nicotinic acetylcholine receptor expression by retaining the α4 and β2 subunits in the ER, perhaps as a means of facilitating their assembly. However, given the ubiquitous expression of CRELD2 in the central nervous system, including regions that do not express α4β2 nicotinic acetylcholine receptor, it is likely that this function is related specifically to
the CRELD2β isoform, and that other CRELD2 isoforms have additional distinct functions. We have identified several features about CRELD2 that will be important to future studies involving molecular genetic analyses, including correct physical mapping of the CRELD2 locus, a detailed description of the genomic organization including identification of new alternatively spliced exons, identification of several previously unreported CRELD2 isoforms produced by non-malignant tissues, and characterization and functional analysis of the promoter region. 2. Materials and methods 2.1. Fluorescence in situ hybridization analysis (FISH) A bacterial colony harboring the CITF22 fosmid clone (Sanger Institute), which encompasses the CRELD2 locus was grown in LB medium with 25 μg/ml chloramphenicol, and the DNA isolated using the Qiagen plasmid maxi prep kit™ following the manufacturer's recommendation. DNA was then used as a probe for fluorescent in situ hybridization (FISH) on normal primary lymphocytes from a healthy donor. A fluorescent probe was produced by nick-translation using 1.2 μg of DNA and the Nick Translation Kit by Roche Applied Sciences (Indianapolis, IN) with Spectrum Green dUTP from Vysis (Downers Grove, IL). Cells were dropped onto slides, baked for 6 min at 97 °C, rinsed in 2× SSC at 37 °C for 30 min, and dehydrated using 70%, 80%, and 95% ethanol for 2 min each at room temperature. Immediately before hybridization, the probe was denatured at 75 °C for 10 min, and preannealed at 37 °C for 30 min. The probe was then added onto the slides and hybridization was done using HYBrite (Vysis); 72 °C denaturation temperature, 2 min denaturation time, and reannealing at 37 °C overnight. The next day, the slides were washed with 0.4× SSC/0.1% NP-40 at 73 °C for 2 min, with 2× SSC/0.1% NP-40 (Sigma) at room temperature for 1 min, and then counterstained with 125 μg/μl DAPI II (Vysis). Cells were observed using a Nikon E800 fluorescence microscope, and captured using CytoVision software from Applied Imaging. The probe signal was determined to be on chromosome 22q using DAPI banding and the CytoVision software. 2.2. Northern blot analyses Human Multiple Tissue Northern blots (MTN™, Clontech) and Multiple Tissue Expression Arrays (MTE™, Clontech) were probed with an ∼ 1 kb cDNA fragment encompassing exons 1–10 of CRELD2 generated by RT-PCR amplification from normal human cultured fibroblasts using primers in the 5′ UTR (5′-GCTCCTGCCGCTTCTGCTG-3′) and at the end of exon 10b (5′-GTAAGTCCGGCACATTACAG-3′). The composition of the fragment was confirmed by DNA sequence analysis. The fragment was labeled using the Gene Images labeling module (Amersham) to create the hybridization probe. Hybridization was carried out at 65 °C for 12 h. Blots were washed at a final stringency of 65 °C for 15 min in 0.1× SSC/ 0.1% SDS. CRELD2 transcripts were detected using the Gene
C.L. Maslen et al. / Gene 382 (2006) 111–120
Images CDP-Star detection module (Amersham). The blots were stripped and then probed under the same conditions with a βactin probe to check for equality of the sample loading. Blots were exposed to Hyperfilm ECL (Amersham) for visualization of the signal. 2.3. Differential expression of CRELD2 splice variants
113
10b terminus (GenBank accession number NP_077300), CRELD1 var1 (GenBank accession number NM 001031717) and CRELD1 var 2 (accession number NM 015513) in order to confirm that the BLAST results were unique to CRELD2. To determine the respective chromosomal location of the above cds and mRNA sequences in each species, NCBI (http:// www.ncbi.nlm.nih.gov/Genomes/) was searched. The affiliated
RT-PCR analysis was used to examine alternate splicing of differentially expressed exons in human tissues, using multiple tissue cDNA panels (MTC™ panels, Clontech). Primers in exons 5 and 7 were used to detect the presence of exon 6 (5′AGCGGGAATGGCCACTGCAGC-3′ and 5′-GACGAGGGC GCCTGTGTGG-3′). The product size is 357 bp with exon 6 and 238 bp when exon 6 is skipped. Primers in exons 7 (5′-TGT GACGAGTCCTGCAAGAC-3′) and 10 (5′-CAGGACACACACAGACGTAG-3′) were used to detect alternate splicing of exons 8a and 9. The product size is 375 bp when exon 9 is included and 279 bp if exon 9 is spliced out. Exon 8a would increase the product sizes by 83 bp if present. PCR primers that specifically amplify exon 10a or exon 10b were used to determine which alternate terminal exon is used. Primers for exon 10a are (5′-TGTGGACGAGTGCTCACTAG-3′) and (5′-ATTACA GGTCTTCGCGGG-3′), and amplify a 240 bp product. Primers for exon 10b are (5′-TGTGGACGAGTGCTCACTAG-3′) and (5′-TTACAGGTATTCGCGGGAG-3′), and amplify a 190 bp product. 2.4. Protein structure analysis The domain structure for the various CRELD2 isoforms was predicted using the Simple Modular Architecture Research Tool (http://smart.embl-heidelberg.de/). The amino acid sequence was also analyzed through the TMHMM algorithm to identify potential transmembrane domains and predict protein localization (http://www.cbs.dtu.dk/services/TMHMM-2.0/). Highly repetitive elements in the gene sequence were identified using the RepeatMasker program (http://repeatmasker.genome.washington. edu/cgi-bin/RepeatMasker). Sites for potential posttranslational modifications, in particular N-glycosylation and phosphorylation were detected using the Center for Biological Sequence Analysis bioinformatics tools NetNGlyc (http://www.cbs.dtu.dk/services/ NetNGlyc/) and NetPhos 2.0 (http://www.cbs.dtu.dk/services/ NetPhos/), respectively. 2.5. Promoter sequence identification A tBLASTx search was performed with the human mRNA sequence of CRELD2 (Rupp et al., 2002) to locate CRELD2 orthologues. This query resulted in the identification of a mouse mRNA (GenBank accession number BC047370, Evalue 0), a rat mRNA (GenBank accession number XM 343308, Evalue 9e − 142) and a chimpanzee complete cds (GenBank accession number XM 515211, Evalue 0). The amino acid sequences for these species were aligned in ClustalW (http://www.ebi.ac.uk/ clustalw/) against the complete coding sequence for CRELD2–10a terminus (GenBank accession number MGC11256), CRELD2–
Fig. 1. FISH mapping of CRELD2. (A) DAPI staining of a metaphase spread. Arrows refer to both copies of chromosome 22. (B) Same metaphase spread as above showing the localization of the CRELD2 probe (green) to the long arm of chromosome 22 using FISH with DAPI counterstain (blue). (C) Panel of human chromosomes 21 and 22: Right images (blue) are fluorescence in situ hybridization (FISH) using CRELD2 (green) as a probe. The CRELD2 signal is present on the distal long arm of both chromosomes 22. Left images (grey) are the same chromosomes viewed with DAPI banding. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
114
C.L. Maslen et al. / Gene 382 (2006) 111–120
Fig. 2. Diagrammatic representation of the genomic organization of the recognized coding region of CRELD2. Exons are rectangles, introns and alternate UTRs are lines. Alternate splice paths are indicated by lines connecting non-adjacent exons. The exons are numbered below. Exon 10a, which is contiguous with exon 10 is usually skipped, with most transcripts identified to date utilizing alternate exon 10b and the second 3′ UTR.
genomic contigs were acquired as follows: mouse chromosome 15 (GenBank accession number NT 039621), rat chromosome 7 (GenBank accession number NW 047783) and chimp chromosome 22 (GenBank accession number NW 121221), in accordance with the chromosome naming system proposed by McConkey (McConkey, 2004). The mouse, rat and chimp nucleic acid sequences were aligned with their relevant genomic contigs using the Spidey algorithm (http://www.ncbi.nlm.nih. gov/spidey/) to establish the location of their start codons. The 5 kb 5′ flanking DNA of human CRELD2 (chromosome 22, GenBank accession number AL671710) was then aligned with MultiPipMaker to the chimp, mouse and rat upstream sequences (Schwartz et al., 2000). The PipMaker inquiry was limited to only 5 kb upstream because of the presence of an upstream ORF, that when subjected to Blastn was identified as a human asparagine-linked glycosylation 12 homolog (GenBank accession number NM 024105). The 5′ flanking region to human CRELD2 was analyzed by EMBOSS CpG Finder to confirm the presence of any CpG islands (http://www.ebi.ac.uk/emboss/cpgplot/). As a final analysis the human genomic sequence 2 kb upstream of the translational start site was analyzed using Promoter Scan (PROSCAN), version 1.7 (http://thr.cit.nih.gov/molbio/proscan/), which predicts promoter regions based on scoring homologies
with putative eukaryotic Pol II promoter sequences (Prestridge, 1995). An analysis to detect clusters of regulatory elements was done using the CLUSTER-BUSTER algorithm (http://zlab.bu.edu/cluster-buster/cbust.html), which detects sequences that regulate gene transcription. 2.6. Promoter functional analysis The putative promoter region of CRELD2 was cloned from human genomic DNA using oligonucleotide primers that inserted a 5 Bgl II restriction site and a 3 Hind III restriction site into the pCR 2.1-TOPO vector (Invitrogen) (5′-AGATCTCCCAGAT CCCCAGCCCCCAGG and 5′-TTCGAATAGCGCTGCG GGAAGACGCAG, respectively). The putative promoter was digested from the TOPO vector at the introduced restriction sites and inserted into the pGL4.11[luc2P] vector (Promega) directly upstream of a luc2P gene. COS-1 cells, cultured in Dulbecco's modified Eagle's medium supplemented with 10% fetal calf serum and 1% P/S were maintained at 37 °C, 5% CO2. In a 12-well plate, 3× 105 cells were seeded and grown to 90–95% confluency. Cells were transfected with 1.6 μg of the reporter DNA using Lipofectamine reagents (Invitrogen) according to the manufacture's protocol. Cells were harvested and assayed for luciferase activity using Bright-Glo reagents (Promega) 24 h post-transfection.
Table 1 Positions of the exons comprising CRELD2 a Exon
Exon Size (bp)
Position
Intron size (bp)
Splice acceptor site intron ′ exon boundaries
Splice donor site exon ′ intron boundaries
1 2 3 4 5 6 7 8 8a 9 10 10a 10b
166 83 111 92 177 147 96 84 83 96 141 113 53
28085–28213 28545–28627 29045–29155 29471–29562 30900–31076 31612–31758 31927–32022 32549–32632 32633–32715 33675–33770 34732–34872 34873–34985 36570–36622
331 417 315 1337 535 168 526 – 959 961 – 1584 –
′ GGAGCTCCGG ctcctcgcag ′ GGGATGGTGG tgtgtttcag ′ CGAGATTCGC ccatcctcag ′ GAAGAGCGAA tcccctaaag ′ CATGCCAGGG ccctcagcag ′ TCAGGACCGG tgccttccag ′ CCTGTGACGA tctgttccag ′ ATGTGGACGA ′ GTGGGCCAGG cctgttgcag ′ AGTGTGACTC ttcattttag ′ ATGTGGACAG ′ GTGAGTGGCA tttctgacag ′ AAGCCACAGA
GTTTAACCAG ′ gtgggaaggg ACGAGTCCAG ′ gtgggtgccc GGCTGCAGCT ′ gtgagtgcct GACTGTCTCG ′ gtgcgtttct ATCTGCACAG ′ gtacgggcta TTCTCTCCAG ′ gttattaaaa GCCTGTGTGG ′ gtgaggagcg ACGTGCGAAG ′ gtgggccagg GGCTTCTTAG CAGTGTGCAG ′ gtcagtgacg GAGATGGTGA AGACCTGTAA
The three potential stop codons are in bold typeface. a Derived from the GenBank Homo sapiens genomic DNA reference sequence of chromosome 22 contig (AL671710). The position is the genomic DNA position along the chromosome 22 contig.
C.L. Maslen et al. / Gene 382 (2006) 111–120
115
Fig. 3. Northern blot analysis of poly-A+ RNA from human fetal and adult tissues showing CRELD2 transcripts. (A) Adult tissue Northern blot, hybridized with a CRELD2-specific cDNA probe. Each lane contains 2 μg mRNA isolated from H, heart; B, brain; P, placenta; Lu, lung; Li, liver; SM, skeletal muscle; K, kidney; Pa, pancreas. (B) Fetal tissue northern blot also hybridized with a CRELD2-specific cDNA probe. Each lane contains 2 μg mRNA isolated from B, fetal brain; Lu, fetal lung; Li, fetal liver; K, fetal kidney. The numbers to the left of each blot indicate the positions of the size markers on each blot in kb. The numbers to the right of each blot indicate the approximate size of the adjacent band as calculated based on the position relative to the size markers. Note the prominent 1.5 kb transcript in all lanes. It is likely that this transcript represents the splice variant that utilizes exon 10b. There are also several larger transcripts of unknown origin in both the adult and fetal tissues. On the adult blot there is also a band for an unidentified minor transcript of about 1.0 kb. Under each blot is an image of the same blot hybridized with a human β-actin cDNA probe used as a control to assess the relative amounts of RNA present in each lane.
Fig. 4. RNA dot blot analysis to survey general CRELD2 expression in multiple tissues. The identity of the source of normalized cDNA for each position on the blot is identified on the grid beneath the blot. Negative controls are in column 12. Note that there is a hybridization signal for most every tissue source, indicating that CRELD2 is ubiquitously expressed. The highest levels of expression in adult tissues are in tissues that are glandular. There is consistent albeit relatively low levels of expression in all fetal tissues represented, with the highest levels in the fetal liver and lung.
116
C.L. Maslen et al. / Gene 382 (2006) 111–120
C.L. Maslen et al. / Gene 382 (2006) 111–120
3. Results 3.1. Mapping of CRELD2 by FISH Database references indicate that the CRELD2 locus is on the short arm of chromosome 22, reported as a result of radiation hybrid mapping. Mapping of CRELD2 by FISH clearly shows that it resides at 22q13, near the telomere of the long arm (Fig. 1). This is consistent with the position of the DNA sequence for CRELD2 within the assembled chromosome 22 sequence, which places the gene at 22q13.33, with the translational start site (AUG) beginning at genomic position 28,085. 3.2. Genomic organization and alternate splicing Analysis of genomic DNA demonstrates that the originally reported CRELD2 coding region was that of an alternate splice variant (Rupp et al., 2002). Here we provide an updated sequence with the addition of two previously unidentified exons, which we now designate exons 6 and 9. The genomic organization for CRELD2 and the known alternative splicing pathways are shown in Fig. 2. In all, there are 13 identified coding exons, with the gene spanning approximately 9.8 kb. Exons 6, 8, 8a, 9, 10a and 10b are all subject to alternative splicing. All intron– exon boundaries have the appropriate splice site sequences and all intron sequences have polypyrimidine tracts and branchpoint consensus sequences characteristic of mammalian introns. The size of each exon and the junctional sequences are shown in Table 1. There are repetitive elements (SINE/Alu, LINE/L1 class) in introns 3, 4, 7, 9 and 3′ UTR-2. There are three alternate 3′ terminal exons, exons 8a, 10a, and 10b. Exon 8a was discovered as a result of the identification of the CRELD2β isoform (Ortiz et al., 2005). It is contiguous with exon 8 and is skipped in most transcripts. When it is utilized, translation is terminated at the inframe stop codon at the end of exon 8a, with the following intron serving as alternate 3′ UTR-1. Although there is no exact match to the polyadenylation signal sequence (aataaa) in 3′ UTR-1, there are several similar sequences that may suffice to promote polyadenylation of this mRNA. The two additional alternate 3′ terminal exons are designated here as 10a and 10b. Exon 10a is contiguous with exon 10, with the junction between them often treated as an exon–intron boundary resulting in exon 10a to being recognized as an intron. When exon 10a is spliced out exon 10b, which is 1.5 kb downstream is utilized in its place. There is an intervening alternate 3′ UTR (3′ UTR-2) separating exons 10a and 10b. Alternate stop codons occur at the end of 10a and 10b, and there
117
are polyadenylation consensus sequences (g.35771aataaa, g.36820aataaa) in both potential 3′ UTRs. PCR analysis of multiple tissue cDNA panels shows that exon 10b is the preferentially used terminal exon (not shown). Exon 10b has a relative abundance of exon splicing enhancers compared to exon 10a, which may contribute to its high rate of utilization. In fact, at this point we have found exon 10a expressed only by cultured skin fibroblasts. 3.3. Transcript analysis and tissue-specific expression Northern blot analyses show that there are many splice variants expressed by CRELD2. All adult and fetal tissues examined by Northern blot analysis show a complex splicing pattern with multiple transcripts (Fig. 3). The major transcript is approximately 1.5 kb, although the breadth of the band suggested that it might represent multiple transcripts that are all close to 1.5 kb in size. Ortiz et al. confirmed that the CRELD1α isoform is encoded by a 1.5 kb transcript (Ortiz et al., 2005). In addition there are several relatively abundant transcripts ranging from 5.5 to 1.0 kb, expressed by both adult and fetal tissues. The 5.5, 4.4, 4.0, 3.2 and 3.0 kb transcripts remain unidentified as to their precise composition. Normalized RNA dot blot analyses (multiple tissue expression arrays) were used to determine the extent of CRELD2 expression in a wide variety of tissues (Fig. 4). Like CRELD1, there is ubiquitous expression of CRELD2 in all tissues examined although there was a broad range of signal intensity suggesting that the level of expression varies significantly between tissue types. For CRELD2, the most prominent signals in adult tissues occur in pancreas, stomach, duodenum, salivary gland, thyroid gland, appendix and trachea, which generally overlap with the expression pattern for CRELD1 (Rupp et al., 2002). Of the fetal tissues represented, the highest levels of CRELD2 expression are in the lung, liver, thymus, spleen and heart. The levels of expression by various cancer cell lines is negligible, with the exception of the HeLA S53 cell line and the A549 lung carcinoma cell line, which show signals with the same relative intensity as the highest expressing fetal tissues. The dot blot does not distinguish which splice variants are expressed by these tissues. 3.4. Protein sequence analysis Translation of the cDNA sequence for the 5 major CRELD2 transcripts and sequence analysis using the SMART algorithm shows that each isoform has several overlapping motifs. Two to five alternative domain organizations are predicted depending
Fig. 5. (A) Diagrammatic representations of the basic structure of the 5 recognized CRELD2 isoforms. The isoform designation is to the left of each diagram. All CRELD2 isoforms have a tryptophan–glutamic acid rich (WE) domain that is unique to the CRELD protein family and varying numbers and spacing of EGF domains, some with the calcium binding consensus sequences. The composition of the carboxyl-terminal tail differs depending on which terminal exon is utilized. Likewise, the size of the protein varies depending on which splice variant is expressed. CRELD2 is predicted to be a secreted protein, although some splice variants could potentially be retained inside the cell, with isoform β known to be localized to the ER. (B) The amino acid sequences of the 5 known CRELD2 isoforms, shown in single letter code. The isoform designation, number of amino acids (in parentheses), the identity of skipped exons (indicated by the Δ), and the identity of the terminal exon, are shown above each isoform sequence. In the amino acid sequences, the boxed N is a potentially N-glycosylated asparagine residue. Potential phosphorylation sites with probability scores of N0.90 are underlined. Amino acids that are at the junction of exon boundaries and are therefore encoded by split codons are in bold typeface. Note that the amino acid residue at a given position varies when exon skipping alters the codon. The sequence from the 3′-terminal exon is in italics.
118
C.L. Maslen et al. / Gene 382 (2006) 111–120
Fig. 6. (A) Overview of MultiPipMaker results of the comparison of the 5′ upstream nucleic acid sequence of the CRELD2 orthologues. The chimp genomic sequence is highly conserved in the majority of the 5 kb upstream sequence from CRELD2, while the rat and mouse homology is mostly limited to the immediate 1 kb upstream sequences. The dark shading indicates regions of ≥90% identity over stretches of N100 bp, light shading indicates regions of ≥75% identity over stretches of N100 bp. (B) The CpG island upstream of CRELD2 as predicted by EMBOSS CpG Finder. This island stretches from − 52 to − 657 nucleotides upstream of the CRELD2 ATG start codon. (C) Annotated human genomic sequence showing the region of interspecies homology which extends − 1 kb from the initiation codon. The ATG translation start codon is in bold and the first exon of CRELD2 is underlined. This region overlaps the CpG island (bold italics) and the predicted promoter region (shaded with reverse typeface). The most likely promoter region as predicted by PROSCAN is from nucleotide position 27,696 to 27,946. Numbering corresponds to the nucleic acid sequence of human chromosome 22 (GenBank accession number AL671710).
on the positioning of boundaries between domains. The models differ from each other in that they contain either tandem arrays of EGF and/or calcium binding EGF domains, or EGF/calcium binding EGF domains and furin cysteine-rich domains. They all have the conserved tryptophan, glutamic acid rich (WE) domain that is unique to the CRELD family. Diagrammatic representations of the predicted domain structures are shown in Fig. 5A. For the sake of clarity only one model is presented for each isoform. There are considerable differences between the various isoforms. The number, spacing and precise nature of the EGFrelated domains varies depending on utilization of exons 6, 8a and 9. Junctional amino acids also vary with exon skipping since the codons are split between exons. However, the most significant variability is the nature of the carboxyl-terminus, which is dependent on which terminal exon is utilized. Proteins translated from transcripts utilizing exon 10a have a basic carboxyl-terminus with a pI of 8.25. In contrast the splice variants that express the alternate 3′ exon 10b have an acidic carboxyl-terminus with a pI of 3.91. The CRELD2β isoform
appears to be unique, utilizing exon 8a as the terminal exon, resulting in a carboxyl-terminus that is relatively cysteine rich, suggesting a structure that is significantly different from the other CRELD2 isoforms. The substantial variation in the composition of the various CRELD2 isoforms is likely to have significant functional consequences to the protein.
Table 2 Regulatory elements identified in the putative promoter region Motif
Location a
Weight b
Sp1 Sp1 Sp1 Sp1 SRF Sp1 CCAAT Sp1
27776 27788 27790 27812 27886 27901 27913 27944
6.02 7.19 6.66 8.12 7.96 6.02 11.9 12.9
a b
Location of the regulatory element based on genomic sequence numbering. Weights are log likelihood ratios.
C.L. Maslen et al. / Gene 382 (2006) 111–120
119
In addition to recognizable protein domains, other sequencebased elements are indicated. There is one consensus sites for asparagine-linked (N-linked) glycosylation (NXS/T) in the EGFlike domain encoded by exon 8. There are several high probability phosphorylation sites for serine, threonine and tyrosine residues. The sequences for each isoform and the positions of the predicted glycosylation site and all phosphorylation sites with probability scores N0.90 are indicated in Fig. 5B.
criptional elements (TEs) on the positive strand identified with weight scores of N 6 (Table 2). The weight is the relative frequency of the sequence occurring in a promoter versus non-promoter region (log likelihood ratios). CLUSTER-BUSTER analysis confirmed the presence of the promoter sequences and identified additional regulatory elements including a CCAAT box.
3.5. Promoter sequence identification
To investigate the transcriptional activity of the putative promoter region, a luciferase reporter assay was employed. The putative promoter was cloned into a reporter vector directly upstream of the luc2P gene and transfected into COS-1 cells. A statistically significant 18-fold increase (p b 0.00002) in luciferase expression was detected in COS-1 cells transfected with the CRELD2 promoter-containing vector compared with the promoterless vector (Fig. 7), indicating that the identified region encompasses sequences capable of activating transcription, which is likely the CRELD2 promoter.
BLAST searches against the mouse, rat and chimp genome databases using the hCRELD2 mRNA sequence identified orthologues for each of those species. The mouse and rat sequences were found to share the most homology with CRELD2–10b, while the chimp was in greatest relation to CRELD2–10a. The CRELD2 genes were found on mouse chromosome 15, rat chromosome 7 and chimp chromosome 22 (according to the chimp chromosome numbering system of McConkey (2004)). Alignment of the genomic sequences upstream of each CRELD2 orthologue revealed a highly conserved region between all 4 species extending from the translational start site to approximately 1 kb further upstream (Fig. 6A). This region is 97% identical between chimp and human, and 76% identical between mouse and human, and 77% identical between rat and human. Analysis of sequence upstream of the hCRELD2 coding region using the EMBOSS CpG Finder program identified a CpG island that extends from position − 52 to −657 upstream of the CRELD2 translational start site (Fig. 6B). The CpG island is within the region that is highly conserved across species. CpG islands are often associated with regulatory elements for gene expression, and the position of this CpG island relative to CRELD2 is consistent with that role. Analysis of the 2 kb upstream of the translational start site using PROSCAN identified a 250 bp region at −138 to −388 bp with a high density of Pol II promoter consensus sequences (Fig. 6C). In particular there is a dense clustering of Sp1 trans-
Fig. 7. Luciferase reporter gene expression driven by the CRELD2 promoter region identified by comparative genomics. Luciferase reporter gene activity was significantly greater (p = 0.00002) in cells transfected with the CRELD2 promoter region construct compared to cells transfected with the reporter plasmid with no insert, demonstrating promoter activity for the CRELD2 upstream fragment.
3.6. Promoter functional analysis
4. Discussion Our previous discovery that missense mutations in CRELD1 are associated with cardiac atrioventricular septal defects (Robinson et al., 2003) implicated the CRELD gene family in the pathogenesis of congenital heart defects. However, expression of the CRELD proteins is widespread and the specific functions are unknown. Characterization of CRELD2, the only homologue of CRELD1, was undertaken in order to provide a foundation for further molecular genetic analyses of this protein family. FISH analysis confirmed that CRELD2 is on 22q13, which is now corrected in GenBank. Given the association of CRELD1 mutations with heart defects, it is of note that the CRELD2 locus is not part of the DiGeorge/velocardial facial syndrome critical region (22q11), which is considerably more proximal. Promoter analysis identified a functional promoter region embedded within a large CpG island just upstream of the CRELD2 coding region. The CpG island lies in the 1730 bp spacer region between CRELD2 and ALG12, which are oriented in a head to head configuration. This strongly suggests that these two genes share common regulatory elements for transcriptional activation, possibly via a bidirectional promoter. The ALG12 gene encodes the endoplasmic reticulum enzyme dolichylphosphate mannose (Dol-P-Man):Man(7)GlcNAc(2)-PP-dolichyl mannosyltransferase, which is involved in asparagine-linked glycosylation. Missense and nonsense mutations in ALG12 cause a congenital disorder of glycosylation type Ig, which manifests with failure to thrive, hypotonia, facial dysmorphism, progressive microcephaly. It is unclear why CRELD2 and ALG12 expression should be coordinately regulated, although CRELD2 has a high probability N-linked glycosylation site in the calcium binding EGF domain encoded by exon 8. However, since ALG12 functions in the glycosylation of many other proteins it is still uncertain why there would be a specific regulatory relationship with CRELD2. To date, GenBank lists 172 spliced CRELD2 ESTs that display evidence of alternative exon use. Since ESTs tend to be skewed
120
C.L. Maslen et al. / Gene 382 (2006) 111–120
towards the 3′ end of the gene some of these may represent incomplete transcripts. However, there are 86 ESTs that extend through exon 1 suggesting that they are not derived from fragments. At this point all of these ESTs — have multiple sequencing errors or ambiguous base calls, so an accurate translation to amino acid sequence is not currently possible. Hence it remains to been seen how many of these reflect unique transcripts. Also, many of these ESTs are derived from cancer cells or tumors and may not be an accurate reflection of normal CRELD2 transcription. However, it is clear that alternative splicing is extensive and produces a large number of CRELD2 isoforms that are diverse in function. Exon skipping in CRELD2 has two general consequences. First, it varies the number and spacing of the EGF domains. This may be a means of fine tuning CRELD2 function. The greater variation results from alteration of the carboxyl-terminal end of the protein. Specifically, the carboxyl-termini of CRELD2 isoforms vary significantly depending on which 3′-terminal exon is utilized in the transcript. One transcript has been shown to terminate translation at exon 8a, resulting in the isoform CRELD2β (Ortiz et al., 2005). The CRELD2β isoform has a unique carboxyl-terminus that is relatively cysteine rich, suggesting a structure that is significantly different from the other CRELD2 isoforms. This unusual isoform has been shown to localize to the ER and bind to the α4 and β2 subunits of the nicotinic acetylcholine receptor, where it is thought to regulate assembly of nAChRs. The other CRELD2 isoforms use either exon 10a or 10b as the terminal exon. Exon 10b is preferentially used in most transcripts, with alternative splicing skipping exon 10a. Isoforms that use exon 10b have a 17 amino acid residue carboxyl-terminus that is acidic, while utilization of exon 10a results in a basic carboxylterminus of 37 amino acids that includes a single cysteine residue which may be available to form a disulfide bond with other molecules. All of the alternate carboxyl-terminal sequences are unique to CRELD2 and given the differences in composition they most likely contribute to variation in CRELD2 isoform function, with the potential for many distinct roles in multiple tissues. Sequence analysis suggests that CRELD1 is a cell adhesion molecule with most of the protein residing in the extracellular space, tethered to the cell surface by transmembrane domains. There are no recognized signaling motifs, but the potential for cell signaling has not been ruled out. The most significant difference between the major CRELD1 isoform (CRELD1α) and CRELD2 isoforms is that CRELD1α is bound to the cell surface by two type III transmembrane domains, whereas CRELD2 isoforms are predicted to be secreted freely into the extracellular space, with the exception of CRELD2β which is localized to the ER. Con-
sequently, most CRELD2 isoforms are similar to CRELD1β which is the only known CRELD1 isoform that is not membrane bound. Otherwise CRELD1 and CRELD2 isoforms share similar domain structures in general, with the highly conserved WE region that is the hallmark feature of the CRELD protein family, followed by variations of EGF domains (Rupp et al., 2002). Therefore it appears that secreted CRELD2 isoforms could potentially modulate CRELD1 function by competitive binding of common ligands, with CRELD1 as a cell surface receptor and CRELD2 as an extracellular competitor. This raises the interesting possibility that CRELD2 regulates CRELD1 function. This could be further modified by interplay between different CRELD1 and CRELD2 isoforms. Given that CRELD1 and CRELD2 likely participate in the same biochemical pathway, CRELD2 is a compelling candidate gene for cardiac atrioventricular septal defects based on the association of CRELD1 with this common form of heart defect. Acknowledgments Thanks to Dr. Robert Glanville for helpful comments on the manuscript. Supported in part by PHS grant 5 M01 RR00334. References Dilley, W.G., Kalyanaraman, S., Verma, S., Cobb, J.P., Laramie, J.M., Lairmore, T.C., 2005. Global gene expression in neuroendocrine tumors from patients with the MEN1 syndrome. Mol. Cancer 4. Hynes, R.O., Zhao, Q., 2000. The evolution of cell adhesion. J. Cell Biol. 150, F89–F96. McConkey, E.H., 2004. Orthologous numbering of great ape and human chromosomes is essential for comparative genomics. Cytogenet. Genome Res. 105, 157–158. Ortiz, J.A., et al., 2005. The cysteine-rich with EGF-Like domains 2 (CRELD2) protein interacts with the large cytoplasmic domain of human neuronal nicotinic acetylcholine receptor a4 and b2 subunits. J. Neurochem. 95, 1585–1596. Prestridge, D.S., 1995. Predicting Pol II promoter sequences using transcription factor binding sites. J. Mol. Biol. 249, 923–932. Robinson, S.W., et al., 2003. Missense mutations in CRELD1 are associated with cardiac atrioventricular septal defects. Am. J. Hum. Genet. 72, 1047–1052. Rupp, P.A., et al., 2002. Identification, genomic organization and mRNA expression of CRELD1, the founding member of a unique family of matricellular proteins. Gene 293, 47–57. Schwartz, S., et al., 2000. PipMaker—a web server for aligning two genomic DNA sequences. Genome Res. 10, 577–586. Zatyka, M., et al., 2005. Analysis of CRELD1 as a candidate 3p25 atrioventicular septal defect locus (AVSD2). Clin. Genet. 67, 526–528.