Structural characterization of Rasgrf1 and a novel linked imprinted locus

Structural characterization of Rasgrf1 and a novel linked imprinted locus

Gene 291 (2002) 287–297 www.elsevier.com/locate/gene Structural characterization of Rasgrf1 and a novel linked imprinted locus Ara´nzazu de la Puente...

584KB Sizes 4 Downloads 22 Views

Gene 291 (2002) 287–297 www.elsevier.com/locate/gene

Structural characterization of Rasgrf1 and a novel linked imprinted locus Ara´nzazu de la Puente a,1, Julia Hall a, Yue-Zhong Wu a, Gustavo Leone a, Jo Peters b, Bong-June Yoon c, Paul Soloway c, Christoph Plass a,* a

Division of Human Cancer Genetics, Department of Molecular Virology, Immunology and Medical Genetics, The Ohio State University, 420 West 12th Avenue, Columbus, OH 43210, USA b Mammalian Genetics Unit, Medical Research Council, Harwell, Didcot, Oxfordshire, OX11 0RD, UK c Department of Molecular and Cellular Biology, Roswell Park Cancer Institute, Buffalo, NY 14263, USA Received 8 October 2001; received in revised form 25 March 2002; accepted 5 April 2002 Received by K. Gardiner

Abstract Imprinted genes in mammals are expressed either from the maternally or the paternally inherited allele. Previously, a genome wide scan identified novel imprinted genes based on their association with differentially methylated regions (DMRs). One of the identified genes, Rasgrf1, showed paternal expression in neonatal brain and was located on mouse chromosome 9. This gene is associated with a DMR, located about 30 kb upstream of Rasgrf1 exon 1. In order to better understand and identify novel elements involved in the regulation of this gene we have isolated and characterized genomic clones coding for mouse and human Rasgrf1 and RASGRF1, respectively. The mouse gene consists of 26 exons spanning approximately 140 kb of genomic DNA while the human gene has 28 exons. The human gene has an additional 39 bp exon inserted between exons 13 and 14 and exon 18 is split in two separate exons in human. The major transcription start site of Rasgrf1, as identified by primer extension, is 1324 bp upstream of the ATG translation start codon. Finally, a genomic region upstream of exon 1, spanning 489 bp, was determined to posses the essential promoter activity for Rasgrf1 gene. A second gene, A19, located 10 kb upstream of the DMR has been characterized. A19 is mainly expressed in testis and at lower levels in neonatal and adult brain tissue. The A19 transcript is non-coding and expressed in mouse testis and brain. A19 is imprinted with expression occurring from just the paternal allele in brain. q 2002 Elsevier Science B.V. All rights reserved. Keywords: Genomic imprinting; Guanine nucleotide exchange factors; Rasgrf1

1. Introduction The growing list of imprinted genes and their physical locations indicates that many imprinted genes are found within clusters. Clusters of imprinted genes are located on mouse chromosomes 2, 6, 7, 11, 12 and 17 (Beechey et al., 2001) and contain up to 15 imprinted genes in the case of chromosome 7, including Ins2, Igf2 and H19 among others (Paulsen et al., 2000; Engemann et al., 2000). The clustered organization of imprinted genes is thought to reflect coordinated regulation of the genes in a chromosomal domain Abbreviations: DMR, differentially methylated region; UTR, untranslated region * Corresponding author. Division of Human Cancer Genetics, The Ohio State University, Medical Research Facility 464A, 420 West 12th Avenue, Columbus, OH 43210, USA. Tel.: 11-614-292-6505; fax: 11-614-6884761. E-mail address: [email protected] (C. Plass). 1 Present address: Department of Biochemistry of Autonoma, University of Madrid, C/ Arzobispo Morcillo 4, 28029 Madrid, Spain.

(Wolffe, 2000; Reik and Walter, 2001). Two important clusters of imprinted genes have been associated with human disease; Beckwith–Wiedemann syndrome is associated with the imprinted region on human chromosome 11p15.5 (Maher and Reik, 2000) and Prader–Willi and Angelman syndromes are associated with the imprinted region on human chromosome 15q11-13 (Nicholls et al., 1998). Mouse orthologs of these loci are located on distal and central chromosome 7, respectively, reflecting the general trend that imprinted genes as well as the clustered organization of these genes are conserved between mouse and human. The guanine nucleotide exchange factor, Rasgrf1, was previously identified as the mouse homologue to the yeast Cdc25 gene (Cen et al., 1992; Martegani et al., 1992). Rasgrf1 is expressed mainly in adult brain (Sturani et al., 1997; Zippel et al., 1997) and plays a role in long-term memory formation (Brambilla et al., 1997) and growth control (Itier et al., 1998). Rasgrf1 is the only imprinted gene localized on mouse chromosome 9 so far, being pater-

0378-1119/02/$ - see front matter q 2002 Elsevier Science B.V. All rights reserved. PII: S 0378-111 9(02)00601-7

288

A. de la Puente et al. / Gene 291 (2002) 287–297

nally expressed. Paternal expression is found in neonatal and adult brain, while lung, ovary and testis show biallelic expression (Plass et al., 1996). Little is known so far about the regulation of imprinted expression as well as additional genes within this region. A differentially methylated region (DMR), located about 30 kb upstream of Rasgrf1 and methylated on the paternal allele, has been described. Next to this DMR is an unusual GC-rich repeat (Sp4repeat), that is found in Mus musculus and Rattus but is absent in Peromyscus (Pearsall et al., 1999). The presence of this repeat sequence correlates with imprinted expression of Rasgrf1 in Mus and Rattus (Pearsall et al., 1999). The repeat has since been shown to regulate establishment of methylation of the Rasgrf1 DMR and the DMR functions as a methylation-sensitive enhancer blocking element (Yoon et al., 2002; unpublished data). At the moment it is unknown if RASGRF1 is imprinted in human and if the repeat sequence is present. The involvement of the DMR with the imprinted expression of Rasgrf1 needs additional studies and it is possible that this region may be involved in the regulation of yet unknown genes. In this paper, we studied the genomic structure and the promoter region of mouse and human Rasgrf1 and RASGRF1 genes in order to better understand the underlying features that might control the expression of the imprinted mouse gene. In addition a novel gene, A19, potentially coding for an untranslated RNA was identified upstream of Rasgrf1. A19 produces four alternative transcripts in testis and brain. It is expressed from both alleles in mouse adult testis but exclusively from the paternal allele in brain.

2. Materials and methods 2.1. Genomic library analysis Screening of the mouse BAC library (RPCI-22 129/ SvEvTACBr) was performed with a pool of Rasgrf1 cDNA probes generated by RT-PCR with primers designed from the published cDNA sequence (Accession number: L20899): 1F (5 0 -CGCCCGATCTCAAAGTGTAG-3 0 ), 1R (5 0 -GCATAAGCGCCTCATGTTCT-3 0 ), 3F (5 0 -CAGCCTGGACTATGCCAAAT-3 0 ), 3R (5 0 -TTGCTGAAGCGAATGTCAAC-3 0 ), 4F (5 0 -CACAACGCCAAGCTTCTGTA3 0 ), 4R (5 0 -CTGGTCAGAGGAAAAGCCTG-3 0 ), 5F (5 0 ATGACGGAGGGTGTGAAGAC-3 0 ) and 5R (5 0 -CTCATGTGGGGAGTTTTGGT-3 0 ). All positive BAC clones were digested with HindIII and EcoRI. Southern blots were hybridized with primers derived from the cDNA sequence, in order to verify positive clones. The BAC contig was developed with probes designed from the end sequence of each BAC clone using primers T7 and Sp6. BAC end probes were used to rescreen the BAC library filters in order to obtain overlapping BAC clones. Mouse A19 was mapped into the BAC contig using the exon-trap clone 19-1 (Plass et

al., 1996) as a probe. The same approach was carried out for the human gene. In this case, RPCI11, a human BAC library, was screened with probes generated by RT-PCR with primers from the cDNA sequence (Accession number: L26584): 1F (5 0 -GCTGCAGAACCTGCTCTTCT-3 0 ), 1R (5 0 -ACACCACCTGGTTCCTCTTG-3 0 ), 2F (5 0 -TGATCATCGAAGGCTGTGAG-3 0 ), 2R (5 0 -TTGCTGAAGCGAATGTCAAC-3 0 ), 4F (5 0 -GTTCTTCGGACAAGGATGGA-3 0 ), 4R (5 0 -AGCTTCAGGTGGGGAGTTT-3 0 ). Several BAC clones were positive and verified by Southern blot analysis. BACs 386O12 and 411M3 contain RASGRF1 and were used in all experiments. 2.2. Mapping of intron and exon border The exon–intron structure of Rasgrf1, RASGRF1 and A19 genes was determined by directly sequencing exons plus exon–intron junctions in BAC clones with primers derived from the cDNA sequence and available intronic sequence. Exon–intron borders were inferred from comparison of cDNA and genomic sequences and identification of the acceptor and donor consensus sites. The sequencing was performed in the Genotyping Sequencing Unit of the Division of Human Cancer Genetics at The Ohio State University using an ABI PRISM 377 DNA Sequencer (Perkin-Elmer). 2.3. Transfection and reporter enzymes assay Constructs were generated from a 2.4 kb genomic clone 84 (MGI:31822) (Plass et al., 1996) that includes sequences 5 0 of Rasgrf1. Four fragments were ligated with the firefly luciferase expression cassette from pGL2-Basic (Promega) in both orientations. Constructs 84f and 84r contain the 2.4 kb fragment in 5 0 –3 0 and 3 0 –5 0 orientation, respectively. By digestion of clone 84 with EcoRI, two fragments were released of 1.6 kb and 781 bp and cloned, generating constructs 84DAf and 84DBCf in 5 0 –3 0 orientation and 84DAr and 84DBCr in the opposite orientation. Constructs 84DACr (5 0 –3 0 ) and 84DACf (3 0 –5 0 ) were generated by PCR using clone 84 as a template with primers 3F (5 0 GAATTCCAAGCCACCTGGGC) and 2R1 (5 0 -CTCGTTGAAACCGGGAACGG-3 0 ). The PCR products were cloned in TA-vector (Invitrogen) and subsequently into pGL2. All the constructs were sequence verified. Then 5 mg of all constructs was co-transfected with 3 mg of pCMV-b-gal and 17 mg of pBluescript into the neuro-2a brain cell line using the calcium phosphate method. Fortyeight hours after transfection, the cells were harvested and lysed in reported lysis buffer (Promega). The luciferase activity was measured using the luciferase assay system (Promega) and the b-gal activity was measured as described previously (Sears et al., 1997). 2.4. Primer extension analysis The transcription start site was mapped by primer extension using a 30-mer oligonucleotide (5 0 -CTTAATGATA-

A. de la Puente et al. / Gene 291 (2002) 287–297

GATTTTCCGTTCTTATTTTT-3 0 ) complementary to base pairs 1116–1146 upstream of the translation start site. This oligonucleotide was 5 0 end-labeled with [g 32P]dATP by T4 polynucleotide kinase. A total of 910,200 cpm were precipitated with 30 mg of total RNA from adult mouse brain and then dissolved in 10 ml of 120 mM KCl, 10 mM Tris (pH 8.5). The annealing was performed at 65 8C for 20 min. The hybridized primer was extended at 42 8C for 2 h 30 min using 15 units of an avian RNase H-minus reverse transcriptase from the ThermoScript kit (Life Technologies). The extended product was precipitated with 0.1 vol. 3 M NaAc, 2.5 vol. EtOH overnight/220 8C and after that dissolved in 10 ml of loading buffer, and further analyzed on a 6% sequencing gel. The size of the product was determined by comparison with a sequencing reaction that was performed with 41F9 oligonucleotide (5 0 -CATATGACGCCAATCCCTCG-3 0 ) in clone 84 DNA as template by a T7 Sequenase version 2.0 sequencing kit (USB).

289

2.9. Northern blot assay Mouse MTN blot from Clontech (7762-1) was hybridized with exon-trap clone 19-1 (Plass et al., 1996) following the instructions provided by the manufacturer. 2.10. RT-PCR assays

Long-range PCR assays were carried out by the Expand Long Template PCR System (Boehringer Mannheim).

A19 RT-PCR primers were A19F1 (TTCAAGACCCAGGATCAGATG) located in exon 4, A19R1 (TGCATGAGATCATGGAAGAGC) located in exon 6, A19F2 (GCTCCTTCATTGCTCTCGTC) located in exon 2 and A19R2 (AGGAGCTGAGGGTTTGATGA) located in exon 3. To test expression in various tissues we used primers A19F2 and A19R2. Denaturing was done at 96 8C for 39 s, annealing at 60 8C for 30 s and extension at 72 8C for 40 s. The PCR was run for 35 cycles. Allele-specific expression was tested with [g- 32P]ATP end-labeled primer A19F2 and A19R2. For the analysis of the allele-specific expression of A19 gene, 129/C57BL/6 females were crossed with PWK males and vice versa. Adult testis tissue and brain were collected from progeny of both crosses. A polymorphism between C57BL/6 and 129 (both C) and PWK (T) in position 876 of the published cDNA was found in exon 2. This polymorphism resulted in a polymorphic restriction site for BpmI that is present in the PWK DNA (CTCCAG) but not in the C57BL/6 or 129 DNA (CCCCAG). In addition we used a reciprocal cross between Mus spretus and Mus musculus castaneus. Again position 876 of the published cDNA was found to be polymorphic resulting in a polymorphic restriction site for BpmI that is present in the Mus musculus castaneus DNA (CTCCAG) but not in the Mus spretus DNA (CCCCAG). Expression was also tested in RNAs from mutant animals [T(4,9)45H] that had either a paternal or maternal duplication of the distal portion of chromosome 9 in combination with a paternal or maternal duplication of the distal portion of chromosome 4. The distal part of chromosome 9 includes the A19 gene.

2.7. Sequence analysis

3. Results

Sequence analysis was performed using the GCG package (http://gene.med.ohio-state.edu/gcg-bin/seqweb.cgi) for dot plot analysis and available Baylor College of Medicine programs on the web (TESS, TSSG, TSSW, Transfac, Core Promoter; http://www.hgsc.bcm.tmc.edu/SearchLauncher/).

3.1. Genomic structure of Rasgrf1 and RASGRF1

2.5. Field inversion gel electrophoresis (FIGE) and Southern blot hybridization BAC clones were isolated by a Qiagen-tip 100 purification system, digested with different enzymes and then analyzed on 1% Pulsed Field certified agarose (Bio-Rad) gels which were run with programs 3 and 4 of the FIGE mapper electrophoresis program (Bio-Rad). Gels were capillary blotted to Zeta-Probe GT membranes (Bio-Rad). Filters were hybridized to different probes generated by RTPCR. Prehybridization and hybridization were performed in 0.5 M sodium phosphate, 7% SDS, at 65 8C. The filters were washed at high stringency (0.02 M sodium phosphate, 0.1% SDS, at 50–60 8C). 2.6. Long-range PCR

2.8. Exon trapping and cDNA analysis Sequence data from exon-trap clone 19-1 (Plass et al., 1996) were used in a Blast search. This clone has high homology with the full-length mouse testis cDNA from the RIKEN library (Accession number: AK015891). Primers were designed in order to obtain the exon–intron structure of the A19 gene.

Several BAC clones were identified by screening the mouse 129 BAC library (RPCI-22). RT-PCR products spanning the whole cDNA were used as probes and positive BAC clones were confirmed by Southern blot analysis. Two positive clones, 459N11 and 500H5, containing all Rasgrf1 exons were used for subsequent studies. The genomic structure of Rasgrf1 was characterized by comparison of cDNA and genomic sequences obtained by direct sequencing of the BAC clones with primers designed from the published cDNA sequence. Rasgrf1 contains 26 exons spanning about 142 kb of genomic DNA (Fig. 1 and Table 1). The intron sizes were determined by long-range PCR with

290

A. de la Puente et al. / Gene 291 (2002) 287–297

Fig. 1. Structure of mouse and human Rasgrf1/RASGRF1 genes. (A) Exon–intron structure of mouse and human Rasgrf1/RASGRF1 genes. Schematic representation of the mouse and human Rasgrf1/RASGRF1 genomic structures. Exons are represented as boxes with the numbers above; they were determined by comparison of cDNA and genomic sequence, identifying the acceptor and the donor sites as well as the size of each exon. The sizes of the introns were determined by long-PCR with primer pairs placed in the exonic sequence. Open boxes represent either 5 0 or 3 0 UTRs. The major transcription start site is indicated by an arrow and the dotted part of the 5 0 UTR depicts the part without homology to the published cDNA. (B) Dot plot analysis of mouse and human promoter and exon 1 sequence of Rasgrf1/RASGRF1. The dot plot program from the GCG package (http://gene.med.ohio-state.edu/gcg-bin/seqweb.cgi) was used to align the 5 0 end mouse sequence (Y-axis) and the 5 0 end human sequence (X-axis). Both 5 0 end regions are shown, indicating the position of exon 1 and the promoter region in both genes. The repeat sequence seen within both the human and mouse promoters is a (TG)n dinucleotide repeat.

primers flanking the exonic sequences. All intron–exon junctions were sequenced and follow the canonical ACGT rules. The first exon contains the translation start codon and the last exon contains the termination codon, a 138 bp untranslated region (UTR) and the polyadenylation signal AATATA, which contains a mismatch with respect to consensus (A to T conversion at the fifth position) (Cen et al., 1992). To identify human BAC clones, we screened the RPCI-11 library with RT-PCR products generated with primers designed from cDNA sequence. Positive BAC clones were verified by Southern blot hybridization. Two clones, BAC 386O12 and 411M3, were used for direct sequencing with the primers designed from cDNA sequence. The comparison of cDNA and genomic sequence revealed that this gene contains 28 exons spanning about 160 kb of genomic DNA. Human exon 13a represents additional sequence in the human gene not found in mouse cDNA sequence. In addi-

tion, mouse exon 18 is split into exons 18a and 18b in the human gene (see Fig. 1A and Table 1). All intron–exon junctions were sequenced and follow the canonical ACGT rules. The last exon contains the termination codon followed by a 64 bp UTR and the polyadenylation signal (AATACA), which also contains a mismatch with respect to the consensus sequence (A to C change at the fifth position). 3.2. Determination of the transcription start site Primer extension analysis was carried out to determine the position of transcription initiation. For this purpose a 30mer antisense oligonucleotide complementary to base pairs 1116–1146 upstream of the translation start site (Fig. 2A) was used in primer extension assays of mouse brain total RNA (Fig. 2B). The results revealed a specific fragment of 178 bp suggesting that the transcription start site is located 1324 bp upstream of the ATG translation start codon. This

Table 1 Exon–intron distribution of mouse and human Rasgrf1/RASGRF1 a Mouse

Human

% homology

Size (kb)

Donor site

Intron

Size (kb)

Acceptor site

Exon

Size (kb)

Donor site

Intron

Size (kb)

Acceptor site

1 2 3 4 5 6 7 8 9 10 11 12 13

1.5 0.110 0.154 0.096 0.254 0.080 0.191 0.109 0.119 0.161 0.066 0.140 0.084

ACAGgtgaggg CCAGgtatgtc AACTgtaagct GAAGgtctgta ACAGgtaagcg CTGGgtgagta CCAGgtgactg CCAGgtgaagt CAAGgtagggt CAAGgtaaggc GAAGgtcagtg CCAGgtgagca TCAAgtaagtg

1 2 3 4 5 6 7 8 9 10 11 12 13

16 20 6.2 3.0 13 2.8 0.428 3.9 2.1 4.0 0.9 2.5 8.0

tgtgcagCAAC tccaaagCTAC attgcagATTA ttaccagGTTC ctttcagCGAG tttgcagCGGA ctcacagATCC tccccagGATC cttacagGTTC gaaaaagTCTT ggaaaagCCAA tttctagTGCG tttacagGTCT

14 15 16 17 18

0.250 0.376 0.122 0.174 0.219

CCAGgtagatg CATAgtaagtg ACAGgtaggtt ACAGgtatgtg TCAGgtaagag

14 15 16 17 18

2.9 3.5 2.2 0.984 2.5

ctcccagCAGG cccccagGCTC tccacagAGTT ccgtcagGCTT tccctagGACT

19 20 21 22 23 24 25 26

0.062 0.104 0.085 0.198 0.080 0.118 0.069 0.900

GATGgtaagca ATGAgtttgtg CCATgtgagta GCAGgtgcgga GAAAgtaagtg AATGgtaagtc AAAGgtaagag

19 20 21 22 23 24 25

4.3 1.3 2.4 4.2 3.5 1.1 5.9

ccctcagACGG cctgcagAGAG cttaaagGTCA cttccagACGA ttccaagTTGT cctgcagATCT ctttcagGTAA

1 2 3 4 5 6 7 8 9 10 11 12 13 13a 14 15 16 17 18a 18b 19 20 21 22 23 24 25 26

. 0.264 0.107 0.148 0.093 0.253 0.080 0.194 0.109 0.057 0.161 0.064 0.137 0.083 0.039 0.258 0.380 0.126 0.171 0.107 0.113 0.061 0.075 0.198 0.198 0.081 0.117 0.069 0.136

ACAGgtgaggc CCAGgtatgtc AGAGgtgactg GAAGgtcagtg ACAGgtgagca CTGGgtgagta CCGAgtggcgt CCAGgtgaggc CAAGgtatggc CAAGgtaaggg GAAGgtgagtg CCAGgtgagta GCAGgtggggt GCAGgtggggt TGAGgtgccct CAGAgtgagtg TCAGgtacaat GCAGgtatgtg TCAGgtgggtg TCAGgtaaagg GATGgtgagct ATGAgtaagtg TGACgtgagta GCAGgtgcggg AAAAgtaagtg GATGgtgcgtt AAAGgtaagct

1 2 3 4 5 6 7 8 9 10 11 12 13 13a 14 15 16 17 18a 18b 19 20 21 22 23 24 25

0.760 5.8 8.4 2.7 9.9 2.8 0.611 3.8 2.3 5.3 2.1 2.3 1.9 0.958 1.9 2.1 2.6 1.0 1.0 2.4 3.9 1.4 5.3 4.2 7.3 1.3 2.1

gatacagCATT tccacagCTAC gttgcagATCA tgctcagGTGC ctttcagCGAA cttccagCTGA cccgcagATCC cccacagAATA cccacagGTTC ttttcagAATG ctggaagCCAA attccagTGTG ttcacagGAGG ctctcagGTCC cccacagGTCG ctcccagGCTC tctgcagAGTT ctgccagGGTT ccggcagGACT tgcctagGACT aaaccagGCTG cctacagGGAG cttaaagATCA cttctagACTA ttccaagTTGT cccacagATAT ttctcagGTAA

88 83 84 93 92 90 90 90 89 84 94 84 92 0 87 81 70 85 92 84 70 91 88 88 93 92 94 87

A. de la Puente et al. / Gene 291 (2002) 287–297

Exon

a Exon sequences are given in uppercase letters and intron sequences are given in lowercase letters. Invariant splice acceptor (ag) and splice donor (gt) bases are shown in bold. Exon and intron sizes are depicted. The two additional exons in the human RASGRF1 gene are underlined. The homologies between exons of both genes are indicated in the last column.

291

292

A. de la Puente et al. / Gene 291 (2002) 287–297

Fig. 2. Identification of the transcription start site for Rasgrf1. (A) Structure of the 5 0 -flanking region of the mouse Rasgrf1 gene. The transcription start site was determined by primer extension assay and is indicated by an arrow. The ATG initiator and the EcoRI site used to generate the reporter constructs are indicated in bold. The location of the primer used in primer extension, pe4, is indicated. The locations of Oct1, AP1, and GC-box are given. The brackets indicate the sequence included in clone 84 (MGI: 31822). (B) Primer extension analysis of the transcriptional start site of the mouse Rasgrf1. Lane 1 shows the primerextended product synthesized with 30 mg of total RNA from adult mouse brain, using pe4 primer. The arrows indicate the major transcription start site at 178 bp and two minor transcription start sites at 166 and 161 bp. (C) Schematic representation of the luciferase reporter constructs. The top drawing depicts the original clone (clone 84) that was used to generate the transfection clones. Restriction sites for B-BamHI, and E-EcoRI are given. The transcription start sites are indicated by arrows. The deletion constructs are shown and arrows indicate the orientation of the fragment. Relative luciferase activity was measured 48 h after transfection in neuro-2a cells by the calcium phosphate method. Data are shown as fold increase or decrease relative to the activity of the empty vector (pGL2), which was set as 1.

A. de la Puente et al. / Gene 291 (2002) 287–297

position is referred to as position 11 in Fig. 2A. Two fainter bands at positions 113 and 118 were visible, indicating putative alternative transcription start sites. The sequence upstream of the transcription start site is GC-rich and lacks the conventional TATA-box, Inr and CAAT-box elements. A GC-box, typical for TATA-box less promoter sequences, is found at position 225. The 1324 bp mouse 5 0 UTR shows high homology with the human sequence, indicating that the first human exon 1 is larger than previously predicted based on the published cDNA sequence (see dot plot of mouse and human sequences in Fig. 1A). 3.3. Promoter characterization Various promoter prediction programs indicated possible promoter elements within genomic clone 84, which contains the 5 0 end of Rasgrf1 (regions A and B, Fig. 2C) and a portion of the 5 0 UTR (region C, Fig. 2C). Sequence analysis identified a CpG island and dot plot analysis indicated a high degree of homology between the mouse and human sequences for 2.8 kb upstream of the translation start site. These analyses suggest that this region harbors the promoter sequence for Rasgrf1/RASGRF1 (Fig. 1B). It is interesting to note that this region contains a (TG)n repeat that is conserved between mouse and human but longer in the human sequence. To further characterize this predicted promoter region for Rasgrf1, several luciferase reporter plasmid constructs containing various fragments upstream of the 5 0 UTR were generated in pGL2 and tested for promoter activity in the murine neuroblastoma cell line (see Section 2.3 for details and Fig. 2C). Each construct was transfected together with the pCMV-b-gal vector. The bgal activity was used as an internal control to normalize for the relative transfection efficiency, while the relative promoter activity was normalized with the results obtained from transfection with the empty vector, pGL2. Construct 84f including regions A, B and C (see Fig. 2C) exhibited a ten-fold higher level of luciferase activity in 5 0 –3 0 orientation versus the activity seen in transfection experiments using the empty vector. This activity was not seen with construct 84r that has the same insert in the opposite orientation (see Fig. 2C). These data indicate that this 1270 bp sequence upstream of the transcription start site contains the promoter for the Rasgrf1 gene. Deletion of region A, position 21270 to 2489 (construct 84DAf), resulted in about a 40% reduction of luciferase activity. Thus, region A contains elements required for increased expression of the locus but which by themselves are not sufficient for expression above the pGL2 control (see constructs 84DBCr, 84DBCf). These data are consistent with enhancer activity within region A. Furthermore, construct 84DACf, containing region B only, supports this assumption by showing reduced promoter activity. Taken together, these data suggest that region B contains the essential promoter of Rasgrf1. In addition, the orientation independence of regions B and C (2489 to 11130) suggests that region B

293

contains a bi-directional promoter and the results from construct 84DACr indicate a stronger activity of this bidirectional promoter element in the reverse orientation.

3.4. Mouse A19: mapping, Northern blot and exon–intron structure Blast searches with exon-trap clone 19-1, previously identified in an exon trapping experiment using P1 clone sp4P1 (Plass et al., 1996), identified homology to a fulllength cDNA clone A19 (Accession number: AK015891). This clone has a 1454 bp insert and was completely sequenced. Hybridization of exon-trap clone 19-1 as probe to the BAC contig shows that this novel gene is located about 40 kb upstream of Rasgrf1 and 10 kb upstream of the Rasgrf1 repeat sequence and DMR. A19 is transcribed in the same direction as Rasgrf1 (see Fig. 3A). Hybridization of the same probe against a mouse multiple tissue Northern blot showed two transcripts in testis RNA of 1.4 and 0.9 kb, respectively (Fig. 3B). RT-PCR on a set of adult and neonatal tissue RNAs confirmed the high expression in testis, but also identified low level expression in both adult and neonatal brain (Fig. 3C). Next we determined the exon– intron structure for A19. Six exons were identified all with exon–intron boundary sequences following the consensus sequence (Fig. 4A and Table 2). The last exon contains the polyadenylation signal (AATAAA). The A19 gene spans about 30 kb of genomic sequence and shows only a very short putative open reading frame (ORF) from nt 547 to 831 coding for a putative protein of 95 amino acids. The putative translation start site is not in good agreement with the published Kozak consensus sequence (Kozak, 1987) indicating that the transcript from this novel gene most likely is non-coding. RT-PCR reactions from testis RNA with various primer combinations identified alternatively spliced products eliminating exon 5 or part of exon 1, or a combination of both, resulting in four different transcripts of 1.4, 1.3, 0.9 and 0.8 kb in size (see Fig. 4A). None of the transcript forms has a larger ORF than the 1.4 kb transcript or a Kozak consensus translation start site (see Fig. 4A).

Table 2 Exon–intron structure of mouse A19 a Exon Size (kb) Donor site

Intron Size (kb) Acceptor site

1 2 3 4 5 6

1 2 3 4 5

720 205 120 109 75 220

TATAgtacgtg AGAGgtgggta TAAGgtgagtc GTAGgtttatc CACAgtctcgt

3.0 2.9 19.8 1.86 5.5

tgcacagACAC tatacagGTAT ttctcagCTGC cttctagGCCA tcctcagGTAC

a Exon sequences are given in uppercase letters and intron sequences are given in lowercase letters. Invariant splice acceptor (ag) and splice donor (gt) bases are shown in bold. Exon and intron sizes are depicted.

294

A. de la Puente et al. / Gene 291 (2002) 287–297

Fig. 3. Mapping and Northern blot of the A19 gene. (A) Map of the Rasgrf1 and A19 genes. Recognition sites of restriction enzymes are depicted as vertical lines. BAC clone numbers are given next to each BAC. The black arrows indicate the Sp4 repeat, and the locations of the DMR and the CpG island are given. The methylated NotI site is depicted in black. The orientation of transcription and the genomic sizes for Rasgrf1 and A19 genes are indicated by arrows. (B) Northern blot analysis of the A19 gene. Poly (A) 1 mRNA Northern hybridized with exon-trap clone 19-1 following the manufacturer’s instructions shows expression of two splicing variants of A19 (1.4 and 0.9 kb, respectively) in mouse adult testis RNA. The Northern blot was rehybridized with a GAPDH probe as a loading control. (C) RT-PCR for the detection of A19 expression. G3pdh served as a control for the RNA. Lanes: M, size marker; 1, heart; 2, brain; 3, spleen; 4, lung; 5, liver; 6, kidney; 7, testis (PWK); 8, testis; 9, skin; 10, skeletal muscle; 11, neonatal heart; 12, neonatal brain; 13, neonatal spleen; 14, neonatal lung; 15, neonatal liver; 16, neonatal kidney; 17, neonatal skin; 18, neonatal bone and skeletal muscle; (2), negative control. All RNAs are derived from the 129 strain with the exception of RNA in lane 7, which was derived from PWK. RNA from adult skin (lane 9) was of low quality.

3.5. Mouse A19 is paternally expressed in brain, but not in testis In order to investigate the expression of the A19 gene, RT-PCR assays were performed on adult testis RNAs from C57BL/6, PWK, 129/C57BL/6 £ PWK and PWK £ 129/ C57BL/6. RT-PCR experiments using end-labeled primers A19F2 and A19R2 were carried out followed by a restriction digest with BpmI (Fig. 4B). Following BpmI restriction digestion, the C57BL/6 and 129 alleles are cut once, whereas the PWK allele is digested twice (Fig. 4A,B). RT-PCRs for testis RNAs from F1 animals show both alleles indicating biallelic expression of the A19 gene. However, the RNA from brain tissue shows expression of the paternal allele (Fig. 4C). This result could be confirmed by sequencing the RT-PCR product. The C57BL/6 sequence

contains a C while the PWK sequence contains a T at position 876 of the sequence. This polymorphism results in a novel restriction site for BpmI (CTCCAG) in the PWK allele. We also tested the interspecific cross between Mus spretus and Mus musculus castaneus. Both strains were polymorphic for the BpmI site. While the Mus musculus castaneus allele contains the BpmI (CTCCAG) restriction site it is lacking in the Mus spretus allele. RT-PCR using RNAs isolated from brain tissues showed that the paternal copy of A19 is expressed (Fig. 4D). Further confirmation came from testing A19 expression in brain RNAs from heterozygotes for the reciprocal translocation [T(4;9)45H] that had either a paternal or maternal duplication of chromosome 9 distal to the breakpoint in 9D. Rasgrf1 maps distal to the breakpoint (Cattanach and Beechey, unpublished data) and thus this distal region of chromosome 9 includes the

A. de la Puente et al. / Gene 291 (2002) 287–297

A19 gene. A19 was only expressed in animals with a paternal duplication, but not in animals with a maternal duplication of this region (data not shown).

4. Discussion In this work, we have studied the structural organization of the mouse and human Rasgrf1/RASGRF1 genes and identified a novel, imprinted mouse gene, A19, located in this

295

region. The mouse Rasgrf1 gene is composed of 26 exons and displays an overall organization similar to the human RASGRF1 gene although the human locus contains 28 exons. The sizes of exons and intron–exon junctions are highly conserved between both genes. However, sequence comparison indicated that the human gene possesses two unique features. The human exons 18a and 18b correspond to mouse exon 18. Furthermore, this gene bears an extra exon, called 13a, and analysis of the putative amino acid sequences indicated that the addition of 13a does not affect

Fig. 4. Allele expression analysis of the A19 gene. (A) Genomic organization of A19. The figure shows the structure of four alternative transcripts of the A19 gene. The exon numbers are indicated above each box. The black boxes indicate the largest putative open reading frames for each transcript. The location of the BpmI restriction sites is indicated (*polymorphic BpmI site). Arrows indicate the location of RT-PCR primers. (B) RT-PCR analysis in C57BL/6, PWK, F1(129/C57BL/6 £ PWK) (mother £ father) animals and F1(PWK £ 129/C57BL/6) (mother £ father) animals. RT-PCR with end-labeled primers A19F2 and A19R2 was used with either testis (t) or brain (b) RNAs as templates. Following a restriction digest with BpmI (1) or undigested (2), PCR products were separated on an acrylamide gel. The PWK allele has two BpmI sites whereas the C57BL/6 and 129 alleles have only a single BpmI site in the amplified product. (C) Chromatograms from sequences generated from PCR products amplified from C57BL/6, PWK, F1(129/C57BL/6 £ PWK) or F1(PWK £ 129/C57BL/6). The (*) indicates the polymorphic BpmI site. (D) RT-PCR analysis in Mus spretus (Spret), Mus musculus castaneus (Cast), F1(Cast £ Spret) (mother £ father) and F1(Spret £ Cast) (mother £ father). RT-PCR with end-labeled primers A19F2 and A19R2 was used with brain (b) RNAs as templates. Following a restriction digest with BpmI (1) or undigested (2), PCR products were separated on an acrylamide gel. The Cast allele has two BpmI sites whereas the Spret allele has only a single BpmI site in the amplified product.

296

A. de la Puente et al. / Gene 291 (2002) 287–297

the conserved protein domains or function. Analysis of the sequences upstream of exon 1 indicated that the mouse and human 5 0 sequences are highly homologous, consistent with the presence of important regulatory elements in this region. In previous work Cen et al. (1992) reported at least six Rasgrf1 cDNAs, with different 5 0 ends. The authors suggested that these clones could have arisen by alternative splicing and/or different promoter usage. The cDNA sequence with a size of 4174 bp, represented by clone lHC4.17, showed the longest ORF. This clone included 225 bp of the sequence 5 0 UTR. Our primer extension analysis suggests that the majority of transcription begins 1324 bp upstream of the translation start codon. The length of the 5 0 UTR (1324 bp) together with the ORF and the 3 0 UTR (3949 bp) and an estimated poly (A) tail of about 300 bp add up to the full transcript size of 5.5 kb seen in Northern blot experiments (Cen et al., 1992). Careful comparison of our sequence data with the cDNAs published by Cen et al. (1992) provides an explanation for these apparently conflicting data. While lHC2.13 does not have homology with the cDNA sequence, clones lHC1, lHC2.30, lHC2.20 and lHC5.20 show partial homologies with exons 10–13, exons 12 and 13, exons 8 and 9, and exons 1 and 2, respectively. The borders of these sequences do not end at consensus splice sites. In addition, analysis of the 5 0 ends of these sequences identified homology with pBluescript vector (clone lHC1), or LINE elements (lHC5.20), indicating that these clones do not contain a completed cDNA sequence and that they are most likely cloning artifacts and not alternatively spliced isoforms of the Ragrf1 gene. We have mapped the major transcription start site in the mouse Rasgrf1 gene to a position located 1324 bp upstream of the ATG translation codon. The promoter sequence immediately upstream of the gene lacks a canonical TATA-box and Inr consensus sequences. Usually, genes with promoter regions that do not contain the TATA-box nor Inr are ubiquitously expressed and often exhibit several transcription start sites (Ince and Scotto, 1995). In the case of the Rasgrf1 gene, we identified a main transcription start site 1324 bp upstream of the translation initiator codon and two neighboring minor transcription start sites. It is interesting to note that the promoter is located in a CpG island region. Previous work has shown that a NotI site located 3 0 to this region is not methylated (Plass et al., 1996). Transfection assays identified the minimum promoter region for Rasgrf1 in a sequence 489 bp upstream of the major transcription start site. Several additional points are of special interest. First, construct 84f, containing regions A, B and C, showed ten-fold higher promoter activity in 5 0 –3 0 orientation as compared to the construct in 3 0 –5 0 orientation (84r). The shorter constructs 84DAf (5 0 –3 0 ) and 84DAr (3 0 – 5 0 ), both lacking region A, showed about 40% less promoter activity in comparison with the 84F construct. These results suggest that region A contains an enhancer element that increases the promoter activity to a level seen in the construct 84f. Second, 84DAr shows similar promoter activ-

ity as 84DAf (the same sequence in opposite orientation), a finding that could be explained by the bi-directional activity of promoter sequences as described, for example, for the thymidylate synthase promoter (Dong et al., 2000). Moreover, the 84DCf construct supports this result. The activity of this enhancer blocker is visible in construct 84r where the enhancer blocker would be located between enhancer and transcription start and thus abolish the activity. In the construct that is lacking region A (84DAr) and thus enhancer and enhancer blocker, the activity seen comes only from the bi-directional promoter. This putative enhancer blocker element could serve to prevent some of the putative upstream regulatory elements impacting upon Rasgrf1 in certain tissues and may provide the basis for relaxation of imprinting in tissues where Rasgrf1 is not imprinted. However, the description of the putative enhancer blocker element needs further investigations. Clusters of imprinted genes may extend several hundreds of kilo base pairs. Often there are both imprinted and biallelically expressed genes found in these regions (Paulsen et al., 2000). Rasgrf1 and A19 remain the only imprinted genes on mouse chromosome 9 so far and future studies will show if additional imprinted genes are located in this region. A19 is located 40 kb upstream of the Rasgrf1 gene and 10 kb from the Rasgrf1 repeat sequence and DMR. A19 is expressed in mouse adult testis from both alleles and from the paternal allele in mouse brain. The gene has multiple alternatively spliced forms and most likely is not translated suggesting a putative function on the RNA level. No human counterpart of A19 could be identified in the current databases. Non-coding RNAs have been described in imprinted regions including H19 (Ariel et al., 1997), Gtl2 (Schmidt et al., 2000) and Air (Sleutels et al., 2002) and these genes also show imprinted expression and are involved in the complex regulation of the respective imprinting clusters. Further studies could identify the role of the A19 gene and define mechanisms of its imprinted expression. Acknowledgements The authors would like to thank Dr Smiraglia for critically reading the manuscript. This work was supported in part by grants P30 CA16058 and GM58269 (C.P.) from the National Institutes of Health, Bethesda, MD. The authors thank the Genome Science Center at RIKEN, Japan, for the mouse full-length cDNA clone Accession number: AK015891. References Ariel, I., Ayesh, S., Perlman, E.J., Pizov, G., Tanos, V., Schneider, T., Erdmann, V.A., Podeh, D., Komitowski, D., Quasem, A.S., de Groot, N., Hochberg, A., 1997. The product of the imprinted H19 gene is an oncofetal RNA. Mol. Pathol. 50, 34–44. Beechey, C.V., Cattanach, B.M., Blake, A., 2001. MRC Mammalian Genetics Unit, Harwell, Oxfordshire. World Wide Web Site – Mouse

A. de la Puente et al. / Gene 291 (2002) 287–297 Imprinting Data and References (http://www.mgu.har.mrc.ac.uk/ imprinting/imprinting.html). Brambilla, R., Gnesutta, N., Minichiello, L., White, G., Roylance, A.J., Herron, C.E., Ramsey, M., Wolfer, D.P., Cestari, V., Rossi-Arnaud, C., Grant, S.G., Chapman, P.F., Lipp, H.P., Sturani, E., Klein, R., 1997. A role for the Ras signalling pathway in synaptic transmission and long-term memory. Nature 390, 281–286. Cen, H., Papageorge, A.G., Zippel, R., Lowy, D.R., Zhang, K., 1992. Isolation of multiple mouse cDNAs with coding homology to Saccharomyces cerevisiae CDC25: identification of a region related to Bcr, Vav, Dbl and CDC24. EMBO J. 11, 4007–4015. Dong, S., Lester, L., Johnson, L.F., 2000. Transcriptional control elements and complex initiation pattern of the TATA-less bidirectional human thymidylate synthase promoter. J. Cell. Biochem. 77, 50–64. Engemann, S., Strodicke, M., Paulsen, M., Franck, O., Reinhardt, R., Lane, N., Reik, W., Walter, J., 2000. Sequence and functional comparison in the Beckwith-Wiedemann region: implications for a novel imprinting centre and extended imprinting. Hum. Mol. Genet. 9, 2691–2706. Ince, T.A., Scotto, K.W., 1995. A conserved downstream element defines a new class of RNA polymerase II promoters. J. Biol. Chem. 270, 30249– 30252. Itier, J.M., Tremp, G.L., Leonard, J.F., Multon, M.C., Ret, G., Schweighoffer, F., Tocque, B., Bluet-Pajot, M.T., Cormier, V., Dautry, F., 1998. Imprinted gene in postnatal growth role. Nature 393, 125–126. Kozak, M., 1987. An analysis of 5 0 -noncoding sequences from 699 vertebrate messenger RNAs. Nucleic Acids Res. 15, 8125–8148. Maher, E.R., Reik, W., 2000. Beckwith-Wiedemann syndrome: imprinting in cluster revisited. J. Clin. Invest. 105, 247–252. Martegani, E., Vanoni, M., Zippel, R., Coccetti, P., Brambilla, R., Ferrari, C., Sturani, E., Alberghina, L., 1992. Cloning by functional complementation of a mouse cDNA encoding a homologue of CDC25, a Saccharomyces cerevisiae RAS activator. EMBO J. 11, 2151–2157. Nicholls, R.D., Saitoh, S., Horsthemke, B., 1998. Imprinting in PraderWilli and Angelman syndromes. Trends Genet. 14, 194–200. Paulsen, M., El-Maarri, O., Engemann, S., Strodicke, M., Franck, O., Davies, K., Reinhardt, R., Reik, W., Walter, J., 2000. Sequence conser-

297

vation and variability of imprinting in the Beckwith-Wiedemann syndrome gene cluster in human and mouse. Hum. Mol. Genet. 9, 1829–1841. Pearsall, R.S., Plass, C., Romano, M.A., Garrick, M.D., Shibata, H., Hayashizaki, Y., Held, W.A., 1999. A direct repeat sequence at the Rasgrf1 locus and imprinted expression. Genomics 55, 194–201. Plass, C., Shibata, H., Kalcheva, I., Mullins, L., Kotelevtseva, N., Mullins, J., Kato, R., Sasaki, H., Hirotsune, S., Okazaki, Y., Held, W.A., Hayashizaki, Y., Chapman, V.M., 1996. Identification of Grf1 on mouse chromosome 9 as an imprinted gene by RLGS-M. Nat. Genet. 14, 106–109. Reik, W., Walter, J., 2001. Genomic imprinting: parental influence on the genome. Nat. Genet. Rev. 2, 21–32. Schmidt, J.V., Matteson, P.G., Jones, B.K., Guan, X.J., Tilghman, S.H., 2000. The Dlk1 and Gtl2 genes are linked and reciprocally imprinted. Gene Dev. 14, 1997–2002. Sears, R., Ohtani, K., Nevins, J.R., 1997. Identification of positively and negatively acting elements regulating expression of the E2F2 gene in response to cell growth signals. Mol. Cell. Biol. 17, 5227–5235. Sleutels, F., Zwart, R., Barlow, D.P., 2002. The non-coding Air RNA is required for silencing autosomal imprinted genes. Nature 415, 810–813. Sturani, E., Abbondio, A., Branduardi, P., Ferrari, C., Zippel, R., Martegani, E., Vanoni, M., Denis-Donini, S., 1997. The Ras guanine nucleotide exchange factor CDC25Mm is present at the synaptic junction. Exp. Cell Res. 235, 117–123. Wolffe, A.P., 2000. Transcriptional control: imprinting insulation. Curr. Biol. 10, R463–R465. Yoon, B.J., Herman, H., Sikora, A., Smith, L.T., Plass, C., Soloway, P.D., 2002. Regulation DNA methylation of Rasgrf1. Nat. Genet. 30, 92–96. Zippel, R., Gnesutta, N., Matus-Leobovitch, N., Mancinelli, E., Saya, D., Vogel, Z., Sturani, E., Renata, Z., Nerina, G., Noa, M.L., Enzo, M., Daniella, S., Zvi, V., Emmapaola, S., 1997. Ras-GRF, the activator of Ras, is expressed preferentially in mature neurons of the central nervous system (published erratum appears in Brain Res. Mol. Brain Res., 1997, 52(1), 170). Brain Res. Mol. Brain Res. 48, 140–144.