Construction of a Transcription Map around the Gene for Ataxia Telangiectasia: Identification of at Least Four Novel Genes

Construction of a Transcription Map around the Gene for Ataxia Telangiectasia: Identification of at Least Four Novel Genes

GENOMICS 40, 267–276 (1997) GE964595 ARTICLE NO. Construction of a Transcription Map around the Gene for Ataxia Telangiectasia: Identification of a...

243KB Sizes 8 Downloads 42 Views

GENOMICS

40, 267–276 (1997) GE964595

ARTICLE NO.

Construction of a Transcription Map around the Gene for Ataxia Telangiectasia: Identification of at Least Four Novel Genes TATJANA STANKOVIC, PHILIP J. BYRD, PAUL R. COOPER, CARMEL M. MCCONVILLE, DAVID J. MUNROE,* JOHN H. RILEY,† GILES D. J. WATTS, HELEN AMBROSE, GERMAINE MCGUIRE, ALEXANDRA D. SMITH, ANDREW SUTCLIFFE, TRACY MILLS, AND A. MALCOLM R. TAYLOR1 CRC Institute for Cancer Studies, The Medical School, University of Birmingham, Birmingham, B15 2TA, United Kingdom; *Center for Cancer Research, Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139; and †Zeneca, Alderley Park, Nr Macclesfield, Cheshire, SK10 4TG, United Kingdom Received August 21, 1996; accepted December 16, 1996

We have constructed YAC, PAC, and cosmid contigs in the ataxia–telangiectasia gene region and used the assembled clones to isolate expressed sequences by exon trapping and hybridization selection. In the interval between D11S1819 and D11S2029, exons and cDNAs for potentially 13 different genes were identified. Three of these genes, F37, K28, and 6.82, are large novel genes expressed in a variety of different tissues. K28 shows sequence homology to the Rab GTP binding protein family and gene 6.82 homology to the rabbit vasopressin activated calcium mobilizing receptor, while gene F37 has no homology to any known sequence in the database. Three further clones, exon 6.41 and cDNAs K22 and E74, from the interval between D11S1819 and D11S2029, appear to be expressed endogenous retrovirus sequences. The fourth large novel gene, E14, together with two further possible novel genes, E13 and E3, was identified from exons and cDNAs in the more telomeric 300-kb interval between markers D11S2029 and D11S2179. These are in addition to the genes for mitochondrial acetoacetyl-CoAacetyltransferase (ACAT) and the ATM gene in the same region. Genes E3, E13, and E14 do not show homology to any known genes. K28, 6.82, ACAT, and ATM all appear to have the same transcriptional orientation toward the telomere. q 1997 Academic Press

INTRODUCTION

The isolation and accurate mapping of genes is the major goal of the human genome mapping initiative. The aim and expectation of this initiative is to develop a transcription map of such density that positional cloning of different disease genes will become an easy Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under Accession Nos. X81882, X99961, X99962, and Y10269–Y10274. 1 To whom correspondence should be addressed. Telephone: 0121414-4488. Fax: 0121-414-4486.

task (Collins, 1995). High-density transcription maps are currently being constructed in two ways: first, through systematic identification of transcribed sequences across the whole genome and second, through various positional cloning projects related to the identification of specific disease genes. While systematic searches for transcribed sequences (expressed sequence tag maps) offer the partial sequence of a large number of genes, transcription maps derived by individual positional cloning projects are much more detailed, as they usually provide information regarding not only the number of genes in a particular region, but also their orientation, order, distribution, and evolutionary conservation as well as their pattern of expression in different tissues. The gene for the recessively inherited disorder ataxia– telangiectasia (A-T) was cloned recently (Savitsky et al., 1995a). The identification of this gene followed several years of genetic and physical mapping studies in different laboratories. A YAC contig of the A-T gene region (Rotman et al., 1994; Arai et al., 1996) and also a genomic map (Ambrose et al., 1994) have been previously published. We have developed a refined physical map of 800 kb–1 Mb of the A-T region and have derived a number of regional transcripts. Our map contains contigs built up by using three alternative cloning systems: YACs, cosmids, and PACs. Various methods have been developed and described in detail for deriving transcripts. We used mainly exon amplification (Buckler et al., 1991; Church et al., 1994; Burfoot and Campbell, 1994), direct selection (Lovett et al., 1991), and screening of arrayed libraries (Munroe et al., 1995), but in addition we used evolutionary conservation and CpG island identification (Larsen et al., 1992; Rouleau et al., 1993). As a consequence of these combined approaches, we assembled a transcription map of a region that includes the ATM gene and several novel widely expressed genes. MATERIALS AND METHODS Construction of YAC, cosmid, and PAC contigs. YACs from the region of the A-T gene were mostly obtained by screening the ICI

267

AID

GENO 4595

/

6r28$$$181

02-13-97 21:55:22

0888-7543/97 $25.00 Copyright q 1997 by Academic Press All rights of reproduction in any form reserved.

gnma

268

STANKOVIC ET AL.

YAC library (Anand et al., 1990) using regional STSs (Ambrose et al., 1994) developed from the ends of YACs (Riley et al., 1990). Additional YACs were identified by screening the chromosome 11 YAC library (Qin et al., 1993) with Alu-vectorette PCR products from the regional YACs, and mega-YACs were obtained by screening a mega-YAC library (Bellanne-Chantelot et al., 1992) with the same products. The degree of overlap between the individual YACs was estimated by STS mapping and by comparison with YAC restriction maps. Orientation of the YAC contigs was assessed by pulse-field gel electrophoresis (PFGE) separation. The two YAC clones 20FE12, and 6E9, which are 720 and 450 kb, respectively, were used to generate human cosmid libraries. Highmolecular-weight DNA was prepared from these YACs, partially digested with Sau3A, and ligated to the arms of the cosmid vector SuperCos 1 (Stratagene). The ligated DNA was packaged with l phage packaging extracts (Stratagene) and propagated in the host NM554 (Stratagene), and the library was plated out. Colonies obtained were lifted onto filters, which were hybridized with total human DNA to identify human clones. Additional cosmid clones were obtained by screening the chromosome 11 cosmid library (Los Alamos) with total Alu-PCR products of the YACs from the A-T region. Cosmid walking was performed by two methods. First, cosmids were screened for the presence of regional STSs, and initial minicontigs were obtained. The end clones from each of these minicontigs were then used as hybridization probes to screen the remainder of the cosmids not previously assigned by PCR assay. Typically, several rounds of hybridization were necessary to join the initial minicontigs. The PAC library containing 120,000 clones with an average insert size of 120 kb (Ioannou et al., 1994) was screened with eight cosmids that had been assigned to contigs from the A-T region, and PAC size was determined by PFGE. The PAC contig was built up by STS mapping and by the establishment of the NotI restriction map of individual PACs. Exon amplification. DNA from six groups of two overlapping cosmids and from the two YACs (6E9 and 13G9) of the A-T region was completely digested with enzymes BglII, BamHI, and PstI. Fifty nanograms of the digested DNA was subcloned into the splicing vector pSPL3. The ligated products were transformed into Escherichia coli DH5a, and plasmid DNAs were purified from a mixture of the transformants. Transfection of these plasmid DNAs into COS7 cells, isolation of total cytoplasmic RNA, synthesis of cDNA, digestion of cDNA with BstXI, and amplification of spliced fragments were performed according to the methods described by Church et al. (1994). Amplified fragments were subcloned into pBluescript, and DNA sequences were determined. Southern blot hybridization of individual exons with a probe corresponding to the intron sequence of the vector was undertaken to eliminate the artifacts of cryptic splicing. Screening of arrayed thymus and frontal cortex cDNA libraries. Two libraries, a human frontal cortex cDNA library and a thymus library (Stratagene), were arrayed by Munroe et al. (1995) and ourselves. PCR assays were used to identify wells that were positive for the exon or cDNA sequence. The PCR-positive well for each of cDNAs or exons was plated out at a density of 1000 plaques per plate, and plaques were lifted on Hybond-N/ filters and hybridized with the probes generated from the relevant exon or cDNA. The ExAssist Interference-Resistance helper Phage kit from Stratagene was used for the in vivo excision of the positive phage DNA. The cultures were tested by PCR for the presence of the exon or cDNA sequence, and DNA was prepared from positive clones. Direct cDNA selection. mRNA was prepared from lymphoblastoid cell lines (LCLs) using a Pharmacia QuickPrep mRNA purification kit. Poly(A)/ RNA (1.5 mg) from LCLs of five individuals was used as a template for oligo(dT)12–18 and pd(N)6 random hexamer primed cDNA synthesis (Pharmacia TimeSaver cDNA synthesis kit). A linker (linker 1), prepared as described by Futreal et al. (1994), was ligated to the cDNA, and the ligated product was column purified. Target cDNA was generated by PCR amplification of 6 ml of the ligated cDNA using a primer complementary to the linker sequence

AID

GENO 4595

/

6r28$$$182

02-13-97 21:55:22

(5* AGC AAG TTC AGC CTG GTT AAG) and purified using a QiaQuick PCR purification kit (Qiagen). Selector DNA was generated from YAC 6E9 and from three cosmid contigs containing 35 cosmids from the centromeric part of the A-T region. One hundred nanograms of DNA from each cosmid was partially digested with NdeII or Sau3A, and the digest was pooled into contig groups, purified, ligated to a linker containing a MboI site (linker 2) (Morgan et al., 1992), and column purified again. Selector DNA was generated by PCR amplification of 6 ml of the purified ligation, using a biotin-labeled primer complementary to the linker sequence 5898 (5* biotinTGG TCT CAC GAA TTC GTC GA) and purified using a QiaQuick PCR purification kit. Hybridization selection of contig-specific cDNAs was carried out by capture of selected cDNAs on streptavidin-coated magnetic beads (Promega), essentially as described by Futreal et al. (1994), except that hybridization was carried out for 24 h and the hybridization mix contained 1 mg of cDNA and 200 ng cosmid DNA. Following two cycles of selection, eluted cDNAs were reamplified and blunt end cloned into a Bluescript vector. Characterization of amplified exons and selected cDNA clones. cDNA clones were PCR amplified using vector primers and used as hybridization probes on chromosome 11q22–q23-specific YACs and somatic cell hybrids to confirm their localization. Amplified exons and cDNA clones mapping to the A-T region were hybridized to the cosmid contig across the A-T region to establish their exact position and relation to each other. To obtain the patterns of expression for individual exons the exon-specific primers were used to amplify Clontech cDNA libraries from different tissues. cDNAs and exons were also used as hybridization probes on multiple tissue Northern blots (Clontech) and Zoo blots (Clontech). Rapid amplification of cDNA ends (RACE). Placental poly(A)/ mRNA (Clontech) was reverse transcribed using gene-specific primers. An adaptor (Clontech Marathon cDNA kit) was ligated to the cDNA following second-strand synthesis. Long-range PCR was then performed using adaptor-specific primer (AP1; Clontech Marathon cDNA kit) and primers specific for the individual genes: ATM (Byrd et al., 1996a), E14 (Byrd et al., 1996b), K28 (primer 9953: 5* TAA CCC CTT CCC AGC CAT CCT GGA TAC), and 6.82 (Byrd, in press). The major amplification products were digested with a series of restriction enzymes and ligated to vectorette for sequencing. Sequencing and analysis of cDNA and exon sequences. DNA sequencing was performed using the Applied Biosystems Prism Ready Reaction Dye Terminator Cycle sequencing kit and the ABI 373A DNA sequencer. Sequenced exons and cDNA clones were investigated for similarities to known genes by searching DNA and protein databases. DNA searches were performed on the combined GenBank 85 and EMBL 40 DNA databases using FASTA (Pearson et al., 1988). The protein database search was performed using the Blastx procedure of BLAST (Basic Local Alignment Search Tool).

RESULTS

Contigs across the A-T Region Contigs of three types of cloned DNA (YAC, PAC, and cosmid) were constructed across 1 Mb of chromosome 11q22–q23 between markers D11S1819 and D11S1294, a region that linkage analysis had shown to be the most likely location for the A-T gene (Gatti et al., 1994). Figure 1 shows a minimal coverage of the A-T gene region with YAC, PAC, and cosmid clones. Twenty-five YACs were identified by screening three YAC libraries and mapped to the region of the A-T gene. Several of these YAC clones were shown to be chimeric or contain repetitive insert terminal sequences and, therefore, extension of the complete YAC contig required screening of more than one YAC li-

gnma

AID

GENO 4595

/

6r28$$4595

02-13-97 21:55:22

gnma

FIG. 1. Transcriptional map of the ataxia – telangiectasia region. The distribution of genes and individual transcripts in relation to the cosmid, PAC, and YAC contigs. Regional markers in centromeric/telomeric order are presented at the top from left to right, with D11 being omitted from the designation of some of the markers, for reasons of brevity. The region between the two NotI sites is also indicated at the top by a bold horizontal bar. The minimal representation of the YAC clones across the A-T region includes YACs 6E9 and 13G9 from the chromosome 11 library (Qin et al., 1993) and YAC 20FE12 from the ICI library (Anand et al., 1990). PAC clones (32K14, 161D22, 214I17, 51A6, 94C11, 22P10, 314E20) were derived from the PAC library (Ioannou et al., 1995) and cosmids 131H11, 104H4, 94G8, 43B2, 31G7, 186B7, 179C1, 63E11, 65B10, 2B1, 59D5, 7F12, and 89F5 from the Los-Alamos cosmid library. Cosmid CJ52.193 was obtained from C. Julier (Julier et al., 1990), and the remainder of the cosmids were generated by subcloning YACs 6E9 and 20FE12. Genes in the A-T region are presented by horizontal bars and their orientation is indicated with arrows. Exons related to these genes are indicated by vertical arrows pointing toward the bottom. Individual transcripts unassigned to particular genes are presented as vertical arrows (exons) and broken arrows (cDNAs) pointing toward the top. At the very bottom the localization of the CpG islands is given.

270

STANKOVIC ET AL.

brary. Eventually, the whole A-T region was covered by three nonchimeric YACs: 6E9, 13G9, and 20FE12 (Fig. 1). Restriction mapping of the YAC and corresponding PAC clones within the A-T region identified two NotI sites, about 60 kb apart, between markers D11S2025 and D11S2026 (Fig. 1). Three CpG islands were identified, two associated with each of the NotI restriction sites and the third one in the region 300 kb distal to the telomeric NotI site. The 60-kb region between the two NotI sites appeared to be both repetitive and unstable, resulting in rearrangements and deletions in many YAC and cosmid clones containing this region. In contrast, the PAC clones seemed to be unaffected by this genomic instability. Identification of Exons and cDNAs from the Region of the A-T Gene Transcripts from the A-T region were identified by two complementary methods: exon amplification and direct cDNA selection. Fifty-two potentially unique exons were derived from the A-T region using exon amplification from the YAC and cosmid clones: 35 from the region of the YAC 20FE12 and 17 from the region of the YAC 6E9. The average length of the identified exons varied between 29 and 420 bp. A data search of their sequences indicated that four exons (three clones) were part of the ACAT gene (acetoacetyl-coenzyme-A-acetyltransferase) (Masuno et al., 1992), previously mapped to the A-T region. One exon was found to be homologous to the sequence of an endogenous retrovirus and 6 appeared to show homology to repeat sequences (3 to Alu and another 3 to LINE-1 repetitive sequences). One exon, 6.82, showed homology to an EST from a testis cDNA library and another one, exon A8, was subsequently found to be a part of the ATM gene (Table 1). The remaining 40 clones–exons represented potentially unique sequences. Exon-specific primers were designed for the purpose of screening arrayed cDNA libraries and extending exons in the 5* and 3* direction by RACE. Exons E3, E10, and E14 were used to screen an arrayed human frontal cortex cDNA library (Munroe et al., 1995), and three corresponding cDNA clones were isolated. In addition two exons, 6.45 and 6.82, from the centromeric part of the A-T region were also used to screen the arrayed cDNA library, and three cDNA clones were identified. No cDNA clones were isolated following screening of the human frontal cortex and the thymus cDNA libraries, with exons E12 and E13. RACE was used to extend the transcript from exon 6.82 and also to obtain the 5* end of the ATM gene (Byrd et al., 1996a). YACs and cosmids from the A-T region were also used to generate cDNA clones by direct selection. Twenty-three clones identified by direct selection were analyzed and 18 mapped back to the A-T region. However, 10/18 clones were either identical or different fragments of the same gene, F37 (Table 1). The re-

AID

GENO 4595

/

6r28$$$182

02-13-97 21:55:22

maining 8 clones appeared to represent six independent fragments (Table 1). Assembly of a Transcription Map in the Region of the A-T Gene A transcription map of the A-T region was established by alignment of identified exons and cDNAs to the cosmid and PAC contig backbone. The assembly allowed a clustering of the identified transcripts with subsequent isolation of several genes (Fig. 1). In addition, amplified fragments from the 5* and 3* ends of the ATM gene (Savitsky et al., 1995a,b; Byrd et al., 1996) were used to place this gene within the cosmid/PAC contig. The interval between D11S1819 and D11S2029. The cosmids that map to the 60-kb NotI fragment between markers D11S2025 and D11S2026 were found to contain repetitive sequences. Two exons were trapped from this fragment. The 354-bp exon 6.41 was found to be homologous (75% identical) to expressed retrovirus LTR sequences while the 136-bp exon 1.89 was found to be unique. A number of exons, 6.15, 6.32, and 6.84, together with hybridization selected cDNAs F37, M1, M17, M22, M43, M64, M72, M83, M93, M96 and cDNA D4, specific to the exon 6.84, were mapped centromeric to the 60kb NotI fragment (Fig. 1, Table 1). These form a 1.5kb cDNA contig representing a unique anonymous gene F37 that extends in a telomeric direction to the nearest CpG island associated with the centromeric NotI site on cosmid 104H4. We formed a contig of F37 overlapping clones on the basis of their sequence and their hybridization patterns. Analysis of the aligned sequences of the overlapping clones enabled us to establish the order of exons 6.84, 6.15, and 6.32 from the telomere toward the centromere. The alignment of the sequence of the exons and cDNAs together with their hybridization patterns to the cosmid contig allowed us to establish the transcription orientation of the F37 gene as being telomere to centromere (Fig. 1). This gene appeared to be widely expressed, producing a 3-kb message on Northern blots in a variety of tissues including spleen, thymus, prostate, testis, ovary, small intestine, colon, and peripheral blood leukocyte (Fig. 2). FASTA and BLAST search, however, identified no homology to known sequence in the DNA or protein sequence database (Table 1). One further hybridization selected cDNA, M27, and the related exon 6.45 (Table 1) appeared not to be part of the F37 gene (Fig. 1). The coding sequences for F37 were found on the nonoverlapping cosmids 131H11 and 104H4, which were linked through cosmid 138. Exon 6.45 and cDNA M27 were mapped to cosmid 138, which was placed between cosmids 131H11 and 104H4, possibly identifying a gene in an intron of the F37 gene or a gene transcribed from the antisense strand relative to F37. On the telomeric side of the 60-kb NotI fragment an exon, 6.83, and two hybridization selected cDNAs, E74 and K28, were mapped to cosmids that were close to

gnma

271

EXPRESSION MAP AROUND THE ATM GENE

TABLE 1 Characteristics of Genes and Individual Transcripts in the A-T Region Gene/ transcript

Initial isolation method

Clone size (kb)

F37

S

1.5

M27 6.41

S E

1.1 0.354

1.89 E74

E S

0.136 0.160

143 6E9, 43B2 143 6E9, 31G7, 59 6E9

6.83 K28

E S

0.176 1.3

31G7 143 6E9, 43B2, 59 6E9, 31G7, 99 6E9 31G7, 99 6E9, 114 6E9

K22

S

1.2

4.14 M25 6.82

E S E

0.113 2.1 0.097

K47 E23 E15 E2 E3 E13 E14

S

E E E

1.3 0.144 0.210 0.065 0.140 0.249 0.370

A8 E34 E36

E E E

0.090 0.093 0.149

E

Sibling clones

Cosmids or PACs

Expression

6.32, 6.15, 6.84 M1, M17, M22, M43, M64, M72, M83, M93, M96, D4 6.45

33 6E9, 131H11, 104H4

Wide

K44 6.82, K35

Transcript size (kb) 3

138 6E9 94G8

114 6E9 186B7 114 6E9, 186B7, 64, 6E9, 179C1, 63E11, 65B10 179C1, 63E11

No homology

No homology Expressed retrovirus LTR sequences No homology HERV-K class of endogenous retrovirus sequence No homology

LCL

Wide

Sequence homologies

LCL

2B1, 59D5

Wide

CJ52.193 X25b, 32a CJ52.193, X26b, X25b, 32a PAC 22P10 89F5, Y4b 35a, 16a

Wide Wide

Wide

X99961

Y10274

2 Rab family gene

Wide

Accession No.

7.5

1.5

5.3 6.25

Integrase and dUTPase domain of the HERV-L No homology No homology Rabbit vasopressin activated calcium mobilizing receptor gene (VACM-1) No homology

X99962

Y10273

X81882

ACAT gene No homology No homology No homology

ATM gene No homology No homology

Y10272 Y10271 Byrd et al., 1996

Y10269 Y10270

Note. Designations of the individual clones are given in centromeric/telomeric order from the top to the bottom on the left hand side. The characterization of the individual transcripts includes the method of isolation (cDNA selection, S; and exon trapping, E), size presence of overlapping clones, position within the cosmid and PAC contig, pattern of expression, size of mRNA message, and sequence homology to other known sequences. The sequences of genes F37, K28, 6.82 and other transcripts have been submitted to the EMBL Nucleotide Sequence Data base and the accession numbers of these genes are given.

or overlapped the telomeric NotI site. The 176-bp exon 6.83 appeared to be unique, while the small 160-bp cDNA E74 was found to be ú85% homologous to the HERV-K class of endogenous retrovirus sequences (Table 1). Despite the fact that they map to the same genomic region, exon 6.83 and cDNA clone E74 do not appear to be part of the K28 cDNA, by either sequence comparison or exon connection. The 5* end of the 1.3-kb K28 cDNA was obtained by RACE, and a database search with the complete coding sequence was performed. Conceptual protein translation and BLAST and FASTA searches confirmed that K28 represents a new member of the Rab gene family. Identities of 42–49% were obtained with various Rab

AID

GENO 4595

/

6r28$$$182

02-13-97 21:55:22

genes including 47.9% identity with Hu-Rab 2 over 190 amino acids. This includes four domains involved in GTP/GDP binding (residues 15–22, 68–74, 124–131, and 155–161) that are common to both Ras and Rab gene families as well as an additional three domains corresponding to residues 44–49, 62–67, and 74–96 that are highly conserved among the Rab protein family but diverge from sequences found in the p21Ras (Fig. 3) (Zahraoui et al., 1989). One of these highly conserved regions among the Rab family, the effector region, (residues 44 to 49), is thought to be involved in the interaction of the protein with its putative effector. K28 shows only limited similarity to other Rab subclasses with respect to another region (residues 110–

gnma

272

STANKOVIC ET AL.

FIG. 2. Northern blot analysis of genes K28 and F37 in multiple human tissues. (Top) Hybridization of PCR products corresponding to the body of genes K28 and F37 on multiple human tissue mRNA filters. (Bottom) The hybridization signal obtained with a b-actin probe on the same filter, for comparison of the amount of mRNA loaded in each track.

120) that is thought to interact with GEFs (guanine nucleotide exchange factor) (Moore et al., 1995), suggesting that this gene is distinct from previously identified Rab family members (Fig. 3). All members of the Ras superfamily possess at least one cysteine near the COOH-terminus, which has been shown to be required for fatty acylation, membrane association, and biological activity. K28 contains a terminal Cys-X-Cys motif that is also found in Rabs 3a, 3b, 4, and 6 (Fig. 3) (Zahraoui et al., 1989). The K28 gene seems to be widely expressed, producing a message of about 2 kb on Northern blots in a variety of tissues, including spleen, thymus, prostate, testis, ovary, small intestine, colon, and peripheral blood leukocytes (Fig. 2). The transcriptional orientation of the K28 Rab gene on chromosome 11 is from the centromere toward the telomere (Fig. 1). Immediately telomeric to the K28 RAB gene we have mapped a 1.2-kb hybridization selected cDNA K22 and a 113-bp exon 4.14. No homologies have been found for the exon but K22 shows ú80% homology to the integrase and dUTPase domains of the HERV-L class of endogenous retrovirus sequences (Cordonnier et al., 1995). Four hybridization selected cDNAs, M25, K44, K47, and K35, have been mapped telomeric to K22. Two clones, M25 (2.1 kb) and K47 (1.3 kb), are unique and do not appear to be part of another large gene, designated 6.82, which extends from cosmid 114 to cosmid 2B1 (Fig. 1). The original 97-bp exon 6.82 that was used to identify this gene was expanded by cDNA library screening and 5* and 3* RACE. A second 112-bp exon,

AID

GENO 4595

/

6r28$$$182

02-13-97 21:55:22

4.40, was found to be the exon preceding the 6.82 exon, and the cDNA clone K35 (0.9 kb) was found to represent 3* sequences of the same gene, which is transcribed toward the telomere of the long arm of chromosome 11. Northern blotting identified a widely expressed 7.5-kb mRNA in tissues such as heart, brain, placenta, lung, liver, skeletal muscle, kidney, pancreas, spleen, thymus, prostate, testis, ovary, small intestine, colon, and peripheral blood leukocytes, and BLAST search identified homology with rabbit vasopressin-activated calcium mobilizing receptor gene (Byrd, in press). The interval between D11S2029 and D11S1294. The gene for mitochondrial ACAT was mapped to 11q22–q23 by Masuno et al. (1992), and we subsequently localized it centromeric to the region of the AT gene. Our previous radiation hybrid mapping (Ambrose et al., 1994) indicated that the ACAT gene was located proximal to D11S384 and within 50 kb of the end of the YAC 20FE12 (D11S2029). This order was confirmed by PCR analysis of cosmids that spanned D11S2029. In addition, PCR analysis on cosmids using ACAT primers from the promoter and different parts of the body of the gene showed that this gene is transcribed toward the telomere of chromosome 11. Between the gene for ACAT and the ATM gene we have evidence for potentially three novel genes. Five exons were trapped within this interval: E3, E10, E12, E13, and E14. The exon E3 (140 bp), which was trapped from the cosmid pair CJ52.193 (containing the marker D11S384) and X26b, was extended by screening cDNA libraries and by 5* and 3* RACE. Differentially spliced E3-specific cDNAs of up to 2.4 kb have been obtained from brain, thymus, and placenta. However, Northern blotting analysis failed to identify a corresponding mRNA in 16 analyzed human tissues. It is possible, therefore, that the E3 exon is part of a low abundance gene or it represents an exon that is differentially spliced in the majority of tissues or functions in either a tissue or developmentally restricted way. Despite the fact that E3 maps close to exons E10, E12, and E14, which are part of the E14 gene (see below), it is unlikely that this exon belongs to the E14 gene, which is expressed ubiquitously. The exon, E13 (249 bp), from this region, which was amplified from the cosmid pair X25b and 32a, had a high G / C composition and a CpG content suggestive of the presence of a CpG island (Fig. 1); it also contained sites for the rare-cutting enzymes AscI (invariably found in CpG islands), BssHII, and NgoMI. The 9.5-kb EcoRI restriction fragment of E13 was seen in genomic DNA from human and monkey, and possibly a smaller 1-kb fragment was seen in cow DNA (data not shown). No hybridizing restriction fragments were detected in genomic DNA from any of the other species. This exon was, however, not amplified from any of the tissues (liver, pancreas, heart, lung, skeletal muscle, kidney, brain, and placenta) included in the Clontech cDNA library panel and, like exon E3, may have restricted tissue or developmental expression.

gnma

273

EXPRESSION MAP AROUND THE ATM GENE

FIG. 3. Alignment of the amino acid sequence of K28 and other members of the Rab/Ras family of proteins. Sets of identical or conserved residues are boxed. The reference numbering is that of the K28 gene.

Exon E14 has identified a new gene. Extension of the 5* end of the ATM cDNA (Savitsky et al., 1995a) on genomic DNA showed that the first exon was part of a CpG island (Byrd et al., 1996a). More than 2 kb of genomic sequence was determined upstream to the 5* end of the A-T cDNA. Within this sequence and 681 bp 5* to the start of the ATM gene we found the first exon of a new gene, E14, a cDNA that had been identified with trapped exon E14 (370 bp). The ATM and E14 genes were found to be oriented in opposite directions, suggesting that they were transcribed from a bidirectional promoter (Byrd et al., 1996a). This was confirmed recently and also shown to be the case in the mouse (Byrd et al., 1996b). The E14 gene is, therefore, transcribed in a telomere to centromere direction, giving transcripts of Ç6.25 and Ç5.3 kb in a variety of tissues (Byrd et al., 1996b). The complete amino acid sequence of E14 (also called NPAT; Imai et al., 1996)

AID

GENO 4595

/

6r28$$$183

02-13-97 21:55:22

has been reported (Byrd et al., 1996; Imai et al., 1996) but no homologous sequences in analyzed DNA or protein databases have been identified (Imai et al., 1996; Byrd et al., 1996b). Two exons, E34 (93 bp) from cosmid pair 89F5/Y4b and E36 (149 bp) from cosmid pair 35a/16a, were isolated telomeric to the ATM gene. These exons showed no homology to known sequence in the database and may potentially be part of two novel genes. DISCUSSION

We have constructed YAC, PAC, and cosmid contigs covering a 1-Mb region on chromosome 11q22–q23 that includes the gene for ataxia telangiectasia (ATM) and used exon trapping and hybridization selection to identify genes within this region. The exons and cDNAs that we isolated have been mapped back to the cosmids

gnma

274

STANKOVIC ET AL.

and PAC clones, and a transcription map has been developed. The A-T region appears to be a gene-rich region. Exons and cDNAs for potentially 19 genes were isolated across the 1 Mb of genomic DNA, of which 4 have so far been identified as new human genes. This represents a density of 1 gene every 50 kb and approximates the expected frequency for the presence of 70,000 genes distributed throughout the human genome. There are, however, segments of the A-T region with a gene density lower than this, probably due to the presence of the large genes F37, 6.82, E14, and ATM. The genes ACAT, K28, and E3 as well as many other exons and cDNAs that represent only fragments of larger transcripts not fully identified seem to correspond to a smaller genomic region. Many of these smaller fragments do not appear to be connected to each other and are included in the transcription map as separate units. For some, such as exons 1.89 and 4.14, PCR screening of representative libraries of different tissues showed that apart from cDNA made from lymphoblastoid cell mRNA, they were not expressed in any other tissue and therefore may well represent parts of tissuespecific genes. cDNA fragments K28, K22, and K47 and exon 1.83 seem to be parts of separate genes since interfragment amplification from the cDNA libraries or directly from cDNA made from lymphoblastoid cell mRNA was not possible. In addition, the K28 cDNA gene shows homology to the human family of the Rab genes while the other three cDNA fragments show no homology to any other human sequence. In the same manner, exons E34 and E36 seem to be part of novel genes with no homologous sequences in the GenBank database. Several regions poor in transcripts were identified during construction of the transcription map. From centromere to telomere these regions are the region centromeric to the derived cosmid 33 6E9 and gene F37, the region between the two NotI sites, and the region telomeric to the ATM gene. The first region, centromeric to cosmid 33 6E9, appeared to be genuinely gene poor as the generation of transcripts from two sources, the cosmid contig and YAC 6E9, using both exon trapping and direct selection of cDNAs, failed to identify transcripts other than ones that were part of gene F37. However, the complete coding sequence of gene F37 has not been obtained and therefore the centromeric boundary of this gene is not known. The second region, between the two NotI sites, is associated with remarkable genomic instability. The cosmid contig could not be extended across this region and we and others (Arai et al., 1996) identified deletions in many YAC clones across this region. The YAC 6E9 colony, which was used for exon trapping and direct selection, contained a NotI fragment of Ç50 kb, the largest NotI fragment seen in 10 colonies. Physical mapping of human genomic DNA indicated that one of these two NotI sites was methylated, so it is not known whether the NotI fragment identified in the 6E9 colony used for

AID

GENO 4595

/

6r28$$$183

02-13-97 21:55:22

exon trapping and hybridization selection truly represented genomic DNA. The third region, telomeric to the ATM gene, contains only two potential genes represented by two exons, E34 and E36. Again, without further characterization of these genes, including obtaining the full-length cDNA and mapping of the fulllength cDNA to the cosmid and PAC backbone, it is difficult to gauge the gene density in this part of the A-T region. The A-T region contains three endogenous retrovirus insertion sites (6.41, K22, and E74) clustered within 60 kb, each in proximity to one of the two CpG islands associated with the NotI sites in the centromeric part of the A-T region. The roles of such transcripts generally are unclear but they are a recurring theme of many transcription maps recently developed across the human genome (Sedlacek et al., 1993). It has been suggested that endogenous retroviruses, if expressed, may be involved in autoimmunity (Query and Keene, 1987) and leukemias (Brodsky et al., 1993) and that their reinsertion at new sites in the genome could activate or inactivate genes (Kongsuwan et al., 1989). Expression patterns of the individual transcripts and genes within the A-T region indicate a common, wide distribution for the majority of genes analyzed. Novel genes such as F37, 6.82, E14, and K28 as well as previously reported genes, ACAT and ATM, show expression in a wide variety of tissues. On the other hand, information regarding restricted expression or lack of expression of isolated exons and fragments of transcripts has to be read with caution, as it is well known that individual exons could be affected by alternative splicing in specific tissues. The existence of ‘‘transcriptional domains’’ has been reported throughout the genome (Yeom et al., 1991; Bione et al., 1993). This situation is where clusters of genes show the same orientation and expression in the same tissues and it is believed that genes involved in these domains may have related functions. The cloning of a single ATM gene ruled out the possibility of the existence of several A-T genes, which had been postulated following the demonstration of different genetic complementation groups. Although transcriptional domains usually involve genes clustered within a small genomic region (up to 100 kb), it is interesting that five neighboring genes within 400 kb of the A-T region, distal to the telomeric NotI site, appear to have a related transcriptional orientation. Genes K28, 6.82, ACAT, and ATM are all transcribed from the centromere toward the telomere, and the E14 gene, although transcribed in the opposite direction, is transcribed from the same promoter as the ATM gene (Byrd et al., 1996a). In addition, the recently identified, ubiquitously expressed, putative DEAD-Box RNA helicase, which maps to a region telomeric to marker D11S1294, is also orientated 5* r 3* centromere to telomere (Savitsky et al., 1996). Prior to the identification of the ATM gene, which was found to be mutated in various A-T patients, any

gnma

EXPRESSION MAP AROUND THE ATM GENE

of the widely expressed genes in the A-T region could have been considered as a candidate for various aspects of the A-T phenotype. However, at present, there is no evidence that any of these genes has a role in the A-T phenotype and, indeed, one gene, E14, was found not to be mutated in A-T patients (Byrd et al., 1996b; Imai et al., 1996). The widely expressed genes from the A-T region are likely to be involved in vital cellular functions. The interest in these genes is maintained further by the fact that the A-T region appears to be involved in loss of heterozygosity in some tumors, including breast cancer (Vorechovsky et al., 1996). ACKNOWLEDGMENTS We thank the Wellcome Trust, the Cancer Research Campaign, the U.K. A-T Society, the A-T Research and Support Trust, and the Medical Research Council for their continued support.

REFERENCES Ambrose, H. A., Byrd, P. J., McConville, C. M., Cooper, P. R., Stankovic, T., Riley, J. H., Shiloh, Y., McNamara, J. O., Fukao, T., and Taylor, A. M. R. (1994). A physical map across chromosome 11q22– 23 containing the major locus for ataxia telangiectasia. Genomics 21: 612–619. Anand, R., Riley, J. H., Butler, R., Smith, J. C., and Markham, A. F. (1990). A 3.5 genome equivalent multi-access YAC library—Construction, characterisation, screening and storage. Nucleic Acids Res. 18: 1951–1956. Arai, Y., Hosoda, F., Nakayama, K., and Ohki, M. (1996). A yeast artificial chromosome contig and NotI restriction map that spans the tumor suppressor gene(s) locus, 11q22.2–q23.3. Genomics 35: 196–206. Bellanne-Chantelot, C., Lacroix, B., Ougen, P., Billault, A., Beaufils, S., Bertrand, S., Georges, I., Gilbert, F., Gros, I., Lucotte, G., Susini, L., Codani, J.-J., Gesnouin, P., Pook, S., Vaysseix, G., Lu-Kuo, J., Ried, T., Ward, D., Chumakov, I., LePaslier, D., Barillot, E., and Cohen, D. (1992). Mapping the whole human genome by fingerprinting yeast artificial chromosomes. Cell 70: 1059–1068. Bione, S., Tamanini, F., Maestrini, E., Tribioli, C., Poustka, A., Torri, G., Rivella, S., and Toniolo, D. (1993). Transcriptional organisation of a 450-kb region of the human X chromosome in Xq28. Proc. Natl. Acad. Sci. USA 90: 10977–10981. Brodsky, I., Foley, B., Haines, D., Johnston, J., Cuddy, K., and Gillepsie, D. (1993). Expression of HERV-K proviruses in human-leukocytes. Blood 81: 2369–2374. Buckler, A. J., Chang, D. D., Graw, S. L., Brook, D., Haber, D. A., Sharp, P. A., and Housman, D. E. (1991). Exon amplification: A strategy to isolate mammalian genes based on RNA splicing. Proc. Natl. Acad. Sci. USA 88: 4005–4009. Burfoot, M. S., and Gorski, J. L. (1993). Improved method of gene detection using exon amplification. Nucleic Acids Res. 22: 5510– 5511. Byrd, P. J., McConville, C. M., Cooper, P., Parkhill, J., Stankovic, T., McGuire, G. M., Thick, J. A., and Taylor, A. M. R. (1996a). Mutations revealed by sequencing the 5* half of the gene for ataxia telangiectasia. Hum. Mol. Genet. 5: 145–149. Byrd, P. J., Cooper, P. R., Stankovic, T., Kullar, H. S., Watts, D. J., Robinson, P., and Taylor, A. M. R. (1996b). A gene transcribed from the bidirectional ATM promoter coding for a serine rich protein: Amino acid sequence, structure and expression studies. Hum. Mol. Genet. 5: 1785–1791. Byrd, P. J., Stankovic, T., McConville, C. M., Smith, A. D., Cooper, P. R., Taylor, A. M. R. Identification and analysis of expression of

AID

GENO 4595

/

6r28$$$183

02-13-97 21:55:22

275

human VACM-1, a cullin gene family member, located on chromosome 11q22–23. Genome Res., in press. Collins, F. S. (1995). Positional cloning moves from perditorial to traditional. Nature Genet. 9: 347–350. Cordonnier, A., Casella, J-F., and Heidmann, T. (1995). Isolation of novel human endogenous retrovirus-like elements with foamy virus-related pol sequence. J. Virol. 69: 5890–5897. Church, D. M., Stoler, C. J., Rutter, J. L., Murrell, J. R., Trofatter, J. A., and Buckler, A. J. (1994). Isolation of genes from complex sources of mammalian genomic DNA using exon amplification. Nature Genet. 6: 98–105. Futreal, P. A., Cochran, C., Rosenthal, J., Miki, Y., Swenson, J., Hobbs, M., Bennett, L. M., Haugen-Strano, A., Marks, J., Barett, J. C., Tavtigan, S. V., Shattuck-Eidens, D., Kamb, A., Skolnick, M., and Weiseman, R. (1994). Isolation of Diverged homeobox gene, MOX from BRCA1 region on 17q21 by solution hybrid capture. Hum. Mol. Genet. 3: 1359–1364. Gatti, R. A., Lange, E., Rotman, G., Chen, X., Uhrhammer, N., Liang, T., Chiplunkar, S., Yang, L., Udar, N., Dandekar, S., Sheikhavandi, S., Wang, Z., Yang, H.-M., Polikow, J., Elashoff, M., Teletar, M., Sanal, O., Chessa, L., McConville, C., Taylor, M., Shiloh, Y., Porras, O., Borresen, A.-L., Wegner, R.-D., Curry, C., Gerken, S., Lange, K., and Concannon, P. (1994). Genetic haplotyping of ataxia-telangiectasia families localises the major gene to an Ç850kb region on chromosome 11q23.1. Int. J. Radiat. Biol. 66: S57–S62. Imai, T., Yamauchi, M., Seki, N., Sugawara, T., Saito, T., Matsuda, Y., Ito, H., Nagase, T., Nomura, N., and Hori, T. (1996). Identification and characterization of a new gene physically linked to the ATM gene. Genome Res. 6: 439–447. Ioannou, P. A., Amemiya, C. T., Garnes, J., Kroisel, P. M., Hiroaki, S., Chen, C., Batzer, M. A., and de Jong, P. J. (1994). A new bacteriophage P1-derived vector for the propagation of large human DNA fragments. Nature Genet. 6: 84–89. Julier, C., Nakamura, Y., Lathrop, M., O’Connell, P., Leppert, M., Litt, M., Mohandas, T., Lalouel, J.-M., and White, R. (1990). A detailed genetic map of the long arm of chromosome 11. Genomics 7: 335–345. Kongsuwan, K., Allen, J., and Adams, J. M. (1989). Expression of HOX-2.4 Homeobox gene directed by proviral insertion in a myeloid leukaemia. Nucleic Acids Res. 17: 1881–1892. Larsen, F., Gundersen, G., Lopez, R., and Prydz, H. (1992). CpG islands as gene markers in the human genome. Genomics 13: 1095–1107. Lovett, M., Kere, J., and Hinton, L. M. (1991). Direct selection: A method for the isolation of cDNAs encoded in large genomic regions. Proc. Natl. Acad. Sci. USA 88: 9628–9632. Masuno, M., Kano, M., Fukao, T., Yamaguchi, S., Osumi, T., Hasimoto, T., Takahashi, E., Hori, T., and Orri, T. (1992). Chromosome mapping of the human mitochondrial acetoacetyl-coenzyme A thiolase gene to 11q22.3–q23.1 by fluorescence in situ hybridization. Cytogenet. Cell Genet. 60: 121–122. Moore, I., Shell, J., and Palme, K. (1995). Subclass-specific sequence motifs identified in rab GTPases. Trends Biochem. Sci. 20: 10–12. Morgan, J. G., Dolganov, G. M., Robbins, S. E., Hinton, L. M., and Lovett, M. (1992). The selective isolation of novel cDNAs encoded by the regions surrounding the human interleukin 4 and 5 genes. Nucleic Acids Res. 20: 5173–5179. Munroe, D. J., Loebbert, R., Bric, E., Whitton, T., Prawitt, D., Vu, D., Buckler, A., Winterpacht, A., Zabel, B., and Houseman, D. G. (1995). Systematic screening of an arrayed cDNA library by PCR. Proc. Natl. Acad. Sci. USA 92: 2209–2213. Pearson, W. R., and Lipman, D. J. (1988). Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85: 2444– 2448. Qin, S., Zhang, J., Isaacs, C. M., Nagafuchi, S., Sait, S. N. J., Abel, K. J., Higgins, M. J., Nowak, N. J., and Shows, T. B. (1993). A chromosome 11 YAC library. Genomics 16: 580–585.

gnma

276

STANKOVIC ET AL.

Query, C. C., and Keene, J. D. (1987). A human autoimmune protein associated with U1 RNA containing a region of homology that is cross-reactive with retroviral P30GAG antigen. Cell 51: 211–220. Riley, J., Butler, R., Oglivie, D., Finniear, R., Jenner, D., Powell, S., Anand, R., Smith, J. C., and Markham, A. F. (1990). A novel, rapid method for isolation of terminal sequences from yeast artificial chromosome (YAC) clones. Nucleic Acids Res. 18: 2887–2890. Rotman, G., Savitsky, K., Ziv, Y., Cole, C. G., Higgins, M., Bar-Am, I., Dunham, I., Bar-Shira, A., Vanagaite, L., Qin, S., Zhang, J., Nowak, N. J., Chandrasekharappa, S. C., Lehrach, H., Avivi, L., Shows, T., Collins, F. S., Bentley, D., and Shiloh, Y. (1994). A YAC contig spanning the ataxia-telangiectasia locus (groups A and C) at 11q22–q23. Genomics 24: 234–242. Rouleau, G. A., Merel, P., Lutchman, M., Sanson, M., Zucman, J., Marineau, C., Hoang-Xuan, K., Demczuk, S., et al. (1993). Alteration in a new gene encoding a putative membrane-organizing protein causes neuro-fibromatosis type 2. Nature 363: 515–521. Savitsky, K., Bar-Shira, A., Gilad, S., Rotman, G., Ziv, Y., Vanagaite, L., et al. (1995a). A single ataxia telangiectasia gene with a product similar to PI-3 kinase. Science 268: 1749–1753. Savitsky, K., Sfez, S., Tagle, D. A., Ziv, Y., Sartiel, A., Shiloh, Y., and Rotman, G. (1995b). The complete sequence of the coding region of

AID

GENO 4595

/

6r28$$$183

02-13-97 21:55:22

the ATM gene reveals similarity to cell cycle regulators in different species. Hum. Mol. Genet. 4: 2025–2032. Savitsky, K., Ziv, Y., Bar-Shira, A., Gilad, S., Tagle, D., Smith, S., Uziel, T., Sfez, S., Nahimas, J., Sartiel, A., Eddy, R. L., Shows, T. B., Collins, F. S., Shiloh, Y., and Rotman, G. (1996). A human gene (DDX10) encoding a putative DEAD-box RNA helicase at 11q22–23. Genomics 33: 199–206. Sedlacek, Z., Korn, B., Konecki, D. S., Siebenhaar, R., Coy, J. F., Kioshis, P., and Poustka, A. (1993). Construction of a transcription map of a 300kb region around the human G6PD locus by direct cDNA selection. Hum. Mol. Genet. 2: 1865–1869. Vorechovsky, I., Rasio, D., Luo, L., Monaco, C., Hammarstrom, L., Webster, D. B., Zaloudik, J., Barbanti-Brodani, G., James, M., Russo, G., Croce, C., and Negrini, M. (1996). The ATM gene and susceptibility to breast cancer: Analysis of 38 breast tumours reveals no evidence for mutation. Cancer Res. 56: 2726–2732. Yeom, Y. I., Abe, K., Bennett, D., and Artzt, K. (1991). Testes-associated embryo-expressed genes are clustered in the mouse H2K region. Proc. Natl. Acad. Sci. USA 89: 773–777. Zahraoui, A., Touchot, N., Chardin, P., and Tavitian, A. (1989). The human Rab genes encode a family of GTP-binding proteins related to yeast YPT1 and SEC4 products involved in secretion. J. Biol. Chem. 264: 12394–12401.

gnma