Gene 239 (1999) 55–64 www.elsevier.com/locate/gene
Juxta-centromeric region of human chromosome 21 is enriched for pseudogenes and gene fragments Myriam Ruault a, Vale´rie Trichet a, Sylvie Gimenez a, Shelagh Boyle b, Kathleen Gardiner c, Morgane Rolland a, Ge´rard Roize`s a, Albertina De Sario a, * a Se´quences Re´pe´te´es et Centrome`res Humains, CNRS UPR 1142, Institut de Biologie, 4, bv Henri IV, 34060 Montpellier, France b MRC Human Genetics Unit, Western General Hospital, Crewe Road, Edinburgh, United Kingdom c Eleanor Roosevelt Institute, 1899 Gaylord Street, Denver, CO 80206-1210, USA Received 12 May 1999; received in revised form 19 July 1999; accepted 18 August 1999
Abstract A physical map including four pseudogenes and 10 gene fragments and spanning 500 kb in the juxta-centromeric region of the long arm of human chromosome 21 is presented. cDNA fragments isolated from a selected cDNA library were characterized and mapped to the 831B6 YAC and to two BAC contigs that cover 250 kb of the region. An 85 kb genomic sequence located in the proximal region of the map was analyzed for putative exons. Four pseudogenes were found, including YIGSF3, YEIF3, YGCTrel whose functional copies map to chromosome 1p13, chromosome 2 and chromosome 22q11, respectively. The TTLL1 pseudogene corresponds to a new gene whose functional copy maps to chromosome 22q13. Ten gene fragments represent novel sequences that have related sequences on different human chromosomes and show 97–100% nucleotide identity to chromosome 21. These may correspond to pseudogenes on chromosome 21 and to functional genes in other chromosomes. The 85 kb genomic sequence was analyzed also for GC content, CpG islands, and repetitive sequence distribution. A GC-poor L isochore spanning 40 kb from satellite 1 was observed in the most centromeric region, next to a GC-rich H1 isochore that is a candidate region for the presence of functional genes. The pericentric duplication of a 7.8 kb region that is derived from the 22q13 chromosome band is described. We showed that the juxta-centromeric region of human chromosome 21 is enriched for retrotransposed pseudogenes and gene fragments transferred by interchromosome duplications, but we do not rule out the possibility that the region harbors functional genes also. © 1999 Elsevier Science B.V. All rights reserved. Keywords: BACs; cDNAs; Pericentric duplication; Pseudogenes
1. Introduction Although increasingly detailed physical and genetic maps have been produced for the human genome and the number of markers also continues to increase, juxtacentromeric regions are still absent or poorly described: Abbreviations: BAC, bacterial artificial chromosome; bp, base pair(s); EST, expressed sequence tag; GC, molar fraction of guanine+cytosine in DNA; kb, kilobase(s); kDa, kilo Dalton(s); Mb, Megabase(s); PCR, polymerase chain reaction; PFGE, pulsed field gel electrophoresis; pfu, plaque-forming unit; RACE, rapid amplification of cDNA ends; RT-PCR, reverse transcription PCR; STS, sequencetagged site; UTR, untranslated region; YAC, yeast artificial chromosome. * Corresponding author. Tel.: +33-4-67601123; fax: +33-4-67660306. E-mail address:
[email protected] (A. De Sario)
long-range contiguous clone coverage is frequently interrupted and/or constituted from rearranged clones, physical and genetic maps contain few markers and the junctions between centromeres and chromosome arms are not characterized. The absence of high quality physical maps in pericentromeric regions hinders the completion of the human genome project and is a major obstacle to cloning genes from these regions. The characterization of the functional genes located in the juxtacentromeric regions is also important for defining the structural and functional boundary between the centromeric heterochromatin and the euchromatic chromosome arms. To characterize genes from a human juxta-centromeric region, we chose to focus on chromosome 21. About 100 genes mapping to chromosome 21 have been cloned. Most of these are derived from the gene-rich
0378-1119/99/$ – see front matter © 1999 Elsevier Science B.V. All rights reserved. PII: S0 3 7 8 -1 1 1 9 ( 9 9 ) 0 0 38 1 - 9
56
M. Ruault et al. / Gene 239 (1999) 55–64
telomeric band 21q22.3 (Gardiner, 1997) and from the intensively studied Down Syndrome Critical Region in the chromosome band 21q22.2 ( Korenberg et al., 1994). However, the observation of Down Syndrome patients with triplication of only the proximal region of 21q ( Korenberg et al., 1994) shows that this region is relevant to the complete phenotype. The most centromeric gene mapped to the long arm of chromosome 21 is STCH (microsome associated stress 70 protein chaperone p60), which is located in the chromosome band 21q11.1. The chromosome region distal to STCH and spanning through Mx (Myxovirus resistance) genes, is syntenic with mouse chromosome 16, but no information is available for the region proximal to STCH. Thus, cloning of active genes in the juxta-centromeric region of chromosome 21 is also a necessary step in completing the human–mouse comparative map and aids in designing new mouse models useful in Down Syndrome studies. To identify transcripts from the juxta-centromeric region of human chromosome 21, we have combined two methods: analysis of a selected cDNA library specific for the YAC 831B6 ( Xu et al., 1995; Tassone et al., 1995) and computer-based exon prediction within an 85 kb genomic sequence. In-situ hybridization ( Korenberg et al., 1995; Tassone et al., 1995) and STS analysis (Chumakov et al., 1992; Gardiner et al., 1995) showed that the YAC 831B6 maps to the chromosome band 21q11.1 and is not chimeric ( Korenberg et al., 1995). This 1.7 Mb YAC consists of a 500 kb single copy DNA fragment adjacent to a >1 Mb DNA segment that hybridizes to alpha satellite and satellite 1 sequences (De Sario et al., 1997; Saupe et al., 1998). An 831B6 selected cDNA library made by Xu et al. (1995) had already been analyzed by Tassone et al. (1995), who isolated three partial cDNAs, #2, #3, and #5 from it. We showed that cDNA fragment #2 derives from IGSF3 ( Immunoglobulin Super Family 3), a 7 kb cDNA that maps to the 1p13 region but has related sequences in 2cen, 13cen and very likely in chromosomes Y and 21 (Saupe et al., 1998). Partial cDNA #3 has sequence identity to the 5∞ untranslated region ( UTR) of GGT-rel (Gamma Glutamyl Transferase-related) ( Tassone et al., 1995), a multicopy gene family with functional copies located in the 22q11.1 chromosome band (Collins et al., 1997). In this work, we aimed to determine whether the juxta-centromeric region of the long arm of human chromosome 21 contains functional genes, and we present the first characterization of the region. 2. Materials and methods 2.1. cDNA library screening The 831B6 selected cDNA library, constructed using a pool of cDNAs obtained from fetal brain, whole fetus,
liver, thymus, testis, and spleen ( Xu et al., 1995), was screened twice by plating 3000 and 7000 plaques at a concentration of 350 pfu/90 mm plate. Phages were grown overnight and then transferred to nylon membranes. During the first screening, membranes were hybridized with probes #2, #3, and #5, the cDNA fragments previously isolated from the 831B6 selected cDNA library by Tassone et al. (1995). During the second screening, these probes plus the newly isolated cDNAs were hybridized. In both experiments, negative plaques were picked. To calculate the frequency of the cDNAs in the original library, each clone was hybridized to 200–400 phage plaques derived from the total library. Amplified cDNA inserts were sequenced using C1/C2 lgt10 primers and the Thermo Sequenase radiolabeled terminator cycle sequencing kit (Amersham, USB). 2.2. Mapping 2.2.1. Southern blot Selected cDNA fragments were labeled by a random oligo primer kit (Amersham, USB) and hybridized to a Southern blot containing 10 mg of EcoRI-digested human, mouse, and GM8854A cell-line DNA. GM8854A (Coriell ) is a somatic hybrid cell line containing human chromosome 21 in a background of mouse DNA. The selected cDNA fragments were also hybridized to a blot containing rare cutter digested DNA from the YAC 831B6 (Saupe et al., 1998). 2.2.2. PCR Specific primers designed for the partial cDNAs were used to amplify DNA from the monochromosome somatic hybrid cell lines (NIGMS mapping panel 2; Dubois and Naylor, 1993). 2.3. Expression analysis 2.3.1. PCR Thymus and placenta cDNA libraries (Clontech) were screened by PCR with specific primers designed for the selected cDNAs. 2.3.2. RT-PCR RT-PCR experiments were carried out on four human cell lines (HeLa, HepG2, CaCo2, and GM8854A). Total RNA was extracted with Trizol (Gibco-BRL). Poly(A)+mRNAs were purified with the Oligotex kit (Qiagen). Reverse transcription was carried out with the First-strand cDNA synthesis kit (Pharmacia) using M-MuLV reverse transcriptase. 2.3.3. RACE RACE experiments were performed with the Marathon cDNA amplification kit (Clontech) using the Expand Long Template PCR System (Boehringer)
M. Ruault et al. / Gene 239 (1999) 55–64
according to the manufacturer’s recommendations. The resulting PCR products, after gel purification with Qiaex resin (Qiagen), were cloned using the pGEM-T PCR cloning kit (Promega). 2.3.4. Northern blots Northern blots containing 2 mg of poly(A)+mRNA from 16 different adult human tissues (Cat #7760-1 and #7766-1, Clontech) and from four fetal human tissues (Cat #7756-1, Clontech) were hybridized with the partial cDNAs in ExpressHyb buffer (Clontech) at high stringency. 2.4. Computational analysis Computational analyses included EST annotation by GAIA (Bailey et al., 1998) at the server http://daphne. humgen.upenn.edu:1024/gaia/index.html and exon prediction by GRAIL ( Uberbacher and Mural, 1991) at http://avalon.epm.ornl.gov and by Genscan (Burge and Karlin, 1997) at http://bioweb.pasteur.fr/seqanal/ interfaces/genscan.html. The GC level was calculated by the program Window in the GCG package with a sliding window of 1 kb and CpG islands were detected by GRAIL. The repetitive sequence distribution was obtained using the CENSOR program (Jurka et al., 1996). Partial cDNAs were analyzed by BLAST (Altschul et al., 1990) to perform searches on public databases. Sequences were aligned by CLUSTALW at http://www2.ebi.ac.uk/clustalw. 2.5. BAC library screening A chromosome 21-specific BAC library was obtained by cloning DNA from the somatic hybrid cell line WAV17 in pBeloBAC11 vector (Saupe et al., unpublished ). The average size of the clones is about 80 kb and the coverage of the library is approximately two to three times the human chromosome 21 DNA. The library was screened by PCR with the primer pairs specific for the selected cDNA fragments. Positive clones were grown, and BAC ends were sequenced using long T7 (5∞TAATACGACTCACTATAGGGCGAATTCGAGCTCGG3∞) and SP6 (5∞GATTACGCCAAGCTATTTAGGTGACACTATAGAATAC3∞) primers. The size of the BAC clones was measured by pulsed field gel electrophoresis (PFGE ) analysis of NotIdigested DNA and/or by adding up all the EcoRI fragments observed on a ethidium bromide-stained agarose gel. EcoRI and NotI double-digested BAC DNA, after gel electrophoresis and Southern blot, was hybridized with probes corresponding to the partial cDNAs and BAC ends at high stringency according to a standard procedure (Sambrook et al., 1989).
57
2.6. FISH The BAC DNA was labeled by nick translation with Bio-16-dUTP (Boehringer). Hybridization due to repetitive sequences was suppressed using 100 ng of biotinylated BAC probe and 4 mg of Cot-1 DNA (Gibco) as competitor. Probes were hybridized to normal human metaphase chromosomes as described in Fantes et al. (1992). Hybridization signals were detected with successive layers of avidin FITC, biotinylated anti-avidin, and avidin FITC. The chromosomes were counterstained with DAPI (1 mg/ml in Vectashield). Hybridization signals were visualized using a Zeiss Axioplan epifluorescence microscope. Images were captured using Digital Scientific Smartcapture software.
3. Results 3.1. Identification of partial cDNAs To isolate new putative coding fragments from the juxta-centromeric region of human chromosome 21, we screened the 831B6 selected cDNA library ( Xu et al., 1995) and picked clones that were negative with respect to the cDNAs (#2, #3, and #5) that had been previously isolated by Tassone et al. (1995). Sequencing of the newly isolated partial cDNAs confirmed the efficiency of the negative selection. Six cDNA fragments with a sequence identity either to the previously described IGSF3 (Saupe et al., 1998) or to GGT-rel (Collins et al., 1997) genes were different from the original #2 and #3 sequences. A total of eight cDNA fragments representing six to eight potentially independent transcript units were isolated from the 831B6-selected library by Tassone et al. (1995) and in this work. In parallel, the 85 kb gb:AP000023 genomic DNA sequence, which maps to 21q11.1 and to the YAC 831B6, was analyzed for the presence of putative coding regions (Fig. 2). Grail and Genscan predicted coding exons differed. Eight ESTs (AA912432, AJ003657, AA601302, AA928834, AA489537, AA485925, H95905, H95906) and the EIF3-related DNA sequence were also identified. Selected and annotated cDNAs were mapped to human chromosomes, compared to databases, and analyzed for expression ( Tables 1 and 2). 3.2. Chromosome mapping 3.2.1. Southern The selected cDNA derived from the YAC 831B6 were hybridized to EcoRI-digested total human DNA and DNA from chromosome 21 (GM8854A). Each of the partial cDNAs hybridized to several EcoRI bands in human DNA and to at least one band in the somatic hybrid cell line containing chromosome 21. This hybrid-
58
M. Ruault et al. / Gene 239 (1999) 55–64
Table 1 List of the partial cDNA fragments isolated by Tassone et al. (1995) (identified by an asterisk) and isolated in this work Selected cDNAs
Percentage in the librarya
#21 #31 #51 #15 #25c #78 #156 #168
15 20 25 15 2 5 0.8 0.6
Annotated cDNAs
ESTs
232457 1564624 1100759
H95905/H95906 AA928834 AA601302
Size (bp)
218 175 130 189 23 417 128 178
408 950 1943
Chromosomesb
1, 2, 13, Y 13, 20, 22, Y 13, 15, 21, 22, Y 21, 22 nd 2, 15, 21, 22 13, 15, 21, 22, Y nd
13, 15, 21 14, 15, 21, 22, Y 13, 14, 15, 21, 22, Y
RT-PCR HeLa
CaCo2
GM8854A
HepG2
Thymus
− − +
nd + +
+ + −
+ + nd
+ + nd
+ −
+ −
+ −
nd nd
nd nd
+ + +
− − −
− + +
nd nd nd
nd nd nd
a Percentage of each selected cDNA in the 831B6-specific library. b Chromosome mapping by PCR on somatic hybrid cell lines. c Primers were chosen on the 681 bp U84524 cDNA showing sequence identity to the #25 partial cDNA that was only 23 bp long (see Section 3.3). nd, not determined.
ization confirmed the sequence similarity to chromosome 21 (data not shown). The partial cDNAs were also localized on the physical map of the YAC 831B6 by
hybridization (see Fig. 1). The #25 partial cDNA could not be localized because it was too short to be used as a probe.
Table 2 Blast analysis of the partial cDNAsa
Selected cDNAs #2 #3 #5 #15b #25 #78 #156b #168 Annotated cDNAs AA912432
AJ003657 AA601302 AA928834 AA489537 AA485925 H95906 H95905
BlastN/X
Accession No.
Percentage identity
P
Ig-like membrane protein (IGSF3) mRNA Gamma-glutamyl transferase-related (GGT-rel ) mRNA Chromosome 22q11.2 cosmid Genomic sequence from chromosome 22 Genomic sequence from chromosome 22 Genomic sequence from chromosome 21 VCF syndrome 22q11 region mRNA Translation initiation factor 3 47 kDa subunit mRNA Genomic sequence from chromosome 21 Genomic sequence from chromosome 22 Genomic sequence from chromosome 21 Genomic sequence from chromosome 22
AF031174 M64099 AC000051 Z98749 AL022476 AP000023 U84522 U94855 AP000023 AL022476 AP000023 Z98749
100 94 94 100 99 94 100 97 92 99 91 100
1.2 e−82 2.7 e−27 1.1 e−41 4 e−63 1.1 e−68 2.7 e−49 9.4 e−9 5.9 e−91 8.9 e−133 4.4 e−31 1.9 e−20 1.2 e−30
Tubulin tyrosine ligase-like 1 Genomic sequence from chromosome Genomic sequence from chromosome Genomic sequence from chromosome Genomic sequence from chromosome Genomic sequence from chromosome Genomic sequence from chromosome Genomic sequence from chromosome Genomic sequence from chromosome Genomic sequence from chromosome
AF104927 AL022476 AP000023 AP000023 AP000023 AP000023 AP000023 AP000023 AP000023 AP000023
98 100 90 99 98 100 100 98 99 97
1 e−165 1 e−114 7 e−87 3.4 e−103 1.1 e−182 1.3 e−85 2 e−116 7.7 e−170 8.3 e−104 3.3 e−108
22 21 21 21 21 21 21 21 21
a Partial cDNAs were isolated from the YAC 831B6 selected library, and ESTs were annotated to the genomic sequence gb:AP000023. ESTs with a sequence identity to EIF3 were identified but are not listed because the full-length cDNA is available. b cDNAs #15 and #156 had also 99% nucleotide identity to gb:AC000406, a genomic sequence that is assigned to human chromosome 11. Since the localization of these cDNA clones on chromosome 11 by PCR failed and the gb:AC000406 was almost 100% identical over 60 kb to gb:AL022476 from chromosome 22, we believe that the gb:AC000406 sequence either was misassigned to chromosome 11 or was derived from a chimeric clone.
M. Ruault et al. / Gene 239 (1999) 55–64
59
Fig. 1. Physical and Transcript map. The YAC 831B6 physical map (B=BssHII, N=NotI, Nr=NruI, and M=MluI ) is as in Saupe et al. (1998). The gray segment on the YAC left end (right-hand side of the figure) corresponds to a >1 Mb DNA fragment that hybridizes to centromeric repetitive sequences and is not shown to scale. Two BAC contigs are anchored on the YAC physical map. BAC orientation is represented as follows: a black square and a black circle correspond to SP6- and T7-ends, respectively. STSs are indicated by vertical bars. BAC ends without an STS correspond to repetitive sequences. Gene symbols are boxed, and gene orientations, when they are known, are shown by an arrow from 5∞ to 3∞. The gb:AP000023 genomic sequence is represented by a thick black line in the map.
3.2.2. PCR Specific primers were designed to both annotated and selected cDNAs and used to amplify DNA from monochromosome somatic hybrid cell lines. All the transcripts had related sequences in different chromosomes and mapped mostly, but not exclusively, to acrocentric chromosomes ( Table 1). Primer pairs for #2 and #3 clones did not amplify DNA from chromosome 21, although the corresponding cDNA fragments did hybridize to DNA from chromosome 21 ( Tassone et al., 1995). This result could be due to sequence divergence of the primers used in the PCR analysis. #168 partial cDNA was not localized on the somatic hybrid panel because the PCR reaction failed on genomic DNA, but the cDNA fragment hybridized to the DNA from human chromosome 21. 3.3. BLAST searches and expression analysis BLAST (Altschul et al., 1990) searches were performed on public databases (Table 2). Three selected cDNAs (#3, #2, and #78) corresponded to known genes: GGT-rel was located in 22q11 (Collins et al., 1997), IGSF3 was localized in 1p13 (Saupe et al., 1998), and EIF3 was unmapped. Five selected cDNAs corresponded to novel sequences: four cDNA fragments (#5, #168, #15, #156) showed 99–100% nucleotide identity to genomic sequences derived from chromosome 22; #25 partial cDNA, which was only 23 bp long, had 100% sequence identity to a segment of the 681 bp mRNA sequence (gb: U84524) derived from the human velo-cardio-facial syndrome 22q11 region.
One annotated cDNA ( EST AA912432), which had 93% sequence identity to chromosome 21 and had 100% nucleotide identity to a genomic sequence from chromosome 22, belongs to human Tubuline tyrosine ligase-like 1 (TTLL1), a gene mapped to 22q13 ( Trichet et al., in preparation). The other annotated cDNAs had 97– 100% nucleotide sequence identity to chromosome 21 ( Table 2) and corresponded to six independent cDNAs. Four cDNAs were purchased from the IMAGE Consortium and sequenced, one cDNA (AA485925 and AA489537) was not available, and one cDNA (AJ003657) was disregarded because it was entirely constituted of repetitive sequences. The full sequences were collinear with the chromosome 21 genomic sequence and had no identity/similarity to any known protein. The percentage of nucleotide identity to chromosome 21 was unchanged compared to that calculated on the EST sequences. Given their 5∞-to-3∞ orientation in the genomic sequence (see Fig. 2) and the presence of a poly(A) tail, the annotated cDNAs may correspond to four or five independent genes. In one case ( H95905/H95906), the 15-nucleotide poly(A) observed in the sequence of the cDNA was also present in the chromosome 21 genomic sequence. This poly(A) suggests that the chromosome 21-related sequence is a retrotransposed pseudogene. Partial cDNAs corresponding to novel sequences were analyzed by RT-PCR in different human cell lines ( Table 1). All the clones were expressed in at least one cell line. The selected cDNA #5 was expanded to 2 kb by RACE on thymus cDNAs. The RACE product had no open reading frame, had a sequence similarity to
60
M. Ruault et al. / Gene 239 (1999) 55–64
Fig. 2. Annotated gb:AP000023 genomic sequence: combination of the computational analysis (GAIA, Bailey et al., 1998, Grail, Uberbacher and Mural, 1991 and Genscan, Burge and Karlin, 1997) and cDNA selection. ESTs showing a sequence identity to EIF3 are not represented. In Grail analysis, only exons having a score ≥0.6 are reported. EST orientations are shown by an arrow from 5∞ to 3∞. Alu and L1 repetitive sequences were detected by CENSOR (Jurka et al., 1996). The GC level was measured with a GCG program, and CpG islands were detected by Grail.
several ESTs, and was collinear with genomic sequences derived from chromosomes 13, 15, 21, 22 (gb: Z98749), and Y. Inspection of 600 bp of the corresponding genomic sequences obtained by amplification of somatic hybrid cell lines showed that the RACE product was 100% identical to chromosome 22 and only 96–98% to chromosomes Y, 13, 15, 21. Thus, very likely, the functional gene corresponding to partial cDNA #5 is located in chromosome 22. The expanded partial cDNA #5 hybridized to two mRNA species of 9 and 7 kb derived from fetal brain and lung polyA+mRNAs, but it did not hybridize to mRNAs derived from adult tissues. The cDNA corresponding to EST AA601302 failed to hybridize to Northern blots. The other annotated cDNAs were not hybridized to Northern blots because they were mainly constituted by repetitive sequences. 3.4. Genomic sequence analysis 3.4.1. GC level and CpG islands Compositional analysis showed that the gb:AP000023 genomic sequence was characterized by two distinct domains: the proximal 40 kb region had a GC level of 40% corresponding to a GC-poor L isochore and extended to the centromere constituted by AT-rich tandemly repetitive DNA sequences; the 45 kb telomeric region was characterized by a GC level of 45%, representing a GC-rich H1 isochore. The boundary between the two isochores was located at the position 45 500, where a sharp change of the GC level was observed. Three CpG islands were detected by Grail: two of them
corresponding to Alu sequences were not represented; the third CpG island was located in the GC-poor L isochore where it produced a local increase of the GC level ( Fig. 2). 3.4.2. Repetitive sequences Satellite 1 sequences were detected in the most proximal 3 kb region. Satellite 1 sequences map next to the centromere in the q arm of chromosome 21 ( Trowell et al., 1993). The SINE and LINE distribution was very uneven, as expected in a region constituted by two different isochores (Soriano et al., 1983). Alu sequences had a very high density (1 Alu/kb) in the GC-rich H1 isochore and a much lower concentration (1 Alu/3.7 kb) in the GC-poor L isochore. This latter concentration was close to the average observed in the human genome. L1 sequences have a density of 1 L1/15 kb in the GC-rich H1 isochore and 1 L1/8 kb in the GC-poor L isochore. Other middle repetitive sequences were observed, among which a 2 kb cluster of (GATA)n repeat from the position 40 000 to 42 000, close to the boundary located between the H1 and L isochores. No similar repeats have been previously reported close to isochore borders. 3.5. Pseudogenes Four pseudogenes were mapped to the juxta-centromeric region of the long arm of human chromosome 21 ( Fig. 1). YEIF3 is located 4 kb telomeric to satellite 1 sequences on chromosome 21 and the corresponding gene, coding for the 47 kDa subunit of the EIF3 human translation factor, is located on chromosome 2. The
M. Ruault et al. / Gene 239 (1999) 55–64
presence of a sequence related to EIF3 on chromosome 21 was revealed both by the computational analysis (Genscan, GAIA) and by cDNA selection (partial cDNA #78). A full-length EIF3 cDNA was already known (gb:U94855), but neither the genomic (intron/exon) structure nor the mapping location of the gene had been described. Primers designed to the EIF3 cDNA were used to amplify DNA from monochromosome hybrid cell lines. A 1222 bp cDNA was obtained from the DNA of human chromosome 2: the gene consists of a single exon encoding the EIF3 47 kDa subunit. In contrast, the PCR products obtained from chromosomes 15, 21 and 22 and from the YAC 831B6 were pseudogenes because they were truncated in the 5∞ region, and their sequences had several deleterious mutations such as stop codons and insertion frameshifts. About 300 kb telomeric to this pseudogene, we detected a second EIF3-related sequence by hybridization of the YAC 831B6 with the #78 partial cDNA. YYTTLL1 maps 71 kb telomeric to YEIF3. Human TTLL1 (gb: AF104927) maps to the chromosome band 22q13 and comprises 11 exons ( Trichet et al., in preparation). The exon/intron structure of the functional gene was deduced by comparing the full-length cDNA to a genomic sequence (gb:AL0224776) from chromosome 22. In this work, we compared the genomic sequence gb:AP000023 with the TTLL1 gene, and we showed that the chromosome 21-related sequence is a pseudogene consisting of the last exon and 300 bp derived from the last intron (see also Section 3.7). Distal to YTTLL1, there are sequences related to IGSF3 and GGT-rel genes. Functional copies of IGSF3 (Saupe et al., 1998) and GGT-rel (Collins et al., 1997) are located in 1p13 and 22q11 chromosome bands, respectively. Eight independent partial cDNAs of these genes were isolated from the selected library. They mapped either to the 5∞ UTR of GGT-rel or to exons 5, 6 and 7 of IGSF3, a gene that comprises 11 exons. Amplification of gene regions different from these failed, suggesting that the YAC 831B6 contains only fragments of the GGT-rel and IGSF3 genes. Less than 50 kb from the right end of the YAC 831B6, we localized a trapped exon, hmc01A06 (Chen et al., 1996) that maps to different acrocentric chromosomes (data not shown). TPTE (sp:P56180; Antonarakis, 1998), the corresponding full-length cDNA expressed in human testis, encodes transmembrane phosphatase tensin. Since the cDNA sequence was not accessible at the time this work had been carried out, we could not determine whether the TPTE copy on chromosome 21 is functional or not. 3.6. BAC contig A chromosome 21-specific BAC library (Saupe et al., unpublished ) was screened with the cDNA fragments
61
isolated from the 831B6 selected cDNA library and with the hmc01A06 trapped exon (Chen et al., 1996) that was mapped to the YAC. BAC ends derived from the positive clones were sequenced to search for overlapping clones by chromosome walking. EcoRI-digested DNA patterns were compared to measure BAC overlaps. Two BAC contigs were obtained and anchored on the YAC map by hybridization (Fig. 1). One contig is located in the proximal part of the 500 kb physical map and covers about 130 kb. This overlaps with the AP000023 genomic sequence. The second contig, more distal in the 21q arm, comprises a 120 kb region and spans beyond the right end of the YAC 831B6. To confirm the localization of the two contigs, B1L1C6 and B6L1C3 BACs were hybridized in situ on human metaphase chromosomes. B1L1C6, belonging to the proximal contig, hybridizes to the centromere or slightly distal on the short arm of chromosome 21. A signal is also present in the centromeric region of chromosomes 13 and 22. B6L1C3, which belongs to the distal contig, maps to the centromere or slightly telomeric to the centromere on the long arm of chromosome 21 (Fig. 3A and B). A signal is also observed on 13cen. B7L1C4 contains STSs that are located 300 kb apart in the YAC map. Thus, either the original 300 kb BAC insert was reduced to the current size by a deletion, or the DNA of the somatic hybrid WAV17, which was used to construct the BAC library, is rearranged. The presence of IGSF3 and GGT-rel gene fragments (see Section 3.5) suggests that the region that is not covered by the BAC contigs is unstable because of chromosome duplications. No clone merging this >200 kb gap was found in either chromosome 21-specific BAC or cosmid (Lawrence Livermore) libraries. The two BAC contigs were sequenced by the German Chromosome 21 Sequencing Consortium (Genbank Accession Nos AL078476, AL0449849, AL078472, AL078471). 3.7. Interchromosome duplication The chromosome 21 gb:AP000023 and chromosome 22 gb:AL022476 genomic sequences, both containing DNA segments that have nucleotide identity to the #15 and #156 cDNA fragments and to the last exon of TTLL1 gene ( Trichet et al., in preparation), were compared by BLAST2 alignment. A 7.8 kb paralogous domain was identified between the telomeric 22q13 chromosome band, where functional TTLL1 gene is located, and the 21q11.1 juxta-centromeric region that contains a TTLL1 pseudogene (YTTLL1). The same genomic intervals were observed among the three transcribed fragments within both genomic sequences, and the average nucleotide sequence identity was 93%. The duplicon is delimited by two breakpoints: the first is located in the intron 10 of the TTLL1 gene at position
62
M. Ruault et al. / Gene 239 (1999) 55–64
Fig. 3. (A) B6L1C3 FITC-labeled BAC hybridized in situ on human metaphase DAPI stained chromosomes. (B) Corresponding black and white DAPI G banding. B6L1C3 maps to 21cen-q11.1 and to 13cen.
12 435 and 148 824 of chromosome 21 and 22 genomic sequences, respectively; the second breakpoint is located in a cluster of nine tandemly arranged Alu sequences at position 4732 and 141 891 of chromosome 21 and 22 genomic sequences, respectively.
4. Discussion This work describes one of the few systematic search for genes in human pericentromeric regions. Previous characterized pericentromeric regions include a 9.75 Mb map across the centromere of human chromosome 10: a PFGE map of the centromeric satellite arrays was integrated with two YAC contigs spanning the p and q arms (Jackson et al., 1999). Similarly, in the centromere of human chromosome 5, the organization of the different alphoid domains was established, and a YAC contig spanning 4.6 Mb in the p and q arm of this chromosome was constructed (Puechberty et al., 1999). In both studies, the position of known genes or cDNA fragments was given with respect to centromeric repetitive sequences, but a systematic search for the genes located in the pericentromeric regions was not carried out. The most proximal genes in chromosomes 10 and 5 are located more than 1.5 Mb from the centromeric alphoid sequences. Here, we present the first characterization of 500 kb in the juxta-centromeric region of the long arm of human chromosome 21. Long-range maps showed that chromosomes 13 and 21 share a similar organization of
the satellite array ( Trowell et al., 1993). A BAC map covering 3.4 Mb in the proximal region of human chromosome 21 showed that sequence conservation between chromosome 21 and the other acrocentric chromosomes extends approximately 1.5 Mb from the centromere to the D21S258 marker (Groet et al., 1998). This BAC contig does not include the centromeric repetitive sequences and does not overlap with our map. The juxta-centromeric region of human chromosome 21 harbors paralogous domains due to interchromosome duplications. We identified a 7.8 kb paralogous domain between 22q13, where functional TTLL1 is located, and 21q11.1 that contains the corresponding pseudogene (TTLL1). Similarly, the YAC region hybridizing to GGT-rel and IGSF3 probes could correspond to other duplications from chromosome 1 and 22, but a sequenceready contig is not yet available. Two interchromosome duplicated loci were already known in the 21q11 chromosome band: KGF, keratinocyte growth factor gene ( Zimonjic et al., 1997) and NF1, neurofibromatosis ( Regnier et al., 1997). An intrachromosome duplication was also reported (Dutriaux et al., 1994). These results corroborate the fact that pericentromeric regions are the preferential targets for interchromosome duplication of large DNA segments carrying complete or partial genes ( Eichler, 1998). In chromosome 15 and 22, pericentromeric duplications have been associated with genome instability, and this instability is responsible for human diseases. Human juxta-centromeric regions should be sequenced to identify other paralogous segments and to investigate the mechanism of genome instability ( Eichler, 1998).
M. Ruault et al. / Gene 239 (1999) 55–64
Compositional analysis of 85 kb spanning from satellite 1 sequences to the long arm of chromosome 21 showed that a GC-poor L isochore is located in the most proximal part, and a GC-rich H1 isochore, which begins about 40 kb telomeric to the satellite 1 sequences, spans the rest of the sequence. In a previous study, the GC level of 11 YACs mapped to the proximal part of the chromosome 21 long arm was calculated from their buoyant density (De Sario et al., 1997). The GC content of the YAC 831B6 was not calculated because the tandemly repetitive sequences in the YAC biased the measurement of the buoyant density. Thus, the presence of this GC-rich H1 isochore next to the centromeric GC-poor L isochore was not detected. Results from a sequence analysis of selected cDNA fragments, EST and genomic sequence suggest that the juxta-centromeric region is enriched for pseudogenes. We identified four pseudogenes with verified functional copies elsewhere. From centromere to telomere, we mapped and characterized YEIF3 next to satellite 1 sequences on chromosome 21, but we could not establish whether the pseudogene was retrotransposed or due to a chromosome duplication, because a single exon codes for the protein. We also described YTTLL1, an unprocessed pseudogene that results from a genome duplication, and YGGT-rel and YIGSF3 that may result from similar genomic duplications. Similarly, five of 10 selected cDNA fragments, while clearly having closely related sequences on chromosome 21, show 100% identity to sequences of chromosome 22. This finding again suggests potential pseudogenes on chromosome 21 with functional copies residing on chromosome 22. Lastly, EST information also cannot confirm functional gene derivations. The ESTs found here have 97– 100% identity with the chromosome 21 genomic sequence and are expressed sequences, as verified by RT-PCR. However, none of these ESTs is associated with exon patterns that are predicted by both Grail and Genscan. This could be due to ESTs derived solely from 3∞ UTR and failures of exon prediction programs in this region; it may also be due to pseudogenes. In this work, we aimed to determine whether the juxta-centromeric region of human chromosome 21 contains functional genes. We conclude that the juxtacentromeric region of the long arm of human chromosome 21 is enriched for retrotransposed pseudogenes and gene fragments transferred by chromosome duplications. Our results do not rule out the possibility that functional genes exist next to the centromere of human chromosome 21, and the adjacent H1 GC-rich isochore is a candidate region to search for them. Very interestingly, a similar high concentration of pseudogenes was observed at the end of the short arm of human chromosome 16 close to the telomeric repeat. The most telomeric 37 kb region of 16p has hits for
63
several ESTs, but the genomic sequence from chromosome 16 contains multiple stop codons in all potential coding frames (Flint et al., 1997). Subtelomeric and juxta-centromeric regions may share a similar chromosome organization consisting of repetitive sequences next to clustered pseudogenes flanking functional genes. Other juxta-centromeric regions should be sequenced to determine whether this chromosome organization can be generalized.
Acknowledgements This work was supported by grants from AFM (Association Franc¸aise contre les Myopathies), ARC (Association Franc¸aise de Recherche contre le Cancer) and BIOMED 2 EEC program to G.R. K.G. acknowledges National Institute for Health for grant HD14479. M.R. is a fellow of French MRT (Ministe`re de la Recherche et Technologie). V.T. was supported by a Poste Rouge CNRS. We thank Oliver Clay and Jean Derancourt for helpful suggestions and Patrick Atger for the artwork.
References Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J., 1990. Basic Local Alignment Search Tool. J. Mol. Biol. 215, 403–410. Antonarakis, S.E., 1998. 10 years of genomics, chromosome 21 and Down Syndrome. Genomics 51, 1–16. Bailey, L.C., Fischer, S., Schug, J., Crabtree, J., Gibson, M., Overton, C., 1998. GAIA: framework annotation of genomic sequence. Genome Res. 8, 234–250. Burge, C., Karlin, S., 1997. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94. Chen, H., Chrast, R., Rossier, C., Morris, M.A., Lalioti, M.D., Antonarakis, S.E., 1996. Cloning of 559 potential exons of genes of human chromosome 21 by exon trapping. Genome Res. 6, 747–760. Chumakov, I., et al., 1992. Continuum of overlapping clones spanning the entire human chromosome 21q. Nature 359, 380–387. Collins, J.E., Mungall, A.J., Badcock, K.L., Fay, J.M., Dunham, I., 1997. The organization of the c-glutamyl transferase genes and other low copy repeats in human chromosome 22q11. Genome Res. 7, 522–531. De Sario, A., Roize`s, G., Allegre, N., Bernardi, G., 1997. A compositional map of the cen-q21 region of human chromosome 21. Gene 194, 107–113. Dubois, B.L., Naylor, S., 1993. Characterization of NIGMS human/ rodent somatic cell hybrid mapping panel 2 by PCR. Genomics 16, 315–319. Dutriaux, A., Rossier, J., Van Hul, W., Nizetic, D., Theophille, D., Delabar, J.M., Van Broeckhoven, C., Potier, M.C., 1994. Cloning and characterization of a 135- to 500-kb region of homology on the long arm of human chromosome 21. Genomics 22, 472–477. Eichler, E.E., 1998. Masquerading repeats: paralogous pitfalls of the human genome. Genome Res. 8, 758–762. Fantes, J.A., Bickmore, W.A., Fletcher, J.M., Ballesta, F., Hanson, I.M., van Heyningen, V., 1992. Submicroscopic deletions at the WAGR locus, revealed by nonradioactive in situ hybridization. Am. J. Hum. Genet. 51, 1286–1294.
64
M. Ruault et al. / Gene 239 (1999) 55–64
Flint, J., Thomas, K., Micklem, G., Raynham, H., Clark, K., Doggett, N.A., King, A., Higgs, D.R., 1997. The relationship between chromosome structure and function at a human telomeric region. Nat. Genet. 15, 252–257. Gardiner, K., Graw, S., Ichikawa, H., Ohki, M., Joetham, A., Gervy, P., Chumakov, I., Patterson, D., 1995. YAC analysis and minimal tiling path construction for chromosome 21q. Somat. Cell Mol. Genet. 21, 399–414. Gardiner, K., 1997. Clonability and gene distribution on human chromosome 21: reflections of junk DNA content? Gene 205, 39–46. Groet, J., Ives, J.H., South, A.P., Baptista, P.R., Jones, T.A., Yaspo, M.L., Lehrach, H., Potier, M.C., Van Broeckhoven, C., Nizetic, D., 1998. Bacterial contig map of the 21q11 region associated with Alzheimer’s disease and abnormal myelopoiesis in Down Syndrome. Genome Res. 8, 385–398. Jackson, M., Rocchi, M., Thompson, G., Hearn, T., Crosier, M., Guy, J., Kirk, D., Mulligan, L., Ricco, A., Piccininni, S., Marzella, R., Viggiano, L., Archidiacono, N., 1999. Sequences flanking the centromere of human chromosome 10 are a complex patchwork of arm-specific sequences, stable duplications and unstable sequences with homologies to telomeric and other centromeric locations. Hum. Mol. Genet. 8, 205–215. Jurka, J., Klonowski, P., Dagman, V., Pelton, P., 1996. CENSOR a program for identification and elimination of repetitive elements from DNA sequences. Comput. Chem. 20, 119–122. Korenberg, J.R., et al., 1994. Down syndrome phenotypes: the consequences of chromosome imbalance. Proc. Natl. Acad. Sci. USA 91, 4997–5001. Korenberg, J.R., Chen, X.N., Mitchell, S., Fannin, S., Gerwehr, S., Cohen, D., Chumakov, I., 1995. A high-fidelity physical map of human chromosome 21q in Yeast Artificial Chromosomes. Genome Res. 5, 427–443. Puechberty, J., Laurent, A.M., Gimenez, S., Billault, A., Brun, M.E., Calenda, A., Marc¸ais, B., Prades, C., Ioannou, P., Yurov, Y., Roize`s, G., 1999. Genetic and physical analyses of the centromeric
and pericentromeric regions of human chromosome 5: recombination across 5cen. Genomics 5, 274–287. Regnier, V., Meddeb, M., Lecointre, G., Richard, F., Duvergner, A., Nguyen, V., Dutrillaux, B., Danglot, G., 1997. Emergence and scattering of multiple neurofibromatosis (NF1)-related sequences during hominoid evolution suggest a process of pericentromeric interchromosomal transposition. Hum. Mol. Genet. 6, 9–16. Sambrook, J., Fritsch, E.F., Maniatis, T., 1989. Molecular Cloning: A Laboratory Manual. 2nd edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. Saupe, S., Roize`s, G., Peter, M., Boyle, S., Gardiner, K., De Sario, A., 1998. Molecular cloning of a human cDNA IGSF3 encoding an Immunoglobulin-like membrane protein: expression and mapping to chromosome 1p13. Genomics 52, 305–311. Soriano, P., Meunier-Rotival, M., Bernardi, G., 1983. The distribution of interspersed repeats is non uniform and conserved in the mouse and human genomes. Proc. Natl. Acad. Sci. USA 80, 1816–1820. Tassone, F., Xu, H., Burkin, H., Weissman, S., Gardiner, K., 1995. cDNA selection from 10 Mb of chromosome 21 DNA: efficiency in transcriptional mapping and reflections of genome organization. Hum. Mol. Genet. 4, 1509–1518. Trowell, H., Nagy, A., Vissel, K.H., Choo, A., 1993. Long-range analyses of the centromeric regions of human chromosomes 13, 14 and 21: identification of a narrow domain containing two key centromeric DNA elements. Hum. Mol. Genet. 2, 1639–1649. Uberbacher, E.C., Mural, R., 1991. Locating protein coding regions in human DNA sequences by a multiple sensor-neural network approach. Proc. Natl. Acad. Sci. USA 88, 11261–11265. Xu, H., Wei, H., Tassone, F., Graw, S., Gardiner, K., Weissman, S.M., 1995. A search for genes from the dark band regions of human chromosome 21. Genomics 27, 1–8. Zimonjic, D.B., Kelley, M.J., Rubin, J.S., Aaronson, S.A., Popescu, N.C., 1997. Fluorescence in situ hybridization analysis of keratinocyte growth factor gene amplification and dispersion in evolution of great apes and humans. Proc. Natl. Acad. Sci. USA 21, 11461–11465.