doi:10.1006/geno.2001.6681, available online at http://www.idealibrary.com on IDEAL
Article
Physical and Transcript Map of the Hereditary Prostate Cancer Region at Xq27 Dietrich A. Stephan,1,* Gareth R. Howell,2 Tanya M. Teslovich,1 Alison J. Coffey,2 Lorie Smith,3 Joan E. Bailey-Wilson,4 Lindsay Malechek,1 Derek Gildea,1 Jeffrey R. Smith,5 Elizabeth M. Gillanders,1 Johanna Schleutker,6 Ping Hu,1 Helen E. Steingruber,2 Pawandeep Dhami,2 Christiane M. Robbins,1 Izabela Makalowska,7 John D. Carpten,1 Raman Sood,1 Steve Mumm,8 Rolland Reinbold,8 Tom I. Bonner,9 Agnes Baffoe-Bonnie,4,10 Lukas Bubendorf,1 Mervi Heiskanen,1 Olli P. Kallioneimi,1 Andreas D. Baxevanis,1 Shirin S. Joseph,2 Ileana Zucchi,8 Robert D. Burk,3 William Isaacs,11 Mark T. Ross,2 and Jeffrey M. Trent1 1
Cancer Genetics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA 2 The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK 3 Albert Einstein College of Medicine, Bronx, New York 10461, USA 4 Inherited Disease Research Branch, National Human Genome Research Institute, National Institutes of Health, Baltimore, Maryland 21224, USA 5 Vanderbilt University Medical Center, Department of Medicine, Division of Genetic Medicine, Nashville, Tennessee 37232, USA 6 Laboratory of Cancer Genetics, Tampere University Hospital, 33521 Tampere, Finland 7 Gene Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA 8 Istituto di Tecnologie Biomediche, Consiglio Nazionale delle Ricerche, 20090 Segrate, Milan, Italy 9 Laboratory of Genetics, National Institute of Mental Health, National Institutes of Health, Bethesda, Maryland 20892, USA 10 Fox Chase Cancer Center, Philadelphia, Pennsylvania 19111, USA 11 Department of Urology, Johns Hopkins University, Baltimore, Maryland 21287, USA *
To whom correspondence and reprint requests should be addressed. Fax: (202) 884-6014. E-mail:
[email protected].
We have recently mapped a locus for hereditary prostate cancer (termed HPCX) to the long arm of the X chromosome (Xq25–q27) through a genome-wide linkage study. Here we report the construction of an ~ 9-Mb sequence-ready bacterial clone contig map of Xq26.3–q27.3. The contig was constructed by screening BAC/PAC libraries with markers spaced at ~ 85-kb intervals. We identified overlapping clones by end-sequencing framework clones to generate 407 new sequence-tagged sites, followed by PCR verification of overlaps. Contig assembly was based on clone restriction fingerprinting and the landmark information. We identified a minimal overlap contig for genomic sequencing, which has yielded 7.7 Mb of finished sequence and 1.5 Mb of draft sequence. The transcriptional mapping effort localized 57 known and predicted genes by database searching, STS content mapping, and sequencing, followed by sequence annotation. These transcriptional units represent candidate genes for HPCX and multiple other hereditary diseases at Xq26.3–q27.3. Key words: physical mapping, genome mapping, genome sequencing, contig mapping, bacterial artificial chromosomes, P1 artificial chromosomes, prostate cancer, Xq26.3–q27.3, sequence annotation, repetitive elements
INTRODUCTION Prostate cancer is a major health concern, with over 200,000 new prostate cancer cases diagnosed in the United States each year. In the US alone, it accounts for more than 35% of all cancer cases affecting men and results in 40,000 deaths annually. There is thought to be substantial genetic heterogeneity
underlying hereditary prostate cancer (HPC), as several prostate cancer susceptibility loci have been reported, including 1q24–q25 [1], 1q42.2–q43 [2], 16q23.2 [3], and 20q13 [4]. Previously, we carried out a genome-wide scan of US, Swedish, and Finnish families at high risk for prostate cancer and revealed evidence of a major prostate cancer susceptibility locus (HPCX; MIM 300147) on Xq [5]. The HPCX locus is
GENOMICS Vol. 79, Number 1, January 2002 Copyright © 2002 by Academic Press. All rights of reproduction in any form reserved. 0888-7543/01 $35.00
41
Article
doi:10.1006/geno.2001.6681, available online at http://www.idealibrary.com on IDEAL
TABLE 1: Markers used to generate a framework BAC/PAC physical map based on the Zucchi et al. [17] map Known genes CDR1
FMR1
LDOC1
MCF2
FMR2
FRAXAC1
HPRT1
cDNA clones AA169138
H67143
R83022
A006J30
A009C30
AFM136YB10
AFMa113zf5
CHLC.ATA24C11
CHLC.ATA25B04
CHLC.ATA27F11
CHLC.GATA74D04
CIT-HSP-433M19
CIT-HSP-507I15
D3S2390
DXS105
DXS119
DXS1192
DXS1193
DXS1200
DXS1205
DXS1211
DXS1215
DXS1227
DXS1232
DXS1286
DXS1289
DXS1324
DXS1337
DXS1341
DXS1343
DXS1344
DXS152
DXS185
DXS259
DXS292
DXS293
DXS295
DXS296
DXS297
DXS312
DXS369
DXS465
DXS532
DXS533
DXS548
DXS6709
DXS6729
DXS6738
DXS6751
DXS6798
DXS6806
DXS7004E
DXS7006E
DXS7049
DXS7087
DXS7089
DXS7094
DXS7096
DXS7143
DXS7158
DXS7262
DXS7281
DXS7302
DXS7305
DXS7306
DXS7371
DXS7373
DXS7375
DXS7376
DXS7377
DXS7378
DXS7379
DXS7381
DXS7382
DXS7383
DXS7385
DXS7386
DXS7388
DXS7389
DXS7391
DXS7395
DXS7396
DXS7397
DXS7398
DXS7400
DXS7401
DXS7402
DXS7403
DXS7404
DXS7408
DXS7409
DXS7410
DXS7411
DXS7413
DXS7414
DXS7416
DXS7419
DXS7444E
DXS7482
DXS7503
DXS7524
DXS7536
DXS7553
DXS7635
DXS7825
DXS7832
DXS7833
DXS7834
DXS7846
DXS7847
DXS7857
DXS7874
DXS7875
DXS7876
DXS7892
DXS7893
DXS7902
DXS7908
DXS7917
DXS8013
DXS8043
DXS8045
DXS8073
DXS8084
DXS8091
DXS8106
DXS8148
DXS8151E
DXS8215
DXS8229
DXS8232
DXS8272
DXS8273
DXS8287
DXS8288
DXS8289
DXS8295
DXS8303
DXS8309
DXS8312
DXS8313
DXS8316
DXS8317
DXS8319
DXS9317
DXS9739
DXS98
DXS984
DXS998
RP_L10
SGC30410
SGC32232
SGC32493
SGC34657
SHGC-17233
SHGC-31764
sts-D45526
stSG10280
StSG12776
stSG13258
stSG15553
stSG15731
stSG16157
StSG16307
stSG1667
stSG2066
stSG22504
stSG2682
STSs
Table 1 continued on next page
42
GENOMICS Vol. 79, Number 1, January 2002 Copyright © 2002 by Academic Press. All rights of reproduction in any form reserved.
Article
doi:10.1006/geno.2001.6681, available online at http://www.idealibrary.com on IDEAL
TABLE 1: Continued StSG28412
stSG29330
stSG29555
stSG29823
stSG30310
StSG31543
stSG38950
stSG3962
stSG4230
stSG43255
StSG4528
stSG4748
stSG4810
stSG8133
stSG8423
StSG8451
stSG8474
stSG8804
stSG9235
sts-H78267
sts-H93110
sts-H98521
sts-L08893
sts-M11309
sts-N21327
sts-N33366
sts-N34966
sts-R87104
sts-T83641
sts-T90453
sts-V00530
sts-W44435
sWXD1117
sWXD1208
sWXD1341
sWXD1344
sWXD1447
sWXD1449
sWXD179
sWXD2238
sWXD2462
sWXD28
sWXD29
sWXD398
sWXD575
sWXD639
sWXD864
sWXD883
TIGR-A004F02
TIGR-A007G08
TIGR-A007J06
WI-11212
WI-11315
WI-11365
WI-11452
WI-11835
WI-12657
WI-12764
WI-13459
WI-13557
WI-14785
WI-14955
WI-16556
WI-16747
WI-16817
WI-18472
WI-18960
WI-18961
WI-20478
WI-3908
WI-4468
implicated in ~ 40% of hereditary prostate cancers in Finland, and thus is perhaps the most prevalent etiologic gene for the disease in this population. Prostate cancer linkage to markers from the Xq26–q28 interval has been confirmed by additional independent data sets [6,7]. These findings suggest that the HPCX gene defect may account for a substantial proportion of the world incidence of hereditary prostate cancer. These linkage analyses set the stage for identification of causative and susceptibility genes in some families. Identification of a major prostate cancer susceptibility gene would be a significant step towards successful molecular diagnostics and may assist in targeted therapeutics for hereditary prostate cancer cases. To this end, there have been reports of association between mutations in several genes and the onset of prostate cancer, but these cases are either sporadic or occur in smaller familial cohorts. For example, the gene ELAC2 has recently been implicated in hereditary prostate cancer in some families from Utah [8], but replication of this finding has not occurred [9,10]. Despite the certainty of genetic heterogeneity, it seems reasonable to assume that the HPCX region will be found to harbor a gene that accounts for a proportion of hereditary prostate cancer cases. Accordingly, we have focused significant effort on identifying this susceptibility gene through physical mapping and sequencing of this region of the genome. In addition to HPCX, several other hereditary disease loci have been genetically mapped to Xq26.3–q27.3, including the albinism-deafness syndrome locus (ADFN; MIM 300700). Albinism-deafness syndrome is characterized by partial albinism, or the “piebald” phenotype, and congenital deafness [11,12]. It has been suggested that albinismdeafness syndrome is an X-linked dominant form of Waardenburg syndrome type II [13]. Another example is
anophthalmos-1 (ANOP1; MIM 301590), whose locus has been genetically mapped to Xq27–q28. Anophthalmos shows X-linked recessive inheritance and is characterized by ankyloblepharon, radiologically demonstrable underdevelopment of the bony orbits, and mental retardation (IQ less than 50) [14,15]. In addition, the testicular germ cell tumor-1 locus (TGCT1; MIM 300228) has been mapped to Xq27–q28. Testicular germ-cell tumors affect 1 in 500 men and are the most common cancer in males aged 15 to 40 in western European populations. The risk to brothers of men with TGCT is nearly twice the relative risk to fathers and sons, consistent with X-linked inheritance [16]. High-resolution physical mapping and sequencing of this region of the genome also provide a resource to identify the etiologic mutations in these disorders. To this end, a yeast artificial chromosome (YAC) contig covering the region Xq26–qter was previously constructed, and sequence tagged sites (STSs) were mapped at ~ 85-kb intervals across the contig [17]. We built upon this framework STS map to generate our BAC/PAC contig. BAC and PAC clones have been assembled to form an ~ 9-Mb contig map covering Xq26.3–q27.3 between the loci DXS1192 and DXS548. There are three gaps within the contig map, each with an average size of one clone length, or 150–200 kb, as shown by fluorescent in situ hybridization to extended interphase fibers (fiber-FISH). A minimum tiling set of 89 clones has been identified and is currently being sequenced, with most of the region now available as finished or draft sequence. Several known genes and potential novel transcripts have been precisely mapped by analysis of finished sequence. These mapped transcriptional units serve as candidate genes for HPCX, as well as several other hereditary diseases.
GENOMICS Vol. 79, Number 1, January 2002 Copyright © 2002 by Academic Press. All rights of reproduction in any form reserved.
43
Article
doi:10.1006/geno.2001.6681, available online at http://www.idealibrary.com on IDEAL
FIG. 1. An illustration of the contig map generated between DXS1192 and DXS548. Key landmark data are shown, and the contigs covering the region are represented by the red vertical bars. Smaller vertical bars represent the minimum set of clones identified for sequencing, and the color indicates the sequence status: white, no sequence available; yellow, working draft sequence available; black, finished sequence. GenBank/EMBL accession numbers are given where available. For clones that have not yet been assigned accession numbers, the official library name is given. The conversion from accession numbers to clone names is shown in Table 3. Gaps between contigs are represented by horizontal green bars.
RESULTS Generation of Sequence-Ready Contigs A bacterial clone contig map has been constructed across Xq26.3–q27.3 between the loci DXS1192 and DXS548 using a combination of landmark content mapping and restriction
44
digestion fingerprinting. Both publicly available STSs and BAC/PAC end-STSs generated in-house were used to screen either PCR pools or gridded arrays of bacterial clone libraries. The framework map was constructed using previously published STS markers at an average density of 85 kb, as well as microsatellite and EST markers from UniGene (http://www.ncbi.nlm.nih.gov/UniGene/index.html; Table
GENOMICS Vol. 79, Number 1, January 2002 Copyright © 2002 by Academic Press. All rights of reproduction in any form reserved.
doi:10.1006/geno.2001.6681, available online at http://www.idealibrary.com on IDEAL
Article
FIG. 2. An image showing the gap size between Chr_Xctg104 and Chr_Xctg100. A clone from the end of each contig has been labeled and hybridized to extended DNA fibers. Gap sizes are estimated by comparing the length of the gap with the length of the signal for each clone.
1). All positive clones were confirmed by single-colony PCR. A subset of the BAC and PAC clones identified by markers known to map to this interval were subjected to direct endsequencing to generate new STSs, primarily to facilitate chromosome walking. A total of 269 new STSs were generated from BAC end-sequencing at NHGRI and have been subsequently mapped to the DXS1192-DXS548 interval (the novel STS primer sequences are available, see supplementary data). End-sequences were compared against the NCBI nonredundant nucleotide database using RepeatMasker for the identification of repetitive elements and also to identify homologous sequences, including gene and EST homologies [18]. These new STSs were used to further screen BAC and PAC library DNA pools to provide more complete coverage. In total, 523 STSs (116 previously identified STSs and 407 STSs identified at NHGRI and the Sanger Center) were used to identify 244 PACs and 457 BACs. Clones were subjected to restriction digestion fingerprinting to establish the extents of overlaps between them. Additional clones from the interval were identified in the Washington University Human Fingerprinted Contigs Database (http://genome.wustl.edu/gsc/human/human_ database.shtml), either by searching for shared clones or by experimental comparison of clone fingerprints between the two data sets. This approach has produced four contigs covering an estimated 9 Mb, or 96%, of the Xq26–q27.3 region (Fig. 1). The contig that extends to DXS1192 has been joined to a contig assembled separately that extends to DXS8093 in proximal Xq25. The three remaining gaps have been sized to approximately one clone length each (~ 150 kb) using FISH on extended DNA fibers (Fig. 2). Attempts are being made to close these remaining gaps, but the identification of bridging clones is complicated by the occurrence of several low-copy repeated sequences throughout the region. A minimum set of 89 overlapping clones was selected from the contigs for genomic sequencing. These were re-fingerprinted to confirm
the integrity of the DNA and subjected to FISH to ensure that they mapped to the expected chromosome region. Figure 1 illustrates the extent of the contig map, showing the key landmarks, the clones identified for sequencing, and the current sequence status of each. (A detailed version of the contigs, including all clones identified and all marker data, can be viewed in the context of other sequence-ready contigs on the X chromosome via http://www.sanger.ac.uk/HGP/ ChrX.) Transcript Mapping and Identification For the identification and mapping of transcriptional units within the Xq26.3–q27.3 contig, we performed database searching and sequencing. A thorough search of the NCBI human radiation hybrid mapping data GeneMap98 [19] was performed for an expanded region (DXS1192–DXS1193) due to the relatively low resolution of the Genebridge4 radiation hybrid panel used for EST mapping. Primers derived from EST or gene sequence were tested to ensure that they amplified genomic DNA and were mapped to the contig. All EST primers were tested against all 701 BAC/PAC clones and shown to reside within the interval, and thus serve as candidates for Xq26.3–q27.3 linked disorders. With the finished sequence now in hand, the results of PCR EST mapping have been reiterated and are available via the Sanger and Ensembl web sites (http://www.sanger.ac.uk and http://www. ensembl.org, respectively). Of the 89 clones in the minimum tiling set, 86 have been subjected to genomic sequencing so far using the methods described by the International Human Genome Sequencing Consortium [20]: 76 clones have produced 7.7 Mb of finished sequence, and 10 clones have produced 1.5 Mb of draft sequence. Therefore, approximately 90% of the target region of DXS1192–DXS548 has sequence available. Sequence contigs were searched against the NCBI nonredundant nucleotide database using the PowerBlast algorithm [21,22].
GENOMICS Vol. 79, Number 1, January 2002 Copyright © 2002 by Academic Press. All rights of reproduction in any form reserved.
45
Article
doi:10.1006/geno.2001.6681, available online at http://www.idealibrary.com on IDEAL
TABLE 2: Known and predicted genes in Xq26.3–q27.3 Ensembl gene ID
Gene description
Location (bp)
Location (Mb)
Clone accession
ENSG00000123719
[family] LDOC1 protein
135659061 - 135659396
135.7
AL136169
ENSG00000101926
[family] high mobility group protein (HMG)
135719209 - 135719687
135.7
Z83826
ENSG00000101928
DJ473B4.1 (novel protein similar to predicted human and worm genes)
135781981 - 135790067
135.8
Z83826
ENSG00000123720
unknown
135810726 - 135810869
135.8
Z83826
ENSG00000123710
unknown
135828634 - 135833646
135.8
AC004676
ENSG00000123711
unknown
135885143 - 135885184
135.9
AC004676
ENSG00000101930
unknown
135885303 - 135909366
135.9
AC004676
ENSG00000052406
unknown
136080146 - 136084069
136.1
AC004387
ENSG00000101965
hypoxanthine-guanine phosphoribosyltransferase (HGPRT)
136181632 - 136208966
136.2
AC004383
ENSG00000123715
unknown
136221960 - 136222130
136.2
AC004383
ENSG00000101968
unknown
136256971 - 136304661
136.3
AC002408
ENSG00000102193
unknown
136549559 - 136578819
136.5
AC025232
ENSG00000129678
[family] glyceraldehyde-3-phosphate (GAPDH)
136568139 - 136568891
136.6
AC025232
ENSG00000036440
sodium/hydrogen exchanger 6 (NHE-6)
136613227 - 136675353
136.6
AC025232
ENSG00000129683
unknown
136749635 - 136750641
136.7
AL445247
ENSG00000022267
skeletal muscle LIM-protein 1 (SLIM 1)
136825562 - 136830509
136.8
AL078638
ENSG00000129680
cDNA FLJ12649 fis, clone NT2RM4002044
136835975 - 136845106
136.8
AL078638
ENSG00000102197
cDNA FLJ12401 fis, clone MAMMA1002796
136847773 - 136860439
136.8
AL078638
ENSG00000102235
[family] protein kinase
136915122 - 136915841
136.9
AL078638
ENSG00000102239
bombesin receptor subtype-3 (BRS-3)
137107123 - 137111596
137.1
Z97632
ENSG00000129681
unknown
137116473 - 137116658
137.1
Z97632
ENSG00000102241
HIV-1 transcriptional elongation factor TAT
137116785 - 137131503
137.1
Z97632
ENSG00000102243
R76043
137151423 - 137175956
137.2
Z97632
ENSG00000129677
unknown
137266954 - 137267169
137.3
AL135783
ENSG00000102245
CD40 ligand (CD40-L)
137267337 - 137279533
137.3
AL135783
ENSG00000129675
KIAA0006 protein (fragment)
137285138 - 137326137
137.3
AL135783
ENSG00000018887
unknown
137327819 - 137386137
137.3
AL135783
ENSG00000102250
heterogeneous nuclear ribonucleoprotein G (hnRNP G) 137428331 - 137488722
137.4
AC022220
ENSG00000129676
unknown
138330843 - 138330941
138.3
AL035443
ENSG00000091813
zinc finger protein of the cerebellum ZIC3
138349331 - 138354729
138.3
AL035443
ENSG00000129679
[family] SNRPN upstream reading frame protein
139134501 - 139134710
139.1
AL035262
ENSG00000129682
fibroblast growth factor-13 (FGF-13)
139785574 - 139865234
139.8
AL031386
ENSG00000101981
coagulation factor IX precursor (Christmas factor)
140529017 - 140561710
140.5
AL033403
ENSG00000127648
unknown
140579719 - 140579831
140.6
AL033403
ENSG00000101977
proto-oncogene DBL precursor (contains MCF2)
140580023 - 140630707
140.6
AL033403
ENSG00000127646
[family] guanine nucleotide exchange factor DBS
140643820 - 140643909
140.6
AL033403
ENSG00000101974
[family] potential phospholipid transporting ATPase
140727155 - 140766604
140.7
AL356785
ENSG00000127647
[family] potential phospholipid transporting ATPase
140786444 - 140786595
140.8
AL356785
ENSG00000127645
[family] potential phospholipid transporting ATPase
140794525 - 140795577
140.8
AL356785
Table 2 continued on next page
46
GENOMICS Vol. 79, Number 1, January 2002 Copyright © 2002 by Academic Press. All rights of reproduction in any form reserved.
Article
doi:10.1006/geno.2001.6681, available online at http://www.idealibrary.com on IDEAL
TABLE 2: Continued Ensembl gene ID
Gene description
Location (bp)
Location (Mb)
Clone accession
ENSG00000127644
[family] potential phospholipid transporting ATPase
140800488 - 140802851
140.8
AL356785
ENSG00000102014
nuclear-associated protein SPANXb
142282454 - 142283409
142.3
AC025400
ENSG00000102015
cerebellar-degeneration-related antigen 1 (CDR34)
142524890 - 142525560
142.5
AL078639
ENSG00000119053
[family] DNA segment, human EST 478828 (fragment)
142788311 - 143260667
142.8
AL357075
ENSG00000119055
[family] C330027G06RIK protein fragment
142811669 - 142811857
142.8
AL357075
ENSG00000119052
unknown
142853967 - 142854263
142.9
AL357075
ENSG00000119054
unknown
142858547 - 142858754
142.9
AL357075
ENSG00000127015
nuclear-associated protein SPANXb
143253265 - 143254380
143.3
AL031078
ENSG00000078685
unknown
143265292 - 143846084
143.3
AL121881
ENSG00000127020
ribosomal protein L44
143401172 - 143401864
143.4
Z98950
ENSG00000127017
cancer-testis-associated protein
143504105 - 143505138
143.5
AL109799
ENSG00000127019
hypothetical protein CGI-79
143523715 - 143526825
143.5
AL109799
ENSG00000046767
nuclear-associated protein SPANXa
143840257 - 143841194
143.8
AL121881
ENSG00000127018
nuclear-associated protein SPANXa
143846253 - 143847283
143.8
AL121881
ENSG00000102033
[family] melanoma-associated antigen (MAGE antigen)
144137584 - 144137976
144.1
AL023279
ENSG00000127021
melanoma-associated antigen (DA232G24.2)
144161586 - 144165011
144.2
AL023279
ENSG00000127014
[family] melanoma-associated antigen (MAGE antigen)
144429057 - 144429704
144.4
AL023773
ENSG00000046774
cancer-testis antigen CT10
144450767 - 144461462
144.5
AL031073
Genes were predicted by the Ensembl analysis pipeline from either a GeneWise or Genscan prediction followed by confirmation of the exons by comparisons to protein, cDNA, and EST databases.
Before gene analysis, repeat sequences were removed using RepeatMasker (Arian Smit and Paul Green, unpublished data). Sequence data were organized and initially analyzed using WebBlast [18]. Sequence contigs were subsequently analyzed with several gene prediction programs, including GRAIL, GENSCAN, MZEF, and FGENES [23–26] using an analysis workbench called GeneMachine (http://genome. nhgri.nih.gov/genemachine). All BLAST information as well as exon prediction results were viewed and annotated using Sequin (http://www.ncbi.nlm.nih.gov/Sequin). Several known genes and novel transcripts were identified by sequencing. Annotation of the current X chromosome contigs has resulted in the precise localization of 57 genes (http://www.ensembl.org/perl/contigview?chr=X&vc_start=1 38400000&vc_end=153000000&imgmap=1&x=23&y=16), which can now be considered as positional candidates (Table 2).
DISCUSSION Several hereditary disease loci have been shown to map to the Xq26.3–q27.3 interval, including HPCX, ODPF, EBM, ANOP1, and TGCT1. An approximately 9-Mb sequence-ready bacterial clone contig spanning the region between DXS1192 and DXS548 was constructed using a framework YAC contig [17]. BAC and PAC libraries were screened with markers placed
approximately every 85 kb across the region. We identified 407 new STSs by end-sequencing BAC or PAC clones, and these, combined with restriction fingerprinting, were used to assemble a contig with only three small gaps. Before sequencing, a minimal tiling pathway was identified and chimeric clones were excluded from further analysis using metaphase fluorescent in situ hybridization. So far, our sequencing efforts with this contig have yielded 7.7 Mb of finished sequence and 1.5 Mb of draft sequence. Annotation of the region has been carried out as previously described and is available (http://www.sanger.ac.uk and http://www.ensembl.org). Fiber-FISH results indicate that the three gaps in the contig are approximately 150 kb in length, the equivalent of one BAC/PAC clone (Fig. 2). YAC clones exist which cover the remaining gaps, but there does seem to be some riddle involving the regions encompassed by the gaps. The first gap between Chr_Xctg104 and Chr_Xctg100 is covered by YAC clone yWXD310. This is a large clone of ~ 1000 kb and is only singly linked to the contigs. The second (between Chr_Xctg100 and Chr_Xctg23) and third (between Chr_Xctg23 and Chr_Xctg178) gaps seem to be within a duplicated segment. From the Zucchi et al. [17] YAC contig, there is a single YAC clone (yWXD1554) that is ~ 250 kb in length which would span both of these gaps. This clone is also singly linked. The inability to identify multiple YAC clones that cross these gaps in the BAC/PAC contig may be due to either
GENOMICS Vol. 79, Number 1, January 2002 Copyright © 2002 by Academic Press. All rights of reproduction in any form reserved.
47
Article
doi:10.1006/geno.2001.6681, available online at http://www.idealibrary.com on IDEAL
TABLE 3: Minimal tiling path of BACs/PACs for sequencing: relationship between clone name and accession number
Those clones shown in yellow have draft sequence available, and those with no accession number are awaiting sequencing. All other clones are finished. Clones in red have been sequenced by other centers as part of the International Human Genome Mapping Consortium.
an unclonable region or the fact that there is a duplicated segment in this region. We are currently approaching gap-closure through the use of TAR cloning. To get a sense of the overall integrity of our BAC/PAC contig, we compared the sequence derived from it with the available genetic, physical, and radiation hybrid maps with respect to STS marker order. The sequence comparison with the genetic and radiation hybrid maps can be found via http://genome.cse.ucsc.edu/goldenPath/mapPlots. There was no disagreement between the Zucchi et al. [17] YAC
48
contig and our BAC/PAC contig with respect to STS order. There were no major discrepancies between the sequence and the Genethon genetic map, the GM99 GB4 or G3 radiation hybrid maps, and the Whitehead YAC contig. It does seem that in our interval (represented by the 127 Mb to 159 Mb sequence bin in the December 2000 and April 2001 freezes) the genetic maps have exaggerated the chromosome length and most likely represent increased recombination rates in this sub-telomeric region. Additionally, there seems to be an inversion of substantial size (~ 13 Mb) in the Whitehead radiation hybrid map relative to the draft sequence and all other genetic and physical data. A smaller (~ 10 Mb) and partially overlapping inversion is seen in the TNG radiation hybrid map data. Identification of a duplicated segment (DXS1341–DXS1227–DXS7410–DXS7402) in this high-resolution version of the physical map could cause such a finding if it were inverted as well as duplicated. The approximate size of this event (~ 500 kb) does not mimic the larger-sized putative inversion seen when comparing sequence with the RH mapping data. Of interest in the Xq26.3–q27.3 region is the presence of multiple gene family clusters. These family members seem to be functional paralogs in most cases, as opposed to pseudogenes. The gene CXorf1 (at 147.8 Mb) has at least two paralogs, CXorf2 and CXorf6, several megabases in the telomeric direction (154.8 Mb and 151.7 Mb, respectively). We have shown that CXorf2 is highly expressed in the brain and prostate tissue. In addition, there is an entire family of MAGE genes (MAGEC1, MAGEE1, MAGEA4, MAGEA5, MAGEA10, MAGEA2, MAGEA12, MAGEA3, MAGEA6, and MAGEA1), which encode putative cancer antigens expressed in melanoma. We have excluded the MAGE cluster as etiologic of HPC in our X-linked families by mutation screening. There are also at least seven members of the SPANX gene cluster at 143.8 Mb. These genes reside within 20-kb blocks, which are duplicated at extremely high homology across > 1 Mb of genomic DNA near DXS1205. These genes were shown to be expressed in testis and in the spermatozoa. They are also described as “human cancer antigen,” and we have shown expression in prostate tissue (data not shown). Finally, there is evidence at the telomeric end of our region that the gene CTAG is present in two copies and is also expressed in the testis. It is commonly known that the X chromosome is inactivated in female cells to compensate for dosage of gene products, but there may be a more complex evolutionary dosage mechanism that has produced these multiple gene families on Xq. In addition, it has always been unclear how a tumor suppressor gene could be present on the X chromosome in a hereditary cancer (only 1 allele, which would have to be mutated and inherited). The presence of multiple functional gene copies on the same chromosome would make it possible to have one inherited defective copy and have somatic mutations arise in the other copy. The construction and sequencing of this contig has brought us significantly closer to full annotation of this critical region of the genome. Transcriptional mapping has
GENOMICS Vol. 79, Number 1, January 2002 Copyright © 2002 by Academic Press. All rights of reproduction in any form reserved.
doi:10.1006/geno.2001.6681, available online at http://www.idealibrary.com on IDEAL
resulted in the placement of 57 known and predicted genes and many more ESTs by BLAST searching. All of these genes are now being tested for expression in prostate and become candidates for harboring mutations that cause this devastating type of cancer. Supplementary data for this article are available on IDEAL (http://www.idealibrary.com).
MATERIALS AND METHODS PCR amplification. STS primer sequences were synthesized by Life Technologies as described [17]. PCR was carried out using 10 ng of template DNA from the Research Genetics BAC library DNA pools (CTB and CTC libraries) or the Genome Systems PAC library DNA pools (RPCI1 library) with 2.25 mM Mg2+, 10 mM dNTP mix (Gibco BRL), 10 pM each forward and reverse STS primer, PCR buffer II (Perkin-Elmer), and 0.6 units of AmpliTaq Gold Polymerase (Perkin-Elmer) in a 12.5 ml total PCR reaction volume. All PCRs were carried out in MJ 96-well block Tetrad machines (MJ Research) using the following cycling protocol: initial denaturation at 948C for 12 minutes; 948C for 30 seconds, 558C for 30 seconds, 728C for 1 minute for 35 cycles; and with a final extension at 728C for 10 minutes. All PCR products were separated by agarose gel electrophoresis using 2% agarose gels run in 13 TBE buffer (Life Technologies). BAC and PAC library screening. We identified single BAC/PAC clones by screening six libraries with primers derived from the Zucchi et al. [17] map and with EST primers derived from RH mapped EST clusters (UniGene; http://www.ncbi.nlm.nih.gov/UniGene/index.html). Screening was initially carried out in parallel at the NHGRI and at the Sanger Centre. At the NHGRI, the bacterial clone libraries screened were the Research Genetics BAC library (CTB/CTC), the Genome Systems PAC library (RPCI-1), and the RPCI-11 BAC filters (http://www.chori.org/bacpac). For the Research Genetics BAC library, positive pools were identified by electrophoresis of STS amplification products from the master plate clone pool, and then the appropriate sub-pools were screened using the identical PCR protocols (above) to identify positive clones. All positive clones were purchased from Research Genetics as bacterial stabs and plated on LB plates supplemented with 12.5 mg/ml chloramphenicol. Ten single colonies were re-streaked onto LB/chloramphenicol plates and PCRverified with the STS primers. A single positive clone was end-sequenced as described below. PAC clones were identified by screening the Genome Systems PAC library of gridded pools of clones in an identical fashion according to manufacturer’s instructions. Positive clones were purchased from Genome Systems, single-colony verified by PCR, and end-sequenced as below. RPCI11 library filters were screened with pools of STSs. STSs were amplified from control genomic DNA, electrophoresed through 2% agarose, excised from the gel, and purified using the QIAquick Gel Extraction kit (Qiagen). Purified amplicons were radiolabeled using the Random Primers DNA Labeling System (GibcoBRL) and [a-32P]dATP (ICN Biomedicals). Free nucleotides were removed by centrifugation through STE Midi Select-D G-50 columns (5 Prime → 3 Prime Inc.), and the radiolabeled probes were denatured and hybridized to gridded clone filters in Hybrisol I solution (Intergen) at 428C overnight. Blots were washed five times in 13 SSC, 0.1% SDS, and twice at 508C in 0.13 SSC, 0.1% SDS, followed by exposure to X-OMAT AR film (Kodak) overnight with an intensifying screen. Positive clones were identified, plated out from in-house stocks, and single colony purified as above. At the Sanger Centre, the RPCI-1, 3, 4, 5, and 6 PAC libraries and the RPCI11 and 13 BAC libraries (http://www.chori.org/bacpac) were screened by hybridization of radiolabeled PCR products to high-density gridded clone filters. Positive clones were verified by single-colony PCR. All methods are described at http://www.sanger.ac.uk/HGP/methods/mapping. End-walking to close gaps. At NHGRI, DNA for end-sequencing was prepared from 25 ml bacterial cultures using an AutoGen 850 automated DNA isolation system according to the manufacturer’s recommended protocols
Article
(Autogen). Subsequently, the BAC/PAC DNA was resuspended in 600 ml dH20, treated with RNase (Ambion), and purified over a Microcon 100 column (Amicon). BACs were sequenced using M13 forward and reverse primers, and PACs were sequenced using T7 and SP6 primers. Sequencing reactions were set up using the BigDye Terminator Chemistry (Perkin-Elmer) as follows: 500 ng BAC DNA, 10 pmol primer, and 16 ml Big Dye Terminator Reagent Mix in a 40 ml total reaction volume. Cycle sequencing was performed in MJ Tetrad Thermocyclers using the following cycling conditions: 958C for 5 minutes, followed by 35 cycles of 958C for 30 seconds, 508C for 20 seconds, and 608C for 4 minutes. Free fluorescent nucleotides were removed using CentriSep columns according to the manufacturer’s recommendations (Princeton Separations). The PCR products were dried in a Speed Vac (Savant), re-dissolved/denatured in 3 ml loading buffer (95% Formamide/50mM EDTA) at 908C for 3 minutes, and analyzed with an Applied Biosystems 377 XL automated DNA sequencer (Perkin-Elmer). Gel files were tracked and analyzed using Applied Biosystems DNA Analysis Sequencing Software 3.2 (Perkin-Elmer). For STSs generated at the Sanger Centre, detailed protocols for endsequencing can be found at http://www.sanger.ac.uk/HGP/methods/mapping/chr_walking. Following repeat masking (RepeatMasker, http://ftp.genome.washington.edu/RM/RepeatMasker.html), end-STS primers were designed using PRIMER [27] (Table 3). These novel end-STSs were used to identify overlapping BAC/PAC clones by PCR or hybridization as described above. Contig assembly and verification of chromosomal location. BAC and PAC clones were restriction fingerprinted as described [28] and assembled into contigs using IMAGE (http://www.sanger.ac.uk/Software/Image) and FPC [29] (http://www.sanger.ac.uk/Software/fpc). Landmark content of clones was also considered during contig assembly. Full details of protocols are available at http://www.sanger.ac.uk/HGP/methods/mapping. Additional clones in areas of low-fold coverage were identified by searching the Washington University restriction fingerprint database [30] (http://genome.wustl.edu/gsc/ human/human_database.shtml) and were incorporated into the contig by fingerprinting. From the contig map, a minimal tiling set of clones was selected for genomic sequencing (below). These clones were also used for fluorescence in situ hybridization (FISH) to control male metaphase chromosome spreads to confirm chromosome localization and to exclude clone chimerism. Detailed methods are described at http://www.sanger.ac.uk/HGP/methods/cytogenetics. Briefly, clone DNA was isolated using a standard alkaline lysis protocol (see web site). DNA was labeled with biotin-16-dUTP or digoxigenin-11dUTP (Boehringer-Mannheim) by nick translation, and the probes were hybridized to either metaphase spreads or extended interphase fibers from a normal male lymphoblastoid cell line. These probes were detected by either Texas-red-conjugated avidin or FITC-conjugated anti-digoxigenin, respectively, and the slides were counterstained with DAPI. Any gaps in the contig map were sized by fiber-FISH. Briefly, fluorescently labeled BAC/PAC clones were hybridized to stretched DNA fibers, and the length of each gap was estimated by comparing the length with those of the clones used for hybridization. Detailed protocols can be found at http://www.sanger.ac.uk/HGP/methods/cytogenetics. BAC/PAC insert sequencing. BAC and PAC inserts were sequenced as described [20]. Clone names and GenBank accession numbers. RP1-36J3, Z82975; RP3-523M5, AL035262; RP1-93C23, AL008713; RP6-152C18, AL096887; RP6-88D7, AL033403; RP11-197K18, AL161777; RP13-206I21, AL356785; RP11-35F15, AL590077; RP11-364B14, AL589987; RP5-914F23, AL138892; RP11-189F23, AL449183; bWXD13, AC004070; bWXD105, AC002412; bWXD90, AC004075; RP11-359I11, AL137014; RP11-51C14, AL121875; RP4-595A18, AL137016; RP1177G6, AL078639; RP11-338I3, AL359393; RP11-298A8, AL451048; RP1-171K16, AL121881; GS1-164F24, AL050308; RP11-518F7, AL133273; GS1-82I1, AL109799; RP3-507I15, Z98950; RP3-376H23, AL031078; RP3-433M19, Z95703; RP51178I21, AL109852; RP4-552K20, AL049177; RP3-326L12, AL023279; RP6232G24, AL022152; RP1-85A12, AL049858; RP3-406C18, AL023773; RP1-142F18, AL031073; GS1-54N10, AL109620; RP3-324C6, AL031586; RP1-73H14, AL080272; RP11-485I9, AL449265; RP1-48G12, AL031054; RP11-241O12, AL137840; RP3-357K22, AL022720; GS1-91O18, AL080238; RP1-231L4, AL022719; RP5-1189K21, AL030997; GS1-256O22, AL080239; RP11-29O6,
GENOMICS Vol. 79, Number 1, January 2002 Copyright © 2002 by Academic Press. All rights of reproduction in any form reserved.
49
Article
doi:10.1006/geno.2001.6681, available online at http://www.idealibrary.com on IDEAL
AL500522; RP3-526F5, AL109622; RP11-514L15, AL135920; RP11-319M16, AL590424; RP1-51J23, AL031312; RP1-110C15, AL138969; RP11-239L17, AL359884; RP1-169P22, AL049588; RP11-449O17, AL512285; RP1-29A6, AL133546; RP1-145B12, AL008706; RP11-480M11, AL159988; RP4-581F7, AL022164; RP11-570O20, AL354752; GS1-103B18, AL139112; RP1-315J21, AL356499; RP11-571O22, AC019230; RP13-111A12, AL391360; RP13-150K15, AL391256; RP11-550B3, AL589671; GS1-115M3, AL109653; RP11-36B11, AL589680; RP11-269F10, AL445258; RP11-387H19, AL358174; RP13-159A24, AL356503; RP13-5C2, AL358052; GS1-278N14, AL109654; RP11-183K14, AL109913; RP11-79A21, AL513491; RP11-243C2, AL109836; RP1-203P18, Z97180; GS1-152G24, AL356286; RP1-73A14, Z99497; RP11-522P6, AL589706; RP11-226B15, AL589669; RP13-485J5, AL591O22; RP11-42J13, AL137841; RP5824H1, AL096861; RP5-1056D13, AL450486; RP3-433G13, AL009048; RP6244C24, AC007538 Clones that have not yet been assigned accession numbers are RP11-963P9, RP11-206A13, and RP13-559O21.
ACKNOWLEDGMENTS We thank Darryl Leja (Scientific Illustration Unit, National Human Genome Institute, NIH) for assistance with the preparation of figures, and Lucia Susani (Istituto di Tecnologie Biomediche, Consiglio Nazionale delle Ricerche, Milan, Italy) for assistance and technical support This research is supported in part by grants from the Academy of Finland, Albert Einstein College of Medicine (R.D.B.), the Wellcome Trust, the Dr. Louis Sklarow Memorial Fund (R.D.B.), and a grant from the Department of the Army (DAMD17-01-1-0014). This is manuscript number 58 of the project Genoma 2000/ITBA funded by Cariplo. RECEIVED FOR PUBLICATION JULY 13; ACCEPTED OCTOBER 31, 2001.
REFERENCES 1. Smith, J. R., et al. (1996). Major susceptibility locus for prostate cancer on chromosome 1 suggested by a genome-wide search. Science 274: 1371–1374. 2. Berthon, P., et al. (1998). Predisposing gene for early-onset prostate cancer, localized on chromosome 1q42.2-43. Am. J. Hum. Genet. 62: 1416–1424. 3. Paris, P. L., et al. (2000). Identification and fine-mapping of a region showing high frequency of allelic imbalance on chromosome 16q23.2 that corresponds to a prostate cancer susceptibility locus. Cancer Res. 60: 3645–3649. 4. Berry, R., et al. (2000). Evidence for a prostate cancer-susceptibility locus on chromsome 20. Am. J. Hum. Genet. 67: 82–91. 5. Xu, J., et al. (1998). Evidence for a prostate cancer susceptibility locus on the X chromosome. Nat. Genet. 20: 175–179. 6. Lange, E. M., et al. (1999). Linkage analysis of 153 prostate cancer families over a 30-cM region containing the putative susceptibility locus HPCX. Clin. Cancer Res. 5: 4013–4020. 7. Peters, M. A., et al. (2001). Genetic linkage analysis of prostate cancer families to Xq2728. Hum. Hered. 15: 107–113. 8. Tavtigian, S. V., et al. (2001). A candidate prostate cancer susceptibility gene at chromo-
50
some 17p. Nat. Genet. 27: 172–180. 9. Xu, J., et al. (2001). Evaluation of linkage and association of HPC2/ELAC2 in patients with familial or sporadic prostate cancer. Am. J. Hum. Genet. 68: 901–911. 10. Vesprini, D., et al. (2001). HPC2 variants and screen-detected prostate cancer. Am. J. Hum. Genet. 68: 912–917. 11. Litvak, S., Zukas, H., and Heumann, J. E. (1987). Attending to America: personal assistance for independent living. Report of the National Survey of Attendant Care Programs in the United States. World Institute on Disability, Berkeley, CA. 12. Shiloh, Y., et al. (1990). Genetic mapping of X-linked albinism-deafness syndrome (ADFN) to Xq26.3-q27.1. Am. J. Hum. Genet. 47: 20–27. 13. Zlotogora, J. (1995). X-linked albinism-deafness syndrome and Waardenburg syndrome type II: a hypothesis. Am. J. Med. Genet. 59: 386–387. 14. Graham, C. A., et al. (1988). Linkage analysis in a family with X-linked anophthalmos. J. Med. Genet. 25: 643. 15. Graham, C. A., Redmond, R. M., and Nevin, N. C. (1991). X-linked clinical anophthalmos: localization of the gene to Xq27-Xq28. Ophthal. Paediat. Genet. 12: 43–48. 16. Rapley, E. A., et al. (2000). Localization to Xq27 of a susceptibility gene for testicular germcell tumours. Nat. Genet. 24: 197–200. 17. Zucchi, I., et al. (1996). YAC/STS map across 12 Mb of Xq27 at 25-kb resolution, merging Xq26-qter. Genomics 34: 42–54. 18. Ferlanti, E. S., Ryan, J. F., Makalowska, I., and Baxevanis, A. D. (1999). WebBLAST 2.0: an integrated solution for organizing and analyzing sequence data. Bioinformatics 15: 422–423. 19. Deloukas, P., et al. (1998). A physical map of 30,000 human genes. Science 282: 744–746. 20. International Human Genome Sequencing Consortium. (2001). Initial sequencing and analysis of the human genome. Nature 409: 860–921. 21. Altschul, S. F., et al. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25: 3389–3402. 22. Zhang, J., and Madden, T. L. (1997). PowerBLAST: a new network BLAST application for interactive or automated sequence analysis and annotation. Genome Res. 7: 649–656. 23. Guan, X., Mural, R. J., Einstein, J. R., Mann, R. C., and Uberbacher, E. C. (1992). GRAIL: An Integrated Artificial Intelligence System for Gene Recognition and Interpretation. Proc., The Eighth IEEE Conference on AI Applications, 9–13. 24. Burge, C., and Karlin, S. (1997). Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268: 78–94. 25. Zhang, M. Q. (1997). Identification of protein coding regions in the human genome based on quadratic discriminant analysis. Proc. Natl. Acad. Sci. USA 94: 565–568. 26. Solovyev, V. V., and Salamov, A. A. (1997). The Gene-Finder computer tools for analysis of human and model organisms genome sequences. Proceedings of the Fifth International Conference on Intelligent Systems for Molecular Biology, Halkidiki, Greece, AAAI Press, 294–302. 27. Lincoln, S. E., Daly, M. J., and Lander, S. E. (1991). PRIMER: A computer program for automatically selecting PCR primers. Available at http://www.genome.wi.mit.edu/ftp/distribution/software/primer.0.5, and via anonymous ftp to ftp-genome.wi.mit.edu, directory /pub/software/primer.0.5. 28. Gregory, S. G., Howell, G. R., and Bentley, D. R. (1997). Genome mapping by fluorescent fingerprinting. Genome Res. 7: 1162–1168. 29. Soderlund, C., Longden, I., and Mott, R. (1997). FPC: a system for building contigs from restriction fingerprinted clones. CABIOS 13: 523–535. 30. International Human Genome Mapping Consortium. (2001). A physical map of the human genome. Nature 409: 934–941.
GENOMICS Vol. 79, Number 1, January 2002 Copyright © 2002 by Academic Press. All rights of reproduction in any form reserved.