Physical and Transcript Map of the Hereditary Prostate Cancer Region at Xq27

Physical and Transcript Map of the Hereditary Prostate Cancer Region at Xq27

doi:10.1006/geno.2001.6681, available online at http://www.idealibrary.com on IDEAL Article Physical and Transcript Map of the Hereditary Prostate C...

416KB Sizes 0 Downloads 26 Views

doi:10.1006/geno.2001.6681, available online at http://www.idealibrary.com on IDEAL

Article

Physical and Transcript Map of the Hereditary Prostate Cancer Region at Xq27 Dietrich A. Stephan,1,* Gareth R. Howell,2 Tanya M. Teslovich,1 Alison J. Coffey,2 Lorie Smith,3 Joan E. Bailey-Wilson,4 Lindsay Malechek,1 Derek Gildea,1 Jeffrey R. Smith,5 Elizabeth M. Gillanders,1 Johanna Schleutker,6 Ping Hu,1 Helen E. Steingruber,2 Pawandeep Dhami,2 Christiane M. Robbins,1 Izabela Makalowska,7 John D. Carpten,1 Raman Sood,1 Steve Mumm,8 Rolland Reinbold,8 Tom I. Bonner,9 Agnes Baffoe-Bonnie,4,10 Lukas Bubendorf,1 Mervi Heiskanen,1 Olli P. Kallioneimi,1 Andreas D. Baxevanis,1 Shirin S. Joseph,2 Ileana Zucchi,8 Robert D. Burk,3 William Isaacs,11 Mark T. Ross,2 and Jeffrey M. Trent1 1

Cancer Genetics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA 2 The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK 3 Albert Einstein College of Medicine, Bronx, New York 10461, USA 4 Inherited Disease Research Branch, National Human Genome Research Institute, National Institutes of Health, Baltimore, Maryland 21224, USA 5 Vanderbilt University Medical Center, Department of Medicine, Division of Genetic Medicine, Nashville, Tennessee 37232, USA 6 Laboratory of Cancer Genetics, Tampere University Hospital, 33521 Tampere, Finland 7 Gene Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA 8 Istituto di Tecnologie Biomediche, Consiglio Nazionale delle Ricerche, 20090 Segrate, Milan, Italy 9 Laboratory of Genetics, National Institute of Mental Health, National Institutes of Health, Bethesda, Maryland 20892, USA 10 Fox Chase Cancer Center, Philadelphia, Pennsylvania 19111, USA 11 Department of Urology, Johns Hopkins University, Baltimore, Maryland 21287, USA *

To whom correspondence and reprint requests should be addressed. Fax: (202) 884-6014. E-mail: [email protected].

We have recently mapped a locus for hereditary prostate cancer (termed HPCX) to the long arm of the X chromosome (Xq25–q27) through a genome-wide linkage study. Here we report the construction of an ~ 9-Mb sequence-ready bacterial clone contig map of Xq26.3–q27.3. The contig was constructed by screening BAC/PAC libraries with markers spaced at ~ 85-kb intervals. We identified overlapping clones by end-sequencing framework clones to generate 407 new sequence-tagged sites, followed by PCR verification of overlaps. Contig assembly was based on clone restriction fingerprinting and the landmark information. We identified a minimal overlap contig for genomic sequencing, which has yielded 7.7 Mb of finished sequence and 1.5 Mb of draft sequence. The transcriptional mapping effort localized 57 known and predicted genes by database searching, STS content mapping, and sequencing, followed by sequence annotation. These transcriptional units represent candidate genes for HPCX and multiple other hereditary diseases at Xq26.3–q27.3. Key words: physical mapping, genome mapping, genome sequencing, contig mapping, bacterial artificial chromosomes, P1 artificial chromosomes, prostate cancer, Xq26.3–q27.3, sequence annotation, repetitive elements

INTRODUCTION Prostate cancer is a major health concern, with over 200,000 new prostate cancer cases diagnosed in the United States each year. In the US alone, it accounts for more than 35% of all cancer cases affecting men and results in 40,000 deaths annually. There is thought to be substantial genetic heterogeneity

underlying hereditary prostate cancer (HPC), as several prostate cancer susceptibility loci have been reported, including 1q24–q25 [1], 1q42.2–q43 [2], 16q23.2 [3], and 20q13 [4]. Previously, we carried out a genome-wide scan of US, Swedish, and Finnish families at high risk for prostate cancer and revealed evidence of a major prostate cancer susceptibility locus (HPCX; MIM 300147) on Xq [5]. The HPCX locus is

GENOMICS Vol. 79, Number 1, January 2002 Copyright © 2002 by Academic Press. All rights of reproduction in any form reserved. 0888-7543/01 $35.00

41

Article

doi:10.1006/geno.2001.6681, available online at http://www.idealibrary.com on IDEAL

TABLE 1: Markers used to generate a framework BAC/PAC physical map based on the Zucchi et al. [17] map Known genes CDR1

FMR1

LDOC1

MCF2

FMR2

FRAXAC1

HPRT1

cDNA clones AA169138

H67143

R83022

A006J30

A009C30

AFM136YB10

AFMa113zf5

CHLC.ATA24C11

CHLC.ATA25B04

CHLC.ATA27F11

CHLC.GATA74D04

CIT-HSP-433M19

CIT-HSP-507I15

D3S2390

DXS105

DXS119

DXS1192

DXS1193

DXS1200

DXS1205

DXS1211

DXS1215

DXS1227

DXS1232

DXS1286

DXS1289

DXS1324

DXS1337

DXS1341

DXS1343

DXS1344

DXS152

DXS185

DXS259

DXS292

DXS293

DXS295

DXS296

DXS297

DXS312

DXS369

DXS465

DXS532

DXS533

DXS548

DXS6709

DXS6729

DXS6738

DXS6751

DXS6798

DXS6806

DXS7004E

DXS7006E

DXS7049

DXS7087

DXS7089

DXS7094

DXS7096

DXS7143

DXS7158

DXS7262

DXS7281

DXS7302

DXS7305

DXS7306

DXS7371

DXS7373

DXS7375

DXS7376

DXS7377

DXS7378

DXS7379

DXS7381

DXS7382

DXS7383

DXS7385

DXS7386

DXS7388

DXS7389

DXS7391

DXS7395

DXS7396

DXS7397

DXS7398

DXS7400

DXS7401

DXS7402

DXS7403

DXS7404

DXS7408

DXS7409

DXS7410

DXS7411

DXS7413

DXS7414

DXS7416

DXS7419

DXS7444E

DXS7482

DXS7503

DXS7524

DXS7536

DXS7553

DXS7635

DXS7825

DXS7832

DXS7833

DXS7834

DXS7846

DXS7847

DXS7857

DXS7874

DXS7875

DXS7876

DXS7892

DXS7893

DXS7902

DXS7908

DXS7917

DXS8013

DXS8043

DXS8045

DXS8073

DXS8084

DXS8091

DXS8106

DXS8148

DXS8151E

DXS8215

DXS8229

DXS8232

DXS8272

DXS8273

DXS8287

DXS8288

DXS8289

DXS8295

DXS8303

DXS8309

DXS8312

DXS8313

DXS8316

DXS8317

DXS8319

DXS9317

DXS9739

DXS98

DXS984

DXS998

RP_L10

SGC30410

SGC32232

SGC32493

SGC34657

SHGC-17233

SHGC-31764

sts-D45526

stSG10280

StSG12776

stSG13258

stSG15553

stSG15731

stSG16157

StSG16307

stSG1667

stSG2066

stSG22504

stSG2682

STSs

Table 1 continued on next page

42

GENOMICS Vol. 79, Number 1, January 2002 Copyright © 2002 by Academic Press. All rights of reproduction in any form reserved.

Article

doi:10.1006/geno.2001.6681, available online at http://www.idealibrary.com on IDEAL

TABLE 1: Continued StSG28412

stSG29330

stSG29555

stSG29823

stSG30310

StSG31543

stSG38950

stSG3962

stSG4230

stSG43255

StSG4528

stSG4748

stSG4810

stSG8133

stSG8423

StSG8451

stSG8474

stSG8804

stSG9235

sts-H78267

sts-H93110

sts-H98521

sts-L08893

sts-M11309

sts-N21327

sts-N33366

sts-N34966

sts-R87104

sts-T83641

sts-T90453

sts-V00530

sts-W44435

sWXD1117

sWXD1208

sWXD1341

sWXD1344

sWXD1447

sWXD1449

sWXD179

sWXD2238

sWXD2462

sWXD28

sWXD29

sWXD398

sWXD575

sWXD639

sWXD864

sWXD883

TIGR-A004F02

TIGR-A007G08

TIGR-A007J06

WI-11212

WI-11315

WI-11365

WI-11452

WI-11835

WI-12657

WI-12764

WI-13459

WI-13557

WI-14785

WI-14955

WI-16556

WI-16747

WI-16817

WI-18472

WI-18960

WI-18961

WI-20478

WI-3908

WI-4468

implicated in ~ 40% of hereditary prostate cancers in Finland, and thus is perhaps the most prevalent etiologic gene for the disease in this population. Prostate cancer linkage to markers from the Xq26–q28 interval has been confirmed by additional independent data sets [6,7]. These findings suggest that the HPCX gene defect may account for a substantial proportion of the world incidence of hereditary prostate cancer. These linkage analyses set the stage for identification of causative and susceptibility genes in some families. Identification of a major prostate cancer susceptibility gene would be a significant step towards successful molecular diagnostics and may assist in targeted therapeutics for hereditary prostate cancer cases. To this end, there have been reports of association between mutations in several genes and the onset of prostate cancer, but these cases are either sporadic or occur in smaller familial cohorts. For example, the gene ELAC2 has recently been implicated in hereditary prostate cancer in some families from Utah [8], but replication of this finding has not occurred [9,10]. Despite the certainty of genetic heterogeneity, it seems reasonable to assume that the HPCX region will be found to harbor a gene that accounts for a proportion of hereditary prostate cancer cases. Accordingly, we have focused significant effort on identifying this susceptibility gene through physical mapping and sequencing of this region of the genome. In addition to HPCX, several other hereditary disease loci have been genetically mapped to Xq26.3–q27.3, including the albinism-deafness syndrome locus (ADFN; MIM 300700). Albinism-deafness syndrome is characterized by partial albinism, or the “piebald” phenotype, and congenital deafness [11,12]. It has been suggested that albinismdeafness syndrome is an X-linked dominant form of Waardenburg syndrome type II [13]. Another example is

anophthalmos-1 (ANOP1; MIM 301590), whose locus has been genetically mapped to Xq27–q28. Anophthalmos shows X-linked recessive inheritance and is characterized by ankyloblepharon, radiologically demonstrable underdevelopment of the bony orbits, and mental retardation (IQ less than 50) [14,15]. In addition, the testicular germ cell tumor-1 locus (TGCT1; MIM 300228) has been mapped to Xq27–q28. Testicular germ-cell tumors affect 1 in 500 men and are the most common cancer in males aged 15 to 40 in western European populations. The risk to brothers of men with TGCT is nearly twice the relative risk to fathers and sons, consistent with X-linked inheritance [16]. High-resolution physical mapping and sequencing of this region of the genome also provide a resource to identify the etiologic mutations in these disorders. To this end, a yeast artificial chromosome (YAC) contig covering the region Xq26–qter was previously constructed, and sequence tagged sites (STSs) were mapped at ~ 85-kb intervals across the contig [17]. We built upon this framework STS map to generate our BAC/PAC contig. BAC and PAC clones have been assembled to form an ~ 9-Mb contig map covering Xq26.3–q27.3 between the loci DXS1192 and DXS548. There are three gaps within the contig map, each with an average size of one clone length, or 150–200 kb, as shown by fluorescent in situ hybridization to extended interphase fibers (fiber-FISH). A minimum tiling set of 89 clones has been identified and is currently being sequenced, with most of the region now available as finished or draft sequence. Several known genes and potential novel transcripts have been precisely mapped by analysis of finished sequence. These mapped transcriptional units serve as candidate genes for HPCX, as well as several other hereditary diseases.

GENOMICS Vol. 79, Number 1, January 2002 Copyright © 2002 by Academic Press. All rights of reproduction in any form reserved.

43

Article

doi:10.1006/geno.2001.6681, available online at http://www.idealibrary.com on IDEAL

FIG. 1. An illustration of the contig map generated between DXS1192 and DXS548. Key landmark data are shown, and the contigs covering the region are represented by the red vertical bars. Smaller vertical bars represent the minimum set of clones identified for sequencing, and the color indicates the sequence status: white, no sequence available; yellow, working draft sequence available; black, finished sequence. GenBank/EMBL accession numbers are given where available. For clones that have not yet been assigned accession numbers, the official library name is given. The conversion from accession numbers to clone names is shown in Table 3. Gaps between contigs are represented by horizontal green bars.

RESULTS Generation of Sequence-Ready Contigs A bacterial clone contig map has been constructed across Xq26.3–q27.3 between the loci DXS1192 and DXS548 using a combination of landmark content mapping and restriction

44

digestion fingerprinting. Both publicly available STSs and BAC/PAC end-STSs generated in-house were used to screen either PCR pools or gridded arrays of bacterial clone libraries. The framework map was constructed using previously published STS markers at an average density of 85 kb, as well as microsatellite and EST markers from UniGene (http://www.ncbi.nlm.nih.gov/UniGene/index.html; Table

GENOMICS Vol. 79, Number 1, January 2002 Copyright © 2002 by Academic Press. All rights of reproduction in any form reserved.

doi:10.1006/geno.2001.6681, available online at http://www.idealibrary.com on IDEAL

Article

FIG. 2. An image showing the gap size between Chr_Xctg104 and Chr_Xctg100. A clone from the end of each contig has been labeled and hybridized to extended DNA fibers. Gap sizes are estimated by comparing the length of the gap with the length of the signal for each clone.

1). All positive clones were confirmed by single-colony PCR. A subset of the BAC and PAC clones identified by markers known to map to this interval were subjected to direct endsequencing to generate new STSs, primarily to facilitate chromosome walking. A total of 269 new STSs were generated from BAC end-sequencing at NHGRI and have been subsequently mapped to the DXS1192-DXS548 interval (the novel STS primer sequences are available, see supplementary data). End-sequences were compared against the NCBI nonredundant nucleotide database using RepeatMasker for the identification of repetitive elements and also to identify homologous sequences, including gene and EST homologies [18]. These new STSs were used to further screen BAC and PAC library DNA pools to provide more complete coverage. In total, 523 STSs (116 previously identified STSs and 407 STSs identified at NHGRI and the Sanger Center) were used to identify 244 PACs and 457 BACs. Clones were subjected to restriction digestion fingerprinting to establish the extents of overlaps between them. Additional clones from the interval were identified in the Washington University Human Fingerprinted Contigs Database (http://genome.wustl.edu/gsc/human/human_ database.shtml), either by searching for shared clones or by experimental comparison of clone fingerprints between the two data sets. This approach has produced four contigs covering an estimated 9 Mb, or 96%, of the Xq26–q27.3 region (Fig. 1). The contig that extends to DXS1192 has been joined to a contig assembled separately that extends to DXS8093 in proximal Xq25. The three remaining gaps have been sized to approximately one clone length each (~ 150 kb) using FISH on extended DNA fibers (Fig. 2). Attempts are being made to close these remaining gaps, but the identification of bridging clones is complicated by the occurrence of several low-copy repeated sequences throughout the region. A minimum set of 89 overlapping clones was selected from the contigs for genomic sequencing. These were re-fingerprinted to confirm

the integrity of the DNA and subjected to FISH to ensure that they mapped to the expected chromosome region. Figure 1 illustrates the extent of the contig map, showing the key landmarks, the clones identified for sequencing, and the current sequence status of each. (A detailed version of the contigs, including all clones identified and all marker data, can be viewed in the context of other sequence-ready contigs on the X chromosome via http://www.sanger.ac.uk/HGP/ ChrX.) Transcript Mapping and Identification For the identification and mapping of transcriptional units within the Xq26.3–q27.3 contig, we performed database searching and sequencing. A thorough search of the NCBI human radiation hybrid mapping data GeneMap98 [19] was performed for an expanded region (DXS1192–DXS1193) due to the relatively low resolution of the Genebridge4 radiation hybrid panel used for EST mapping. Primers derived from EST or gene sequence were tested to ensure that they amplified genomic DNA and were mapped to the contig. All EST primers were tested against all 701 BAC/PAC clones and shown to reside within the interval, and thus serve as candidates for Xq26.3–q27.3 linked disorders. With the finished sequence now in hand, the results of PCR EST mapping have been reiterated and are available via the Sanger and Ensembl web sites (http://www.sanger.ac.uk and http://www. ensembl.org, respectively). Of the 89 clones in the minimum tiling set, 86 have been subjected to genomic sequencing so far using the methods described by the International Human Genome Sequencing Consortium [20]: 76 clones have produced 7.7 Mb of finished sequence, and 10 clones have produced 1.5 Mb of draft sequence. Therefore, approximately 90% of the target region of DXS1192–DXS548 has sequence available. Sequence contigs were searched against the NCBI nonredundant nucleotide database using the PowerBlast algorithm [21,22].

GENOMICS Vol. 79, Number 1, January 2002 Copyright © 2002 by Academic Press. All rights of reproduction in any form reserved.

45

Article

doi:10.1006/geno.2001.6681, available online at http://www.idealibrary.com on IDEAL

TABLE 2: Known and predicted genes in Xq26.3–q27.3 Ensembl gene ID

Gene description

Location (bp)

Location (Mb)

Clone accession

ENSG00000123719

[family] LDOC1 protein

135659061 - 135659396

135.7

AL136169

ENSG00000101926

[family] high mobility group protein (HMG)

135719209 - 135719687

135.7

Z83826

ENSG00000101928

DJ473B4.1 (novel protein similar to predicted human and worm genes)

135781981 - 135790067

135.8

Z83826

ENSG00000123720

unknown

135810726 - 135810869

135.8

Z83826

ENSG00000123710

unknown

135828634 - 135833646

135.8

AC004676

ENSG00000123711

unknown

135885143 - 135885184

135.9

AC004676

ENSG00000101930

unknown

135885303 - 135909366

135.9

AC004676

ENSG00000052406

unknown

136080146 - 136084069

136.1

AC004387

ENSG00000101965

hypoxanthine-guanine phosphoribosyltransferase (HGPRT)

136181632 - 136208966

136.2

AC004383

ENSG00000123715

unknown

136221960 - 136222130

136.2

AC004383

ENSG00000101968

unknown

136256971 - 136304661

136.3

AC002408

ENSG00000102193

unknown

136549559 - 136578819

136.5

AC025232

ENSG00000129678

[family] glyceraldehyde-3-phosphate (GAPDH)

136568139 - 136568891

136.6

AC025232

ENSG00000036440

sodium/hydrogen exchanger 6 (NHE-6)

136613227 - 136675353

136.6

AC025232

ENSG00000129683

unknown

136749635 - 136750641

136.7

AL445247

ENSG00000022267

skeletal muscle LIM-protein 1 (SLIM 1)

136825562 - 136830509

136.8

AL078638

ENSG00000129680

cDNA FLJ12649 fis, clone NT2RM4002044

136835975 - 136845106

136.8

AL078638

ENSG00000102197

cDNA FLJ12401 fis, clone MAMMA1002796

136847773 - 136860439

136.8

AL078638

ENSG00000102235

[family] protein kinase

136915122 - 136915841

136.9

AL078638

ENSG00000102239

bombesin receptor subtype-3 (BRS-3)

137107123 - 137111596

137.1

Z97632

ENSG00000129681

unknown

137116473 - 137116658

137.1

Z97632

ENSG00000102241

HIV-1 transcriptional elongation factor TAT

137116785 - 137131503

137.1

Z97632

ENSG00000102243

R76043

137151423 - 137175956

137.2

Z97632

ENSG00000129677

unknown

137266954 - 137267169

137.3

AL135783

ENSG00000102245

CD40 ligand (CD40-L)

137267337 - 137279533

137.3

AL135783

ENSG00000129675

KIAA0006 protein (fragment)

137285138 - 137326137

137.3

AL135783

ENSG00000018887

unknown

137327819 - 137386137

137.3

AL135783

ENSG00000102250

heterogeneous nuclear ribonucleoprotein G (hnRNP G) 137428331 - 137488722

137.4

AC022220

ENSG00000129676

unknown

138330843 - 138330941

138.3

AL035443

ENSG00000091813

zinc finger protein of the cerebellum ZIC3

138349331 - 138354729

138.3

AL035443

ENSG00000129679

[family] SNRPN upstream reading frame protein

139134501 - 139134710

139.1

AL035262

ENSG00000129682

fibroblast growth factor-13 (FGF-13)

139785574 - 139865234

139.8

AL031386

ENSG00000101981

coagulation factor IX precursor (Christmas factor)

140529017 - 140561710

140.5

AL033403

ENSG00000127648

unknown

140579719 - 140579831

140.6

AL033403

ENSG00000101977

proto-oncogene DBL precursor (contains MCF2)

140580023 - 140630707

140.6

AL033403

ENSG00000127646

[family] guanine nucleotide exchange factor DBS

140643820 - 140643909

140.6

AL033403

ENSG00000101974

[family] potential phospholipid transporting ATPase

140727155 - 140766604

140.7

AL356785

ENSG00000127647

[family] potential phospholipid transporting ATPase

140786444 - 140786595

140.8

AL356785

ENSG00000127645

[family] potential phospholipid transporting ATPase

140794525 - 140795577

140.8

AL356785

Table 2 continued on next page

46

GENOMICS Vol. 79, Number 1, January 2002 Copyright © 2002 by Academic Press. All rights of reproduction in any form reserved.

Article

doi:10.1006/geno.2001.6681, available online at http://www.idealibrary.com on IDEAL

TABLE 2: Continued Ensembl gene ID

Gene description

Location (bp)

Location (Mb)

Clone accession

ENSG00000127644

[family] potential phospholipid transporting ATPase

140800488 - 140802851

140.8

AL356785

ENSG00000102014

nuclear-associated protein SPANXb

142282454 - 142283409

142.3

AC025400

ENSG00000102015

cerebellar-degeneration-related antigen 1 (CDR34)

142524890 - 142525560

142.5

AL078639

ENSG00000119053

[family] DNA segment, human EST 478828 (fragment)

142788311 - 143260667

142.8

AL357075

ENSG00000119055

[family] C330027G06RIK protein fragment

142811669 - 142811857

142.8

AL357075

ENSG00000119052

unknown

142853967 - 142854263

142.9

AL357075

ENSG00000119054

unknown

142858547 - 142858754

142.9

AL357075

ENSG00000127015

nuclear-associated protein SPANXb

143253265 - 143254380

143.3

AL031078

ENSG00000078685

unknown

143265292 - 143846084

143.3

AL121881

ENSG00000127020

ribosomal protein L44

143401172 - 143401864

143.4

Z98950

ENSG00000127017

cancer-testis-associated protein

143504105 - 143505138

143.5

AL109799

ENSG00000127019

hypothetical protein CGI-79

143523715 - 143526825

143.5

AL109799

ENSG00000046767

nuclear-associated protein SPANXa

143840257 - 143841194

143.8

AL121881

ENSG00000127018

nuclear-associated protein SPANXa

143846253 - 143847283

143.8

AL121881

ENSG00000102033

[family] melanoma-associated antigen (MAGE antigen)

144137584 - 144137976

144.1

AL023279

ENSG00000127021

melanoma-associated antigen (DA232G24.2)

144161586 - 144165011

144.2

AL023279

ENSG00000127014

[family] melanoma-associated antigen (MAGE antigen)

144429057 - 144429704

144.4

AL023773

ENSG00000046774

cancer-testis antigen CT10

144450767 - 144461462

144.5

AL031073

Genes were predicted by the Ensembl analysis pipeline from either a GeneWise or Genscan prediction followed by confirmation of the exons by comparisons to protein, cDNA, and EST databases.

Before gene analysis, repeat sequences were removed using RepeatMasker (Arian Smit and Paul Green, unpublished data). Sequence data were organized and initially analyzed using WebBlast [18]. Sequence contigs were subsequently analyzed with several gene prediction programs, including GRAIL, GENSCAN, MZEF, and FGENES [23–26] using an analysis workbench called GeneMachine (http://genome. nhgri.nih.gov/genemachine). All BLAST information as well as exon prediction results were viewed and annotated using Sequin (http://www.ncbi.nlm.nih.gov/Sequin). Several known genes and novel transcripts were identified by sequencing. Annotation of the current X chromosome contigs has resulted in the precise localization of 57 genes (http://www.ensembl.org/perl/contigview?chr=X&vc_start=1 38400000&vc_end=153000000&imgmap=1&x=23&y=16), which can now be considered as positional candidates (Table 2).

DISCUSSION Several hereditary disease loci have been shown to map to the Xq26.3–q27.3 interval, including HPCX, ODPF, EBM, ANOP1, and TGCT1. An approximately 9-Mb sequence-ready bacterial clone contig spanning the region between DXS1192 and DXS548 was constructed using a framework YAC contig [17]. BAC and PAC libraries were screened with markers placed

approximately every 85 kb across the region. We identified 407 new STSs by end-sequencing BAC or PAC clones, and these, combined with restriction fingerprinting, were used to assemble a contig with only three small gaps. Before sequencing, a minimal tiling pathway was identified and chimeric clones were excluded from further analysis using metaphase fluorescent in situ hybridization. So far, our sequencing efforts with this contig have yielded 7.7 Mb of finished sequence and 1.5 Mb of draft sequence. Annotation of the region has been carried out as previously described and is available (http://www.sanger.ac.uk and http://www.ensembl.org). Fiber-FISH results indicate that the three gaps in the contig are approximately 150 kb in length, the equivalent of one BAC/PAC clone (Fig. 2). YAC clones exist which cover the remaining gaps, but there does seem to be some riddle involving the regions encompassed by the gaps. The first gap between Chr_Xctg104 and Chr_Xctg100 is covered by YAC clone yWXD310. This is a large clone of ~ 1000 kb and is only singly linked to the contigs. The second (between Chr_Xctg100 and Chr_Xctg23) and third (between Chr_Xctg23 and Chr_Xctg178) gaps seem to be within a duplicated segment. From the Zucchi et al. [17] YAC contig, there is a single YAC clone (yWXD1554) that is ~ 250 kb in length which would span both of these gaps. This clone is also singly linked. The inability to identify multiple YAC clones that cross these gaps in the BAC/PAC contig may be due to either

GENOMICS Vol. 79, Number 1, January 2002 Copyright © 2002 by Academic Press. All rights of reproduction in any form reserved.

47

Article

doi:10.1006/geno.2001.6681, available online at http://www.idealibrary.com on IDEAL

TABLE 3: Minimal tiling path of BACs/PACs for sequencing: relationship between clone name and accession number

Those clones shown in yellow have draft sequence available, and those with no accession number are awaiting sequencing. All other clones are finished. Clones in red have been sequenced by other centers as part of the International Human Genome Mapping Consortium.

an unclonable region or the fact that there is a duplicated segment in this region. We are currently approaching gap-closure through the use of TAR cloning. To get a sense of the overall integrity of our BAC/PAC contig, we compared the sequence derived from it with the available genetic, physical, and radiation hybrid maps with respect to STS marker order. The sequence comparison with the genetic and radiation hybrid maps can be found via http://genome.cse.ucsc.edu/goldenPath/mapPlots. There was no disagreement between the Zucchi et al. [17] YAC

48

contig and our BAC/PAC contig with respect to STS order. There were no major discrepancies between the sequence and the Genethon genetic map, the GM99 GB4 or G3 radiation hybrid maps, and the Whitehead YAC contig. It does seem that in our interval (represented by the 127 Mb to 159 Mb sequence bin in the December 2000 and April 2001 freezes) the genetic maps have exaggerated the chromosome length and most likely represent increased recombination rates in this sub-telomeric region. Additionally, there seems to be an inversion of substantial size (~ 13 Mb) in the Whitehead radiation hybrid map relative to the draft sequence and all other genetic and physical data. A smaller (~ 10 Mb) and partially overlapping inversion is seen in the TNG radiation hybrid map data. Identification of a duplicated segment (DXS1341–DXS1227–DXS7410–DXS7402) in this high-resolution version of the physical map could cause such a finding if it were inverted as well as duplicated. The approximate size of this event (~ 500 kb) does not mimic the larger-sized putative inversion seen when comparing sequence with the RH mapping data. Of interest in the Xq26.3–q27.3 region is the presence of multiple gene family clusters. These family members seem to be functional paralogs in most cases, as opposed to pseudogenes. The gene CXorf1 (at 147.8 Mb) has at least two paralogs, CXorf2 and CXorf6, several megabases in the telomeric direction (154.8 Mb and 151.7 Mb, respectively). We have shown that CXorf2 is highly expressed in the brain and prostate tissue. In addition, there is an entire family of MAGE genes (MAGEC1, MAGEE1, MAGEA4, MAGEA5, MAGEA10, MAGEA2, MAGEA12, MAGEA3, MAGEA6, and MAGEA1), which encode putative cancer antigens expressed in melanoma. We have excluded the MAGE cluster as etiologic of HPC in our X-linked families by mutation screening. There are also at least seven members of the SPANX gene cluster at 143.8 Mb. These genes reside within 20-kb blocks, which are duplicated at extremely high homology across > 1 Mb of genomic DNA near DXS1205. These genes were shown to be expressed in testis and in the spermatozoa. They are also described as “human cancer antigen,” and we have shown expression in prostate tissue (data not shown). Finally, there is evidence at the telomeric end of our region that the gene CTAG is present in two copies and is also expressed in the testis. It is commonly known that the X chromosome is inactivated in female cells to compensate for dosage of gene products, but there may be a more complex evolutionary dosage mechanism that has produced these multiple gene families on Xq. In addition, it has always been unclear how a tumor suppressor gene could be present on the X chromosome in a hereditary cancer (only 1 allele, which would have to be mutated and inherited). The presence of multiple functional gene copies on the same chromosome would make it possible to have one inherited defective copy and have somatic mutations arise in the other copy. The construction and sequencing of this contig has brought us significantly closer to full annotation of this critical region of the genome. Transcriptional mapping has

GENOMICS Vol. 79, Number 1, January 2002 Copyright © 2002 by Academic Press. All rights of reproduction in any form reserved.

doi:10.1006/geno.2001.6681, available online at http://www.idealibrary.com on IDEAL

resulted in the placement of 57 known and predicted genes and many more ESTs by BLAST searching. All of these genes are now being tested for expression in prostate and become candidates for harboring mutations that cause this devastating type of cancer. Supplementary data for this article are available on IDEAL (http://www.idealibrary.com).

MATERIALS AND METHODS PCR amplification. STS primer sequences were synthesized by Life Technologies as described [17]. PCR was carried out using 10 ng of template DNA from the Research Genetics BAC library DNA pools (CTB and CTC libraries) or the Genome Systems PAC library DNA pools (RPCI1 library) with 2.25 mM Mg2+, 10 mM dNTP mix (Gibco BRL), 10 pM each forward and reverse STS primer, PCR buffer II (Perkin-Elmer), and 0.6 units of AmpliTaq Gold Polymerase (Perkin-Elmer) in a 12.5 ml total PCR reaction volume. All PCRs were carried out in MJ 96-well block Tetrad machines (MJ Research) using the following cycling protocol: initial denaturation at 948C for 12 minutes; 948C for 30 seconds, 558C for 30 seconds, 728C for 1 minute for 35 cycles; and with a final extension at 728C for 10 minutes. All PCR products were separated by agarose gel electrophoresis using 2% agarose gels run in 13 TBE buffer (Life Technologies). BAC and PAC library screening. We identified single BAC/PAC clones by screening six libraries with primers derived from the Zucchi et al. [17] map and with EST primers derived from RH mapped EST clusters (UniGene; http://www.ncbi.nlm.nih.gov/UniGene/index.html). Screening was initially carried out in parallel at the NHGRI and at the Sanger Centre. At the NHGRI, the bacterial clone libraries screened were the Research Genetics BAC library (CTB/CTC), the Genome Systems PAC library (RPCI-1), and the RPCI-11 BAC filters (http://www.chori.org/bacpac). For the Research Genetics BAC library, positive pools were identified by electrophoresis of STS amplification products from the master plate clone pool, and then the appropriate sub-pools were screened using the identical PCR protocols (above) to identify positive clones. All positive clones were purchased from Research Genetics as bacterial stabs and plated on LB plates supplemented with 12.5 mg/ml chloramphenicol. Ten single colonies were re-streaked onto LB/chloramphenicol plates and PCRverified with the STS primers. A single positive clone was end-sequenced as described below. PAC clones were identified by screening the Genome Systems PAC library of gridded pools of clones in an identical fashion according to manufacturer’s instructions. Positive clones were purchased from Genome Systems, single-colony verified by PCR, and end-sequenced as below. RPCI11 library filters were screened with pools of STSs. STSs were amplified from control genomic DNA, electrophoresed through 2% agarose, excised from the gel, and purified using the QIAquick Gel Extraction kit (Qiagen). Purified amplicons were radiolabeled using the Random Primers DNA Labeling System (GibcoBRL) and [a-32P]dATP (ICN Biomedicals). Free nucleotides were removed by centrifugation through STE Midi Select-D G-50 columns (5 Prime → 3 Prime Inc.), and the radiolabeled probes were denatured and hybridized to gridded clone filters in Hybrisol I solution (Intergen) at 428C overnight. Blots were washed five times in 13 SSC, 0.1% SDS, and twice at 508C in 0.13 SSC, 0.1% SDS, followed by exposure to X-OMAT AR film (Kodak) overnight with an intensifying screen. Positive clones were identified, plated out from in-house stocks, and single colony purified as above. At the Sanger Centre, the RPCI-1, 3, 4, 5, and 6 PAC libraries and the RPCI11 and 13 BAC libraries (http://www.chori.org/bacpac) were screened by hybridization of radiolabeled PCR products to high-density gridded clone filters. Positive clones were verified by single-colony PCR. All methods are described at http://www.sanger.ac.uk/HGP/methods/mapping. End-walking to close gaps. At NHGRI, DNA for end-sequencing was prepared from 25 ml bacterial cultures using an AutoGen 850 automated DNA isolation system according to the manufacturer’s recommended protocols

Article

(Autogen). Subsequently, the BAC/PAC DNA was resuspended in 600 ml dH20, treated with RNase (Ambion), and purified over a Microcon 100 column (Amicon). BACs were sequenced using M13 forward and reverse primers, and PACs were sequenced using T7 and SP6 primers. Sequencing reactions were set up using the BigDye Terminator Chemistry (Perkin-Elmer) as follows: 500 ng BAC DNA, 10 pmol primer, and 16 ml Big Dye Terminator Reagent Mix in a 40 ml total reaction volume. Cycle sequencing was performed in MJ Tetrad Thermocyclers using the following cycling conditions: 958C for 5 minutes, followed by 35 cycles of 958C for 30 seconds, 508C for 20 seconds, and 608C for 4 minutes. Free fluorescent nucleotides were removed using CentriSep columns according to the manufacturer’s recommendations (Princeton Separations). The PCR products were dried in a Speed Vac (Savant), re-dissolved/denatured in 3 ml loading buffer (95% Formamide/50mM EDTA) at 908C for 3 minutes, and analyzed with an Applied Biosystems 377 XL automated DNA sequencer (Perkin-Elmer). Gel files were tracked and analyzed using Applied Biosystems DNA Analysis Sequencing Software 3.2 (Perkin-Elmer). For STSs generated at the Sanger Centre, detailed protocols for endsequencing can be found at http://www.sanger.ac.uk/HGP/methods/mapping/chr_walking. Following repeat masking (RepeatMasker, http://ftp.genome.washington.edu/RM/RepeatMasker.html), end-STS primers were designed using PRIMER [27] (Table 3). These novel end-STSs were used to identify overlapping BAC/PAC clones by PCR or hybridization as described above. Contig assembly and verification of chromosomal location. BAC and PAC clones were restriction fingerprinted as described [28] and assembled into contigs using IMAGE (http://www.sanger.ac.uk/Software/Image) and FPC [29] (http://www.sanger.ac.uk/Software/fpc). Landmark content of clones was also considered during contig assembly. Full details of protocols are available at http://www.sanger.ac.uk/HGP/methods/mapping. Additional clones in areas of low-fold coverage were identified by searching the Washington University restriction fingerprint database [30] (http://genome.wustl.edu/gsc/ human/human_database.shtml) and were incorporated into the contig by fingerprinting. From the contig map, a minimal tiling set of clones was selected for genomic sequencing (below). These clones were also used for fluorescence in situ hybridization (FISH) to control male metaphase chromosome spreads to confirm chromosome localization and to exclude clone chimerism. Detailed methods are described at http://www.sanger.ac.uk/HGP/methods/cytogenetics. Briefly, clone DNA was isolated using a standard alkaline lysis protocol (see web site). DNA was labeled with biotin-16-dUTP or digoxigenin-11dUTP (Boehringer-Mannheim) by nick translation, and the probes were hybridized to either metaphase spreads or extended interphase fibers from a normal male lymphoblastoid cell line. These probes were detected by either Texas-red-conjugated avidin or FITC-conjugated anti-digoxigenin, respectively, and the slides were counterstained with DAPI. Any gaps in the contig map were sized by fiber-FISH. Briefly, fluorescently labeled BAC/PAC clones were hybridized to stretched DNA fibers, and the length of each gap was estimated by comparing the length with those of the clones used for hybridization. Detailed protocols can be found at http://www.sanger.ac.uk/HGP/methods/cytogenetics. BAC/PAC insert sequencing. BAC and PAC inserts were sequenced as described [20]. Clone names and GenBank accession numbers. RP1-36J3, Z82975; RP3-523M5, AL035262; RP1-93C23, AL008713; RP6-152C18, AL096887; RP6-88D7, AL033403; RP11-197K18, AL161777; RP13-206I21, AL356785; RP11-35F15, AL590077; RP11-364B14, AL589987; RP5-914F23, AL138892; RP11-189F23, AL449183; bWXD13, AC004070; bWXD105, AC002412; bWXD90, AC004075; RP11-359I11, AL137014; RP11-51C14, AL121875; RP4-595A18, AL137016; RP1177G6, AL078639; RP11-338I3, AL359393; RP11-298A8, AL451048; RP1-171K16, AL121881; GS1-164F24, AL050308; RP11-518F7, AL133273; GS1-82I1, AL109799; RP3-507I15, Z98950; RP3-376H23, AL031078; RP3-433M19, Z95703; RP51178I21, AL109852; RP4-552K20, AL049177; RP3-326L12, AL023279; RP6232G24, AL022152; RP1-85A12, AL049858; RP3-406C18, AL023773; RP1-142F18, AL031073; GS1-54N10, AL109620; RP3-324C6, AL031586; RP1-73H14, AL080272; RP11-485I9, AL449265; RP1-48G12, AL031054; RP11-241O12, AL137840; RP3-357K22, AL022720; GS1-91O18, AL080238; RP1-231L4, AL022719; RP5-1189K21, AL030997; GS1-256O22, AL080239; RP11-29O6,

GENOMICS Vol. 79, Number 1, January 2002 Copyright © 2002 by Academic Press. All rights of reproduction in any form reserved.

49

Article

doi:10.1006/geno.2001.6681, available online at http://www.idealibrary.com on IDEAL

AL500522; RP3-526F5, AL109622; RP11-514L15, AL135920; RP11-319M16, AL590424; RP1-51J23, AL031312; RP1-110C15, AL138969; RP11-239L17, AL359884; RP1-169P22, AL049588; RP11-449O17, AL512285; RP1-29A6, AL133546; RP1-145B12, AL008706; RP11-480M11, AL159988; RP4-581F7, AL022164; RP11-570O20, AL354752; GS1-103B18, AL139112; RP1-315J21, AL356499; RP11-571O22, AC019230; RP13-111A12, AL391360; RP13-150K15, AL391256; RP11-550B3, AL589671; GS1-115M3, AL109653; RP11-36B11, AL589680; RP11-269F10, AL445258; RP11-387H19, AL358174; RP13-159A24, AL356503; RP13-5C2, AL358052; GS1-278N14, AL109654; RP11-183K14, AL109913; RP11-79A21, AL513491; RP11-243C2, AL109836; RP1-203P18, Z97180; GS1-152G24, AL356286; RP1-73A14, Z99497; RP11-522P6, AL589706; RP11-226B15, AL589669; RP13-485J5, AL591O22; RP11-42J13, AL137841; RP5824H1, AL096861; RP5-1056D13, AL450486; RP3-433G13, AL009048; RP6244C24, AC007538 Clones that have not yet been assigned accession numbers are RP11-963P9, RP11-206A13, and RP13-559O21.

ACKNOWLEDGMENTS We thank Darryl Leja (Scientific Illustration Unit, National Human Genome Institute, NIH) for assistance with the preparation of figures, and Lucia Susani (Istituto di Tecnologie Biomediche, Consiglio Nazionale delle Ricerche, Milan, Italy) for assistance and technical support This research is supported in part by grants from the Academy of Finland, Albert Einstein College of Medicine (R.D.B.), the Wellcome Trust, the Dr. Louis Sklarow Memorial Fund (R.D.B.), and a grant from the Department of the Army (DAMD17-01-1-0014). This is manuscript number 58 of the project Genoma 2000/ITBA funded by Cariplo. RECEIVED FOR PUBLICATION JULY 13; ACCEPTED OCTOBER 31, 2001.

REFERENCES 1. Smith, J. R., et al. (1996). Major susceptibility locus for prostate cancer on chromosome 1 suggested by a genome-wide search. Science 274: 1371–1374. 2. Berthon, P., et al. (1998). Predisposing gene for early-onset prostate cancer, localized on chromosome 1q42.2-43. Am. J. Hum. Genet. 62: 1416–1424. 3. Paris, P. L., et al. (2000). Identification and fine-mapping of a region showing high frequency of allelic imbalance on chromosome 16q23.2 that corresponds to a prostate cancer susceptibility locus. Cancer Res. 60: 3645–3649. 4. Berry, R., et al. (2000). Evidence for a prostate cancer-susceptibility locus on chromsome 20. Am. J. Hum. Genet. 67: 82–91. 5. Xu, J., et al. (1998). Evidence for a prostate cancer susceptibility locus on the X chromosome. Nat. Genet. 20: 175–179. 6. Lange, E. M., et al. (1999). Linkage analysis of 153 prostate cancer families over a 30-cM region containing the putative susceptibility locus HPCX. Clin. Cancer Res. 5: 4013–4020. 7. Peters, M. A., et al. (2001). Genetic linkage analysis of prostate cancer families to Xq2728. Hum. Hered. 15: 107–113. 8. Tavtigian, S. V., et al. (2001). A candidate prostate cancer susceptibility gene at chromo-

50

some 17p. Nat. Genet. 27: 172–180. 9. Xu, J., et al. (2001). Evaluation of linkage and association of HPC2/ELAC2 in patients with familial or sporadic prostate cancer. Am. J. Hum. Genet. 68: 901–911. 10. Vesprini, D., et al. (2001). HPC2 variants and screen-detected prostate cancer. Am. J. Hum. Genet. 68: 912–917. 11. Litvak, S., Zukas, H., and Heumann, J. E. (1987). Attending to America: personal assistance for independent living. Report of the National Survey of Attendant Care Programs in the United States. World Institute on Disability, Berkeley, CA. 12. Shiloh, Y., et al. (1990). Genetic mapping of X-linked albinism-deafness syndrome (ADFN) to Xq26.3-q27.1. Am. J. Hum. Genet. 47: 20–27. 13. Zlotogora, J. (1995). X-linked albinism-deafness syndrome and Waardenburg syndrome type II: a hypothesis. Am. J. Med. Genet. 59: 386–387. 14. Graham, C. A., et al. (1988). Linkage analysis in a family with X-linked anophthalmos. J. Med. Genet. 25: 643. 15. Graham, C. A., Redmond, R. M., and Nevin, N. C. (1991). X-linked clinical anophthalmos: localization of the gene to Xq27-Xq28. Ophthal. Paediat. Genet. 12: 43–48. 16. Rapley, E. A., et al. (2000). Localization to Xq27 of a susceptibility gene for testicular germcell tumours. Nat. Genet. 24: 197–200. 17. Zucchi, I., et al. (1996). YAC/STS map across 12 Mb of Xq27 at 25-kb resolution, merging Xq26-qter. Genomics 34: 42–54. 18. Ferlanti, E. S., Ryan, J. F., Makalowska, I., and Baxevanis, A. D. (1999). WebBLAST 2.0: an integrated solution for organizing and analyzing sequence data. Bioinformatics 15: 422–423. 19. Deloukas, P., et al. (1998). A physical map of 30,000 human genes. Science 282: 744–746. 20. International Human Genome Sequencing Consortium. (2001). Initial sequencing and analysis of the human genome. Nature 409: 860–921. 21. Altschul, S. F., et al. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25: 3389–3402. 22. Zhang, J., and Madden, T. L. (1997). PowerBLAST: a new network BLAST application for interactive or automated sequence analysis and annotation. Genome Res. 7: 649–656. 23. Guan, X., Mural, R. J., Einstein, J. R., Mann, R. C., and Uberbacher, E. C. (1992). GRAIL: An Integrated Artificial Intelligence System for Gene Recognition and Interpretation. Proc., The Eighth IEEE Conference on AI Applications, 9–13. 24. Burge, C., and Karlin, S. (1997). Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268: 78–94. 25. Zhang, M. Q. (1997). Identification of protein coding regions in the human genome based on quadratic discriminant analysis. Proc. Natl. Acad. Sci. USA 94: 565–568. 26. Solovyev, V. V., and Salamov, A. A. (1997). The Gene-Finder computer tools for analysis of human and model organisms genome sequences. Proceedings of the Fifth International Conference on Intelligent Systems for Molecular Biology, Halkidiki, Greece, AAAI Press, 294–302. 27. Lincoln, S. E., Daly, M. J., and Lander, S. E. (1991). PRIMER: A computer program for automatically selecting PCR primers. Available at http://www.genome.wi.mit.edu/ftp/distribution/software/primer.0.5, and via anonymous ftp to ftp-genome.wi.mit.edu, directory /pub/software/primer.0.5. 28. Gregory, S. G., Howell, G. R., and Bentley, D. R. (1997). Genome mapping by fluorescent fingerprinting. Genome Res. 7: 1162–1168. 29. Soderlund, C., Longden, I., and Mott, R. (1997). FPC: a system for building contigs from restriction fingerprinted clones. CABIOS 13: 523–535. 30. International Human Genome Mapping Consortium. (2001). A physical map of the human genome. Nature 409: 934–941.

GENOMICS Vol. 79, Number 1, January 2002 Copyright © 2002 by Academic Press. All rights of reproduction in any form reserved.