GENOMICS 1 4 , 2 4 1 - 2 4 8 (1992)
Mapping Human Chromosomes by Walking with Sequence-Tagged Sites from End Fragments of Yeast Artificial Chromosome Inserts JUHA KERE,* RAMAIAH NAGARAJA,* STEVENMUMM,* ALFREDOClCCODICOLA,~ MICHELED'URso,t AND DAVIDSCHLESSINGER* *Department of Molecular Microbiology, Washington University School of Medicine, St. Louis, Missouri; and t lnternational Institute of Genetics and Biophysics, Naples, Italy Received March 31, 1992; revised June 26, 1992
S e q u e n c e - t a g g e d sites (STSs) d e r i v e d from end fragments of c h r o m o s o m e - s p e c i f i c y e a s t artificial c h r o m o s o m e s (YACs) can facilitate the a s s e m b l y of an o v e r l a p p i n g Y A C / S T S map. Contigs form rapidly b y iterativ e l y s c r e e n i n g YAC collections w i t h e n d - f r a g m e n t S T S s f r o m YACs that h a v e not yet been detected b y a n y p r e v i o u s S T S . T h e map is r e n d e r e d rapidly useful during its a s s e m b l y b y i n c o r p o r a t i n g s u p p l e m e n t a r y S T S s from g e n e s and genetic l i n k a g e probes w i t h k n o w n locations. Methods for the s y s t e m a t i c d e v e l o p m e n t and testing of the e n d - f r a g m e n t S T S s are g i v e n here, and a group of 1 0 0 S T S s is p r e s e n t e d for the X c h r o m o s o m e . T h e m a p p i n g s t r a t e g y is s h o w n to be successful in simulations w i t h portions of the X c h r o m o s o m e a l r e a d y l a r g e l y m a p p e d into o v e r l a p p i n g YACs b y other means.
© 1 9 9 2 Academic Press,
Inc.
INTRODUCTION Overlapping yeast artificial chromosomes (YACs) provide a route to the development of physical maps of entire human chromosomes. In many formulations, the maps are formatted with sequence-tagged sites (STSs; Olson et al., 1989), detected by primer pairs yielding unique PCR products, spaced on the average every 100 kb. In the initial formulations of "STS content mapping," emphasis was placed on the use of large collections of YACs made from total human genomic DNA. For each chromosome, a group of corresponding STSs is independently assembled to identify the cognate YACs. DNA sources for chromosome-specific STSs include flow-sorted chromosomes, hybrid cells containing only one human chromosome, and microdissected chromatin. PCR (for example with A l u primers) (Cole et al., 1991) as well as direct cloning (Green et al., 1991) can be used to isolate sequencing templates. During map assembly, YACs are overlapped with others into growing contigs by their common content of STSs, with screenings gradually shifting from the use of random STSs to the use of surrogate or true ends of YAC contigs to achieve long-range coverage of a region 241
(Green and Green, 1991). In a positive control case, a set of YACs has been assembled across the cystic fibrosis transmembrane regulator region, new STSs have been developed for chromosome 7 by several methods, and simulations that support the feasibility of the approach have been carried out (Green and Green, 1991). Attractive features of STS content mapping can be extended if instead of a library of YACs made from total human DNA, the collection is specific for the chromosome or chromosomal region of interest. One can then make STSs systematically from the ends of YACs to facilitate the assembly of overlapping clones. As discussed by Palazzolo et al. (1991), insert end-fragment STSs provide more complete coverage with less screening activity than a random marker strategy (Barillot et al., 1991). Here, that approach is incorporated into a strategy to yield a physical map with efficient STS production and less screening activity. With this strategy, the STSs need not be binned to specific cytogenetic localizations at the start. Rather, localization occurs subsequently, as STSs derived from YACs are supplemented with STSs from genes and linkage mapping probes with defined locations along the chromosome. To implement such a strategy, we describe here (1) a way to assemble a minimum collection of YAC libraries adequate for the mapping of the X chromosome and partially adequate for other chromosomes as well; (2) an efficient protocol for the recovery of YAC insert end fragments; (3) a set of 100 STSs specific for the X chromosome, including samples derived from end fragments; and (4) an outline of an efficient method for recovering individual YACs cognate for each STS by a completely PCR-based screening method. MATERIALS
AND
METHODS
Enriched YAC libraries for X chromosome mapping. Y A C s were isolated f r o m h u m a n X q 2 4 - q t e r by m a k i n g t h e m f r o m a h a m s t e r / h u m a n s o m a t i c cell hybrid, X3000.11 ( N u s s b a u m et al., 1986), c o n t a i n ing t h a t p o r t i o n of t h e X c h r o m o s o m e as its only c o n t e n t of h u m a n D N A . T h i s is referred to as t h e "X3000 library." T h e 820 clones in a t h r e e - g e n o m e - e q u i v a l e n t library, w i t h a n average size of 250 kb, h a v e b e e n a s s e m b l e d into overlapping Y A C coverage of a b o u t 50 M b in t h a t 0888-7543/92 $5.00 Copyright © 1992 by Academic Press, Inc. All rights of reproduction in any form reserved.
242
K E R E E T AL.
region of the X chromosome by the application of a number of techniques, although end fragments were used only secondarily to extend contigs (Schlessinger et al., 1991). A YAC library made from h u m a n Xpter-q27 in a hamster cell line (Lee et al., 1992) was a kind gift from Dr. R. Nussbaum and consists of 2500 clones of an estimated average size of 250 kb. This library is referred to as the "X-only library." Both libraries made from hybrid cells show relatively low levels of c h i m e r i s m - - o n the order of 15% of the clones. A third YAC library of 20,000 clones was constructed from a 4 9 , X X X X X cell line (GM06061B, obtained from N I G M S H u m a n Genetic M u t a n t Cell Repository) essentially according to published procedures (Imai and Olson, 1990; Brownstein et al., manuscript in preparation). This library is referred to here as the "5X library." Sources of S T S sequences and S T S development. Sequences for STS development were derived mainly from four sources. (1) A plasmid library was made from flow-sorted X chromosomes from a 48,XXXX cell line, and clones were picked at random for sequencing (Ciccodicola et al., in preparation). (2) Published sequences specific for h u m a n X chromosomal genes and other DNA segments were retrieved from GenBank, and 100-500 bp at the 5' or 3' ends of the sequences were considered for primer pair design; exon sequences were used when possible. (3) Probes with assigned DXS numbers were obtained from investigators or from ATCC and partially sequenced. (4) Insert end segments from X chromosomal YACs from the X-only and Xq libraries were recovered by a ligation-mediated PCR method, as described in detail below. Two computer programs were employed in STS development. FASTA (Pearson and Lipman, 1988) was used to identify repeated sequences, which were then excluded from STS development, and OSP (Hillier and Green, 1991) was used to assist in primer selection for a given sequence. Primers were synthesized using an Applied Biosystems DNA synthesizer. Each primer pair was tested using a panel of DNA templates (Fig. 3) and the conditions of PCR were optimized by choosing among three alternative buffers, containing 1.5 m M MgC12, 5 m M NH4C1, 10 m M Tris-HC1, pH 8.6, and 100 raM (TNK100), 50 m M (TNK50), or 25 m M (TNK25) KC1 (V. Nowotny et al., work in progress). The testing templates included yeast genomic DNA, YAC pools, h u m a n genomic DNA, DNAs from hybrid cells containing a single h u m a n chromosome X or 7 (GM06318B and GM10791 from NIGMS), and X3000.11 DNA (Nussbaum et al., 1986). Temperature cycling conditions in P e r k i n - E l m e r thermocyclers were identical for most primer pairs (94°C for i min, 55°C for 2 min, 72°C for 2 min for 35 cycles followed by 72°C for 7 min). Y A C insert-end recovery. Agarose beads or slices of plugs containing yeast DNA from a single clone were digested with RsaI, AluI, PvuII, EcoRV, or ScaI in 15-td reactions. Ligation buffer, linker (Mueller and Wold, 1989) (2 pmol), and T4 DNA ligase (1-2 U) were added and the reactions were continued for at least i h at room temperature. One-half to two microliters of the ligated mixture was amplified by PCR in 10-#l reactions using the linker primer and a YAC vector arm-specific primer (5 pmol each or 10 pmol of vector-specific primer and 1 pmol of linker primer). Sequences of all primers and oligonucleotides employed are presented in Table 1. Temperature cycling conditions were 94°C for 1 min, 65°C for 2 min, and 72°C for 2 min for 35 cycles in TNK50 buffer. One microliter of the reaction mixture was diluted to 100-500 ttl with water and reamplified using linker primer and an internal YAC vector arm-specific primer under identical conditions. Alternatively, primary P C R products were separated in a 1% low-melting-point agarose gel containing ethidium bromide, and the predominant PCR product was excised for secondary amplification. The reamplification products were purified on a low-melting-point agarose gel, excised, and sequenced. Direct sequencing of gel-purified P C R products was carried out by a modified dideoxynucleotide method using *2P-end-labeled primers in temperature-cycled, Taq polymerase-catalyzed reactions essentially as described (Srivastava et al., 1992). PCR-based screening for YACs. The X-only and 5X libraries were set up for entirely PCR-based screening (Brownstein et al., in prepara-
tion). In brief, DNA was extracted from YAC clones pooled in groups from microtiter tray arrays to yield a screening tree such t h a t the most complex pools each contained about one-third of a genomic equivalent of X-specific DNA. Positive signals in any of those pools are followed by screening subpools of decreasing complexity, with the number of clones tested decreasing in each subpool by a factor of 2 to 4 (Brownstein et al., manuscript in preparation). Each screening then requires on the order of 150 P C R assays to recover five YACs cognate for an STS.
DESIGN OF MAPPING
FOR THE X CHROMOSOME
Detailed strategy. The STS content mapping proceeds in three phases (cf. Green and Green, 1991). In the first, end-fragment STSs from YACs in the X-only library are obtained and used to screen both X-only and 5X libraries for cognate clones. Screening continues with end fragments made from the next YACs in the X-only library that have not yet been detected by a previously isolated end-fragment STS. The second phase of the work begins when all the YACs in the X-only library have been used to derive STSs or detected by at least one STS. At that point, many contigs will have been assembled, and the terminal YACs in a contig become the next ones used for endfragment isolation. Contigs overlapping only by a portion of the outermost YACs are thus identified and fused. The identification of terminal YACs will require initial ordering of the YACs on the basis of their STS content. At the beginning of the third phase, the assembled contigs should include essentially all the YACs in both X-only and 5X libraries. Any end-fragment STSs that have failed to find further YACs are "gaps" at which YACs containing the neighboring DNA are missing from the collection. This is often because they are statistically not represented in the YACs (Schlessinger et al., 1991). Such sequences can then be sought in cosmid or other YAC libraries, and techniques like in situ hybridization or radiation hybrid panel mapping can be applied to align and orient contigs and get information about the size and content of remaining gaps. To cover most of a chromosome of 150 Mb, about three genomic equivalents of DNA, or about 1800 YACs of 250 kb average size, are sufficient (cf. Schlessinger et al., 1991). The efficiency of the mapping will of course depend on available YAC technology. For example, newer libraries include more YACs in the megabase size range. The larger YACs can then provide more rapid initial coverage of a chromosomal region, whereas clones of 300 kb or less are easier to analyze further, tend to include fewer chimeras (Lee et al., 1992), and provide more refined localizations of end-fragment STSs. It is not necessary to have a collection of YACs that is entirely from the chromosome of interest, as long as there is an adequately large group of chromosome-specific YACs. For example, to generate an adequate group of "seed YACs" for the entire X chromosome, the 2500 X-specific YACs of the X-specific library covering
MAPPING BY YAC END STSs Xpter-q27 (Lee et al., 1992) are sized, and 500 that are 300 kb or more in length, containing one equivalent of chromosome DNA content, are selected. The STSs are developed from end fragments of all of the seed YACs as rapidly as possible, since all of them will ultimately be employed systematically in formatting the map. For screening, the seed YACs are supplemented with the Xenriched 5X genomic library, and for the third phase, with other libraries. With this approach, most screening of already assembled contigs can be spared. For the Xq24-qter region, for example, STSs t h a t identify the PCR pool of the earlier 820 clones already mapped can be held aside and if desired can be separately and rapidly assigned to their locations by a small number of assays across the alreadyconstructed contigs without screening the 5X library. Possible advantages of S T S s from YAC end fragments. The approach is based on prospective advantages of end fragments as a source of STSs. One advantage of such STSs is their precisely defined position along the YAC of origin, facilitating the accurate formatting of the final map and orienting YACs. To ensure t h a t each round of screening on average reaches the greatest possible distance, each STS is made from the end of a YAC at least 300 kb in length. Moreover, during the process of STS development, end clones that are not from the X chromosome will be identified, so t h a t chimeric YACs will be scored automatically as the nascent map is assembled. In the mapping process itself, there are two major additional advantages. One accrues because the STSs used for screening are selected from YACs t h a t have not been detected by previous STSs. Thus, by design, new STSs reach out to new YACs not already identified. This feature of the proposed scheme avoids the problem of mapping with large numbers of random probes, in which YACs already identified with previous STSs must be recursively screened with STSs developed later, and new probes tend more and more to fall into already mapped regions. The second advantage is that the development of more STSs to reach the proposed 100-kb average spacing becomes very efficient. Long-range coverage in overlapping YACs is achieved relatively rapidly. Additional STSs from end fragments of other YACs already in contigs can then be positioned by screening only a local cluster of YACs.
RESULTS AND DISCUSSION
Systematic End-Fragment Recovery and S T S Development from YACs To generate YAC insert ends for STS development, we used a modification of the primer-ligation method of Mueller and Wold (1989), which we have found to produce a higher rate of success with YACs and to be less
243
complex than other primer-ligation methods (Riley et al., 1990; Rosenthal and Jones, 1990; Lagerstrom et al., 1991). Figure 1 shows results with a set of clones from the X3000.11 library. All the clones were larger than 250 kb and were known to lie in Xq24-q26.1, but had not yet been further localized in contigs. The experiment was performed with 48 clones, using two enzymes (RsaI and AluI) to digest the DNA. Data are shown for 14 clones cut with RsaI; for each, reaction mixtures were prepared with a linker primer and vector primers from the left (L) or the right (R) YAC vector arm. In this group of clones, 12 yielded a unique L and 11 a unique R end fragment. In six cases, more t h a n one product band was seen, but four of these represent cases in which restriction digestion proved to be incomplete, and the other two are clones that contained two YACs--one of which included hamster DNA. Thus, all six cases gave an unambiguous end fragment from the YAC of interest. Fragments with greater t h a n 200 bp of unique sequence are suitable for the generation of STSs. In essentially all cases in which no product was seen or products less than 200 bp were recovered, a second enzyme gave a usable product. Up to 30% of the fragments contained repetitive sequence tracts or hamster DNA at one terminus, but more than 90% of the single-copy human sequences yielded STSs suitable for screening YAC libraries. The production of ends is exemplified in Fig. 2 for two YACs digested with four enzymes. For largescale mapping, one person can process tens of clones per day and can recover and sequence on the order of 100 end fragments in a week. Efficiency can be further increased by using microtiter plates and tools to process 96 clones at a time (work in progress). To account for the "loss" of ends because of their repeated nature or chimerism, a correspondingly larger number of seed YACs need to be used for STS development. This will reduce the initial contig size slightly, but will give detailed information about chimeric clones, and the estimation of contig sizes will become more accurate when the positions of end STSs within YACs are known. With STS content mapping, very few auxiliary activities are needed to confirm the overlaps due to the high specificity of the initial mapping methods (Green and Olson, 1990). To verify that the STSs come from the putative chromosome of origin, initial testing is carried out by PCR with a panel of DNAs analogous to t h a t of Green et al. (1991) for chromosome 7 STSs. An example is shown in Fig. 3. Each STS must amplify the product of appropriate size from total human DNA, from a pool of anonymous YACs supplemented with human DNA, and from DNA of a somatic cell hybrid containing only the human X chromosome, and it should give no corresponding signal from yeast DNA, a pool of random YACs, or DNA of a hybrid cell containing human chromosome 7. An additional reaction is run with X3000.11 DNA to see if the STS falls into the region already mapped with YACs (Schlessinger et al., 1991); if so, screening for further
244
KERE E T AL.
TABLE
1
One Hundred X-Chromosome-Specific STS
HGM
Localization
Primer A
STS Primer B
Size
Buffer
CTTGTCATAATACAGAGGG CAGAAAGAAGATATTGCTGG CTGACCATACACATAATCC TTCTCCCCAAATAAATCCC GGCAGATATGAAACTGAGG CAGCGAGAGTTAGTGAGG GTGATTGAGAATGAATGGG CAGATAGTTCTTTATAGCAGTGCG GAACTGACTGTAGAGAAGG AACACATCTCAGACATCC GAGAATCTTCTGTCTAGG AAGACCTACCAAAGCTCC GCTATGGTGTCATTACTTTTGG GCAATTCAAGGAACATAACTGG CCACTGTGCTATACTGCC CTGGGCTCTTGGCTAAGG TTTCTCTCAGCACATTTATCC ACAATCCAGCAGAATGGGG CAAAGAGACCAAATCTATAAC GTAAGCATTTCTAGTGTTCC TGCTTTCATTATGTGTCTAATAG CTGTTTTGTGTTCACATGG CCAACTCAACTCCTACAC ATTCTAATCCCACTATTAC ATTTGGGGTCATTAAAATC AAAGCTCCATTGATGTTTGTTTCC GCATGTTCTACCTAGAGCC CAAGGGTAAAGGTTCTTAGATGGG CAGAGGCTACTTTCAGCCAC AAATTCAGAGCAGCAGTAG GATCAGGGAAAGTTTCACAG AATCTGCACATAGAAGCTAAAG CTTACTCATTTGCTGGATTCTC AGGGTTCCTGGCTCCAAAC TCAGCCAGCCCCAAATGAAAAG GAGACCCCTCCTAAACCTG CCAATTCCTCAGTTCACAGATG TGGGCAACACATTGTATTTAGC AACCAATACCCCAAAATGAAGAAC GATTTTCAGTGGATAACAATGTAG TTTAGTGATAACCAACTCTTGTCC GCTTAGAGTTAGACACACC TTAGACTCCTTTGAGAATCTACTG TGGACAACAGAGTGAGATG GAAAGACCTTTCCTTATTTCATAC TCCCTTCCTTGTGTTCTCC TATCAAGAGAAAAGACCATCACAG AATCATCTCTAATAACTCC ACCTTTAGTTAGATTGATGAAGCC AAACAATACCCCAAAATGAAGAAC CCTAACAAAATACAAATGATCTCC ACTTGACTTTGCTTTTTGCTTTTG GATTTTCAGATGACTCGGG CCAAACTCCCCAACAAGAC CAATCTGCACATAGAACGTAAAG GCTGTGTGTATCCCACCTG GTCTCCTCTCACAAGTTCC CGTTCTCAAGCTGAAATGCAC TGCCAAATCCAAGGTAATGAAG CATCCCAGTCATTTTAGACC TGGCAACAATTTCCCAATAAGAAC
143 102 130 60 128 137 106 102 145 160 122 142 68 79 100 73 60 60 94 82 69 114 229 93 61 133 93 132 80 108 108 83 85 82 177 85 67 105 82 67 92 73 72 73 67 82 87 84 82 84 65 119 94 94 84 102 167 63 70 110 111
TNK100 TNK100 TNK100 TNK100 TNK100 TNK100 TNK100 TNK100 TNK100 TNK100 TNK100 TNK100 TNK100 TNK100 TNK100 TRK100 TNK100 TNK100 TNK100 TNK100 TNK100 TNK100 TNK100 TNK50 TNK50 TNK100 TNK50 TNK100 TNK100 TNK100 TNK100 TNK50 TNK100 TNK50 TNK50 TNK100 TNK100 TNK100 TNK100 TNK100 TNK100 TNK100 TNK100 TNK100 TNK50 TNK50 TNK100 TNK100 TNK100 TNK100 TNK50 TNK100 TNK50 TNK100 TNK25 TNK100 TNK100 TNK100 TNK100 TNK100 TNK50
Random STSs sWXD86 sWXD93 sWXD94 sWXD95 sWXD96 sWXD97 sWXD98 sWXD99 sWXD100 sWXD101 sWXD 102 sWXDll5 sWXDll6 sWXD 117 sWXDll8 sWXD119 sWXD121 sWXD122 sWXD123 sWXD124 sWXD127 sWXD128 sWXD 129 sWXD131 sWXD145 sWXD146 sWXD147 sWXD148 sWXD149 sWXD150 sWXD151 sWXD152 sWXD153 sWXD159 sWXD160 sWXD161 sWXD162 sWXD163 sWXD164 sWXD165 sWXD166 sWXD167 sWXD168 sWXD169 sWXD170 sWXD175 sWXD 176 sWXD177 sWXD178 sWXD179 sWXD180 sWXD181 sWXD182 sWXD183 sWXD 184 sWXD185 sWXD186 sWXD 187 sWXD190 sWXD191 sWXD 192
Xpter-q24 Xpter-qter Xpter-qter Xpter-qter Xpter-qter Xpter-qter Xpter-qter Xpter-qter Xpter-qter Xpter-qter Xpter-qter Xpter-qter Xpter-qter Xpter-qter Xpter-qter Xpter-qter Xpter-qter Xpter-qter Xpter-qter Xpter-qter Xpter-qter Xpter-qter Xpter-qter Xq24-qter Xq24-qter Xq24-qter Xq24-qter Xpter-q24 Xpter-q24 Xpter-q24 Xq24-qter Xpter-q24 Xpter-q24 Xq24-qter Xpter-q24 Xpter-q24 Xq24-qter Xpter-q24 Xq24-qter Xpter-q24 Xpter-q24 Xpter-q24 Xq24-qter Xpter-q24 Xpter-q24 Xq24-qter Xpter-q24 Xq24-qter Xpter-q24 Xq24-qter Xq24-qter Xpter-q24 Xpter-q24 Xpter-q24 Xpter-q24 Xpter-q24 Xpter-q24 Xpter-q24 Xpter-q24 Xq24-qter Xpter-q24
AGGAAAAGATGCAATGTGG CATAGAACAAGCAGAAGG CAAAACTTTCCTACCTACC AATTTAGGCAAGAGCAGC ATCGTGCTGCTGTACTCC GGAGGGAAGAAGAGAGGG CAACTGGGATAAGTCACC CCCTTCACTCACCTTCCC CGTGCTTAGGCTTAATCCCC GAAATTCTTCACTACCTCC CTTTGATAGTTCAGGTTTGC GCTGTAGATTCACTTTCG ACTCTTCATATCCTAATCCC CATTTTGTAGCTGAGAAAGG CTCTTTTCCTTAATCCAACCC GATCAACACGGCTCTCGG TCCTTTTATCCCCATATTTC TACTAATTCCTTGTCTGATGGG CTAGGGAGCTAATTTCAC CAACTGTAAAATGGTATCGG GAATCCAAACATCAGAAATC CTAAGAATAAGGGCTTTTCC TCCGCTTCTCCATTATGC TCACTGGCTATACAAATAC TTAACTCTTTATGGAGTTG CCACCTCCAAAATTCAATCC GAATGACAAGTCAAGTCAAACC GAATGGAGTGTTTCAAAGGGG ACTGGACCCCCACTCCTTC CAAAACTTCTCAATATCCAAATC ACCCTCTCTCCTAACCATC GCAAAATGATTCAAATGCTTGG TAAAGGGATCGCCAAGGAC GCTCAGAATCCTGCGGTTG GTGACCAATGAAAAGTAGGAGAG GATCTGCACCCTGTAAACC CTCTCAAGTCTTCAATAACCTC CAGACACTCATTTTGAAAATTCAG GAGAAGAGACGAGAGACAG GGGGGGGATGTAAAAGTATAG TGGCAGTGTGTATATTTAGCAG CCAAGTTGAAGAACCACTG GCCATTTTGTGTGATTTTGTTTTC TTAAAAAATAGACTGCCAAGTG GAGCTAAAAGACTCTGAAATCC CCCCAAAGTTAATCTCCTC TTACCAGTAGAGAGCACAG ATGTCAAGTTAAGTAAGTC GTTAATAGTAATGTCCTCTCTTTC GAGAATGAGACTGAGAGAC TAGTTGTCTGTGTTACCTTCC GATTGAACCCTTTGGAGAC TGTTGCATAGAATGCGTGG GTGGAGGTTTGAACTACAG GCAAAATGATTCAAATGCTTGG GCCTTTGTCCCTTAACCTC ATCAGCTAAATGTCAGCCC TCCAATTACAGAAGCAAGACC GTAAGGGCTAAAACTGGAAAAC TAGTTCAGTGTCCAGAGCC TCATATATAGGTCAGACTCCAC
MAPPING BY YAC END STSs
245
T A B L E 1--Continued STS
HGM
Localization
Primer A
Primer B
Size
Buffer
Gene-specific STSs sWXD5 sWXD92 sWXD103 sWXD104 sWXD105 sWXD106 sWXD107 sWXD108 sWXD109 sWXDll0 sWXDlll sWXD112 sWXDll3 sWXDll4 sWXD126 sWXD130 sWXD132 sWXD133 sWXD134 sWXD135 sWXD136 sWXD196
CCG1 PRPS2 MIC2 PLP HRASP AMD GF1 POLA GLA TBG L1CAM COL4A5 F9 HPRT RPS4X GAPDP1 BGN MYCL2 "HUMXREP" "HUMP3A" TIMP PDHA1
Xq13 Xp22 Xp22 Xq21-q22 Xpter-q26 Xq22-q28 Xpll Xp22-p21 Xq21-q22 Xq21-q22 Xq28 Xq22 Xq26-q27 Xq26 Xql3-qter Xpter-qter Xq24-qter Xpter-q24 Xpter-q24 Xq24-qter Xpll Xp22
TCAGTAAGTCACTTCTGGGCGAC GCCTACTCTGACTTCTGAC GCTCTATGTTTCCAAGAAG TCTCACTTCATGGCTTCTC CTGAACCACCAGTGCTTCG CTGTATCTGCCTCTATTTC AGCCCAGGTTAATCCCCAG CAGGGAGTTTTGTATCTTC CTAGAGCACTGGACAATGG CAGCGTTTTCATAATGTTGC TGAATACCCTCCCAGGCAC CAGGAGAAAAAGGTAGTAAAGG CTTCAGTACCTTAGAGTTCC AGCTTGCTGGTGAAAAGG CCATTGCTGAAGAGAGAGAC CATAAATGTCACAGTGTAGTGG GGCAACTACAAAAAGTAGAGG GTACATTTTTGTTACAGCAGG AACACGCTCTCCTGGCTAC TCTGCTCAAACTCTGTCGG AGATAGCCTGAATCCTGCC TCTGCTGGGGCACCTGAAG
AAAGGAAACCCGCTAAAGAAAATGG 180 AACCTATCATTGCTGAATACCTAC 70 GTTTACAGCCCTCTGAATG 84 AAGATGTCCTGGAAACTTC 85 CACACCATCACAGACAGCC 167 GTTACTAAAGTTCAGGTTCC 139 TGTGGAGGACACCAGAGCAG 107 CTTTTTCAGTCTTTCTAGGG 83 GTCAAGGTTGCACATGAAG 80 TAATATGGACAGGGAGTAG 93 ATCTTCCCAGGCATTTTAAG 99 TTTTGAGCCCAGAAGATTTG 80 CCATATTTGCCTTTCATTGC 221 TCATTATAGTCAAGGGCATATC 278 AGGGACCCATTTCACCCAC 64 TGTAGAACAGGAGGAGCAG 86 AGGATGTCTGGCTGTGTTC 80 TAAGTCAGGGAAGAGAGAG 129 GCAGAATTATCTCCACCGCTTCAC 118 AGAAGGGAAAGGCTAGGGG 136 CTGGGTGGTAACTCTTTATTTC 106 AGCGGCGACTCCTCACAAC 70
TNK100 TNK50 TNK100 TNK100 TNK100 TNK100 TNK100 TNK100 TNK100 TNK100 TNK100 TNK100 TNK100 TNK100 TNK50 TNK100 TNK100 TNK100 TNK100 TNK100 TNK100 TNK100
AGAGAATGGGGACTTTAAGAGG CATGGGTAAACTGGTTTGCCA GAGCATGTCTATTGTGAAACAACC GATGCAGACTGTGTGTTCTCCTAG
168 107 197 130
TNK50 TNK50 TNK50 TNK100
CAAATGTAGGTGATTAAGGG ATTCCAGTGTGTGAATTCTC ACAGGAGATGGCAAATGAAGTTA GTCTATCTTAACTTCTTTTGC CCTTTGTAGTCCCCTCCCTG GTAGTTACTGTCAGCTTGTG ACTATCTCCACAGAAGTC ATGCCTACTGTCAAGTAGAC GGTCTTGGACTTTCTGTGCTAAGT TCCCCAAGCAAAACCAACATATTC TTTTCTTGACATCTCTTACTACTC GACTATTCCATCCTCTTCC ATTCACTGAGAACAAGACAAAAAC CCTTTCACTCATACACTTCC
128 TNK100 125 TNK100 162 TNK100 101TNK100 88 TNK100 131 TNK50 176 TNK50 179 TNK50 250 TNK50 170 TNK50 84 TNK50 267 TNK50 204 TNK50 157 TNK100
Linkage marker STSs sWXD43 sWXD44 sWXD45 sWXD47
DXS100 DXS122 DXS137 DXS37
Xq25 Xq27-q28 Xq27-q28 Xq25-q26
GCTTAGAGTGCTAGTTTAGG CTGATGCAGGAAGGAGGTGCATGT CTCTTACACTGCCATGGGACAGAT GGATTGAGCAGCTAGGCTAAG YACinse~-endSTSs
sWXD1 sWXD2 sWXD3 sWXD4 sWXD7 sWXD53 sWXD54 sWXD56 sWXD57 sWXD171 sWXD172 sWXD173 sWXD174 sWXD198
Xpter-~er Xpter-~er Xpter-~er Xpter-~er Xq28 Xq27 Xq27-q28 Xq27-q28 Xq27-q28 Xq24-~er Xq24-~er Xq24-~er Xq24-~er Xpter-q24
CTAAAGTTTAGGAAGCATCG CTCTTGAGGATTTTACTGAG ATCTACATTGATAGCTGGATTTGT GAATGCAAAGAGTTGTATAAGC CCCCCCTCCTTGATATACTG TTGCTCGCTTTGGACTGTTC CAGACTATGGTAACCATCC GCAATTCACTGACCAAAG CCATGAGCATCCAGTAGGAATGCA CCTCTCTGTGTGTTTCCCC GTCCACTTAATCAAACAAGTCTAC ATTTTCCACTTTCCCATAAACATAC CCACTGCCTTTAGAAAAACAC CTAGTTGCACAGAAGAGCC
Note. The STSs are grouped according to the origin of the sequence information. STS sizes are in basepairs. Y A C s is o p t i o n a l b u t u n n e c e s s a r y , s i n c e t h e S T S c a n b e rapidly placed along assembled contigs.
Additional Classes of S T S s Most investigators are interested in looking for genes, starting from YACs containing relevant, previously m a p p e d s e q u e n c e s ( A d a m s et al., 1991; K l i n g e r , 1992). To increase the utility of the map, STSs can be systematically developed from genes, linkage probes, and other p u b l i s h e d s e q u e n c e s (for e x a m p l e s , see T a b l e 2). T h e y
anchor the map to the genetic map and other physical maps as they develop and provide one route to contig alignment and orientation from the genetic order of probes that detect polymorphism. Such STSs, like others derived from random fragments of chromosomal D N A o r e n d f r a g m e n t s ( T a b l e 2), a l s o p r o v i d e a n e a r l y check on the possibility that some regions are missing from the collection of YACs. Thus random or end-fragm e n t S T S s n e e d n o t b e i n i t i a l l y l o c a l i z e d b y h y b r i d cell panels or other means, since most of them become placed by the other STSs in the YAC contigs. The appli-
246
K E R E ET AL.
612 r'~-~'l
619 622 633 649 676 693 372 234"[ 234'Mr~'~MrT"'~l~-~'~lr~-~h1234'rl
379
384 416 420 234'Mr~-~r~2~M
455 47 I r ~ - ~ r, Z 3 4 '
F I G . 1. Agarose gel electrophoresis of end fragments recovered by ligation-mediated PCR. End fragments are shown for a group of 14 clones cut with RsaI. For each clone, reactions are under Materials and Methods with the L vector end of the YAC (lanes 1 and 2) and the R vector end (lanes 3 and 4). Lanes 1 and 3 show the products with the linker cassette primer and a vector primer 280 bp (L) or 136 bp (R) from the cloning site; lanes 2 and 4 show the products of a second-stage reaction that uses the linker cassette primer and vector primers nearer the insert junction [50 bp (L) or 24 bp (R) away] to achieve greater specificity with less vector sequence. Table 1 shows the sequences of the primers. Lanes M, 123-bp ladder (BRL) used as a size marker.
cation of this strategy to the mapping of other chromosomes will depend on the rate at which gene-specific and linkage marker STSs accumulate, and thus refinements of the strategy may be necessary for mapping other chromosomes.
Simulation of the Proposed Strategy As a test of the proposed approach, we have carried out two kinds of simulation. First YAC contigs were analyzed retrospectively. Such simulations can determine whether existing YAC libraries can really yield an overlapping YAC/STS map by the proposed methods. Two contigs assembled within the X3000.11 YAC library were considered: an 8-Mb contig of 102 YACs covering the Factor IX and H P R T genes in Xq26.1-q27.1 (Little et al., 1992) and a 3-Mb contig of 40 YACs in
L I
M1234
R I1'
L a
1234M1
I
Xq27, extending from the DXS172 locus through the DXS369 locus (Zucchi et al., manuscript in preparation). Both contigs were originally assembled by hybridization with a variety of probes. YACs were used as putative sources for end STSs. The first YAC in the total collection (by accession number) that is larger than 250 kb was considered first, and those YACs actually overlapping each of its ends in current contigs were listed as "positive" for screening (i.e., these YACs would be detected by an STS from each of its end fragments). This process was subsequently repeated with all the YACs
1
2
3
4
M
5
6
7
R |t
2 341
I
234M
F I G . 2. End-fragment recovery efficiency with several restriction enzymes. Agarose gel electrophoresis of PCR products of two YACs chosen at random, digested with the restriction enzymes RsaI (lane 1), AluI (lane 2), PvuII (lane 3), or ScaI (lane 4), and amplified with left (L) or right (R) vector primers and the linker primer. End fragments were recovered for each end with at least two enzymes. Lane M, 123bp ladder used as a size marker.
FIG. 3. Polyacrylamide gel electrophoresis to test primer pair products for specificity and chromosome origin. A primer pair was tested in TNK100 buffer for the production of a PCR fragment of the expected size from human DNA and X-specific DNA, but not from yeast DNA, a pool of nonspecific YACs, or rodent DNA. Lane 1, yeast DNA; lane 2, a pool of indifferent YACs; lane 3, human DNA added to a pool of YAC DNAs; lane 4, human genomic DNA; lane 5, X-only cell line DNA; lane 6, X3000.11 cell DNA; lane 7, chromosome 7-only cell line DNA. The size marker (lane M) is pBR322 DNA digested with MspI.
MAPPING BY YAC END STSs TABLE 2 O l i g o n u c l e o t i d e s U s e d for t h e L i g a t i o n - M e d i a t e d P C R M e t h o d to R e c o v e r Y A C I n s e r t E n d s (5' -* 3') L primer further from the cloning site CACCCGTTCTCGGAGCACTGTCCGACCGC R primer further from the cloning site ATATAGGCGCCAGCAACCGCACCTGTGGCG L primer nearer the cloning site TCTCGGTAGCCAAGTTGGTTTAAGG R primer nearer the cloning site TCGAACGCCCGATCTCAAGATTAC Linker long strand and linker primer GCGGTGACCCGGGAGATCTGAATTC Linker short strand GAATTCAGATC Note. All oligonucleotides are used unphosphorylated. Left (L) and right (R) refer to the large and small arms of YAC vector pYAC4 (Burke et al., 1987), respectively. For the ligation of linker to the digested YAC clone DNA (Materials and Methods), linker short and long strands are annealed.
l a r g e r t h a n 250 k b t h a t h a d n o t y e t b e e n f o u n d b y a n e n d fragment. T h e r e s u l t w a s t h a t f o r t h e X q 2 6 . 1 - q 2 7 . 1 c o n t i g , 30 e n d s d e r i v e d f r o m 16 c l o n e s ( t w o Y A C s w e r e h a m s t e r human chimeras) were sufficient to construct a fully cont i g u o u s s e t o f 65 o v e r l a p p i n g Y A C s . C o m p a r a b l y , f o r t h e 3 - M b r e g i o n i n X q 2 7 , 14 e n d s f r o m 7 Y A C s w e r e e n o u g h t o a s s e m b l e 40 Y A C s i n c o n t i g u o u s D N A . C a l c u l a t i n g from these two contigs and comparable simulations with other portions of Xq24-qter, on the order of 300-350 c l o n e s (or a b o u t 6 0 0 - 7 0 0 e n d s ) w o u l d b e s u f f i c i e n t t o a s s e m b l e c o n t i g s a c r o s s t h e c o m p l e t e 150 M b c h r o m o some. In a further prospective test of the proposed method, YACs from the Xq24-qter YAC library that were larger t h a n 250 k b a n d t h a t h a d n o t y e t b e e n a s s i g n e d t o c o n tigs were used for end-fragment isolation. For a group of 48 Y A C s , e n d f r a g m e n t s w e r e o b t a i n e d f r o m e v e r y e n d s o u g h t , a n d 85 o f t h e 96 e n d s y i e l d e d u n i q u e s e q u e n c e from which an STS could be developed. Examples of e n d - f r a g m e n t i s o l a t i o n a r e s h o w n i n F i g . 1. P r o b i n g t h e YAC collection by hybridization with the end fragments p r o d u c e d e n o u g h o v e r l a p s t o (1) f i n i s h t h e a s s e m b l y o f p o s s i b l e c o n t i g s i n t h e r e g i o n a n d (2) d e t e r m i n e w h i c h YAC ends abutted gaps in the collection and would have t o b e s o u g h t i n n e i g h b o r i n g Y A C s i n o t h e r l i b r a r i e s (R. N a g a r a j a e t al., w o r k i n p r o g r e s s ) . These simulations can be generalized, since genome mapping in YACs requires an amount of work that c o r r e s p o n d s w i t h t h e size o f t h e g e n o m e . F o r o r g a n i s m s w i t h g e n o m e sizes u p t o a b o u t 200 M b , e n d f r a g m e n t s from a single set of YACs can sustain global mapping. With mammalian genomes, however, where very large numbers of YACs must be screened for those cognate for a s i n g l e S T S , t h e t a s k c a n b e s i m p l i f i e d , a s h e r e , b y isolating a cohort of YACs for each chromosome and using insert end fragments to generate the bulk of the STSs.
247 ACKNOWLEDGMENTS
We thank Dr. Henry Huang for critical comments. The work was supported by Progetto Finalizzato Ingegneria Genetica (to M.D.) and NIH Grant HG00247 (to D.S.).
REFERENCES Abidi, F. E., Wada, M., Little, R. D., and Schlessinger, D. (1990). YACs containing human Xq24-q28 DNA: Library construction and representation of probe sequences. Genomics 7: 363-376. Adams, M. D., Kelley, J. M., Gocayne, J. D., Dubnick, M., Polymeropoulos, M. H., Xiao, H., Merril, C. R., Wu, A., Olde, B., Moreno, R. F., Kerlavage, A. R., McCombie, W. R., and Venter, J. C. (1991). Complementary DNA sequencing: Expressed sequence tags and human genome project. Science 252: 1651-1656. Barillot, E., Dausset, J., and Cohen, D. (1991). Theoretical analysis of a physical mapping strategy using random single-copy landmarks. Proc. Natl. Acad. Sci. USA 88: 3917-3921. Cole, C. G., Goodfellow, P. N., Bobrow, M., and Bentley, D. R. (1991). Generation of novel sequence tagged sites (STSs) from discrete chromosomal regions using Alu-PCR. Genomics 10: 816-826. Green, E. D., and Green, P. (1991). Sequence-tagged site (STS) content mapping of human chromosomes: Theoretical considerations and early experiences. PCR Methods Appl. 1: 77-90. Green, E. D., Mohr, R. M., Idol, J. R., Jones, M., Buckingham, J. M., Deaven, L. L., Moyzis, R. K., and Olson, M. V. (1991). Systematic generation of sequence-tagged sites for physical mapping of human chromosomes--Application to the mapping of human chromosome 7 using yeast artificial chromosomes. Genomics 11: 548-564. Green, E. D., and Olson, M. V. (1990). Chromosomal region of the cystic fibrosis gene in yeast artificial chromosomes: A model for human genome mapping. Science 250: 94-98. Hillier, L., and Green, P. (1991). OSP: A computer program for choosing PCR and DNA sequencing primers. PCR Methods Appl. 1: 124128. Imai, T., and Olson, M. V. (1990). Second-generation approach to the construction of yeast artificial-chromosome libraries. Genomics 8: 297-303. Klinger, H. P., ed. {1992). "Human Gene Mapping 11: Eleventh International Workshop on Human Gene Mapping," Karger, Basel. Lagerstrom, M., Parik, J., Malmgren, H., Stewart, J., Pettersson, U., and Landegren, U. (1991). Capture PCR: Efficient amplification of DNA fragments adjacent to a known sequence in human and YAC DNA. PCR Methods Appl. 1: 111-119. Lee, J. T., Murgia, A., Sosnoski, D. M., Olivos, I. M., and Nussbaum, R. L. (1992). Construction and characterization of a yeast artificial chromosome library for Xpter-Xq27.3: A systematic determination of cocloning rate and X-chromosome representation. Genomics 12: 526-533. Little, R. D., Pilia, G., Johnson, S., Zucchi, I., D'Urso, M., and Schlessinger, D. (1992). Yeast artificial chromosomes spanning 8 Mb and 10-15 centimorgans of human cytogenetic band Xq26. Proc. Natl. Acad. Sci. USA 89: 177-181. Mueller, P. R., and Wold, B. (1989). In vivo footprinting of a muscle specific enhancer by ligation mediated PCR. Science 246: 780-786. Nelson, D. L., Ledbetter, S. A., Corbo, L., Victoria, M. F., RamirezSolis, R., Webster, T. D., Ledbetter, D. H., and Caskey, C. T. (1989). Alu polymerase chain reaction: A method for rapid isolation of human-specific sequences from complex DNA sources. Proc. Natl. Acad. Sci. USA 86: 6686-6690. Nussbaum, R. L., Airhart, S. D., and Ledbetter, D. H. (1986). A rodent-human hybrid containing Xq24-qter translocated to a ham-
248
KERE ET AL.
ster chromosome expresses the Xq27 folate-sensitive fragile site. Am. J. Med. Genet. 23: 457-466.
Olson, M. V., Hood, L., Cantor, C., and Botstein, D. (1989). A common language for physical mapping of the human genome. Science 245: 1434-1435. Palazzolo, M. J., Sawyer, S. A., Martin, C. H., Smoller, D. A., and Hartl, D. L. (1991). Optimized strategies for sequence-tagged-site selection in genome mapping. Proc. Natl. Acad. Sci. USA 88: 80348038. Pearson, W. R., and Lipman, D. J. (1988). Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85: 24442448. Riley, J., Butler, R., Ogilvie, D., Finniear, R., Jenner, D., Powell, S., Anand, R., Smith, J. C., and Markham, A. F. (1990). A novel, rapid method for the isolation of terminal sequences from yeast artificial chromosome (YAC) clones. Nucleic Acids Res. 18: 2887-2890.
Rosenthal, A., and Jones, D. S. C. (1990). Genomic walking and sequencing by oligo-cassette mediated polymerase chain reaction. Nucleic Acids Res. 18: 3095-3096. Schlessinger, D., Little, R. D., Freije, D., Abidi, F., Zucchi, I., Porta, G., Pilia, G., Nagaraja, R., Johnson, S. K., Yoon, J. Y., Srivastava, A., Kere, J., Palmieri, G., Ciccodicola, A., Montanaro, V., Romano, G., Casamassimi, A., and D'Urso, M. (1991). Yeast artificial chromosome-based genome mapping: Some lessons from Xq24-q28. Genomics 11: 783-793. Srivastava, A. K., Montanaro, V., and Kere, J. (1992). Simplified template preparation and improved direct sequencing using Taq polymerase. PCR Methods Appl. 1: 255-256. Wada, M., Little, R. D., Abidi, F., Porta, G., Labella, T., Cooper, T., Della Valle, G., D'Urso, M., and Schlessinger, D. (1990). Human Xq24-Xq28: Approaches to mapping with yeast artificial chromosomes. Am. J. Hum. Genet. 46: 95-106.