GENOMICS
11,799-805
( 1991)
Genome Mapping with Anchored Clones: Theoretical
Aspects
W. J. EwENs,*,t C. J. BELL,*+~ P. J. DoNNELLY,t+- P. DuNN,*,§ E. MATALLANA,**§ AND J. R. ECKER*$ l Department of Biology and § Plant Science Institute, University of Pennsylvania, Philadelphia, Pennsylvania, 19 104; t Department of Mathematics, Monash University, Clayton, Victoria 37 68, Australia; and $ School of Mathematical Sciences, Queen Mary and Westfield College, London El 4NS, England
Received
April
12, 1991;
revised
July 15, 1991
The ordering of clones may be carried out in several ways: (i) restriction mapping, (ii) fingerprinting, and (iii) anchoring. Restriction mapping provides the highest resolution of these methods but is suitable only when both the genome of interest and the size of the clones are relatively small (Kohara et al., 1987, Link and Olson, 1991). Fingerprinting of random cosmid clones ( Coulson et at., 1986; Olson et aZ., 1986) also provides high resolution and is suitable for somewhat larger genomes but produces a large number of relatively small contigs. Efforts to link these into a complete map may be frustrated if the genome contains dispersed repetitive DNA elements, Further complications arise from the instability of certain sequences in Escherichia coli ( Papp et al., 1991). Mapping with YACs potentially avoids both of these problems, since YACs are large enough to span certain repeated DNA elements, and sequences that are unstable in E. coli may well be stable in Succhuromyces cereuisiae. A PCR-based method of fingerprinting YACs with human DNA inserts that relies on the frequent occurrence of certain repetitive DNA elements throughout the genome has been developed (Nelson et al., 1989). A restriction fingerprinting method for YACs has also been developed (Bellane-Chantelot et al., 1991), but this requires first subcloning of the YAC insert into a cosmid library. The resolution of the resulting map is high but the labor involved in constructing such a physical map of a whole genome would be very great. This method may develop into an excellent way of making a fine physical map once a region of interest on a chromosome has been identified. Both of the last two methods depend on the prevalence of dispersed repetitive DNA in the genome. In genomes where such repeats are scarce, an alternative mapping strategy must be used. Anchoring is a method by which very large DNA clones such as YACs are isolated by virtue of their containing short “anchor” sequences. These anchors
As part of our effort to construct a physical map of the genomeof Arabidopsis thaliana we have madea mathematical analysis of our experimental approach of anchoring yeast artificial chromosome clones with genetically mapped RFLPs and RAPDs. The details of this analysis are presented and their implications for mapping the o 1991 Academic POW, IOC. Arabidopsis genomeare discUSsed.
INTRODUCTION
A powerful method for furthering our understanding of the genetics and molecular biology of an organism is the construction of a physical map of its genome. Such a physical map could take several forms, but in principle consists of an ordered series of overlapping segments of DNA spanning the entire genome. The construction of the map requires first the generation of a library of random overlapping DNA clones in which the entire genome is represented and then the placement of these in order based on their sequence content. A complete physical map would reveal the relationship between physical and recombinational distances throughout the genome and, once aligned with the genetic map of the organism, would greatly facilitate map-based cloning of genes for which only a mutant phenotype and linkage map position are known. Furthermore, it would provide an opportunity to study the molecular structure of chromosomes by defining regions of interest such as centromeres. Physical maps have traditionally been constructed in bacteriophage X and cosmid vectors which can accommodate a maximum of 20 and 50 kb of insert DNA, respectively. More recently, yeast artificial chromosome vectors (YACs) , in which much larger segments of DNA may be cloned, have been developed (Burke et aZ., 1987; reviewed by Schlessinger, 1990). 799
Copyright 0 1991 All rights of reproduction
0888-7543/91 $3.00 by Academic Press, Inc. in any form reserved.
800
EWENS
could consist of any short single-copy sequence from the genome such as restriction fragment length polymorphisms (Botstein et aZ., 1980)) arbitrarily primed PCR products (Williams et al., 1990; Welsh and McClelland, 1990); or sequence-tagged sites (Olson et al., 1989). Screening a library of YACs to isolate clones containing them could be carried out by colony hybridization or by PCR screening. By isolating YACs that contain two or more anchor sequences, contigs consisting of overlapping YACs that share common anchors may be built (Green and Olson, 1990). This method provides a means of constructing large contigs relatively quickly, and long-range continuity of the map might then be achieved by a directed approach toward linking the contigs with further YAC clones. Theoretical considerations of planning mapping projects based on fingerprinting have been published (Lander and Waterman, 1988). Arratia et al. ( 1991) and our present study consider a mathematical analysis of a general scheme for mapping large genomes by anchoring. We are specifically interested in mapping the genome of the flowering plant Arabidopsis thuZiana by anchoring YAC clones with genetically mapped molecular markers. Arubidopsis has been chosen as a model organism by a large number of plant scientists and its unique features of a very small genome, low amount of dispersed repetitive DNA, accumulated genetic data (Meyerowitz, 1987), and large number of mapped molecular probes (Chang et al., 1988; Nam et al., 1989; P. Scolnik and R. Reiter, personal communication) make anchoring a particularly promising mapping strategy. From an experimental standpoint, mapping the genome will take place in two stages: building islands by anchoring YACs with all available probes and linking these islands into a contiguous map by bridging the gaps between them by chromosome walking. Thus our mathematical analysis of the problem was approached in the same manner. Where the calculations attempted in our analysis and that of Arratia et al. ( 1991) overlap, the conclusions agree, although the two sets of results were arrived at by different routes. We note below that our calculations predict that with a YAC library (Ecker, 1990) of 2300 YAC clones, of size 250 kb, and a manageable number (500) of probes, we can expect to cover 87% of the Arabidopis genome of about 100,000 kb with some 180 islands of average size about 550 kb. We also note that bridging the gaps between islands to achieve long-range continuity of the map is a realistic goal. This is due mainly to the parameters of this particular experimental system, i.e., a large number of mapped molecular markers ( Chang et al., 1988, Nam et al., 1989) and the ratio of YAC insert size to genome size. With larger and more complex genomes, there may be too few an-
ET
AL.
chars available to make a complete physical map using this strategy. However, the formulae presented here may be used to predict the utility of this method for any system and could be applied to regions of interest in complex genomes in which large numbers of anchors exist.
ANCHORS
The anchors used in our system are of two types: restriction fragment length polymorphisms ( RFLPs) (Chang et al, 1988; Nam et aZ., 1989) and random amplified polymorphic DNAs ( RAPDs) (Williams et al., 1990)) also known as arbitrarily primed PCR prodccts ( AP-PCR) (Welsh and McClelland, 1990). For our purposes, the important feature of these anchors is that they are short, genetically mapped single-copy sequences from the Arabidopsis genome that may be used as hybridization probes. They also fulfill a requirement of our analysis in that they may be considered discrete points on the genome. We use the expression “anchor” in the remainder of this paper to describe either RFLPs or RAPDs.
MATHEMATICAL
BACKGROUND
For convenience, we use the same assumptions and notation as those of Arratia et al. ( 1991) . We imagine a genome of length G, with M anchors and N YAC clones falling on the genome according to two independent Poisson processes. The length of each clone isL.Ifweput
a = LNIG,
b = LMIG,
(Y = N/G,
(3 = M/G,
then (Y and ,f3are the Poisson rates of the YAC clone and anchor processes, respectively. We consider here only the case of constant clone size L, and are thus able to make all calculations quickly once several basic mathematical formulae are obtained. Let P be a randomly chosen point on the genome. Then the number of YAC clones crossing P has a Poisson distribution with parameter a. Further, the left (and also right) projection of any YAC clone crossing P has a uniform (0, L) distribution. Suppose k (k > 1) YAC clones cross the point P (see Fig. 1). Then the distance X will be called the right projection of these YAC clones: this is the length to the right of P of the right-hand end of the rightmost YAC clone crossing P. The joint probability that at least one Y AC clone crosses P and that this right projection takes the value X is found from order-statistics theory as
THEORETICAL YAC clones
= length
ASPECTS
OF
MAPPING
WITH
ANCHORED
of these clones is 2, is then found from [ 4 ] by standard methods as
L
j(z)
= (2L - z)u2L-2e-2a+a2’L, FORMULAE
FIGURE
1
g(x) = 5 (e-auk/k!)kxk-‘L-k k-l aL-‘e-“+“lL,
0 <
x
<
L.
VI
The conditional density function f ( x) of X, given that at least one YAC clone crosses P, is found by dividing g(x) by ( 1 - ePa), giving
f(x)
= uL-‘e -a+m’L(l
- e-“)-‘,
0 < x G L.
[2]
It is convenient to think of the left projection Y as Y = L - W, where W is, as shown in Fig. 1, the right projection of the leftmost YAC clone covering P. For any k greater than or equal to 2, order-statistics theory shows that X and W have joint density function
and from this, the joint density
function
k(k-l)(x+y-L)k-2L-k,
OGx,yGL;
L < 2 G 2L.
FOR PROPERTIES
[5]
OF ISLANDS
An anchored island is a collection of one or more YAC clones, “stapled” together by anchors, as described above. An unanchored YAC clone has no anchor on it and does not form part of an island. In this section various properties of anchored islands, or more briefly islands, are derived almost immediately from the formulae in the preceding section. In each calculation we assume that a # b; the case a = b is easily handled by allowing a + b in each formula found. We also ignore end effects as being of negligible importance and imagine that the genome is scanned from left to right; the word “following” thus meaning following to the right.
X
=
801
CLONES
of X and Y is
Mean Number
of Islands
The mean can be found in two ways, and we consider first the derivation from the point of view of YAC clones. Each island has a unique rightmost YAC clone so that the mean number of islands is equal to the mean number of rightmost YAC clones. Such a clone must be anchored, and further its right-hand end must be crossed by no clone sharing an anchor with the clone in question. The required mean number thus is Ne-“(
1 - ewb) L
+N
g(z)e-@“(l
- e-B(L-x))dx,
[6]
where g(x) is given by ( 1) and the point P in Fig. 1 is taken as the right-hand end of the YAC clone in question. Evaluation of [ 61 leads to
Nb{e-”
x+y>L.
s0
- emb}/(b
- a) = pl,
say.
[71
[3]
Thus the nrobabilitv that P is covered by at least two YAC clones, with left projection Y and right projection X, is
The calculation is even more straightforward from the point of view of anchors. The mean number of islands is the mean number of anchors on islands with no anchor to its right on the same island. This is L
h(x, y) = 5 (eeaak/k!)k(k
M
- 1 )(x + y - L)k-2L-k
k-2
= a2L-2e-2a+ax/L+aYlL,
0 < x, y < L;
x + y > L.
[41 The “span” 2 of the YAC clones covering P is defined by 2 = X + Y. The probability that the point P is covered by at least two YAC clones, and that the span
and noting
s0
g(x)e+“dx,
that Mu = Nb, this leads directly
Mean Number and Length of Islands Having Anchor Only
to [ 71.
One
An island having one anchor only is formed either by an anchor covered by exactly one YAC clone that
802
EWENS
has no other anchor or by an anchor covered by more than one YAC clone, no other anchor arising in the span of these YAC clones. Using [ 51, the mean number of such islands is
s
ET AL.
clone, (ii) is covered by exactly one YAC clone with no anchor on it, or (iii) is covered by more than one YAC clone, no anchor arising in the span of these YAC clones. This is
2L
Maevaeb + M
2L
j(z)e+dz
e -a
L
=
Mae-a-bb{ea-b
l}/(a
-
-
b12
-
b/(a
-
b)~.
[81
It is interesting to note that Arratia et al. ( 1991) compute the mean number of islands having one clone only, arriving at the same formula as [ 8 ] with the roles of N and M, and hence a and b, reversed. This result is explained by the duality property of clones and anchors discussed by Arratia et al. ( 1991) . However, while the mean length of an island with one YAC clone only is, trivially, L, the mean length of an island with one anchor only is not immediate, and cannot be found by duality considerations. If the island has one YAC clone, its length must be L. If it has more than one clone, its length must be the span 2 of the clones covering the single anchor. Using [ 51, the mean length for such islands is found by dividing Laeeawb
+
2L zj(z)e-azdz
s
=e =
+
sL -a
ru,,
+
j(z)e-@‘dz
a(b2 - ab - u)eeomb a2 -26 (b - a)2 -I- (a - b)2 e
191
say.
The Probability That the Anchor to the Right of an Anchor on an Island Is Not on the Same Island Suppose that an anchor lies at the point P in Fig. 1 and is covered by at least one YAC clone. The density function f ( x) of the right projection X of the YAC clones covering this anchor is given by [ 2 1. The probability that there is no further anchor on this right projection is then
L
s
s 2L
+
se-a-b
The mean length of the genome not covered by islands is, of course, Gp2 and the mean length covered by islands is G( 1 - p2).
by ae -a-b
+
j(z)e+dz.
f f (x)e-@“dx = a(e-b - eea)/[(a
- b)(l
- e-‘)I.
[lo]
L
The resulting calculation gives
L[ (a - b)3 - a{ 2eaeb - 2(a - b)Fb + (a - b)2 - 2}]
This calculation can be used to prove an important “nonindependence” result. If an anchor is on an island, the probability that there are no further anchors on this island to the right or left is
M-l { mean number single anchor islands } / As a + b, this cumbersome expression approaches the far simpler value L{l
+ b/(6
+ 3b)].
It is then interesting to note that as b ( =a) --* co, this expression approaches 4L/3.
Mean Proportion Islunds
of the Gerwme Not Covered by
The mean proportion of the genome not covered by islands is an important quantity in its own right and is also important as an aid in computing other interesting quantities. It is found by calculating the probability that a given point (i) is not covered by any YAC
(1 - e-O),
[ll]
which can be calculated immediately from [ 81. The value so obtained is not equal to the square of the expression [lo], and this implies nonindependence of the events that there is an anchor on the island to the left and to the right of the anchor in question. This implies that the geometric distribution, used successfully in a similar problem by Lander and Waterman (1988) for deriving results analogous to those obtained here, cannot be used in the present (more complex) calculations. The next two calculations form the basis for finding a further quantity of considerable interest, namely, the mean island size.
THEORETICAL
ASPECTS
OF
MAPPING
WITH
ANCHORED
803
CLONES
Mean Distance between Anchors on the Same Island Consider an anchor on an island, for example, at the point P in Fig. 1. The distance U to the next anchor on its right has density function m(u)
= Pep@“,
u >, 0.
This anchor will be on the same island as the original anchor if U is less than the distance from the original anchor to the rightmost projection of the YAC clones crossing this anchor. This is the distance X in Fig. 1, having density function f ( x) given in [ 21. The mean value of the distance required is thus the integral of um(u)f(x) over the domain 0 d u G x < L, divided by the integral of m(u) f ( x) over the same domain. The result is L[ (a - b)2e” + a(b2 + 2b - ab - a)eamb - b2] b(a - b)2(e” - 1) - a(a - b)(eumb - 1)
= p3, say.
WI
Mean Distance from the Rightmost Anchor on an Island to the Right-Hand End of the Island
L[(a
- b)2e”+b + a(ab - b2 - a)ea + b(2a - b)eb] ab(a - b) (ea - e*) = p5,
for the mean island size. This expression is identical to that of Arratia et al. ( 1991), although obtained from a completely different approach. The Probability Ocean
That an Island Is Followed by an
An ocean is a segment of the genome with no anchored island on it. Several important quantities can be calculated using the probability that an island is followed by an ocean (rather than by another island whose left-hand end overlaps the right-hand end of the island in question). This is found, extending the analysis of Arratia et al. ( 1991)) as the conditional probability, given that a certain YAC clone is the rightmost in any island, that its right-hand end is not overlapped by a YAC clone, or is overlapped by exactly one unanchored YAC clone, or is overlapped by two or more unanchored YAC clones. Thus the probability required may be written in the form {wJ)
This mean distance is calculated by a process that is a mirror image to the previous one, since the quantity in question is the mean value of X, given that X < U. The required mean is thus found by a calculation similar in form to that leading to [12]. The resulting value is L/(1
It is interesting using a duality
- ebma} - L/(a
- b) = p4,
say.
[13]
Any island will have some number i (i 3 1) of anchors on it, and thus i - 1 “interanchor” distances. It will also have two further distances, to the left (right) of the leftmost (rightmost) anchor. The mean island size, using [12] and 1131, is thus p3E(i
-
1)
+
2~~.
v41
Now the mean of i is the mean number of anchors on islands (M( 1 - e-“) ) divided by the mean number p1 of islands, found from [ 7 ] . Using [ 12 ] , [ 13 I, and these values, [ 14 ] yields
+p(=2)l/P,,
+P(l)
where the conditioning al., 1991)
probability
p1 is (Arratia
p1 = b(eTa - emb)/(b - a).
WI et
[I71
We have, immediately,
to note that this result has been found argument by Arratia et at. ( 1991) .
Mean Island Size
say, WI
P(0)
= e-“(1 - evb).
P( 1) is the probability that a YAC clone has at least one anchor and that its right-hand end is covered by exactly one unanchored YAC clone. This probability is L
P(1)
= aeeaeb
L-1(1
_
e-#cL-r))&
s
= ae-a-b{i-b
-
1 +
b}/b.
WI
Similarly, P (32) is the probability that a YAC clone has at least one anchor and has its right-hand end covered by two or more unanchored YAC clones. Using [ 41, this is
a04
EWENS
P (St)
=
s
a2e
h(x,
-a-b
y)e+("+y'dydx en-b
=(a-b)
1
a-b
-1
-1
- epb(ea - 1) + l-epb
a The required
probability,
b
1 ’
WI
from [ 16 ] - [ 191, is
=1
(b-a)(l-e-b)+ae-b b( 1 - eamb)
1 + 1 1 - ea-b a-b
1~~ say.
[201
Note that the first part of this expression is calculated by Arratia et al. ( 1991) as the probability that an island is followed by an “actual” ocean, that is, a stretch of the genome with no clone, anchored or unanchored, on it. For our purposes it is more interesting to calculate the probability that an island is followed by an ocean, “actual” or otherwise, and this leads to the expression [ 201, which we use for the next two calculations. Note that a simple extension of the above argument gives the probability that an island is followed by an ocean of size at least x, for any value of x.
Mean Number
of Oceans and Mean Ocean Size
The mean number of oceans is, immediately, the mean number of islands multiplied by the probability that an island is followed by an ocean. This is plr, where ~1~and ?r are given by [ 71 and [ 201, respectively. The mean ocean size is the mean total ocean size divided by p1a, or, from [ 91, Gp2/ ( plx). The resulting expressions do not simplify and the formulae are best left in this form.
Mean Number
of Overlaps and Mean Overlap Size
The mean number of overlaps of islands is the mean number of islands multiplied by the probability that an island is followed by an overlapping island, or pL1( 1 - n) . The mean overlap size is found by dividing the total overlap length by this amount. This is [ws
- G(l
- ~2)1/[~1(1
RESULTS
In this section we present and discuss numerical results for the case G = 100,000, L = 250, M = 500, N = 2300 (so that a = 5.75, b = 1.25), approximately appropriate for the Arabidopsis genome, our library of YAC clones, and probes, as discussed earlier. For these numerical values, we find:
L
s L-x
AL.
NUMERICAL
,: (1 - e-B(L-x)}
X
ET
- ~11,
[211
where pcL1ps is the mean total size of all islands and the calculation recognizes that no more than two islands can overlap at a given point.
Mean number of islands Mean number of islands with one anchor Mean size of an island Mean size of an island with one anchor Mean number of oceans Mean ocean size Mean number of overlaps of islands Mean overlap size Probability that an island is followed by an ocean Mean proportion of genome not covered by islands
181.01 65.54 547.95 kb 396.11 kb 68.14 197.04 kb 112.87 111.72 kb 0.3764 0.1344
DISCUSSION The above calculations provide a very encouraging basis upon which to build a physical map of Arabidopsis. We anticipate that 500 anchoring experiments by colony hybridization can be carried out in a reasonable amount of time and if our expectation of approximately 180 islands is correct, closure of the map may be achieved in a similar period. Although about 60% of the oceans separating the islands will be false, chromosome walking must still be carried out to determine which are in fact real. However, the estimated ocean size of 197 kb is smaller than the average YAC insert and it is therefore likely that many oceans will be bridged in just one step. It is also clear that since every successful bridging of one ocean connects the edges of two islands, the number of chromosome walks required to close the map will be one-half of the number of oceans. Our analysis ignores potential obstacles to map closure such as the absence of certain sequencesfrom the library caused by long stretches of genomic DNA lacking EcoRI restriction sites (Ecker, 1990) ; other genomic sequences may turn out to be refractory to cloning in yeast as has been shown with human DNA (Neil et al., 1990). Difficulties in obtaining certain sequences due to these problems might be overcome by use of a YAC vector with a cloning site other than EcoRI (Ward and Jen, 1990; Grill and Somerville, 1991), and by a combined approach to regions of difficulty using additional genomic libraries constructed in alternative vectors. Throughout, we have based our calculations on a conservative estimate of 100,006 kb for the size of the Arabidopsis genome. If the actual
THEORETICAL
ASPECTS
OF
WITH
MAPPING
genome size is closer to the published estimate of 70,000 kb (Leutweiler et al., 1984) we may expect the number of islands and the number of walking steps required to close the map to be correspondingly reduced. Recently, Barillot et al. ( 1991) have published several formulae similar, but not identical, to some of ours. The difference arises because Barillot et al. assume that an unanchored clone is to be regarded as an island. From our point of view this is not an appropriate assumption, and for our purposes we prefer the analysis given above. Further, Torney ( 1991) has considered the case in which clone length has a normal distribution. He has specialized some of his calculations to the case in which clones are all of the same size (the case considered here), and thus obtains, for example, our Eq. [ 91 (Torney’s Eq. (4’) ) . Thus the results of Torney, Arratia et al. ( 1991) , and this paper jointly provide substantial information on genome mapping with anchored clones.
9.
10.
ANCHORED
805
CLONES
GREEN, E. D., AND OLSON, gion of the cystic fibrosis somes: A model for human 94-98.
M. V. (1990). Chromosomal regene in yeast artificial chromogenome mapping. Science 250:
GRILL, E., AND SOMERVILLE, C. (1991). Construction and characterization of a yeast artificial chromosome library of Arabidopsis which is suitable for chromosome walking. Mol.
Gen. Genet. 226: 484-490. 11.
12.
13.
KOHARA, Y., AKNAMA, K., AND ISONO, K. ( 1987). The physical map of the whole E. coli chromosome: Application of a new strategy for rapid analysis and sorting of a large genomic library. Cell 50: 495-508. LANDER, E. S., AND WATERMAN, M. S. (1988). Genomic mapping by fingerprinting random clones: A mathematical analysis. Genomics 2: 231-239. LEUTWEILER, L. S., HOUGH-EVANS, B. R., AND MEYEROWITZ, E. M. (1984). The DNA of Arabidopsis thuliunn. Mol. Gen.
Genet. 194:15-23. 14.
LINK,
A. J., AND OLSON,
Sa&.aromyces cerevisioe Genetics 12'7:681-98. 15.
MEYEROWITZ,
E. M.
M. V. (1991). Physical genome at IlO-kilobase
(1987).
Arabidopsis
map of the resolution.
tha1ian.u. Annu.
Reu. Genet. 21: 93-111. 16.
ACKNOWLEDGMENTS
NAM, H.-G., GIRALJDAT, J., DEN BOER, B., MOONAN, F., Loos, W. D. B., HAUGE, B. M., AND GOODMAN, H. M. ( 1989). Restriction fragment length polymorphism linkage map of
Arabidopsis thaliunu. Plant Cell 1: 953-960. This work was supported in part by Grant HGO0322 to J.R.E. from the National Center for Human Genome Research and by SERC AdvancedFellowship B/AF/1255 to P.J.D. The support and hospitality of the Department of Mathematics at Monash University to P.J.D. are also gratefully acknowledged.
17.
18.
REFERENCES 1.
ARRATIA, R., LANDER, E. S., TAVARE, S., AND WATERMAN, M. S. (1991). Genomic mapping by anchoring random clones: A mathematical analysis. Genomics 11,806~827.
2.
BARILLOT, E., DAUSSET, J., AND COHEN, D. (1991) . Theoretical analysis of a physical mapping strategy using random single-copy landmarks. Proc. Natl. Acad. Sci. USA 88: 39173921.
3.
BELLANE-CHANTELOT, C., BARILUT, E., LACROM, B., LE PASLJER, D., AND COHEN, E. ( 1991) . A test case for physical mapping of human genome by repetitive sequence fingerprints: Construction of a physical map of a 420 kb YAC subcloned into cosmids. Nucleic Acids Res. 19: 505-510. BOTSTEIN, D., WHITE, R. L., SKOLNICK, M., AND DAVIS, R. W. ( 1980). Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am. J. Hum. Genet. 32: 314-331.
4.
5.
6.
7.
8.
BURKE, D. T., CARLE, G. F., AND OLSON, M. V. ( 1987). Cloning of large segments of exogenous DNA into yeast by means of artificial chromosome vectors. Science 236: 806812. CHANG, C., BOWMAN, J. L., DEJOHN, A. W., LANDER, E. S., AND MEYEROWITZ, E. M. (1988). Restriction fragment length polymorphism linkage map for Arabidopsis thuliana. Proc. Natl. Acad. Sci. USA 85: 6856-6860. COULSON, A., SULSTON, J., BRJZNNER, S., AND KARN, J. (1986). Towards a physical map of the genome of the nematode Caenorhabditis elegans. Proc. Natl. Acad. Sci. USA 83: 7821-7825. ECKER, J. R. (1990). PFGE and YAC analysis of the Arabidopsis genome. Methods 1: 186-194.
NEIL, D. L., VILLASANTE, A., FISHER, R. B., VETRIE, D., Cox, B., AND TYLER-SMITH, C. ( 1990). Structural instability of human tandemly repeated DNA sequences cloned in yeast artificial chromosomes. Nucleic Acids Res. 18: 1421-1428. NELSON, D. L., LEDBE~R, S. A., CARBO, L., VICTORIA, M. F., RAMIREZ-SOLIS, R., WEBSTER, T. D., LEDBETTER, D. H., AND CASKEY, C. T. (1989). Alu polymereee cheii reaction: A method for rapid isolation of human-specific sequences from complex DNA sources. Proc. Natl. Acad. Sci.
USA 86: 6686-6690. 19.
20.
21.
22.
OLSON, M. V., DUTCHIK, J. E., GRAHAM, M. Y., BRODEUR, G. M., HELMS, C., FRANK, M., MACCOLLIN, M., SCHEINMAN, R., AND FRANK, T. ( 1986). Random-clone strategy for genomic restriction mapping in yeast. Proc. Natl. Acad. Sci. USA 83: 7826-7830. OLSON, M., HOOD, L., CANTOR, C., AND B~TSTIZIN, D. (1989). A common language for physical mapping of the human genome. Science 245: 1434-1435. PAPP, A., ROUGVIE, A. E., AND AMBROS, V. ( 1991) . Molecular cloning of &n-29, a heterochronic gene required for the differentiation of hypodermal cells and the cessation of molting in C. elegans. Nucleic Acids Res. 19: 623-630. SCHLESSINGER, D. (1990). Yeast artificial chromosomes: Tools for mapping and analysis of complex genomes. Trends
Genet 6: 248-259. 23.
TORNEY,
D. C. ( 1991)
. Mapping
using
unique
sequences.
J.
Mol. Biol. 217: 259-264. 24.
25.
26.
WARD, E. R., AND JEN, G. C. (1990). Isolation of single-copysequence clones from a yeast artificial chromosome library of randomlv sheared Arabidonsis DNA. Plant Mol. Biol. 14: 561-568: WELSH, J., AND MCCLELLAND, M. (1996). Fingerprinting genomes using PCR with arbitrary primers. Nucleic Acids Res. 18: 7213-7218. WILLIAMS, J. G. K., KUBELIK, A. R., LIVAK, K. J., RAFALSKI, J. A., AND TINGEY, S. V. (1990). DNA polymorphisms amplified by arbitrary primers are useful as genetic markers. Nu-
cl&cAcidsRes.
18:6531-6535.