355
Gene, 23 (1983) 355-367 Elsevier
GEN 00809
Comparison of sequence repetitiveness of ham vector, piVX (Recombinant
DNA;
insert-mediated
genetic
cDNA and genomic DNA using the miniplasmid
recombination;
repeated
DNA
sequences;
human
gene
mapping)
Rachael L. Neve a and David M. Kurnit a3b of Clinical Genetics, Children5
a Division ’ Division
of
Hospital Medical Center, Boston, MA 0211.5, Tel. (617) 135- 7240, and Genetics, Brigham and Women’s Hospital, Boston, MA 02115 (U.S.A.) Tel. (617) 732-2296
(Received
February
(Revision
received April 20th, 1983)
2nd, 1983)
(Accepted
April 29th. 1983)
SUMMARY
We studied the sequence repetitiveness of human cDNA and genomic DNA fragments inserted in the min~plasmid piVX. Sequence repetitiveness was assayed by the frequency with which a given insert mediated recombination between the chimeric mi~plasmid and a recombinant bacteriophage library constructed from large random human genomic fragments. The methodology allows rapid analysis and isolation of sequences of a given copy number in the genome: few (1 to 10 copies), low order-repeated (10 to 100 copies) and more highly repeated (over 100 copies). In a model application of the method, the distribution of these classes of sequences was compared in cDNA and genomic DNA libraries constructed in piVX. The major difference observed between cDNA and genomic DNA paucity of highly repeated elements in cDNA copies from high-molecular-weight
repeat structure was the cytoplasmic poly(A) +
RNA.
Although successful was first demonstrated
Abbreviations: kilobase adenyiated;
bp, base
pairs;
pfu,
nucIeic acid hybridization with an in vivo biological
pairs;
REP number,
Nas‘citrate,
037%
dodecyl
pH 7.5;
I I 19/83/$03.00
sulfate;
ethidium units;
normalized
pfu on Su hosts: see MATERIALS c; SDS, sodium
EtBr,
plaque-forming
ratio
AND
bromide;
poly(A)+,
kb, poly-
of pfu on Sue to
METHODS,
SSC, 0.15 M NaCl,
[ 1.indicates pIasmid-carrier
section 0.015 M
state.
0 I983 Elsevier Science Publishers
B.V.
assay (Marmur and Lane, 1960), sequence homology assays determined by hybridization (Rownd et al., 1963) have been performed almost exclusively using physicochemical criteria in vitro. Recently, Seed (1983) developed a methodology that utilizes recombination in vivo to detect nucleotide sequence homology between a DNA fragment cloned in a specialized miniplasmid and a mammalian genomic sequence represented in a recombinant bacteriophage library. The availability of this convenient assay to study nudeotide sequence homol-
356
ogy in a biologically relevant manner should facilitate study of aspects of genome evolution that
of larger genes by RNA polymerase II, or as 300
depend on recombination-based
processes, includ-
bp transcripts by RNA polymerase III (for review,
ing gene duplication, gene conversion, sister chro-
see Jelinek and Schmid, 1982). Although some transcribed sequences comprise 50 to 1000 copies
matid exchange and chromosome translocation. In this report, we detail the usefulness of this recombination-based
assay for determining the degree of
transcribed, either as part of intervening sequences
in the genome (Ryffel Crampton et al., 1981)
and McCarthy, 1975; their origin is obscure,
sequence repetitiveness of a cloned DNA sequence
with the exception of particular gene families such
in the genome from which it was derived.
as ribosomal DNA (Young et al., 1976) or some
studies of bulk ge-
repetitive
mobile
(Potter
et al., 1979;
nomic DNA from a variety of eukaryotes have
Nucleic acid hybridization
Cameron
et al., 1979). Transcribed
repeated se-
permitted the ascertainment of different families of related sequences which may be tandemly clus-
quences of still lower copy number include gene families with or without pseudogenes (Jacq et al.,
tered or widely scattered, and whose reiteration frequency within the genome can vary from IO’ to IO6 copies (Waring and Britten, 1966; Britten and Kohne, 1968; Wetmur and Davidson, 1968; Skinner, 1977; Schmid and Deininger, 197.5; Dein-
1977). processed genes (Nishioka et al., 1980) and organ-specific identification sequences (Sutcliffe et
inger et al., 1981). Recombinant DNA technology has enabled the study of repeated sequences in eukaryotes to advance from the description of the existence of repeated families to the isolation and analysis of members of these families. Rapid determination of whether an individual cloned sequence has a copy number of at least lo* can be accomplished by hybridizing cloned DNA fixed to filters with radiolabeled total genomic DNA (Gusella et al., 1980; Shen and Maniatis, 1980; Crampton et al., 1981). However, analysis of lower order repeats requires radiolabeling of each individual clone followed by reassociation kinetics (Britten and Kohne, 1968; Wetmur and Davidson, 1968), Southern (1975) blot, or dot blot (Brandsma and Miller, 1980) analyses. We demonstrate that an adaptation of the recombination-based approach of Seed (1983) allows more rapid discrimination of few copy (repeat frequency less than 10’) from low-order repeated (lo’-lo*) sequences than is feasible with nucleic acid hybridization technology. Transcribed sequences represent a specific subset of the genome and include some, but not all, repeated sequence families (Klein et al., 1974; Ryffel and McCarthy, 1975; Crampton et al., 1981). In general, highly repeated, tandemly clustered, simple sequence DNAs are not transcribed (Flamm et al., 1969) except in unusual circumstances (Varley et al., 1980). The interspersed highly repeated sequences of the Alu family are
elements
al., 1982). Previous analyses of the sequence complexity of transcribed sequences utilized nucleic acid hybridization (Klein et al., 1974; Ryffel and McCarthy, 1975; Crampton et al., 1981). We confirm and extend the results of these studies by using recombination-based assays to compare the repeat sequence structure of cloned cDNA and genomic DNA fragments. These studies confirm the paucity of highly repeated sequences in high molecular weight cytoplasmic poly(A) + RNA and permit the rapid isolation of individual transcribed sequences of few copy, low-order repeated and more highly repeated copy number, respectively.
MATERIALS
(a)
AND
METHODS
Recombination
miniplasmids
and
bacterial
strains The miniplasmids piVX and piAN were obtained from B. Seed. piAN7, constructed by H. Huang, differs from piVX (Seed, 1983) in the content of the ColEl replicon and the orientation of the polylinker, which results in a five-fold greater copy number for piAN in the host bacterial cell. These miniplasmids were propagated in MCl061[p3], a Sue strain (Casadaban and Cohen, 1980) which carries plasmid p3 (described in Neve et al., 1983). To detect bacteriophage carrying piVX or piAN7, the IacZ amber mutant LG75 was used as detailed previously (Neve et al., 1983). The Su strain used in these studies was LE392, which
351
contains plating,
a chromosomal growth
and
scribed previously
supP
elution
gene.
of phage
Details
of
were
de-
(b) Construction of genomic and cDNA libraries in recombination miniplasmids piVX was linearized Linearized
lowing
electrophoresis
with PstI and piAN molecules
fol-
in a 3.5% acrylamide
gel,
eluted in 0.5 M ammonium and
Gilbert,
with
were isolated
1980), diluted
acetate buffer (Maxam to 0.2 M ammonium
acetate and chromatographed on RPCV (Schleicher and Schuell). The DNA was recovered in 0.5 ml and ethanol-precipitated. PstI-digested piVX was dG-tailed (Michelson and Orkin, 1982) with terminal transferase to an average tail length of 15-20 bases. Tailing conditions were determined by digesting PstI-linearized piVX with BamHI, 5’-end labeling at this site, and monitoring the shift in size of the small BamHIPstI fragment on an acrylamide gel subsequent to a model tailing reaction. Double-stranded dCtailed cDNA from the cytoplasmic poly(A)+RNA of adult human liver was a gift of E. Prochownik and D. Woods (Woods et al., 1982). This cDNA was sized to exclude fragments shorter than 500 bp. Total cellular lung fibroblast poly(A)+RNA was isolated from a 20-week fetus with trisomy 21 (47, XX + 21) by guanidine hydrochloride extraction (Cox, ‘1968). Fetal material was obtained under a protocol approved by the Institutional Review Board at Brigham and Women’s Hospital (Kurnit et al., 1982). Double-stranded cDNA was prepared, sized to exclude fragments shorter than 500 bp and dC-tailed as described (Michelson and Orkin, 1982; Woods et al., 1982; Kurnit et al., 1982). Transformation of MC1061[p3] with dCcDNA annealed to dG-piVX yielded 60 recombinant colonies/ng of tailed cDNA on plates containing 50 pg/ml kanamycin, 10 pg/ml tetracycline and 12.5 pg/ml ampicillin. Only 3% of colonies represented nonrecombinants without cDNA inserts. BamHI-digested piAN was ligated using T4 DNA ligase to a complete Mb01 digest of human female DNA (gift of L. Kunkel). The ligated mixture was digested with BamHI to minimize nonrecombinants and used to transform MC1061[p3],
20 colonies/ng
15% of colonies age insert
(Neve et al., 1983).
BarnHI.
yielding
libraries termined digested
of MboI-digested
were nonrecombinant.
size of both cDNA ranged from by monitoring plasmids
DNA. The aver-
and genomic
DNA
500 to 1000 bp as dethe size of endonuclease-
on an 0.8% agarose gel.
(c) Analysis of recombination frequencies: the REP number We assayed the sequence human genome of individual
repetitiveness in the human DNA frag-
ments by determining the frequency with which a fragment occurred in a recombinant bacteriophage library carrying human genomic DNA inserts (Lawn et al., 1978). When amber mutant bacteriophage are grown in cells harboring piVX or piAN with a human DNA insert, phage which share human DNA sequence homology with that insert can incorporate a copy of piVX or piAN by recombination between the homologous human DNA sequences. Such phage thus acquire the supF gene of piVX or piAN (Seed, 1983) and can grow on Sue hosts. lo6 pfu from the human recombinant bacteriophage ,library (Lawn et al., 1978) were plated on bacteria carrying piVX or piAN with a human DNA insert to be analyzed. Following confluent lysis and elution of the phage, the resulting phage were plated on both Su (LE392) and Sue (LG75) hosts. The ratio of pfu on LG75 vs. LE392 should reflect the frequency of occurrence of sequences in the bacteriophage library which are homologous to the human DNA cloned in piVX or piAN7. This ratio is termed the REP number. For recombinants in piVX, REP was expressed as number of pfu on LG75 which resulted after plating an aliquot of phage stock equivalent to that which yielded 5 X lo* pfu on LE392. For recombinants in piAN7, REP was expressed as number of pfu on LG75 from 10’ pfu on LE392. This five-fold difference reflects the increased copy number of piAN vs. piVX which makes piAN roughly five-fold more efficient than piVX for the same human DNA insert in this recombinational assay (unpublished data of D.M.K.). The REP number was zero in multiple control experiments using piVX or piAN alone without a human insert. The REP number scale was chosen so that a
358
REP
number
of one represented
repetitiveness.
“single”
This scale was chosen
but is in agreement considerations: insert
piVX
containing
empirically,
with the following
when
given
theoretical
a bacteriophage
is propagated DNA
carrying
on bacteria homologous
copy
a
Su host.
The
data
are summarized
demonstrating
the existence
REP numbers tude.
ranged
in Table
of fragments
over four orders
I,
whose
of magni-
harboring
to the given
(b) Distribution
of repeated
sequences
in tran-
insert, one out of lo2 to lo3 of the resulting phage incorporate piVX by homologous recombination
scribed DNA sequences (cDNA clones)
and
Table II of this paper and unpublished data of D.M.K.). Somewhat under 1 out of lo5 phage in a
The distribution of repeated 90 individual clones containing
sequences among cytoplasmic liver
cDNA
total cellular
human
to carry
fibroblast
DNA
termined.
can
a given
therefore
genomic single
grow
library copy
on LG75
(Seed,
would be expected fragment
of human
1983;
(Lawn et al., 1978). Thus, it is anticipated that the frequency of recombination of a “single” copy sequence in piVX with phage from a human genomic library would be on the order of 1 out of 10’ phage. Reconstruction experiments, in which a bacteriophage carrying a DNA insert homologous to that subcloned in piVX is diluted with large numbers of nonhomologous phage. are in agreement with the above arguments. In these experiments, the proportion of phage which incorporate piVX by recombination depends directly on the proportion of phage that contain the homologous DNA insert (unpublished data of B. Seed and D.M.K.).
and 50 clones containing cDNA
The average
TABLE
for cDNA
The REP numbers obtained
The Percentage proportion
The distribution of repeated sequences among 100 individual clones containing human genomic DNA inserted into piAN was analyzed. The average human DNA insert size was 500 to 1000 bp. In 100 separate experiments, phage from a recombinant bacteriophage library containing 15- to 20-kb DNA inserts (Lawn et al., 1978) were plated on bacteria carrying piAN with a human genomic fragment whose copy number in the genome was to be assayed. Following elution of the propagated phage, the proportion of phage that were viable on Sue hosts was expressed as a REP number (MATERIALS AND METHODS, section c), i.e. the number of viable phage on a Sue host per 10’ pfu on a
MATERIALS
lung
was
de-
size was 500
column
human during
discernible
AND
inserts
are
not
to clones represented
of the library
in
the
so that they are not
Percentage
REP number
Group
Liver cDNA
0
I
18
loo-10’
I
44
lo’-lo2
II
28
102-103
II
9
over lo3
III
1
0
I
28
loo-lo’
I
29
lo’-10’
II
35
102-IO3
II
2
over lo3
III
6
0
I
15
loo-lo’
I
20
DNA
a
which
et al., 1978) or have been
Library
Genomic
for the
that would
in this analysis.
Lung cDNA
c)
of 0. Thus, clones showing
I correspond
that
amplifications
after normalizing
clones in each library
used (Lawn
section
METHODS,
in the text are displayed.
was derived
of 0 in Table library
DNA clones
described
to give a REP number
number
diluted
(see
of nonrecombinant
bacteriophage
in genomic
insert
and genomic
in the experiments
contain
(a) Distribution of repeated sequences
cDNA
piVX
I
REP numbers
REP
DNA
into
to 1000 bp. The data are summarized in Table I, demonstrating the existence of fragments whose REP numbers varied over several orders of magnitude. It is apparent that while both genomic and cDNA libraries contain few copy, low order-repeated and moderately repeated sequences, there is a paucity of highly repeated sequences in the
be expected
RESULTS
inserted
lo’-lo2
II
19
lo’-lo3
II
12
over IO3
III
34
359
cDNA libraries constructed using high-molecularweight poly(A)+ RNA as template. The lung cDNA library, which was constructed using total cellular RNA, appears to contain more highly repeated sequences (REP number of lo3 or greater) than the liver cDNA library constructed using cyto-
CN
plasmic RNA. This finding is in accord with other studies indicating that highly repetitive sequences are removed during processing of nuclear RNA into cytoplasmic RNA (for review, see Jelinek and Schmid,
1982). We confirmed
this assertion
28s
by
hybridizing a pool of four highly repeated Alu family members (Deininger et al., 1981) to filterbound nuclear and cytoplasmic RNA from a human hepatoma cell line, Hep G-2. As depicted in Fig. 1, hybridization is greatest to high-molecularweight nuclear RNA and virtually absent in the lane containing cytoplasmic RNA. These results contrast with our findings using HeLa RNA, where
-
cytoplasmic RNA containing highly repetitive sequences was observed (Kurnit et al., 1982).
18s
(c) Correlation between the REP number and actual sequence repetitiveness To validate the utility of the recombinationbased analysis by nucleic acid hybridization, we determined the copy number of individual clones which gave different REP numbers. Plasmid DNA mini-preps were prepared from 28 genomic clones and 45 liver cytoplasmic cDNA clones, as well as from a miniplasmid containing a highly repeated Alu family sequence, piBLUR (Neve et al., 1983). These were arranged in decreasing order of REP numbers, electrophoresed on agarose gels, transferred to nitrocellulose filters, and hybridized with radiolabeled total human DNA (Fig. 2). Positive hybridization is anticipated for sequences whose copy number is lo2 or higher (Gusella et al., 1980; Shen and Maniatis, 1980; Crampton et al., 1981). Only those clones with REP numbers above lo2 showed hybridization. Only two of the cDNA clones showed even faint hybridization, consistent with the finding that the liver cDNA library has few if any highly repeated sequences. Thus, only those sequences with the highest REP numbers were indeed highly or moderately repeated in the human genome by hybridization criteria.
Fig. 1. Representation
of Alu family sequences in human liver
RNA. Nuclear (lane N) and cytoplasmic (lane C) RNA was prepared from human hepatoma Hep-G 2 cells, and 15 pg of each was electrophoresed on a formaldehyde-agarose
gel. Fol-
lowing transfer
was hy-
to nitrocellulose
paper, the RNA
bridized with a pool of nick-translated plasmid DNA samples containing the Alu family members BLUR6, and BLUR19.
BLURS,
BLUR1 1
RNA markers of 45S, 28s and 18s are denoted
on the right margin.
The size of the human DNA inserts in each of the 28 human genomic clones was determined following linearization of the miniplasmid with EcoRI. DNA insert sizes ranged from a few hundred to one thousand bp in all groups of REP numbers (Table I). All four clones that had inserts larger than 1000 bp contained repetitive elements:
Pig. 2. Comparison 50-100 five-fold
greater
transferred denotes
between
ng of plasmid
lane containing
over 1000; lanes denoted inserts.
was isofated
using piAN7.
to nitrocellulose
from plasmids
REP numbers
DNA
The DNA
DNA from piBLUR II contain
panel depicts
of highly and moderately
from mini-preps samples
filters and hybridized
giving REP numbers
The bottom
and presence
from
IS-ml
were electrophoresed,
nicked
with nick-translated
human
giving REP numbers
of 10 and under. The top two panels plasmids
with genomic
human
the respective REP numbers were 1000, 500, 20 and 13. We analyzed the correlation of the REP number with sequence repetitiveness at the lower end of the REP scale. Human DNA was digested with EcoRI, electrophoresed, transferred to nitrocellulose, and hybridized with a given radiolabeled plasmid whose REP number was known. Fig. 3 depicts the results showing that all sequences whose REP number was 3 or fewer were in fact sequences which were present in only a few copies in the genome. To data, all of 25 sequences examined with REP numbers less than 10’ have yielded blots with five or fewer bands. Further, the REP numbers of three known few-copy sequences cloned in piVX are all less than 10 (Seed, 1983; unpublished
sequences
cultures
using
by nucleic acid hybridization.
the vector
piVX;
the yield was
with short wave UV light in the presence DNA
1 (Neve et al., 19831. Lanes denoted
DNA from plasmids
repeated
overnight
under
I contain
between
depict plasmids
conditions
described
DNA from plasmids
Arrow
giving REP numbers
10 and 1000; lanes denoted with adult human
of EtBr,
previously. III contain
liver cytoplasmic
DNA cDNA
DNA inserts.
data of D.M.K.). As the REP number increased to between 10 and 25, more complex hybridization patterns were seen, either multiple bands or a single intense band, often with increased lane smear. In general, sequences with REP numbers over 40 hybridized as a smear to the overall EcoRI digest consistent with a moderate to high degree of sequence complexity in the human genome. (d) Analysis of repeated sequence organization on bacteriophage containing transcribed hutnan sequences One virtue of the recombination-based analysis is that it results in the plaque purification of bacteriophage homologous to the human DNA
361
REP numbers.
CDNA
ferred and
DNA from each plaque
to nitrocellulose hybridized
DNA
(Fig.
bridized
with
4A).
(Benton
and Davis,
radiolabeled
Most,
but
was trans-
total
not
with the total human
1977) human
all, phage
DNA
probe
hyas is
the case for random phage isolated from this library (Gusella et al., 1980; Tashima et al., 1981). One phage that was rescued by a cDNA clone with a REP number of 650 showed only weak hybridization in this experiment, indicating the absence of a highly repeated One
hundred
Alu family element plaque-purified
this cDNA clone were hybridized
on this phage.
phage
rescued
analogously
by with
human DNA, and most but not all hybridized strongly (Fig. 4B). These results demonstrate that
GENOMIC
recombinant bacteriophage containing transcribed sequences do not differ grossly from random bacteriophage in their content of highly repeated sequences. Sequences homologous to a moderately repeated cDNA clone can be found on different phage with and without highly repetitive elements. There was no clear-cut correlation between REP number of cDNA clones and presence of Alu family members on corresponding phage.
DNA
(e) Effects of mismatching on REP number
Fig. 3. Comparison complexity. from
between
DNA
piVX with adult
and from piAN panel). plasmid preparation plasmid
genomic
bands For
was was
EcoRI-digested
nick-translated
hybridized
hybridization
directly.
Each
above each lane denote DNA
“II”
used as probe.
was performed
as described
cpm/ml
1980) and DNA
radiolabeled
filter
DNA had been transferred
that was radiolabeled.
total human
the supercoiled
the plasmid
to a nitrocellulose
human
(bottom
was electro-
and Gilbert,
preparations,
cultures
(top panel)
inserts
the DNA gel and
(Maxam
piAN
1975). The numbers for the clone
eluted
inserts
DNA
preparations,
on a 3.5% polyacrylamide DNA
and actual sequence
from 3 ml overnight liver cDNA
with human
nick-translated.
lated
human
For piVX DNA
phoresed
REP numbers
was prepared
to which (Southern,
the REP number
denotes
nick-trans-
In each experiment, by Neve et al. (1983),
with 500000
Cerenkov
of hybridization
were washed
in 0.1 x SSC + 0. I Se;SDS at 50°C.
buffer.
Filters
inserted into piVX. We selected one random phage rescued by each of 46 liver cDNA miniplasmid clones and arrayed them in order of decreasing
The Alu family consists of over lo5 related but nonidentical 300 bp sequences scattered about the human genome to the extent that 95% of phage with 15- to 20-kb human DNA inserts will contain at least one member of this family (Gusella et al., 1980; Tashima et al., 1981). We obtained the REP number of individual bacteriophage from a human recombinant phage library (Lawn et al., 1978) that we plated on piVX containing members of the Alu family. The Alu family members subcloned in piVX included piBLUR and piBLUR 1 (Neve et al., 1983) and piSVA8, an Alu family member subcloned from a recombinant bacteriophage, HyG5 (which was originally isolated in a screen for genomic human y-globin sequences). The recombinant bacteriophage included HyG5 and 20 plaque-purified bacteriophage picked at random from the human library of Lawn et al., (1978). In each experiment summarized in Table II, lo6 pfu of the individual phage were plated on bacteria harboring piBLUR6, piBLUR or piSVA8, respectively. The REP number was then determined
362
363
TABLE
II
REP numbers piBLUR
of individual
and piBLUR
from H.R. Treitsman. genomic
library
numbers
obtained
Bacteriophage
human
recombinant
were described The table presents
bacteriophages
(Neve et al., 1983); piSVA8, the REP numbers
(Lawn et al., 1978) were propagated for each of these PiBLUR
obtained
on bacteria
Genomic
library
containing
after HyG5
containing
Alu family members an Alu family
REP numbers
member
or each of 20 randomly
piBLUR6,
clones with pooled phage encompassing
piBLUR the genomic
inserted
in piVX
from HyG5,
was obtained
picked phage from a human
1 or piSVA8 respectively. library
The REP
are given as well.
of plasmids
piBLUR HyG5
with three different
piBLUR
1
piSVA8
5x103
5x103
5x106
5x103
2x 104
3x103
Indioidualphuge 1
1x104
2x 104
2x 104
2
2x IO4
2x104
2x104
3
3x 10s
5x10s
1 x lo4
4
1x104
2x 10s
3x103
5
1 x 104
2x 103
3x103
6
2x 104
1 x 104
4x 104
7
2x lo4
5x103
8X 103
8
1 x IO4
2x104
4x 104
9
0
0
0
1x 104
2x104
11
8x10s 1 x lo4
5x103
2x lo3
12
2x104
5x104
6x lo4
13
1x IO3
2x103
3x 103
14
1 x IO4
1 x 104
1x104
I5
2x104
5x10’
2x 104
16
5x10s
5x103
1 x to4
17
5x102
4x 102
4x 102
18
2x103
1 x 104
2x103
19
3x 103
2x104
1 x 104
20
4x 103
I x 103
5x102
10
as outlined in MATERIALS AND NETHODS, section c. Recombination between the Alu family member in piSVA8 and the bacteriophage HyG5 from which it was derived yielded a REP number = 5 x 106, consistent with perfect homology between the DNA sequence shared by the plasmid and phage. In contrast, recombination between HyG5 and the related, but nonidentical, Alu family members in piBLUR and piBLUR 1 was depressed by three orders of magnitude (REP number = 5 x 103) (Table II). Among the 20 random phage, only one showed no homology with the piBLUR clones as evidenced by a REP number = 0. The REP numbers for the 19 other phages varied over several orders of magnitude. None of the phages appears
to contain perfect matches to any of the Alu family members subcloned in piVX, as all REP numbers obtained were at least two orders of magnitude below the REP number obtained for piSVA8 with HyG5. We have shown in the preceding paper (Neve et al., 1983) that ~smatching between rodent and human Alu family members is sufficiently great to inhibit recombination between piBLUR miniplasmids and bacteriophage carrying rodent DNA inserts. In this manner, the effects of mismatching on recombination may be exploited to retrieve bacteriophage carrying human DNA inserts from recombinant bacteriophage libraries constructed using the genomes of human-rodent somatic cell hybrids.
364
DISCUSSION
Although
(a) Relationship between quence repetitiveness
tween sequence
REP numbers
and se-
represents
a useful and rapid approximation
sequence
repetitiveness genome.
of that
In general,
be used rapidly copy, low-order
sequence
of the in the
the REP number
to categorize repeated
a sequence
can
as few
or more highly repeated.
One cause of error in the recombination-based assay is the depression of REP numbers due to sequence mismatching. This is illustrated in Table II where mismatching among Alu family members depressed recombination with a given bacteriophage, HyG5, by three orders of magnitude. Since the effects of mismatching are greatest among repeated sequence families (Sutton and McCallum, 1971), these effects would be greatest on sequences with high REP numbers. This is consistent with the finding that BLUR sequences cloned in piVX give REP numbers of only lo2 to lo4 against a human genomic library (Table II) even though there are over IO5 Alu family members in the human genome. Thus, determination of sequence homology using the recombination-based assay is analogous to determination using nucleic acid hybridization at high stringency criteria. One useful aspect of the stringency of this system is that the use of homopolymeric dG : dC tailing to clone fragments in PiVX does not result in the adventitious retrieval of phage containing dG: dC-rich stretches which would interfere with the recombination-based assay. These tails can yield nonspecific hybridization when hybridization assays are performed at low stringency (J. Posakony, personal communication). The REP number assay facilitates the isolation of few copy DNA sequences suitable for DNA mapping and polymorphism studies. In particular, the rapid separation of these sequences from loworder repetitive DNA sequences offers a major advantage of the recombination-based assay over conventional hybridization analyses. At the other end of the repetitive sequence spectrum, REP values of lo3 or more generally indicate the presence of highly repetitive DNA sequences, such as those found
in the Alu
family
was evidence repetitiveness
of correlation
(Table
II and
Fig. 2).
from several factors, including ity between zation-based
be-
as the REP number
varied from lo3 to lo’, the correlation fect. The inexactness of this correlation
The determination of a REP number for a human DNA sequence cloned in piVX or piAN
human
there
was impermay result
differential
sensitiv-
the recombination-based and hybridiassays to mismatch among and within
families of repeated DNA sequences. Errors in recombination-based analyses may also result from nonrandomness of sequence representation domness
in large genomic libraries. Such nonrancan result from cloning artifacts, con-
straints of the cloning methods used to construct a library, or unequal growth of individual phage during amplification of a library. The use of at least lo6 phage in the initial plating of these analyses minimizes these effects. Nevertheless, variation due to both biological and stochastic factors occurs during performance of the recombination-based analysis. Stochastic variations should be less important for sequences that are repeated, as such variability should be dampened over the larger numbers of phage representing repeated sequences. For few copy sequences, stochastic effects or inability to clone a specific locus with a given cloning stratagem explains why the REP value for a number of few copy sequences was nil, even though Southern blotting confirmed the presence of a human insert in the piVX vector (Fig. 3). Either these sequences were never present in the original library (Lawn et al., 1978), or they have been diluted during multiple amplifications of that library so that they were not detected in the analysis as performed (MATERIALS AND METHODS, section c). Alternatively, the REP number of few copy sequences might increase due to preferential cloning or amplification of a given sequence. Fortunately, these effects were not large enough in practice to interfere with use of REP numbers to distinguish few copy from low-order repeated sequences. To date, each of 25 sequences predicted to be few copy from the recombination-based assay hybridized to human DNA on Southern blots in a manner consistent with its REP number. (b) Distribution of repeated sequences nome and in transcribed sequences
in the ge-
Early studies of the complexity of transcribed sequences, using nucleic acid hybridization of RNA
365
to genomic likely (Klein
DNA,
indicated
to be represented
that most genes were
in “single”
et al., 1974). The design
copy
of these studies
prevented discrimination between “single’‘-copy, few-copy and low-order repeated DNA sequences. Subsequent studies of individual genes using cloned genie probes have indicated that the genomic organization
of
transcribed
complicated:
many
genes
sequences belong
families. These families include evolved
by duplication
is more
to larger
gene
related genes which
events (e.g., the ribosomal
genes and globin genes), some of which have resulted in the creation of pseudogenes (Jacq et al., 1977), processed genes (Nishioka et al., 1980), and mobile elements (Potter et al., 1979; Cameron et al., 1979). The Southern blotting patterns of cDNA clones often contain multiple bands of hybridization which may localize to more than one chromosome, even though the primary gene product maps to a single locus (Daiger et al., 1982). The data in Table I demonstrate*that this complexity of cDNA organization is observed for large numbers of randomly picked cDNA clones: 37% of both liver cytoplasmic cDNA and total lung cDNA clones had REP numbers between IO’ and 103. This frequency of low-order and moderately repeated sequences mirrored that in the total genome, where 3 1% of comparably sized DNA fragments (500 to 1000 bp) had REP numbers in this range. In contrast, the REP values in Table I showed that the cytoplasmic liver cDNA clones lacked highly repetitive sequences which were abundantly
represented
(34%) in genomic
(c) Applications of the recombination-based
assay
The recombination-based assay provides method for determining the repeat sequence ture of large numbers of clones derived from or genomic DNA. In addition to its use
a rapid struccDNA as an
DNA
DNA.
These findings are consistent with previous data (Flamm et al., 1969; Klein et al., 1974; Ryffel and McCarthy, 1974; Crampton et al., 198 1; Jelinek and Schmid, 19X2), and with the hybridizations depicted in Figs. 1 and 2, which demonstrate that Alu family sequences represented in high-molecular-weight nuclear RNA are removed during the processing of cytoplasmic RNA from nuclear precursors. Thus, Hep G-2 hepatoma cytoplasmic poly(A)+ RNA contains many different low-order and moderately repeated sequences (Fig. 3), but lacks highly repeated sequences repeated lo5 or more times in the genome, such as the Alu family or tandemly repeated simple-sequence DNAs (Flamm et al., 1969).
analytical
tool,
the system
allows
for the rapid
sorting of sequences cloned in piVX or piAN into those which are likely to be few copy, low order repeated purified DNA
or more highly repeated. The plaquebacteriophage carrying a large human insert
corresponding
to the insert
miniplasmid is isolated concomitantly. lar advantage of this system is that
in the
A particuit provides
rapid separation between few copy and low-order repeated sequences. Few copy sequences isolated in this manner from both genomic and transcribed DNAs can be used to perform chromosome mapping and locus expansion studies, and to obtain restriction fragment length polymorphisms (Botstein et al., 1980) linked to distinct areas of the genome. The separation of low-order repeated cDNA sequences which is made possible by the recombination-based assay should facilitate analysis of these sequences, which are likely to include gene families and/or transposable elements.
ACKNOWLEDGEMENTS
Support was from NIH grants K08 HD 0045301, HD 04807, HD 06276, from March of Dimes grant l-825, and from the trustees of Children’s Hospital Medical Center. We thank Brian Seed for discussions and for provision of the recombination miniplasmids piVX and piAN7, T. Maniatis for provision of the human recombinant phage library, D: Woods and E. Prochownik for provision of dC-tailed adult liver cytoplasmic cDNA, G. Goldberger and B. Jackson for provision of Hep G-2 RNA blots, and H.R. Treitsman for provision of PiSVA8 and HyG5.
REFERENCES Benton, nant
W.D. and Davis, R.W.: Screening clones
Science
by hybridization
196 (1977) 180-182.
lambda-gt
to single
plaques
recombiin situ.
366
Botstein,
R.W.:
and
renaturation
of deoxyribonucleic
of a genetic linkage map in man using restric-
J.N.
and
W.E. (Eds.),
D., White,
Construction tion fragment
R.L..
length
Skolnick,
M. and
polymorphisms.
Davis,
Am. J. Hum.
Genet.
J. and Miller, G.: Nucleic
rapid quantitative
screening
Barr
Proc.
viral
DNA.
acid spot hybridization:
of lymphoid
Natl.
Acad.
Maxam,
lines for Epstein-
Sci. USA
77 (1980)
6851-6855. Britten,
R.J. and Kohne. J.R.,
D.E. Repeated
sequences
in DNA.
Loh,
transposition
E.Y.
and
of dispersed
Davis.
repetitive
by DNA
R.W.:
Evidence
for
DNA in yeast. Cell 16
fusion
S.N.: Analysis
and cloning
of gene control
in Escherichia
coli. J.
acids, in Moldave.
ogy, Vol. 12. Academic Crampton,
J.M..
rence
chloride
in the isolation
K. (Ed.), Methods
of
in Enzymol-
Press, New York, 1968, pp. 120-129.
Davies,
of families
K.E. and Knapp,
of repeated
cloned DNA from human
T.F.:
for
lymphocytes.
R.S.
Y chromosome
and
The occur-
P.L., Jolly, C.W.:
D.J., Rubin, sequence
human
Sequences
on the gene
(1982)
298,
C.M., Friedmann, studies
T. and
of 300 nucleotide
clones.
J. Mol. Biol. 151
satellite
isolated
M.: Some prop-
from the DNA
of the
of the mouse. Mus musculus. J. Mol. Biol.
J.F., Keys, C., Varsanyi-Breiner,
of DNA segments
A., Kao. F.-T., Jones,
D.: Isolation
from specific human
chromosomes.
G.G.:
Proc.
eukaryotic
A pseudogene
in 5s
Schmid,
C.W.:
Repetitive
DNA and their expression.
W.H.,
Davidson, sequence
sequences
in
Annu. Rev. Biochem.
W.,
Attardi,
E.H.: Distribution transcripts
G.,
Britten.
of repetitive
in HeLa mRNA.
R.J.,
and
and nonrepetitive
Proc. Natl. Acad. Sci.
USA 71 (1974) 1785-1789. D.M.,
Wentworth,
B.M.,
Komaroff,
L.: Construction
of human
fetal tissues.
DeLong,
of cloned
Cytogenet.
L.
libraries
and
Villa-
from RNA
Cell Genet.
34 (1982)
R.M.,
Maniatis, delta-
Fritsch,
E.H..
T.: The isolation
and beta-globin
Parker,
R.C.,
Blake,
and characterization
genes from a cloned
G.O.
and
of linked
library
of hu-
J., Rownd,
R. and Schildkraut,
Biol.
of the
by terminal
Chem.
257
de-
(1982)
M.: Isolation
in Pseudomonas.
of non-
J. Bacterial.
126
G.A.P.,
of human
libraries
by
Dryja,
DNA
T.P.
from
a recombination
and
Kurnit,
rodent-human
process.
D.M. genomic
Gene
23 (1983)
343-354. Nishioka,
Y., Leder, A. and Leder,
S.S., Brorein
Transposition
cleanly
P.: Unusual
lost
both
Jr.. W.J., Dunsmuir,
of elements
sed repeated Ryffel,
has
alpha-globin-
globin
intervening
P. and Rubin.
G.M.:
of the 412. copia and 297 disper-
families in Drosophila. Cell 17 (1979) 4155427.
G.U. and McCarthy,
B.J.: Polyadenylated
to repeated
DNA
in mouse
RNA com-
L-cells.
Biochem-
istry 14 (1975) 1385- 1389. Schmid,
C.W. and Deininger.
the human
genome.
libraries
by
P.L.: Sequence
organization
of
Cell 6 (1975) 345-358.
of genomic
recombination
sequences and
from bacteriophage
selection
in viva.
Nucl.
Acids Res. 11 (1983) 2427-2445. Shen, C.-K.J.
and Maniatis, in a cluster
T.: The organization
of rabbit
beta-like
of repetitive
globin genes. Cell
19 (1980) 379-391. D.M.: Satellite DNA’s.
Southern,
E.M.: Detection separated
Bioscience
of specific
27 (1977) 790-796.
sequences
by gel electrophoresis.
among
DNA
J. Mol. Biol. 98
(1975) 503-517. J.G.,
Common
Mimer,
R.J.,
82-nucleotide
Proc. Natl. Acad. W.D.
and
reassociation
Bloom, sequence
F.E. and unique
Lerner,
R.A.:
to brain
RNA.
Sci. USA 79 (1982) 4942-4946. McCallum,
of satellite
M.: Strand DNA.
mismatching
(1971) Nature
and
New Biol.
232, 83-85. Tashima.
M., Calabretta,
B., Torelli,
A. and Saunders,
G.F.:
widely
sequence
dispersed
Presence
G., Scofield,
M., Maizel.
of a highly repetitive
in the human
and
genome.
Proc.
H.C. and Erba, H.P.: Satellite
DNA
Natl. Acad. Sci. USA 78 (1981) 1508-1512. Varley,
J.M., Macgregor, on
lampbrush
(1980) 686-688. Waring, M. and Britten, A rapidly
chromosomes.
R.J.: Nucleotide
reassociating
fraction
Nature
sequence
of mouse
283
repetition:
DNA.
Science
154 (1966) 791-794.
man DNA. Cell 15 (1978) 1157-1174. Marmur,
J.
J. and Weisburd, mutants
Bruns,
is transcribed
193-203. Lawn,
R.L.,
Retrieval
Sutton,
Murphy,
Characterization
catalyzed
(1978) 177-182.
Sutcliffe,
51 (1982) 813-844.
Kurnit,
L., Cohen,
fragments
DNA of Xenopus lueuis. Cell 12 (1977) 109-120. and
S.H.:
reaction
transferase.
sense supressor
Skinner,
and localization
Natl. Acad. Sci. USA 77 (1980) 2829-2833. Jacq, C., Miller, J.R. and Brownlee,
Orkin,
14773-14782.
sequences
C., Puck, T.T. and Housman.
W.R.
and tailing
Seed. B.: Purification P.M.B. and McCallum,
40 (1969) 423-443.
Klein,
65. Academic
Proc. Natl. Acad. Sci. USA 77 (1980) 2806-2809.
Nature
DNA
erties of the single strands
Jelinek,
Vol.
sequences.
to the autosomal
synthetase.
Base
repeated
W.G., Walker,
Gusella,
in Enzymology,
DNA
L. and Moldave,
Nucl. Acids Res. 9
(1981) 17-33.
nuclear
A.M.
plementary
renatured Flamm,
Methods
end-labeled
in Grossman,
that
682-684. Schmid,
Acids
1961, pp.
like gene
in a library
Su, T-S.:
homologous
argininosuccinate
Deininger,
New York,
W.: Sequencing
cleavages,
oxynucleotidyl
Potter,
S.P., Wildin,
human
Press,
of
sequences
(1981) 3821-3834. Daiger,
Michelson,
Neve,
Mol. Biol. 138 (1980) 179-207. Cox, R.A.: The use of guanidinium nucleic
K. (Eds.).
Mindich,
M.J. and Cohen,
signals
A.M. and Gilbert,
with base-specific
homopolymer
(1979) 739-751. Casadaban,
Vol. 1. Academic
in Davidson,
in Nucleic
Press. New York 1980, pp. 499-560.
Science 161 (1968) 529-540. Cameron,
Research,
acid,
Progress
231-300.
32 (1980) 314-331. Brandsma,
Cohen,
C.L.:
Denaturation
Wetmur,
J.G. and Davidson,
N.: Kinetics
of renaturation
of
361
DNA. J. Mol. Biol. (1968) 31, 349-370. Woods,
D.E.,
Markham,
and Colten,
H.R.:
complement
protein
Isolation
tibility gene product. 5661-5665.
A.F.. factor
Ricker. of cDNA
Young,
A.T.,
Goldberger,
G.
clones for the human
B.D., Hell, A. and Bimie,
human
ribosomal
gene number.
(1976) 539-548.
B, a class III major histocompa-
(1982) Proc. Natl. Acad. Sci. USA 79.
Communicated
by N. Fedoroff.
G.D.:
A new estimate
Biochim. Biophys.
of
Acta 454