Comparison of sequence repetitiveness of human cDNA and genomic DNA using the miniplasmid vector, piVX

Comparison of sequence repetitiveness of human cDNA and genomic DNA using the miniplasmid vector, piVX

355 Gene, 23 (1983) 355-367 Elsevier GEN 00809 Comparison of sequence repetitiveness of ham vector, piVX (Recombinant DNA; insert-mediated genet...

2MB Sizes 0 Downloads 17 Views

355

Gene, 23 (1983) 355-367 Elsevier

GEN 00809

Comparison of sequence repetitiveness of ham vector, piVX (Recombinant

DNA;

insert-mediated

genetic

cDNA and genomic DNA using the miniplasmid

recombination;

repeated

DNA

sequences;

human

gene

mapping)

Rachael L. Neve a and David M. Kurnit a3b of Clinical Genetics, Children5

a Division ’ Division

of

Hospital Medical Center, Boston, MA 0211.5, Tel. (617) 135- 7240, and Genetics, Brigham and Women’s Hospital, Boston, MA 02115 (U.S.A.) Tel. (617) 732-2296

(Received

February

(Revision

received April 20th, 1983)

2nd, 1983)

(Accepted

April 29th. 1983)

SUMMARY

We studied the sequence repetitiveness of human cDNA and genomic DNA fragments inserted in the min~plasmid piVX. Sequence repetitiveness was assayed by the frequency with which a given insert mediated recombination between the chimeric mi~plasmid and a recombinant bacteriophage library constructed from large random human genomic fragments. The methodology allows rapid analysis and isolation of sequences of a given copy number in the genome: few (1 to 10 copies), low order-repeated (10 to 100 copies) and more highly repeated (over 100 copies). In a model application of the method, the distribution of these classes of sequences was compared in cDNA and genomic DNA libraries constructed in piVX. The major difference observed between cDNA and genomic DNA paucity of highly repeated elements in cDNA copies from high-molecular-weight

repeat structure was the cytoplasmic poly(A) +

RNA.

Although successful was first demonstrated

Abbreviations: kilobase adenyiated;

bp, base

pairs;

pfu,

nucIeic acid hybridization with an in vivo biological

pairs;

REP number,

Nas‘citrate,

037%

dodecyl

pH 7.5;

I I 19/83/$03.00

sulfate;

ethidium units;

normalized

pfu on Su hosts: see MATERIALS c; SDS, sodium

EtBr,

plaque-forming

ratio

AND

bromide;

poly(A)+,

kb, poly-

of pfu on Sue to

METHODS,

SSC, 0.15 M NaCl,

[ 1.indicates pIasmid-carrier

section 0.015 M

state.

0 I983 Elsevier Science Publishers

B.V.

assay (Marmur and Lane, 1960), sequence homology assays determined by hybridization (Rownd et al., 1963) have been performed almost exclusively using physicochemical criteria in vitro. Recently, Seed (1983) developed a methodology that utilizes recombination in vivo to detect nucleotide sequence homology between a DNA fragment cloned in a specialized miniplasmid and a mammalian genomic sequence represented in a recombinant bacteriophage library. The availability of this convenient assay to study nudeotide sequence homol-

356

ogy in a biologically relevant manner should facilitate study of aspects of genome evolution that

of larger genes by RNA polymerase II, or as 300

depend on recombination-based

processes, includ-

bp transcripts by RNA polymerase III (for review,

ing gene duplication, gene conversion, sister chro-

see Jelinek and Schmid, 1982). Although some transcribed sequences comprise 50 to 1000 copies

matid exchange and chromosome translocation. In this report, we detail the usefulness of this recombination-based

assay for determining the degree of

transcribed, either as part of intervening sequences

in the genome (Ryffel Crampton et al., 1981)

and McCarthy, 1975; their origin is obscure,

sequence repetitiveness of a cloned DNA sequence

with the exception of particular gene families such

in the genome from which it was derived.

as ribosomal DNA (Young et al., 1976) or some

studies of bulk ge-

repetitive

mobile

(Potter

et al., 1979;

nomic DNA from a variety of eukaryotes have

Nucleic acid hybridization

Cameron

et al., 1979). Transcribed

repeated se-

permitted the ascertainment of different families of related sequences which may be tandemly clus-

quences of still lower copy number include gene families with or without pseudogenes (Jacq et al.,

tered or widely scattered, and whose reiteration frequency within the genome can vary from IO’ to IO6 copies (Waring and Britten, 1966; Britten and Kohne, 1968; Wetmur and Davidson, 1968; Skinner, 1977; Schmid and Deininger, 197.5; Dein-

1977). processed genes (Nishioka et al., 1980) and organ-specific identification sequences (Sutcliffe et

inger et al., 1981). Recombinant DNA technology has enabled the study of repeated sequences in eukaryotes to advance from the description of the existence of repeated families to the isolation and analysis of members of these families. Rapid determination of whether an individual cloned sequence has a copy number of at least lo* can be accomplished by hybridizing cloned DNA fixed to filters with radiolabeled total genomic DNA (Gusella et al., 1980; Shen and Maniatis, 1980; Crampton et al., 1981). However, analysis of lower order repeats requires radiolabeling of each individual clone followed by reassociation kinetics (Britten and Kohne, 1968; Wetmur and Davidson, 1968), Southern (1975) blot, or dot blot (Brandsma and Miller, 1980) analyses. We demonstrate that an adaptation of the recombination-based approach of Seed (1983) allows more rapid discrimination of few copy (repeat frequency less than 10’) from low-order repeated (lo’-lo*) sequences than is feasible with nucleic acid hybridization technology. Transcribed sequences represent a specific subset of the genome and include some, but not all, repeated sequence families (Klein et al., 1974; Ryffel and McCarthy, 1975; Crampton et al., 1981). In general, highly repeated, tandemly clustered, simple sequence DNAs are not transcribed (Flamm et al., 1969) except in unusual circumstances (Varley et al., 1980). The interspersed highly repeated sequences of the Alu family are

elements

al., 1982). Previous analyses of the sequence complexity of transcribed sequences utilized nucleic acid hybridization (Klein et al., 1974; Ryffel and McCarthy, 1975; Crampton et al., 1981). We confirm and extend the results of these studies by using recombination-based assays to compare the repeat sequence structure of cloned cDNA and genomic DNA fragments. These studies confirm the paucity of highly repeated sequences in high molecular weight cytoplasmic poly(A) + RNA and permit the rapid isolation of individual transcribed sequences of few copy, low-order repeated and more highly repeated copy number, respectively.

MATERIALS

(a)

AND

METHODS

Recombination

miniplasmids

and

bacterial

strains The miniplasmids piVX and piAN were obtained from B. Seed. piAN7, constructed by H. Huang, differs from piVX (Seed, 1983) in the content of the ColEl replicon and the orientation of the polylinker, which results in a five-fold greater copy number for piAN in the host bacterial cell. These miniplasmids were propagated in MCl061[p3], a Sue strain (Casadaban and Cohen, 1980) which carries plasmid p3 (described in Neve et al., 1983). To detect bacteriophage carrying piVX or piAN7, the IacZ amber mutant LG75 was used as detailed previously (Neve et al., 1983). The Su strain used in these studies was LE392, which

351

contains plating,

a chromosomal growth

and

scribed previously

supP

elution

gene.

of phage

Details

of

were

de-

(b) Construction of genomic and cDNA libraries in recombination miniplasmids piVX was linearized Linearized

lowing

electrophoresis

with PstI and piAN molecules

fol-

in a 3.5% acrylamide

gel,

eluted in 0.5 M ammonium and

Gilbert,

with

were isolated

1980), diluted

acetate buffer (Maxam to 0.2 M ammonium

acetate and chromatographed on RPCV (Schleicher and Schuell). The DNA was recovered in 0.5 ml and ethanol-precipitated. PstI-digested piVX was dG-tailed (Michelson and Orkin, 1982) with terminal transferase to an average tail length of 15-20 bases. Tailing conditions were determined by digesting PstI-linearized piVX with BamHI, 5’-end labeling at this site, and monitoring the shift in size of the small BamHIPstI fragment on an acrylamide gel subsequent to a model tailing reaction. Double-stranded dCtailed cDNA from the cytoplasmic poly(A)+RNA of adult human liver was a gift of E. Prochownik and D. Woods (Woods et al., 1982). This cDNA was sized to exclude fragments shorter than 500 bp. Total cellular lung fibroblast poly(A)+RNA was isolated from a 20-week fetus with trisomy 21 (47, XX + 21) by guanidine hydrochloride extraction (Cox, ‘1968). Fetal material was obtained under a protocol approved by the Institutional Review Board at Brigham and Women’s Hospital (Kurnit et al., 1982). Double-stranded cDNA was prepared, sized to exclude fragments shorter than 500 bp and dC-tailed as described (Michelson and Orkin, 1982; Woods et al., 1982; Kurnit et al., 1982). Transformation of MC1061[p3] with dCcDNA annealed to dG-piVX yielded 60 recombinant colonies/ng of tailed cDNA on plates containing 50 pg/ml kanamycin, 10 pg/ml tetracycline and 12.5 pg/ml ampicillin. Only 3% of colonies represented nonrecombinants without cDNA inserts. BamHI-digested piAN was ligated using T4 DNA ligase to a complete Mb01 digest of human female DNA (gift of L. Kunkel). The ligated mixture was digested with BamHI to minimize nonrecombinants and used to transform MC1061[p3],

20 colonies/ng

15% of colonies age insert

(Neve et al., 1983).

BarnHI.

yielding

libraries termined digested

of MboI-digested

were nonrecombinant.

size of both cDNA ranged from by monitoring plasmids

DNA. The aver-

and genomic

DNA

500 to 1000 bp as dethe size of endonuclease-

on an 0.8% agarose gel.

(c) Analysis of recombination frequencies: the REP number We assayed the sequence human genome of individual

repetitiveness in the human DNA frag-

ments by determining the frequency with which a fragment occurred in a recombinant bacteriophage library carrying human genomic DNA inserts (Lawn et al., 1978). When amber mutant bacteriophage are grown in cells harboring piVX or piAN with a human DNA insert, phage which share human DNA sequence homology with that insert can incorporate a copy of piVX or piAN by recombination between the homologous human DNA sequences. Such phage thus acquire the supF gene of piVX or piAN (Seed, 1983) and can grow on Sue hosts. lo6 pfu from the human recombinant bacteriophage ,library (Lawn et al., 1978) were plated on bacteria carrying piVX or piAN with a human DNA insert to be analyzed. Following confluent lysis and elution of the phage, the resulting phage were plated on both Su (LE392) and Sue (LG75) hosts. The ratio of pfu on LG75 vs. LE392 should reflect the frequency of occurrence of sequences in the bacteriophage library which are homologous to the human DNA cloned in piVX or piAN7. This ratio is termed the REP number. For recombinants in piVX, REP was expressed as number of pfu on LG75 which resulted after plating an aliquot of phage stock equivalent to that which yielded 5 X lo* pfu on LE392. For recombinants in piAN7, REP was expressed as number of pfu on LG75 from 10’ pfu on LE392. This five-fold difference reflects the increased copy number of piAN vs. piVX which makes piAN roughly five-fold more efficient than piVX for the same human DNA insert in this recombinational assay (unpublished data of D.M.K.). The REP number was zero in multiple control experiments using piVX or piAN alone without a human insert. The REP number scale was chosen so that a

358

REP

number

of one represented

repetitiveness.

“single”

This scale was chosen

but is in agreement considerations: insert

piVX

containing

empirically,

with the following

when

given

theoretical

a bacteriophage

is propagated DNA

carrying

on bacteria homologous

copy

a

Su host.

The

data

are summarized

demonstrating

the existence

REP numbers tude.

ranged

in Table

of fragments

over four orders

I,

whose

of magni-

harboring

to the given

(b) Distribution

of repeated

sequences

in tran-

insert, one out of lo2 to lo3 of the resulting phage incorporate piVX by homologous recombination

scribed DNA sequences (cDNA clones)

and

Table II of this paper and unpublished data of D.M.K.). Somewhat under 1 out of lo5 phage in a

The distribution of repeated 90 individual clones containing

sequences among cytoplasmic liver

cDNA

total cellular

human

to carry

fibroblast

DNA

termined.

can

a given

therefore

genomic single

grow

library copy

on LG75

(Seed,

would be expected fragment

of human

1983;

(Lawn et al., 1978). Thus, it is anticipated that the frequency of recombination of a “single” copy sequence in piVX with phage from a human genomic library would be on the order of 1 out of 10’ phage. Reconstruction experiments, in which a bacteriophage carrying a DNA insert homologous to that subcloned in piVX is diluted with large numbers of nonhomologous phage. are in agreement with the above arguments. In these experiments, the proportion of phage which incorporate piVX by recombination depends directly on the proportion of phage that contain the homologous DNA insert (unpublished data of B. Seed and D.M.K.).

and 50 clones containing cDNA

The average

TABLE

for cDNA

The REP numbers obtained

The Percentage proportion

The distribution of repeated sequences among 100 individual clones containing human genomic DNA inserted into piAN was analyzed. The average human DNA insert size was 500 to 1000 bp. In 100 separate experiments, phage from a recombinant bacteriophage library containing 15- to 20-kb DNA inserts (Lawn et al., 1978) were plated on bacteria carrying piAN with a human genomic fragment whose copy number in the genome was to be assayed. Following elution of the propagated phage, the proportion of phage that were viable on Sue hosts was expressed as a REP number (MATERIALS AND METHODS, section c), i.e. the number of viable phage on a Sue host per 10’ pfu on a

MATERIALS

lung

was

de-

size was 500

column

human during

discernible

AND

inserts

are

not

to clones represented

of the library

in

the

so that they are not

Percentage

REP number

Group

Liver cDNA

0

I

18

loo-10’

I

44

lo’-lo2

II

28

102-103

II

9

over lo3

III

1

0

I

28

loo-lo’

I

29

lo’-10’

II

35

102-IO3

II

2

over lo3

III

6

0

I

15

loo-lo’

I

20

DNA

a

which

et al., 1978) or have been

Library

Genomic

for the

that would

in this analysis.

Lung cDNA

c)

of 0. Thus, clones showing

I correspond

that

amplifications

after normalizing

clones in each library

used (Lawn

section

METHODS,

in the text are displayed.

was derived

of 0 in Table library

DNA clones

described

to give a REP number

number

diluted

(see

of nonrecombinant

bacteriophage

in genomic

insert

and genomic

in the experiments

contain

(a) Distribution of repeated sequences

cDNA

piVX

I

REP numbers

REP

DNA

into

to 1000 bp. The data are summarized in Table I, demonstrating the existence of fragments whose REP numbers varied over several orders of magnitude. It is apparent that while both genomic and cDNA libraries contain few copy, low order-repeated and moderately repeated sequences, there is a paucity of highly repeated sequences in the

be expected

RESULTS

inserted

lo’-lo2

II

19

lo’-lo3

II

12

over IO3

III

34

359

cDNA libraries constructed using high-molecularweight poly(A)+ RNA as template. The lung cDNA library, which was constructed using total cellular RNA, appears to contain more highly repeated sequences (REP number of lo3 or greater) than the liver cDNA library constructed using cyto-

CN

plasmic RNA. This finding is in accord with other studies indicating that highly repetitive sequences are removed during processing of nuclear RNA into cytoplasmic RNA (for review, see Jelinek and Schmid,

1982). We confirmed

this assertion

28s

by

hybridizing a pool of four highly repeated Alu family members (Deininger et al., 1981) to filterbound nuclear and cytoplasmic RNA from a human hepatoma cell line, Hep G-2. As depicted in Fig. 1, hybridization is greatest to high-molecularweight nuclear RNA and virtually absent in the lane containing cytoplasmic RNA. These results contrast with our findings using HeLa RNA, where

-

cytoplasmic RNA containing highly repetitive sequences was observed (Kurnit et al., 1982).

18s

(c) Correlation between the REP number and actual sequence repetitiveness To validate the utility of the recombinationbased analysis by nucleic acid hybridization, we determined the copy number of individual clones which gave different REP numbers. Plasmid DNA mini-preps were prepared from 28 genomic clones and 45 liver cytoplasmic cDNA clones, as well as from a miniplasmid containing a highly repeated Alu family sequence, piBLUR (Neve et al., 1983). These were arranged in decreasing order of REP numbers, electrophoresed on agarose gels, transferred to nitrocellulose filters, and hybridized with radiolabeled total human DNA (Fig. 2). Positive hybridization is anticipated for sequences whose copy number is lo2 or higher (Gusella et al., 1980; Shen and Maniatis, 1980; Crampton et al., 1981). Only those clones with REP numbers above lo2 showed hybridization. Only two of the cDNA clones showed even faint hybridization, consistent with the finding that the liver cDNA library has few if any highly repeated sequences. Thus, only those sequences with the highest REP numbers were indeed highly or moderately repeated in the human genome by hybridization criteria.

Fig. 1. Representation

of Alu family sequences in human liver

RNA. Nuclear (lane N) and cytoplasmic (lane C) RNA was prepared from human hepatoma Hep-G 2 cells, and 15 pg of each was electrophoresed on a formaldehyde-agarose

gel. Fol-

lowing transfer

was hy-

to nitrocellulose

paper, the RNA

bridized with a pool of nick-translated plasmid DNA samples containing the Alu family members BLUR6, and BLUR19.

BLURS,

BLUR1 1

RNA markers of 45S, 28s and 18s are denoted

on the right margin.

The size of the human DNA inserts in each of the 28 human genomic clones was determined following linearization of the miniplasmid with EcoRI. DNA insert sizes ranged from a few hundred to one thousand bp in all groups of REP numbers (Table I). All four clones that had inserts larger than 1000 bp contained repetitive elements:

Pig. 2. Comparison 50-100 five-fold

greater

transferred denotes

between

ng of plasmid

lane containing

over 1000; lanes denoted inserts.

was isofated

using piAN7.

to nitrocellulose

from plasmids

REP numbers

DNA

The DNA

DNA from piBLUR II contain

panel depicts

of highly and moderately

from mini-preps samples

filters and hybridized

giving REP numbers

The bottom

and presence

from

IS-ml

were electrophoresed,

nicked

with nick-translated

human

giving REP numbers

of 10 and under. The top two panels plasmids

with genomic

human

the respective REP numbers were 1000, 500, 20 and 13. We analyzed the correlation of the REP number with sequence repetitiveness at the lower end of the REP scale. Human DNA was digested with EcoRI, electrophoresed, transferred to nitrocellulose, and hybridized with a given radiolabeled plasmid whose REP number was known. Fig. 3 depicts the results showing that all sequences whose REP number was 3 or fewer were in fact sequences which were present in only a few copies in the genome. To data, all of 25 sequences examined with REP numbers less than 10’ have yielded blots with five or fewer bands. Further, the REP numbers of three known few-copy sequences cloned in piVX are all less than 10 (Seed, 1983; unpublished

sequences

cultures

using

by nucleic acid hybridization.

the vector

piVX;

the yield was

with short wave UV light in the presence DNA

1 (Neve et al., 19831. Lanes denoted

DNA from plasmids

repeated

overnight

under

I contain

between

depict plasmids

conditions

described

DNA from plasmids

Arrow

giving REP numbers

10 and 1000; lanes denoted with adult human

of EtBr,

previously. III contain

liver cytoplasmic

DNA cDNA

DNA inserts.

data of D.M.K.). As the REP number increased to between 10 and 25, more complex hybridization patterns were seen, either multiple bands or a single intense band, often with increased lane smear. In general, sequences with REP numbers over 40 hybridized as a smear to the overall EcoRI digest consistent with a moderate to high degree of sequence complexity in the human genome. (d) Analysis of repeated sequence organization on bacteriophage containing transcribed hutnan sequences One virtue of the recombination-based analysis is that it results in the plaque purification of bacteriophage homologous to the human DNA

361

REP numbers.

CDNA

ferred and

DNA from each plaque

to nitrocellulose hybridized

DNA

(Fig.

bridized

with

4A).

(Benton

and Davis,

radiolabeled

Most,

but

was trans-

total

not

with the total human

1977) human

all, phage

DNA

probe

hyas is

the case for random phage isolated from this library (Gusella et al., 1980; Tashima et al., 1981). One phage that was rescued by a cDNA clone with a REP number of 650 showed only weak hybridization in this experiment, indicating the absence of a highly repeated One

hundred

Alu family element plaque-purified

this cDNA clone were hybridized

on this phage.

phage

rescued

analogously

by with

human DNA, and most but not all hybridized strongly (Fig. 4B). These results demonstrate that

GENOMIC

recombinant bacteriophage containing transcribed sequences do not differ grossly from random bacteriophage in their content of highly repeated sequences. Sequences homologous to a moderately repeated cDNA clone can be found on different phage with and without highly repetitive elements. There was no clear-cut correlation between REP number of cDNA clones and presence of Alu family members on corresponding phage.

DNA

(e) Effects of mismatching on REP number

Fig. 3. Comparison complexity. from

between

DNA

piVX with adult

and from piAN panel). plasmid preparation plasmid

genomic

bands For

was was

EcoRI-digested

nick-translated

hybridized

hybridization

directly.

Each

above each lane denote DNA

“II”

used as probe.

was performed

as described

cpm/ml

1980) and DNA

radiolabeled

filter

DNA had been transferred

that was radiolabeled.

total human

the supercoiled

the plasmid

to a nitrocellulose

human

(bottom

was electro-

and Gilbert,

preparations,

cultures

(top panel)

inserts

the DNA gel and

(Maxam

piAN

1975). The numbers for the clone

eluted

inserts

DNA

preparations,

on a 3.5% polyacrylamide DNA

and actual sequence

from 3 ml overnight liver cDNA

with human

nick-translated.

lated

human

For piVX DNA

phoresed

REP numbers

was prepared

to which (Southern,

the REP number

denotes

nick-trans-

In each experiment, by Neve et al. (1983),

with 500000

Cerenkov

of hybridization

were washed

in 0.1 x SSC + 0. I Se;SDS at 50°C.

buffer.

Filters

inserted into piVX. We selected one random phage rescued by each of 46 liver cDNA miniplasmid clones and arrayed them in order of decreasing

The Alu family consists of over lo5 related but nonidentical 300 bp sequences scattered about the human genome to the extent that 95% of phage with 15- to 20-kb human DNA inserts will contain at least one member of this family (Gusella et al., 1980; Tashima et al., 1981). We obtained the REP number of individual bacteriophage from a human recombinant phage library (Lawn et al., 1978) that we plated on piVX containing members of the Alu family. The Alu family members subcloned in piVX included piBLUR and piBLUR 1 (Neve et al., 1983) and piSVA8, an Alu family member subcloned from a recombinant bacteriophage, HyG5 (which was originally isolated in a screen for genomic human y-globin sequences). The recombinant bacteriophage included HyG5 and 20 plaque-purified bacteriophage picked at random from the human library of Lawn et al., (1978). In each experiment summarized in Table II, lo6 pfu of the individual phage were plated on bacteria harboring piBLUR6, piBLUR or piSVA8, respectively. The REP number was then determined

362

363

TABLE

II

REP numbers piBLUR

of individual

and piBLUR

from H.R. Treitsman. genomic

library

numbers

obtained

Bacteriophage

human

recombinant

were described The table presents

bacteriophages

(Neve et al., 1983); piSVA8, the REP numbers

(Lawn et al., 1978) were propagated for each of these PiBLUR

obtained

on bacteria

Genomic

library

containing

after HyG5

containing

Alu family members an Alu family

REP numbers

member

or each of 20 randomly

piBLUR6,

clones with pooled phage encompassing

piBLUR the genomic

inserted

in piVX

from HyG5,

was obtained

picked phage from a human

1 or piSVA8 respectively. library

The REP

are given as well.

of plasmids

piBLUR HyG5

with three different

piBLUR

1

piSVA8

5x103

5x103

5x106

5x103

2x 104

3x103

Indioidualphuge 1

1x104

2x 104

2x 104

2

2x IO4

2x104

2x104

3

3x 10s

5x10s

1 x lo4

4

1x104

2x 10s

3x103

5

1 x 104

2x 103

3x103

6

2x 104

1 x 104

4x 104

7

2x lo4

5x103

8X 103

8

1 x IO4

2x104

4x 104

9

0

0

0

1x 104

2x104

11

8x10s 1 x lo4

5x103

2x lo3

12

2x104

5x104

6x lo4

13

1x IO3

2x103

3x 103

14

1 x IO4

1 x 104

1x104

I5

2x104

5x10’

2x 104

16

5x10s

5x103

1 x to4

17

5x102

4x 102

4x 102

18

2x103

1 x 104

2x103

19

3x 103

2x104

1 x 104

20

4x 103

I x 103

5x102

10

as outlined in MATERIALS AND NETHODS, section c. Recombination between the Alu family member in piSVA8 and the bacteriophage HyG5 from which it was derived yielded a REP number = 5 x 106, consistent with perfect homology between the DNA sequence shared by the plasmid and phage. In contrast, recombination between HyG5 and the related, but nonidentical, Alu family members in piBLUR and piBLUR 1 was depressed by three orders of magnitude (REP number = 5 x 103) (Table II). Among the 20 random phage, only one showed no homology with the piBLUR clones as evidenced by a REP number = 0. The REP numbers for the 19 other phages varied over several orders of magnitude. None of the phages appears

to contain perfect matches to any of the Alu family members subcloned in piVX, as all REP numbers obtained were at least two orders of magnitude below the REP number obtained for piSVA8 with HyG5. We have shown in the preceding paper (Neve et al., 1983) that ~smatching between rodent and human Alu family members is sufficiently great to inhibit recombination between piBLUR miniplasmids and bacteriophage carrying rodent DNA inserts. In this manner, the effects of mismatching on recombination may be exploited to retrieve bacteriophage carrying human DNA inserts from recombinant bacteriophage libraries constructed using the genomes of human-rodent somatic cell hybrids.

364

DISCUSSION

Although

(a) Relationship between quence repetitiveness

tween sequence

REP numbers

and se-

represents

a useful and rapid approximation

sequence

repetitiveness genome.

of that

In general,

be used rapidly copy, low-order

sequence

of the in the

the REP number

to categorize repeated

a sequence

can

as few

or more highly repeated.

One cause of error in the recombination-based assay is the depression of REP numbers due to sequence mismatching. This is illustrated in Table II where mismatching among Alu family members depressed recombination with a given bacteriophage, HyG5, by three orders of magnitude. Since the effects of mismatching are greatest among repeated sequence families (Sutton and McCallum, 1971), these effects would be greatest on sequences with high REP numbers. This is consistent with the finding that BLUR sequences cloned in piVX give REP numbers of only lo2 to lo4 against a human genomic library (Table II) even though there are over IO5 Alu family members in the human genome. Thus, determination of sequence homology using the recombination-based assay is analogous to determination using nucleic acid hybridization at high stringency criteria. One useful aspect of the stringency of this system is that the use of homopolymeric dG : dC tailing to clone fragments in PiVX does not result in the adventitious retrieval of phage containing dG: dC-rich stretches which would interfere with the recombination-based assay. These tails can yield nonspecific hybridization when hybridization assays are performed at low stringency (J. Posakony, personal communication). The REP number assay facilitates the isolation of few copy DNA sequences suitable for DNA mapping and polymorphism studies. In particular, the rapid separation of these sequences from loworder repetitive DNA sequences offers a major advantage of the recombination-based assay over conventional hybridization analyses. At the other end of the repetitive sequence spectrum, REP values of lo3 or more generally indicate the presence of highly repetitive DNA sequences, such as those found

in the Alu

family

was evidence repetitiveness

of correlation

(Table

II and

Fig. 2).

from several factors, including ity between zation-based

be-

as the REP number

varied from lo3 to lo’, the correlation fect. The inexactness of this correlation

The determination of a REP number for a human DNA sequence cloned in piVX or piAN

human

there

was impermay result

differential

sensitiv-

the recombination-based and hybridiassays to mismatch among and within

families of repeated DNA sequences. Errors in recombination-based analyses may also result from nonrandomness of sequence representation domness

in large genomic libraries. Such nonrancan result from cloning artifacts, con-

straints of the cloning methods used to construct a library, or unequal growth of individual phage during amplification of a library. The use of at least lo6 phage in the initial plating of these analyses minimizes these effects. Nevertheless, variation due to both biological and stochastic factors occurs during performance of the recombination-based analysis. Stochastic variations should be less important for sequences that are repeated, as such variability should be dampened over the larger numbers of phage representing repeated sequences. For few copy sequences, stochastic effects or inability to clone a specific locus with a given cloning stratagem explains why the REP value for a number of few copy sequences was nil, even though Southern blotting confirmed the presence of a human insert in the piVX vector (Fig. 3). Either these sequences were never present in the original library (Lawn et al., 1978), or they have been diluted during multiple amplifications of that library so that they were not detected in the analysis as performed (MATERIALS AND METHODS, section c). Alternatively, the REP number of few copy sequences might increase due to preferential cloning or amplification of a given sequence. Fortunately, these effects were not large enough in practice to interfere with use of REP numbers to distinguish few copy from low-order repeated sequences. To date, each of 25 sequences predicted to be few copy from the recombination-based assay hybridized to human DNA on Southern blots in a manner consistent with its REP number. (b) Distribution of repeated sequences nome and in transcribed sequences

in the ge-

Early studies of the complexity of transcribed sequences, using nucleic acid hybridization of RNA

365

to genomic likely (Klein

DNA,

indicated

to be represented

that most genes were

in “single”

et al., 1974). The design

copy

of these studies

prevented discrimination between “single’‘-copy, few-copy and low-order repeated DNA sequences. Subsequent studies of individual genes using cloned genie probes have indicated that the genomic organization

of

transcribed

complicated:

many

genes

sequences belong

families. These families include evolved

by duplication

is more

to larger

gene

related genes which

events (e.g., the ribosomal

genes and globin genes), some of which have resulted in the creation of pseudogenes (Jacq et al., 1977), processed genes (Nishioka et al., 1980), and mobile elements (Potter et al., 1979; Cameron et al., 1979). The Southern blotting patterns of cDNA clones often contain multiple bands of hybridization which may localize to more than one chromosome, even though the primary gene product maps to a single locus (Daiger et al., 1982). The data in Table I demonstrate*that this complexity of cDNA organization is observed for large numbers of randomly picked cDNA clones: 37% of both liver cytoplasmic cDNA and total lung cDNA clones had REP numbers between IO’ and 103. This frequency of low-order and moderately repeated sequences mirrored that in the total genome, where 3 1% of comparably sized DNA fragments (500 to 1000 bp) had REP numbers in this range. In contrast, the REP values in Table I showed that the cytoplasmic liver cDNA clones lacked highly repetitive sequences which were abundantly

represented

(34%) in genomic

(c) Applications of the recombination-based

assay

The recombination-based assay provides method for determining the repeat sequence ture of large numbers of clones derived from or genomic DNA. In addition to its use

a rapid struccDNA as an

DNA

DNA.

These findings are consistent with previous data (Flamm et al., 1969; Klein et al., 1974; Ryffel and McCarthy, 1974; Crampton et al., 198 1; Jelinek and Schmid, 19X2), and with the hybridizations depicted in Figs. 1 and 2, which demonstrate that Alu family sequences represented in high-molecular-weight nuclear RNA are removed during the processing of cytoplasmic RNA from nuclear precursors. Thus, Hep G-2 hepatoma cytoplasmic poly(A)+ RNA contains many different low-order and moderately repeated sequences (Fig. 3), but lacks highly repeated sequences repeated lo5 or more times in the genome, such as the Alu family or tandemly repeated simple-sequence DNAs (Flamm et al., 1969).

analytical

tool,

the system

allows

for the rapid

sorting of sequences cloned in piVX or piAN into those which are likely to be few copy, low order repeated purified DNA

or more highly repeated. The plaquebacteriophage carrying a large human insert

corresponding

to the insert

miniplasmid is isolated concomitantly. lar advantage of this system is that

in the

A particuit provides

rapid separation between few copy and low-order repeated sequences. Few copy sequences isolated in this manner from both genomic and transcribed DNAs can be used to perform chromosome mapping and locus expansion studies, and to obtain restriction fragment length polymorphisms (Botstein et al., 1980) linked to distinct areas of the genome. The separation of low-order repeated cDNA sequences which is made possible by the recombination-based assay should facilitate analysis of these sequences, which are likely to include gene families and/or transposable elements.

ACKNOWLEDGEMENTS

Support was from NIH grants K08 HD 0045301, HD 04807, HD 06276, from March of Dimes grant l-825, and from the trustees of Children’s Hospital Medical Center. We thank Brian Seed for discussions and for provision of the recombination miniplasmids piVX and piAN7, T. Maniatis for provision of the human recombinant phage library, D: Woods and E. Prochownik for provision of dC-tailed adult liver cytoplasmic cDNA, G. Goldberger and B. Jackson for provision of Hep G-2 RNA blots, and H.R. Treitsman for provision of PiSVA8 and HyG5.

REFERENCES Benton, nant

W.D. and Davis, R.W.: Screening clones

Science

by hybridization

196 (1977) 180-182.

lambda-gt

to single

plaques

recombiin situ.

366

Botstein,

R.W.:

and

renaturation

of deoxyribonucleic

of a genetic linkage map in man using restric-

J.N.

and

W.E. (Eds.),

D., White,

Construction tion fragment

R.L..

length

Skolnick,

M. and

polymorphisms.

Davis,

Am. J. Hum.

Genet.

J. and Miller, G.: Nucleic

rapid quantitative

screening

Barr

Proc.

viral

DNA.

acid spot hybridization:

of lymphoid

Natl.

Acad.

Maxam,

lines for Epstein-

Sci. USA

77 (1980)

6851-6855. Britten,

R.J. and Kohne. J.R.,

D.E. Repeated

sequences

in DNA.

Loh,

transposition

E.Y.

and

of dispersed

Davis.

repetitive

by DNA

R.W.:

Evidence

for

DNA in yeast. Cell 16

fusion

S.N.: Analysis

and cloning

of gene control

in Escherichia

coli. J.

acids, in Moldave.

ogy, Vol. 12. Academic Crampton,

J.M..

rence

chloride

in the isolation

K. (Ed.), Methods

of

in Enzymol-

Press, New York, 1968, pp. 120-129.

Davies,

of families

K.E. and Knapp,

of repeated

cloned DNA from human

T.F.:

for

lymphocytes.

R.S.

Y chromosome

and

The occur-

P.L., Jolly, C.W.:

D.J., Rubin, sequence

human

Sequences

on the gene

(1982)

298,

C.M., Friedmann, studies

T. and

of 300 nucleotide

clones.

J. Mol. Biol. 151

satellite

isolated

M.: Some prop-

from the DNA

of the

of the mouse. Mus musculus. J. Mol. Biol.

J.F., Keys, C., Varsanyi-Breiner,

of DNA segments

A., Kao. F.-T., Jones,

D.: Isolation

from specific human

chromosomes.

G.G.:

Proc.

eukaryotic

A pseudogene

in 5s

Schmid,

C.W.:

Repetitive

DNA and their expression.

W.H.,

Davidson, sequence

sequences

in

Annu. Rev. Biochem.

W.,

Attardi,

E.H.: Distribution transcripts

G.,

Britten.

of repetitive

in HeLa mRNA.

R.J.,

and

and nonrepetitive

Proc. Natl. Acad. Sci.

USA 71 (1974) 1785-1789. D.M.,

Wentworth,

B.M.,

Komaroff,

L.: Construction

of human

fetal tissues.

DeLong,

of cloned

Cytogenet.

L.

libraries

and

Villa-

from RNA

Cell Genet.

34 (1982)

R.M.,

Maniatis, delta-

Fritsch,

E.H..

T.: The isolation

and beta-globin

Parker,

R.C.,

Blake,

and characterization

genes from a cloned

G.O.

and

of linked

library

of hu-

J., Rownd,

R. and Schildkraut,

Biol.

of the

by terminal

Chem.

257

de-

(1982)

M.: Isolation

in Pseudomonas.

of non-

J. Bacterial.

126

G.A.P.,

of human

libraries

by

Dryja,

DNA

T.P.

from

a recombination

and

Kurnit,

rodent-human

process.

D.M. genomic

Gene

23 (1983)

343-354. Nishioka,

Y., Leder, A. and Leder,

S.S., Brorein

Transposition

cleanly

P.: Unusual

lost

both

Jr.. W.J., Dunsmuir,

of elements

sed repeated Ryffel,

has

alpha-globin-

globin

intervening

P. and Rubin.

G.M.:

of the 412. copia and 297 disper-

families in Drosophila. Cell 17 (1979) 4155427.

G.U. and McCarthy,

B.J.: Polyadenylated

to repeated

DNA

in mouse

RNA com-

L-cells.

Biochem-

istry 14 (1975) 1385- 1389. Schmid,

C.W. and Deininger.

the human

genome.

libraries

by

P.L.: Sequence

organization

of

Cell 6 (1975) 345-358.

of genomic

recombination

sequences and

from bacteriophage

selection

in viva.

Nucl.

Acids Res. 11 (1983) 2427-2445. Shen, C.-K.J.

and Maniatis, in a cluster

T.: The organization

of rabbit

beta-like

of repetitive

globin genes. Cell

19 (1980) 379-391. D.M.: Satellite DNA’s.

Southern,

E.M.: Detection separated

Bioscience

of specific

27 (1977) 790-796.

sequences

by gel electrophoresis.

among

DNA

J. Mol. Biol. 98

(1975) 503-517. J.G.,

Common

Mimer,

R.J.,

82-nucleotide

Proc. Natl. Acad. W.D.

and

reassociation

Bloom, sequence

F.E. and unique

Lerner,

R.A.:

to brain

RNA.

Sci. USA 79 (1982) 4942-4946. McCallum,

of satellite

M.: Strand DNA.

mismatching

(1971) Nature

and

New Biol.

232, 83-85. Tashima.

M., Calabretta,

B., Torelli,

A. and Saunders,

G.F.:

widely

sequence

dispersed

Presence

G., Scofield,

M., Maizel.

of a highly repetitive

in the human

and

genome.

Proc.

H.C. and Erba, H.P.: Satellite

DNA

Natl. Acad. Sci. USA 78 (1981) 1508-1512. Varley,

J.M., Macgregor, on

lampbrush

(1980) 686-688. Waring, M. and Britten, A rapidly

chromosomes.

R.J.: Nucleotide

reassociating

fraction

Nature

sequence

of mouse

283

repetition:

DNA.

Science

154 (1966) 791-794.

man DNA. Cell 15 (1978) 1157-1174. Marmur,

J.

J. and Weisburd, mutants

Bruns,

is transcribed

193-203. Lawn,

R.L.,

Retrieval

Sutton,

Murphy,

Characterization

catalyzed

(1978) 177-182.

Sutcliffe,

51 (1982) 813-844.

Kurnit,

L., Cohen,

fragments

DNA of Xenopus lueuis. Cell 12 (1977) 109-120. and

S.H.:

reaction

transferase.

sense supressor

Skinner,

and localization

Natl. Acad. Sci. USA 77 (1980) 2829-2833. Jacq, C., Miller, J.R. and Brownlee,

Orkin,

14773-14782.

sequences

C., Puck, T.T. and Housman.

W.R.

and tailing

Seed. B.: Purification P.M.B. and McCallum,

40 (1969) 423-443.

Klein,

65. Academic

Proc. Natl. Acad. Sci. USA 77 (1980) 2806-2809.

Nature

DNA

erties of the single strands

Jelinek,

Vol.

sequences.

to the autosomal

synthetase.

Base

repeated

W.G., Walker,

Gusella,

in Enzymology,

DNA

L. and Moldave,

Nucl. Acids Res. 9

(1981) 17-33.

nuclear

A.M.

plementary

renatured Flamm,

Methods

end-labeled

in Grossman,

that

682-684. Schmid,

Acids

1961, pp.

like gene

in a library

Su, T-S.:

homologous

argininosuccinate

Deininger,

New York,

W.: Sequencing

cleavages,

oxynucleotidyl

Potter,

S.P., Wildin,

human

Press,

of

sequences

(1981) 3821-3834. Daiger,

Michelson,

Neve,

Mol. Biol. 138 (1980) 179-207. Cox, R.A.: The use of guanidinium nucleic

K. (Eds.).

Mindich,

M.J. and Cohen,

signals

A.M. and Gilbert,

with base-specific

homopolymer

(1979) 739-751. Casadaban,

Vol. 1. Academic

in Davidson,

in Nucleic

Press. New York 1980, pp. 499-560.

Science 161 (1968) 529-540. Cameron,

Research,

acid,

Progress

231-300.

32 (1980) 314-331. Brandsma,

Cohen,

C.L.:

Denaturation

Wetmur,

J.G. and Davidson,

N.: Kinetics

of renaturation

of

361

DNA. J. Mol. Biol. (1968) 31, 349-370. Woods,

D.E.,

Markham,

and Colten,

H.R.:

complement

protein

Isolation

tibility gene product. 5661-5665.

A.F.. factor

Ricker. of cDNA

Young,

A.T.,

Goldberger,

G.

clones for the human

B.D., Hell, A. and Bimie,

human

ribosomal

gene number.

(1976) 539-548.

B, a class III major histocompa-

(1982) Proc. Natl. Acad. Sci. USA 79.

Communicated

by N. Fedoroff.

G.D.:

A new estimate

Biochim. Biophys.

of

Acta 454