A simple method for identifying the palindromic sequences recognized by restriction endonucleases: the nucleotide sequence of the AvaII site

A simple method for identifying the palindromic sequences recognized by restriction endonucleases: the nucleotide sequence of the AvaII site

Gene, 4 ( 1 9 7 8 ) 1--23 1 © Elsevier/North-Holland Biomedical Press, Amsterdam -- Printed in The Netherlands A SIMPLE M E ~ O D FOR I D E ~ I F Y...

2MB Sizes 11 Downloads 123 Views

Gene, 4 ( 1 9 7 8 ) 1--23

1

© Elsevier/North-Holland Biomedical Press, Amsterdam -- Printed in The Netherlands

A SIMPLE M E ~ O D FOR I D E ~ I F Y I N G THE PALINDROMIC SEQUENCESRECOGNiZED BY RESTRICTION ENDONUCLEASES: ~ ~CLEO~E SEQUENCE OF THE A v a i l SITE (DNA sequencing; computer search; tetra-, penta-, hexanucleotides; bacteriophage $X174; ~ SV40) CAM!L FUCHS

Department of Statistics, University of Wisconsin, Madison, WI 53706 (U.S.A.) and ERIC C. ROSENVOLD, ALIK HONIGMAN* and WACLAW SZYBALSKI**

MeArdle Laboratory for Cancer Research, University of Wisconsin, Madison, WI 53 706 (U.S.A.) (Received November 10th, 1977) (Accepted May 15th, 1978)

SUMMARY

Tables specifying the frequencies, distances between and positions of all possible tetra-, penta- and hexanucleotide palindromes*** in ¢X174 and SV40 viral DNAs were prepared by a computer search of their base sequences. A simple method based on these tables is described for identifying the sequence recognized by any specific restriction endonuclease. The method requires experimental determination of the number and approximate sizes of the fragments obtained by digestion of ~X174 RF and SV40 DNAs. U~ing this method we identified the sequence for the AvaII restriction endonuclease as

5'.GG(#)cc.

INTRODUCTION

Most type II restriction endonucleases recognize short palindromic sequences***, generally four to six nucleotide pairs long (Nathans and Smith, 1975). In addition, a few restriction enzymes recognize a set of related hexanucleotide sequences, some Of which are not strictly palindromic. For example endonucl~ase AwI reco~izes sequences of the type CYCGRG, where Y he purines (A or G). 'CGAG are elements in Hadassah Medical School, ** T o whom reprintrequests should be addressed. * **See legend ~to Table I.

the set CYCGRG. Nucleotide sequences, including those recognized by restriction enzymes have usually been determined by any of several methods of direct chemical analysis (Kelly and Smith, 1970; Roberts, 1976; Maxam and Gilbert, 1977; Sanger et ai., 1977). These powerful methods, however, require a series of biochemical operations, including radioactive end l a b e ~ g of the DNA fragments. We have developed an alternative ana very simple method for identifying the sequence recognized by a restriction enzyme, taking advantage of the fact that the complete base sequences of two small viral DNAs, ~X174 and SV40, were recently determined (Sanger et al., 1977; Reddy et al., 1978). The sequence of ~X174 has been revised sL-lcepublication (Sanger, personal communication). The tables in this article are based upon the revised version. In this communication we outline the principle of the procedure and provide the necessary tables for identifying a specific tetra-, penta-, or hexanucleotidepair palindromic sequence recognized by a restriction endonuclease. The tables also list the frequencies, distances between and positions of set~ of related hexanucleotide sequences. For the sake of simplicity, in this article we use the term "palindrome" to include sets of related hexanucleotide sequences as well as true palindromes. METHODS

The principle of the procedure Our method for identifying the recognition sequence of a restriction enzyme is based on the principle that the pattern of bands observed following gel electrophoresis of a DNA cleaved by a restriction endonuclease is unique and specific for the nucleotide sequence recognized by that enzyme. The procedure requires only that the number and sizes of fragments produced by cleavage of one or more sequenced DNAs be compared with the frequencies of occurrence of and the distances between palindromes in such DNAs. Tables I-ILX list the frequencies, distances between and locations of all possible tetra-, penta- and hexanucleotide palindromes in the DNAs of ~X174 and SV40.

Computer search for palindromes As the first step, we used a compute~" program to search for the locations of all possible tetra-, penta-, and hexanucleotide palindromes in ~X174 and SV40 DNAs. From this we prepared a series of tables that list the following information for every such palindrome: its frequency of occurrence in each DNA (Table I), the distances between adjacent identical palindromes, i.e. fragment sizes (Tables II--V), as well as the locations of each palindrome in these two DNAs (Tables VI-IX). The sequences in the tables are ordered according to the frequencies with which they occur. For equal frequencies the sequences of tetranucleotides precede those of pentanucleotides, with hexanucleotides following. Alphabetic order is used for sequences of equal frequency and length.

The sequences are specified in the 5' to 3' direction and thus the sequence CATATG (Table II, line 7) corresponds to: 5'-CATATG-3' 3'-GTATAC-5'. Special representation is required for the central nucleotide pair in pentanucleotides. Any enzyme that recognizes a palindromic sequence having A in the center (e.g., 5'-TCAGA) would obviously also recognize the ~same sequznce having T in the middle (e.g., 5'-TCTGA) because they represent the two complementary strands of the same double-stranded sequence. The same is true for the central C and G bases. Therefore, we use symbol X for A or T and symbol Z for C or G in the middle: of pentanucleotides. Since there are enz~Tnes that recognize pentanucleotides regardless of the central base, a new entry with N as a middle symbol was created for this contingency.

Using the method From the locations of palindromic sequences, as given in Tables VI--IX, one can retrieve all information regarding the number and length of the restriction fragments for each recognition sequence, and compare it with the experimental data obtained from the electropherogram of the ~X174 and SV40 DNAs cleaved by an enzyme of unknown specificity. However, to simplify this task we prepared Table I, in which the frequencies of given sequences are listed for both viral DNAs, since such information often permits immediate identification of the sequence(s) when only the total number of fragments derived from ~X174 RF DNA and from SV40 DNA is known. For instance, when one finds that the enzyme HapII produces five fragments from ~X174 RF DNA and one fragment from SV40 DNA, the sequence is likely to be CCGG, in agreement with the published result (Roberts, 1976; 1977; Sanger et a2., 1977; Reddy et al., 1978). This result can be confirmed by comparing the measured lengths of the HapII fragments with the distances between adjacent identical palindromes given in Tables II--V (see Table II, line 34 and Table III, line 1 for this example). Finally, the map locations of the observed and predicted cleavage sites can be compared. When using agarose or polyacrylamide gel electrophoresis, the number of bands observed can be less than the number of DNA fragments produced in a cleavage reaction. Small fragments may not bind enough dye to be visible or may electrophorese out of the gel. Moreover, fragments of similar size may overlap in the gel. Therefore, it is necessary to take the leugth of the fragments into consideration when using Table I. For example, if the products of complete digestion give 10 and 5 fragments for ~X174 and SV40 DNAs, respectively, the recognition sequence could be GAZTC (see Table I), but also any sequence farther down in Table I that occurs more frequently in one or both DNAs. However, the search for the correct sequence is easy because the approximate sizes of the fragments have to agree with the distances between palindromes as given in Tables II--V. Careful inspection o f t h e Tables reveals that some recognition sequences

are only one or a few nucleotides apart, and hence they overlap. Therefore, they cannot lead to production of any fragments that on first sight might have been predicted from Table I. For instance, the sequence AAXTT (line 80 in Table H) would theoretically predict five fragments one base-pair long and one fragment six base pairs long in addition to 16 longer fragments.

Limitations of the method The method as described here would not always permit one to determine the recognition sequence of a restriction enzyme. But it is very unlikely that it would give an erroneous answer, providing that both the number and lengths of the fragments and in some cases also their location on the physical map are determined. One reason for failure of the present method in its simple form might be lack of a recognition site on either SV40 or ~X174 DNA. In this case other sequenced DNAs would have to be employed and additional tables prepared. (We plan to prepare such tables for the plasmid pBR322 DNA; recently sequenced by G. Sutcliffe, personal communication.) In some cases enzymes recognize sequences not listed in our tables, e.g., HphI and MboII and a few others (Kleid et aL, 1976; Roberts, 1976; 1977). For those cases, the computer methor} has been expanded to list and locate all the possible tetra-, penta- end hexa~mcleotide sequences in any regions of the sequenced DNAs. There are 5376 such sequences. As the second step of the expanded method, the program lists as potential candidates all the sequences appearing in prespecified frequencies in the respective regions. Sequences shorter than tetranucleotides or longer than hexanucleotides have not been considered, but the program could be easily expanded to include these. One should also note that this method determines only the recognition sequence and not the position of the cuts within this sequence. In spite of these limitations, the present tables should be quite useful, since about 90% of the restriction enzymes presently characterized recognize palindromes or related sequences of the type listed in the table~, as calculated from the reviews of Roberts (1976, 1977). RESULTS

Example of deter..~.£nii~gan unknown recognition sequence The recognition sequence for the A vaII restriction endonuclease was not determined by conventional sequencing methods (Murray et al., 1976). When ~X174 RFI or RFII DNA is cleaved with AvaII and electrophoresed in a 1% agarose gel a single band is observed at a position corresponding to full-length linear double-stranded ~X174 DNA (not ~bown). For SV40 DNA, at least five fragments are observed under the same conditions (Fig. 2, track 2). In table I only one sequence (GGZCC) directly corresponds to these results. However, comparison of the observed sizes of the SV40 DNA fragments with the distances between palindromes listed in Table III, line 39 excludes this sequence. Furthermore, close examination of Table IH (also Table VII, line

iII,

I

II, I

HAEIII HAE III

AVA I

HAE III

AVA I

HAEIII

AvAIl

AVA II

HAEIII

Fig. 1. The AvaII cleavage site in ~X174. ~X174 RF DNA has been cleaved with each of the restriction enzymes or combinations of enzymes indicated at the origin of each track. The HaeIII fragments of ~X174 RF DNA were purchased from Bethesda Research Labs Inc., Rockville, MD. Further cleavages with AvaI and/or AvaII were carried out in 25 ~1 reaction mixtures containing: 15 ~1 of 50 m.M NaCI, 2.5 ~! of 10× buffer (0.9 M Tris pH 7.4, 0.1 M MgCI2), 2.5 ~1 of sterile gelatin solution (1 mg/ml), 1 ~g of'~X174 R F HaeIII fragments and several units of AvaI and/or A v a i l The reactions were incubated at 37°C for I h, then stopped by heating to 70°C for 5 min. A solution of 0.05% bromphenol blue and 50% glycerol was added (5/zl) to each reaction mixture before loading on a 4% polyacrylamide gel (Bio Rad). Electrophoresis was at 80 V for 2 h at room temperature. The electrophoresis buffer was: 90 mM Tris, 8.9 mM H~BO3, 2 M Na-EDTA, pH 8.3. At the completion of the electrophoretic run, the gel was stained in a solution of ethidium bromide (2 ~g/ml), then photographed under short wave UV illumination. The photograph was taken through an orange filter on Polaroid type 57 film. Endonucleases AvaI and AvaII were purified by a procedure modified from the one described by Murray et al. (1976). The three white dots left from center of the photograph between tracks 3 and 4 indicate from right to left (a) the position of HaeIII fragment 9 (118 base pairs), (b) the smaller of the two fragments produced when AvaII cleaved HaeIII fragment 3, and (c) HaeIII fragment 10 (72 base pairs). Direction of migration is from right to left.

39) shows that two of the GGZCC sequences overlap, and thus only four SV40 DNA fragments would be produced if AvaII cleaved at these loci. The next most likely sequences down in Table I are GGXCC and TARYTA, both of which appear once in ~X174 and 6 times in SV40 DNA. In this case the fragment sizes predicted for cleavage at or near GGXCC in SV40 (Table III, line 40) agree perfectly with the sizes of the five experimentally observed bands, whereas the sixth fragment of only 31 base-pair length would not have been observed in the 1% agarose gel. In contrast, the fragment sizes predicted for cleavage at or near TARYTA in SV40 (Table V, line 42) do not agree with the observed fragment sizes (Fig. 2). Thus the recognition sequence must be GGXCC. A further confirmation of this sequence was obtained by determining the approximate location of the cuts on the ~X174 and SV40 sequence maps. In

AVAI AVAI I

)

AvAIl EcoRI

)

AvAIl BA I

)

AVAII

/4 RF

TAQI

HAEIII

Fig. 2. The AvaII cleavage sites in SV40 DNA. T h e conditions of endonuclease cleavage of ~.* and SV40 DNAs were similar to those described in the legend to Fig. 1, with the following exceptions. The 10x buffer in the AvaII BamHI double cleavage reaction contained 0.2 M Tris pH 7.5, 70 mM MgCI~, and 20 mM #-mercaptoethanol. The 10x buffer in the AvaII-Taq! double cleavage reaction contained 0.1 M Tris pH 8.4, 60 mM MgCI 2, and 60 mM ~-mercaptoethanol. The DNA fragments were electrophoresed in a 1% agarose gel (Seakem) at 50 V for 4 h at room temperature. The electrophoresis buffer was 40 mM 'Iris pH 8, 20 mM Naacetate, 26 mM acetic acid and 1 mM Na-EDTA. The ~,÷ fragments generated by cleavage with AvaI are estimated to range between 15 000 and 1500 base pairs (Rosenvold and Honigman, 1977). The sizes of the four largest ¢X174 HaeIII fragments are 1353, 1078, 872, and 603 base pairs (Sanger et al., 1977). The white dot near the upper edge of track 3 indicates the position of the smaller fragment produced when EcoRI cleaved SV40 AvaII fragment 3. The SV40 DNA was a gift from Dr. Janet Mertz at the McArdle Laboratory. Endonucleases EcoRI, BamHI and TaqI were purchased from Bethesda Research Laboratories Inc., Rockville, MD.

@X174 RF DNA, the single AvaII cut occurs in HaeIII fragment 3 (see Sanger et al., 1977) (compare tracks 4 and 5 in Fig. 1). One of the two unequal halves produced by the AvaII cleavage of fragment 3 migrates between HaeIII fragments 9 (118 base pairs) and 10 (72 base pairs) (see Sanger et al., 1977). By interpolation it may be estimated that this fragment is 95 base pairs in length. Consequently the AvaII cleavage site must be approx. 95 base pairs from one end or t h e other of HaeHI fragment 3. The sequence CTCGAG, recognized by endonuclease AvaI (Murray et al,, 1976; Roberts, 1976; 1977), occurs at position 162, which is also within fragm e n t 3 (Sanger et al., 1977). Cleavage of this fragment by AuaI produces two new pieces, one 600 and the other 272 base pairs long (compare tracks I and

2 in Fig. 1). Comparison of the bands in tracks 2 and 3 reveals that A vaII cleaves the 600 base-pair fragment. Therefore, since AvaII cleaves HaeIII fragment 3 about 95 base pairs from the end that is farthest from the AvaI site, the locus for AvaII must be near position 5043. The palindrome GGACC occurs at position 5042 (Table VI, line 2). To map the AvaII cleavage sites in SV40 DNA we performed double cleavage reactions with A va II and several other enzymes that cut SV40 at known locations (Fig. 2). By comparing the positions of the bands produced by AvaII with the marker fragments in track 6 it may be seen that fragments 1 and 2 are 1500-.-1600 base pairs in length and fragment 3 is approx. 1000 base pairs long. E c o R I cleaves A va II fragment 3 into two pieces, one about 750 base pairs and the other about 250 base pairs. Endonuclease B a m H I cleaves fragment 2 into 1000 and 500 base-pair fragments. Since the distance between the EcoRI and B a m H I sites is 751 base pairs (Reddy et al., 1978), there is only one arrangement of fragments 2 and 3 in which they do not overlap extensively. This arrangement places fragment 3 immediately to the right of 2 and fixes the approximate locations of three adjacent AvaII cleavage sites in SV40 DNA. These sites must be near positions 950, 1950 and 3500. Table VII (line 40) indicates that the sequence GGXCC occurs in SV40 DNA at positions 936, 1931 and 3456, which is in good agreement with the experimental data. Furthermore, examination of the map locations of the remaining GGXCC sites reveals that a 31 base-pair fragment should be produced by cleavages at positions 475 and 506. Such a fragment would not be visible on our agarose gels but in fact we have seen it on a 4% acrylamide gel. Considering the redundancy of information we conclude that endonuclease AvaII recognizes the sequence 5'-G~G A C C-3' 3'-C C T G tG-5' This result agrees with the experime_,, d results of Murray et al. (1976), who found that the termini of AvaII fragments were 5'-GAC and 5'-GTC, as shown above by the arrows. However, their hexanucleotide sequence assignment cannot be right since it was based on the erroneous conclusion that the ends are flush and since it predicts incorrect numbers and sizes of the restriction fragments. DISCUSSION

The method we have described for determining the recognition sequence of a restriction enzyme is general in scope, rapid and technically simple when compared with direct chemical sequencing procedures. The basic substrates, CX174 RF and SV40 DNAs, are commercially available (Bethesda Research Labs, Inc., Rockville, MD) and were successfully used in some of our experi-

TABLE I F R E Q U E N C I E S O F PAIANDROMIC S E Q U E N C E S A N D SETS O F R E L A T E D HEXAN U C L E O T I D E S E Q U E N C E 8 IN .~X174 ~ SV40 D N A s

Frequency dx,174 0

0

0

0

0 0 O 0 I

Frequency

SV40 • Sequence 0

I

2

3

5 6 7 8 0

ACATGT ACCGGT ACTAGT AGATCT (a) AGTACT ATATAT ATCGAT CACGTG CCCGGG(b) CGATCG CGCGCG CGGCCG GAGCTC GCTAGC GTATAC GTCGAC (c) TACGTA TATATA TCCGGA TCTAGA. (d) TGGCCA (e) TTCGAA RCTAGY RTCGAY YCCGGR AGCGCT GAATTC (f) GATATC GCCGGC GGATCC (g) GGGCCC GGTACC (h) TGATCA (I) RCCGGY YGATCR CCTAGG GCATGC TAGCTA TGTACA GRGCYC RCATGY ATGCAT CA~CTG (J) CCATGG GGRYCC GRATYC RGATCY CCYRGG .,~CTT (k) TRTAYA. GATC (t) CCZGG cc~cG6 (m) CTC~U (n) GACGTC GCGCGC

~X174

SV40

Sequence

!

0

GTGCAC TGCGCA AYATRT CGRYCG CRCGYG CYCGRG (o) RTATAY TYCGRA GAYRTC CATATG CTATAG CTGCAG (p) TAATTA CAYRTG CAATTG TTATAA GARYTC GGZCC GGXCC (q) TARYTA AATATT AGGCCT RGGCCY YAGCTR ACGCGT CGTACG GGCGCC TCGCGA ACRYGT GYGCRC TAYRTA YACGTR YGGCCR YTCGAR CTTAAG GCRYGC GTRYAC RGTACY TTGCAA AYGCRT RTGCAY A1TAAT AGYRCT CRATYG CRGCYG YCTAGR ATRYAT TCYRGA TGYRCA CYATRG GGNCC (r) TTTAAA CCXGG (s) AACGTT AYCGRT TRCGYA

! I

I 2

I

)

I ! t

4 5 6

I

7

I

9

2

0

2 2

! 2

2

3

2

4

2 2

5 5

2 2

8 11

2 )

16 0

Frequency

:Frequency

~X174

dXt74

S V 4 0 Sequence

3

0

3

! 2

3 3

3

3 3

4 7

"3

8 9

3

3

II 12

3 4

16 ~)

4 4 4

4 5

4 4 4 5

6 7 I0 1

5

4

6 6 6 6 6 7 7

3 I0 II !7 22 0 I

7 7 7

2 7 13 16 17 25 0

7 7 7 8

3

I 5 6 II 12

YCGCGR YGCGCR GGYRCC TCATGA CYGCRG GRTAYC YATATR CCRYGG YGTACR GTTAAC(t) AYTART TRGCYA ATYRAT AGRYCT CARYTG TTRYAA CTAG YAATTR CCNGG TCZGA CGYRCG RCGCGY CYTARG ACYRGT CTXAG GYATRC GYTARC YTATAR ARTAYT CCGG (u) GYCGRC TAZTA /UL~T'rT CTYRAG AGXCT CTRYAG TYATRA TYTARA CAZTG GRCGYC ARCGYT GCYRGC CRTAYG P.AATTY TTYRAA YTTAAR ARGCYT ACXGT ACZGT CGXCG RACGTY RGCGCY(v) TRATYA GTZAC RAGCTY YTGCAR

SVkO Sequence

8 9 9

18 2 3

9 9 9 9 I0 I0 I0 IO !o II II 1! I1 11 II I3 13 13 14 14 14

8 II 12 14 I

14 !4 14 I5 15 I5 16 16 17 17 18 18 I8 I8 19 20 20 2I 21 21 22 22 23 24 25 29 31 34 35 36

5

9 IO 14 5 8 !! !4 18 19 I 7 8 0 9 15

AARYTT TCRYGA TGRYCA TYGCRA GTXA~. ARATYT AAYRTT RTTAAY TCGA (w)

GAZT¢

CAXTG TGXCA CTZAG GAXTC YCATGR GTAC RATATY TATA GGCC (x) CGZCG GTYRAC(y) TGZCA CGCG (z) TCXGA AAZTT ATZAT 16 AGZCT 19 CTNAG 23 GCXGC (ca) II TAXTA 16 ATXAT 25 ACNGT 9 TTZAA 31 CANTG 2 GCZGC 14 GTNAC 2 GCGC (be)) 9 TCNGA 36 TGCA 39 TTXAA O ACGT (co) ,I 5 TANTA . . . . 19 AGNCT I CGNCG 10 GANTC(dd) 22 ATAT i7 ¢ATG 22 AAXTT 18 TGNCA 35 AGCT (ee) 37 AATT 31 ~ ATNAT 25 GCNG¢ 48 TTNAA 47 TTAA 37 AANTT

Palindromic sequences are grouped in order according to the frequencies with which they occur in¢X174am3cs70 DNA (Sa~ger et al., 1977 and personal communication). The sequences are arranged into subgroups according to their frequencies of occurrence in SV40 DNA (Reddy et al., 1978). The term palindrome is used here to describe any double-stranded DNA sequence that has twofold rotational symmetry, e.g.: G A AIT T C I. . . . . . C T T',A A G I

G G T C C . . . . . . ,. . . . . C C A G G

or

I

for palindromes with even or odd number of base pairs, respectively. In pentanucleotide palindromes, X represents A or T, Z represen~ C or G, and N represents A, C, G, or T. In sets of related hexanucleotide sequences (see slso Tables IV, V, VHI and IX), R represents purine (A or G) and Y represents pyrimidine (C or T), Lower case letters indicate sequences known to be recognized by restriction endonucleases. The enzymes that recognize these sequences are: (a) Bg|II; (b) SmaI, XmaI; (c) SalI, XamI; (d) Xb~I; (e) Bali; (f) EcoRI; (g) BamI, BstI; (h) KpnI (determined by present method); (i) Bell, CpeI; (j) PvuII, BumI; (k) BbrI, ChuI, HinbIII, HindHI, HsuI; (1) DpnI, DpnII, FnuCI, FnuEI, MboI, MosI, Sau3A; (m) SacII, SstII, TglI; (n) BluI, XhoI, XpaI; (o) AvaI; (p) Bce170, Bsu1247, PstI, SaIPI, XmaII; (q) AvaII; (r) AsuI; (s) EeoRII; (t) ApoI, HpaI; (u) HapII, HpaH, MnoI; (v) HaeII, HinH-1, NgoI; (w) TaqI; (x) Blurt, BspRI, BsuRI, Bsu107 6, Bsu1114, FnuDI, DaeHI, HhgI, PalI, SfaI; (y) ChuH, HincH, HindH, MnoH, MnnI; (z) BeeR, BpaII, FnuDII, Hin1056I, TacI; (aa) BbvSI; (bb) FnuDIII, HhaI; (ec) 8acIII, SstIII; (dd) FnuAI, HhaII, HinfI; ( ee) AluI (see Roberts, 1976, 1977, and personal communication). The following endonucleases recognize sequences not included in this table:

Endonuclease

Sequence

Frequency ¢X174

SV40

Mnll

CCTC

34

51

MboII

GAAGA

11

15

HgaI

GACGC

14

0

e

11

spoI

GT( ) ( )AC

2

HphI

GGTGA

9

4

10

TABLE II

DISTANCES BETWEEN ADJACENT PALINDROMIC SEQUENCES IN ~X174 DNA Sequences are numbered (numbem in parentheses)and listed in order according to the frequencies (thinl column) with whicll they occur ~ ~X174am3esToDNA (sanger et al., 1977 and personal eom__m_unication).The distances between the 5' ends o f adjacent sequences are expressed in numbers of nueleotides and listed in order from the shortest to the longest distance. Sequence #~ ( 1 ) CCZGO ' ( 2 ) GOXCC ~ ( 3 ) GGZCC ( 4)AATATT ' ( 5)AGgCC~ ( 6)CAAXTG ( 7)CAr&TO ( 8)CCGCOG ~ ( 9)CTATAG (IO)CTC6AG 1 (11)CTOCAO 1 (12)GACGTC t (13)GCOCGC 1 ( lq)GT~CAC 1 (15)TAATTA 1 (16)TGC~CA 1 (17)TTATAA 1 (18) CCX~G /~ (19) GGMCC ~ (20)ACGC~T ~ (21)ATTAAr 2 (22)C~TtCG 2 (23)CTrAAG 2 (24)3~C6CC 2 (25)TCGCGA 2 (26)TTGCAA 2 (27) CTAG 3 (28) CCN60 3 (29)AAC~T£ 3 (30)GTr&AC 3 (31)TCATGA 3 (32) CTXAG 4 (33) rCZOA 4 (34) CCGG 5 (35) TAZTA 5 (36~AAATTT 5 (37) AGXCT 6 (38) CAZT~

6

(39) ACKGT

7

(40) ACZGT

8

(41) CGXCG 8 (42) GTZAC

8

(43) GTXAC 9 (44)

TCOA I0

(45) CAXTG I0 (46) CTZAQ I0 (47) GAZTC I0 i

~stances 5386 5386 5386 5386 5386 5386 5386 5386 5386 5386 5386 5386 5386 5386 5386 5386 5386 2619 2767 1322 4064 1925 3461 1789 3597 2381 3005 1499 3887 1957 3429 216~ 3222 2661 2,725 771 ;154 3461 700 ~919 2767 295 !650 3441 392 1264 3730 866 i017 3503 358 660 1330 317 915 1818 219 348 374 259 542 563 102 239 326 102 189 276 3372 27 129 459 2377 49 50 292 1479 1828 43 222 270 536 1116,2443 5 76 176 601 629 3063 220 352 542 936 1007 1016 74 106 126 511 817 1139 20 33 54 231 327 404 8 91 182 529 536 851 6 204 302 486 552 965 40 66 66 233 726 766

!

(48)

rGXCA

Distances

101

,I

(49)

OOCC 11

(50)

GTAC 11

(51)

TATA I1

( 5 2 ) GAXTC 1! (53) CQZCG 13 (54) TQZCA 13 (55)

CGCG 14

( 5 6 ) AAZT~ ( 5 7 ) AGZCT (38) ATZAT 14 3038 2336 1697 2748 1943 2079 2029 2E90 575 372 807 1587

(59) CTNAG (60) GCXOC 1 (61)

rCXGA 14

(62)

ACNGT 15

404 1284 302

qsq

389

qq7

633

680

330 495 1787 87 141 11752914 254 434 1096 1405 303 393 9981177 100 1 4 0 1493 1756

(63)

18 80 112 572 590 878 72 118 194 281 310 603 1353 89 138 157 392 472 525 1560 87 115 126 374 411 555 1704 106 118 249 427 462 500 1587 84 1 0 7 " 114 278 285 336 567 1096 1237 48 57 69 160 203 213 1018'1060 1368 19 79 103 156 170 259 695 718 870 :43 55 60 125 371 383 717 739 845 21 38 71 144 150 298 664 775 881 27 78 93 122 186 194 423 665 786 6 10 18 186 302 303 542 927 998 3 q7 77 268 279 372 495 501 596 6 37 91 197 219 302 589 642 813 10 43 49 222 270 292 454 517 536 45 68 143 252 281 290 577 579 584 3 66 71 134 147 197 314 427 483 8 27 80 129 150 182 362 379 5~6 1096 12 69 82 169 292 296 304 350 ~09 1291

ATXAT 15

(64) TAXTA 15 (65) CANTO 16

( 6 6 ) TTZAA 16

120 425 1252 1339 234 271 872 1078 197 645

247 964

214 750

236 814

354 553

417 613

136 382

203 561

108 353

120 609

11~ 127 496 530 1050 87 114 ~01 ~11 1035 95 102 563 584 1000 97 104 322 qO9 1880 38 165 393 486 1012 104 183 389 417 1655 137 1~8 302 572 1331 50 139 302 404 814 1284 165 189 390 406 592 825 78 128 199 296 1175 1668 91 129 254 305 807 851 84 300 507

158 303 760

11

T A B L E 11 ( c o n t i n u e d ) Sequence

#

(67') GCZGC 171

( 6 8 ) GT~IAC 17

(69)

OCOC 18

(70)

TOCA 18

(71)

TCNOA 18

(72) TTXAA 18

(73)

(74)

ACGT 19

AONCT 20

(75) TANTA 20

(76)

ATA? ~1

(7'7) CONOO 21

(7'8) OANTC 21

(79)

CAIL'O 22

Sequence

Oistances 3 117 124- 129 : 15:4 1 7 . 0 170 258 286 295 363 459 8 1 4 1129 33 74 106 111 220 230 239 2 4 3 312 330 352 420 719 1016 2 35 54 84 101 123 143 145 201 269 300 305 614 640 1553 21 30 33 63 84 101 106 2q2 367 373 398 501 621 624 969 6 6 9 37 1 17 137 142 193 219 293 302 396 572 81 3 1331 1 1 5 60 121 145 190 197 227 260 qq6 500 623 8O6 963 16 24 27 38 89 142 157 268 282 343 362 365 379 397 749 1036 21 38 57 71 95 102 102 144 189 27'6 298 301 427 563 571 584 ~ 30 36 66 78 98 98 99 161 17'!1 197 197 309 314 427' 1014 21 34 42 4.8 107' 114 114 187 206 237 248 25q 300 333 375 582 879 3 5 11 15 7'3 1 0 7 ' 114 121 203 2 0 7 ' 2 4 7 ' 285 336 354 3 7 ' q 567 1237' 24 40 42 48 66 82 100 118 151 200 249 311 1t17 4 2 7 ' 500 553 7'26 13 61 65 86 88 1 0 7 ' 122 1 2 7 ' 155 218 219 285 320 3211 3116. 3 5 7 ' 629 7'57'

153 262 5O0 126 268 587 93 192 532

_#

(80) AAX?T 22

(81)

£GdCA 23

(82)

AGO? 2Jl

(83)

AATT 25

76 258 519 91 197 525 103 222 516 59 282 371 93 150 423 881 71 147' 199 1668 81 203 295 726

(84) ATNAT 29

(85) GCNGC 31

(86) T?HAA 34

(87)

TTAA 35

(88)

&ANTT 36

31 17'6 302 618 66 140 413 7'13 87' 128 291 601

Di stances 1 6 106 337 873 18 66 120 213 590 9" 42 90 2q7 358 6 37 91 229 3811 3 38 93 143 281 q06 3 77 117 154 172 279 501 1 21 57 82 140 236 382 6 32 49 103 140 212 287 1 6 43 7'8 128 195 324 639

1 39 128 379 921 46 69 120 307 626 21 55 109 254 662 6 39 102 239 520 3 45 97 156 290 423 3 79 12q 154 180 286

1 39 135 519

1 78 238 555

1 101 319 608

48 57 8O 83 132 160 425 q34 952 25 32 78 84 1 19 146 258 275 853 1007 20 21 40 51 107 134 330 338 554 607 12 27 73 75 101 113 .165 186 322 324 466 665 11 36 91 104 129 l q l 169 169 234 258 363 389

62 108 203 467

1 36 60 84 160 2q7 387 21 45 60 105 151 222 348 1 |6 43 87' 135 222. 354

5 37 62 108 188 257 424 21 45 69 114 170 228 395 1 35 53 97' 149 234 357'

12 41 67 121 222 292 867 24 48 76 121 t70 228 453 1 39 55 101 183 2511 363

33 87 2011 337 34 78 167 380 872 3o 85 122 252 390 47 114 153 170 262 417 12 51 72 123 227 304 27 48 83 123 189 238 7'35 1 39 7'1 1 14 188 258 521

12

TABLE m DISTANC~ ~

Secluence ( 1 ) CCGG (2) TC3A (3) CGNCO ( ll} CGZCG ( 5IAGCGCI' ( 6 )CTTAAG ( 7 )GAATTC( 8)GArATC ( 9)OCCGGC ( 10 ) GG&I'CC ( 11 )G,gGCCC (12) GGTACC ( 13)TGAl'CA ( 111) GCGC (15) GCZGC ( 16 ) C&TATG +(17 ) CCI'.AG3 ( 18 ) CTA~'AG ( 19 ) CTGCAG (2C)GCATGC (21)'¢AArl"A (21)rAGCrA (2~,) I'CA~GA (2L)TGTACA (2'.) A~XOT (2(,)ATGCAI' (27)CAATTO (2P~) CAOCTG (29) CCATC,~ ( 30 ) T L'ATAA (3 I )TTOCAA (3+2) rAZ.q'A ( 3 3 ) AAAT:'I' (3~)AT?A.r (35)377A tC (36) CTX~O (37) OAX L'C (38) OAZrC (39) ~ZCC (~10) O~X C.C (~ I ) GTZAC (112)AAGCTT

(113) AAI'~:rT (1111)A~C CT (115)

,3~ rc

(46) GT::AC (11:7) TO'CA (118) CAXI'G (49)

TCaOA

(50)

TCXGA

_~

ADJACENT PALINDROMIC SEQUENCES IN SV40 DNA

mstances 226 226 226 226 226 226 226 226 226 226 226 226 226 lt90 546 P18 i17 51 .)!6 55 Z6i |511 |61 ll I 68 55 156 t116 ~27 19 t2 153 ;52 ;28 20 ~61 24 83 1 31 ;80 116 ;58 !15 68 27 133 111 1179 60 9115 27 503 101 1197 111 833 28 291 28 291.

Sequence

#

DistanCes

( 5 1 ) T1'ZAA. ( 5 2 ) GAItTO (53)

rOXCA

(54)

GI'AC

( 5 5 ) GGNCC

4.736 4580 11208 1110,9 5175 4010 5 171 3965 11872 4365 3815 22116 2912 1786 3365 2005 2364 1790 1990 279 11720 1600 3607 1321 3893 1005 1193 2075 565 1999 2010 8119 1543 2206 1067 1992 21117 5511 6611 1097 543 786 958 525 1003 1109 373 539 751 1130 665 995 1168 814

(56)

I'AXTA

(57)

CTAO

( 5 8 ) CTZAO( 5 9 ) GT~&C

2550 2915 2506 3562 1525

(60)

AAZrT

(61)

ATZAT

(62)

EAdI'A

(63)

A~ZC'X' 16

(611) ATXkl' 16

(55)

+ CCNGG 16

(66)

CCX~,G 16

610

(67)

CATO 17

367

~+ + (68) I'ArA i81 • -: + (69)+ TOelCA ! 8 .'~.: :: :::!' :~ " : ....

900 111110

4117 526 1101 1169 153 1193 803 111116 33 227 3 1 t 8 3729 237 3811 396 12611 1730 144 65 275 1358 ~2+286 207 +266 417 1526 1 7 7 3 53 229 270 8 8 1 1313 1 3 5 9 102 + " 578/1 102 5-78 1

871 396

439 2711 285 285

. :

+

.

30 291 387-" 396 423 511.7 7 96 :117~0 1 186 211 2-11 8:.~ 109 237 525 543 76,6 1085 1830 24 95 162 216 680 689 720 805 8117 988 15 57 111- 226 294 497 5011 534 675 708 1605 1 31 2JI15 367 373 430 489 506 539 665 1580 3 3 54 115 165 237 241 493 558 1543 ~814 76 151 169 3111 357 378 380 381 420 629 854 1:t!17 50 '53 60 99 205 211 216 285 309 310 435 615 936 1442 27 46 144 165 215 276 367 388 405 467 468 599 759 900 12 77 150 154 198 212 281 304 309 324 379 634 6~7 767 778 3 3 3 17 75 84 138 192 2c~6 234 ~49 33"1 801 1170 1700 3 J 3 511 67 ~2 115 149 165 237 1117 1191 493 1126 1811 18 38 5:5 ' 83 198 229 250 293 322 347 375 392 465 638 672 851 116 50 S3 109 121 156 161 173 203 319 3116 354 390 539 805 1391 54 55 61 88 l O l 126 127 200 249 311 369 11114 5:52' 673 823 993 5Jl 55 61 88 101 126 127 200 2119 3++ 369 444 552 673 823 993 16 50 55 66 91 133 177 166 232 284 :321_..3481~ -,~';'O 1124 484 6 2 9 ' 11352 11,+: !1LIT -':19 32 35 37:.+-.:/116 ......: 5 I 6:11 12:9 1,~:I:!:::,: 2.6:1 309 1o1141~I!/i8 ~i :,t:::U~,~:: + "

~I

~0 0 0

~D

0~

~

O~ ~ ~

L~ I/~ ~

~ 0 0 ~ O0 ~-

""

~,I

O~I .~r I.-

04¢ ~

~.,- ,e-, Od . - ~

,--- ,r-'- C~ ('e~

~- ~ / % 1

04

l.'II

~"

~-- --..~I-

"

t~l ,~r

¢~1 ~

I~" N

.

~e~ 0

.

-'r

~"- , , -

O~ ~

.

~'~i

.

tv~ L'- ,t~ ~-

.

~'~'q - '

.

~r

e-

~,

e-

N

~IP ~ o

~0

~ 0

i

i~i~i~i

L4

I".,~BT.,R I V DIb~ANC~ BETWEEN ADJACENT ELEMENTS O F SETS-OF.RELATED HEXANUCLEOTIDE SEQUENCES IN ,~X174 DNA -

See legend to Table ]I.. Each entry represents a;Nt o f and Y indicate purine and ~ d i n e , r~vely. Sequence _# Distances ( 1 )AYATRT 1' 5-386 1 5386 1 5386 ( q)CRCOl~O 1 5386 ( 5)C¥CORG 1 5386 ( 6)GAR¥~C 1 5386 ( 7)GAXRTC 1 5386 ( 8)RGGCC¥ 1 5386 ( 9 ) RI'ATAZ 1 5386 (IO)rAR¥TA 1 5386 (11)T¥CGRA ~ 1 5386 (12)¥AGCTR r 1 5386 (13) ACRZGT 2 1925 3461 (14)AGYRCT - 2 11192 3894 (15)ATRYAT 2 1612 3774 (16)AZ,~CRT 2 1925 3461 (17)CRATYO 2 902 41184 ( 18 ) ORGC~'G 2 ,2636 2750 (19)CXAtRG 2 1164 11222 (20)GCRI'GC 2 2022 3364 (21)C,,T RZAC 2 1373 4013 (22)G~'GORC 2 569 11817 (23)RGTAC¥ 2 997 4389 (24)RTGCA~' 2 1271 41 15 (25)TA)rRTA 2 2134 3252 (26)TCYRGA 2 19112 311411 (27)TGYRCA 2 2032 33511 (28)]t'ACOT R 2 13111 110~5 (29)YCT~OR 2 11511 11232 (30)¥GGCCR 2 552 4834 (31)¥TCGAR 2 962 41124 (32)AGRZCT 3 571 2129 2686 (33)AT~RAT 3i 477 1789 3120 (3q)AZCGRT 3! 11111 1555 2717 (35)A¥'r ART 31 339 1450 3597 (36)CARYTG 3~ 537 1207 36112 (37)CCR¥GG 3 14114 1619 2323 (38)C~GCRG 3 894 1969 2523 (39)QGYRCC 3 498 1459 3429 (40)GR'rAzc ] 706 757 3923 (41)TRCGYA 1161 704 4221 (42)TRGCYA 3 141'9 1854 21 ;3 (q3)~TRZAA 3 166 25.59 2661 (4q)YAATTR 3 1197 1912 29?7 (qS)¥ATArR 3 1484 1822 2080 (116)£c(;0oe 3 599 1565 3222 (qT)ZGCGCR 3 194 616 4576 (q8)¥GTACR • 3 354 2027 3005 (49)ACZRGT q 675 87~ lO29 28o8 ( 50)ARTA)fT 4 595 1071 167o 2o5o (51)CGZECG 4 16 399 1966 3005 (52)C~TARO 4 534 1019 1499 2.334 (53)GYATRC q 580 729::1524 2553 (Sq)G¥TARC 4 147 392~1264 3583 (55)RCGCG¥ 4 259 549 .1376:3202 (56)¥~ATAe 4 411 500 2014 2461 ( 2)CAYRTO ( 3)COnZCO

Sequence~i #

Distances

( 57 )CT~nA~, ) cz" :r a .~c, /51 717 7 8 2 (58)ozconC )G~CGRC .5J 3 9 0 : 5 8 8

!TRYA~, ( 5 9 ) 01' n'ZA~, " ~" (60)'rZATRA (51) T'ZTARA (62)ARCG¥T ( 63 ) ARGCYT (6q)CRTA1~G ( 65)GCI'RGC (66)ORCGYC (67)RAATrY (68)TT£RAA

(69)AARI'T r (70)RACOT~' (71)RAGOI'¥ (72)RGCGCX (73)TRATYA (74)~TGCAR

89 6 6 i870 33 2315 50 11579 71 269 1441 7"1 25 15145 71 387 1134 71 252 1003 71 114 1649 7 I 102 1290 7l 22 8 1018 44 I 557 81 27 901 8 78 1007 8 54 783 8 171 683

(75)AA~RTT

8 9

(76)ARAT¥T

9

(77)RTTAA¥

'9

(78)TCR~'GA

9

(79)TOR¥CA ~9 (80)TYGCRA .9 (81)RATAtY 11

(82)¥c&Ton i i • ., . ., (83)~,T'IRAC i3 "

483 :1 0 1 524

825~1135 !927 8 2 9 ~240 .23319 492 1195 1257 866

916 1155

555 1308 1370

284 295 604 777 !716 92 102 423 1329 1870 504 615 655 724 1367 •261 831 935 990 1114 194 302 387 878 1862 185 239 326 554 2690 303 334 350 373 2986 102 103 195 223 1472 2690 78 268 749 871 1077 1415 84 247 276 500 1420 177q 93 123 185 269 1565 2314 21 1 223 420 5110 732 2406 519 552 574 621 991:1097 1 4 3 159. 295 299 4 9 5 705 2935 102 156 239 326 875 1212 2029 45 49 392 557 7 i 4 1202 !720 11, 1 9 5 1228 866 1 0 0 2 : 1 0 0 6 1122 . i17 144 178 246 - q l q 1296 2670 1 0 4 ~175 264 702 738 ~1239 11125 1 1 4 : 1 8 7 237 .381 5497. 5 8 4 689 -966

963 48 307 10 437 24 683 8 948 18 303 i8 720 . 42 489 1148 '86 88 394 468 1:57'7 - "

107 516

285 575

291 999 297

495

15

TABLE V DISTANCES BETWEEN ADJACENT ELEMENTS OF SETS OF RELATED HEXANUCLEOTIDE SEQUENCES IN SV40 T

See legendto Tame m, #

. . . .

0istances 5226~ 5226 5226 5226 5226 5226 5226 5226 1018 4208 1018 4208 1216 4010 55 5171 1482 3744 474 4752 592 4634 55 5171 1651 3575 861 4365 1018 4208 55 1786 227 279 621 1117 275 1241 751 1077 670 1566 55 1786 648 690 12 1321 1411 1706 300 741 109 474 856 1161 77 1369 201 656 38 1039 96 271 9 9 6 334 76 151 55 218 2353 20 667 1480 30 324 2209 209 583 1752 57 134 1611 :55 88 1080 1543 20 29 1538 1963 12~166 1 3 2 0 1987 ,,

[ 'I);ARC~TI 2)GAXRI"C ( 3)GCXRGC ( 4)GGXRCC ( 5)G~CGRC, (6)R~CGG/I ( 7)RGC~C~ ( 8)¥GATCR ( 9)CAYRTG: (IO)CarA~G (11)C~GCRG (12)GCRYGC (13)GI~GC¥C (14)Gara~c (15)GTRYAC (16)RCATG~ (17)RGTAC¥ (18)TCRYGA (19)¥ATAl'R (20)A~GCRT (21)CCR~'GG (22)C¥TARG (23)GGRYCC (24)GRATYC (25)RGAIC~ (26)RTGCA~ (27)-r~RLCA (28)T¥GCRA (29)XGrACH (30)AC~RGT (31)AGXRCT (32)CRA7XG (33)CRGC~G (34)CTXBAG (35)GARZTC' (36)CCZRGG (37)QYATRC (38)TRATYA (39)~CTAGR (40)ATH~Ar

1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 ~ 2 3 3 3 3 3 3 3 3 3 3 4 q q LI ~ ~ 5 ~ 5 5 6

(41)GTEARC

6

(42)TARYIA

5

(43)TCXRGA

6

(44)TGZ'RCA

6

(45)A2'.TAR1

-7

(46)GT'LRAC

7

(47)RAATT'L

~7

Sequence

.

(51)AIYR&T

(53)~CATGR (54)AGRYCT

3385 4720 3488 3710 3398 2990 3385 3888 3893 2109 1852 818 1203 1790 785 1443 846 46 1261 380 318

2333 3825 2006

1990 3584 2656 1450 1383 1473 1117 1032

2563 3T79 2152 3502 1250

891 1067 1101 407

995 1261

670

833 1179

837 1277 1310

369 1067

386

665

690

211

468

520

(49)TRTAZA ' 7 1 : 1 6 7 -::183 207 " " 111 6 9 2'6"6 •

311

543

11324

51 1858

189.

271

51 67 177 722 1604 2166 8

50

493

756 2206 177

279

516 .

212

227

460

632

861 1352 1415

240

:71 ~ 19-

~! 135

(52)CYATRG

849

(50)ztArA:n

Distances 66O

628

(48)TnOCZA:"7 • 26 ~43 I1155 2703

#_

372:374

983

1228

0

14 33 82 227 348 396 479 1605 2042 (55)CARYTG 27 41 298 401 575 622 829 1045 1387 (56)RGGCCY 14 33 227 348 395 479 540 797 2392 (57)YAGCTR 60 81 93 273 368 371 925 1346 1709 (58)AR£A~'T IC 18 27 90 15" 214 475 713 871 12.19 1446 (59)CTRYAG IC 20 44 51 132 220 476 848 975 1152 1308 (60)ARAI~'I' 11 17 26 68 169 359 472 552 565 665 900 1433 (61)RAGCT~ 11 50 87 128 144 253 489 526 612 814 954 1169 (62)TI'R~'AA I1 9 12 19 72 99 149 545 807 901 1249 1364 (63)T~'AI'RA I1 6 6 19 90 121 140 325 549 594 1467 1909 (G4)AAYRTT12 27 45 102 121 153 171 212 391 493 632 1433 1446 (65)YAATIR 12 15 55 78 85 220 247 403 498 690 833 856 1245 (66)'ITGCAR 12 ~ 11 12 38 132 225 257 462 491 551 665 715 1667 (67)TTYRAA 13 11 46 50 71 90 315 318 364 380 411 706 728 1736 (58)RA£Ar¥ 141 25 27 144 153 160 166 184 195 196 493 803 871 . 898 911 12 18 20 54 123 (69)RTTAA'L 141 234 263 268 386 413 616 633 652 1534 (70)ARGC~'T 17 6 12 12 13 14 32 33 188 203 227 237 291 348 384 526 932 1768 (71)TYTARA 17 46 71 75 83 90 169 211 219 315 318 32S 342 364 5O9 664 706 716 (72)AAR~TI' 18 20 63 73 97 165 177 18~ 215 239 309 325 350 352 354 526 528 58~ 665

16

TABLE VI LOCATIONS OF PALINDROMIC SBQU~CES IN ,X174 DN& (thbd eolmnn)with whi~ they occur~ #X17~m3¢~7o DN& (8.fe, et ~i~ 1~77 . d pemonal communication), The loeatiom of q ~ J i e d mqueneu m ~ nueleotide numbem ~ at the P~tI ~avqge ~ite the S' d ~ o n 0 n t h e viral (plus) strau~ The. number ~ m ~ / t h e loeation of theS' t e r m ~ nueleotide f 0 reach sequenee. In pentanudeoti~ palindromu, X repreunts .4, or T,Z repremnts C or G; and N represents K, C, G or T. Sequence (1)

ccz~

(2) ;~zcc (3) ~zcc ( 4)AATAr~ ( 5)AOaCC? ( 7 )CArat; ( 8)CC~C~ ( 9)CTAEA~ (IO)CTCGA3 (ll)C~CA~ (12)GAC~TC (13)~C~C~C (I~)Gr~CAC (15)TAA~TA (16)?~COC~ (I?)r~ATAA (18) ccxo~ (19) ~o~cc (20)AC~C3T (21)A~TAAT (22)C~TAC3 (23)CT?AAG (24)30C0CC (25)rCGC3A (26)T~OCAA (27) CT~G (28) CCa~ (29)AACGT~ (31)rcATOA (32) C~XA~ (3~) ~C~a (34) CCG~ ( 3 5 ) ~Azr~ (35)a~A~rf (37) AOXCZ

(38) CAZ?O (39) aCX~f

(40) ACZ3T (41) CGXC~ (42) GTZAC (~3) OTXAC

(q~)

TC~A

(45) CAXrO

(~7) 3AZTC

# Locations 1 2900 1 5042 1 918 1 1007 1 -4486 I 3939 1 1633 1 2559 1 ~318 1 162 I .~382 1 2782 1 5348 1 ~779 1 '4436 1 155 1 2304 2 881 3500 2 978 50~2 221 2145 71 1 4308 413 2790 ~.914 4013 lo19 2976 2260 4420 2139 4863 3 3136 3907 3o61 3 981 28oo 35oo 3 864 2510 2809 28 1292 5o22 3 33,8 1250 2271 q 1362 2022 2360 3710 38 355 2691 3606 729 1103 2800 3019 85 3qq 907 2850 ;983 2222 2324 2650 G ;530 2402 2678 3253 .~544 782 1241 2048 2177 ~791 7 ;190 1240 1289 2768 ~888 5292 B 1028 169.3 2234 2536 J874 3917 4371 3 543 719 1156 1767 5219 5224 5300 288 1224 2231 2773 q009 4642 5322 399 505" 1644 2461 4374 4885 5215 52891 I 56 110 103 ~, 163 i "r2;, 1123 ~039/~18o

Sequence

rGXCA 1 0

(49)

OOCC 11

(30)

~T&C i l

(51)

TArA i l

(52) ~AXrC 11 (33) COZC3 13 (54) TOZCA 13 (55)

COCO 14

(56)

AAZTT 10

(57) AOZCT 14 (58) aTZAT l q

3.t67 3~92 5300 3002 2204

(59) CT~AG 14 ( 6 0 ) OCXOC 14 ( 6 1 ) TCXOA lii

3060 2758 4830

(62).¢C~OT

15

(63) ATXAr 15

3789 4243 490. 5355

(64) TAXT& 15 ( 6 5 ) CA~T,'

I ~3a 1151 2~66.:3662 409.~

1~,773 42i86-4500.i~53.1 ul 3 5 0 1527, 153311835 JZ342 3307 3 7 0 0 4252 i)] 5 3 286; 352~ 392 11184 ~ 2'577 3443r'3543

# LocaUons

(48)

.5;16.7 2040 4738 1118 3583

(66) ~¢ZA,

2099 ~2589 43q 3128 4948 414 1903 q836 1307 2519 4989 204 2871 5128 1021 2595 4845 255 1963 3461 222 2107 0025 26 2027 4300 054 3196 ~511 387 1743 2701 350 2022 3700 158 1592 4215, 47 2037 4221 1190 2234 3874 63 2169 3612 206 1006

3 321 1573 3928q008 658 978 4206 4487

1693 4590 1172 4758

1711 5170 1775 4876

571 768 906 1431 2150 2795 3187 4747 1394 1805 -1931 2305 2755 3505 4319 4434 310 1897 2015 2264 3377 3731 4148 4701 1135 1413 2798 3365 5227 5311 1273 1330 3023 3092 4070 4273 752 771 2261 2360 0500 5222 397 511 2028 2839 5105 5200 556 1037 3971 4002 0509 0693 484 678 2166 2270 0621 0699 1362 1527 2040 2342 3710 0252 507 620 1639 $294 4398 0677 309 497 257q 3216 5034 5125 1200 1289 2536 2738 3917 4371 600 708 2753"2896 4018 0183 370 qqo 1080 2752

1974 2310 3472 4709 1490 1843 3200 3248 1821 1977 2860 3555 5309 1250 1967 3222 0257 5287 2101 3101 '4063 4213 3277 1060 1650 2392 2714 5360 1533 1836 2380 3307 0735 89'2 1488 3297 3798 5009 1069 2000 3413 4002 5131 1028 1695 2768 3060 4888 5292 998 1 5 7 7 3286 3567 0435 4624 639 710 2886 3083

17 TABLE VI (continued) Sequence #_ Locations (57) GCZGC IT 738 1033 1319 ~- .' 1975 2338 2596 3129:3258,,4387 5 140 53:1-0 " ( 6 8):o~ ~A:c: ,7'! 2 8 8 399 505 • 2231 2461 2773 4248 4374 4642 5289 5322 (69) ~c~c IB 156 688 772 1020 1143 1412 2063 2363 2977 5313 5348 5350 (7o) T,~C,~ 18i 966 1587 1608 2202 2569 3193 3433 3509 3767 4780 4564 5383 (?'I) Fr..~JA 18 38 47 349 1069 240O 2437 3216 3413 3606 5034 512~'5131 (72) rrXAA t8 105 321 328 960 1406 1407 2225 2328 2388 4048 4854 4975 (73) AC~T 19 259 415 787 565 1227 1624 2783 2810 3189 4872 4931 5020 (74) A~ICr 2¢ 454 556 1437 2402 2578 3101 3442 3544 3971 26 4213 4511 4549 85 246 344 (75) EArlIA 639 710 907 2752 2850 2886 ,3566 3993 4307 42 248 282 (?6) A~A? 21 1389 1684 1938 3505 3754 3775 ,4306 4420 4657 i5241 (77) C~CG 21i 543 719 1021 1413 1767 1974 2798 3365 3472 4845 5219 5224 5311 53 204 286 (78) aAN?C 21 392 1118 1184 2264 2677 2877 3543 3683 3731

Sequence 1819 1972 2858 3012 4.846 4970 1224 1644 3789 4009 4885 5215 873 927 1717 1862 3517.3760 1638 2139 3299 3400 4165 4538 355 491 2574 2691 4002 4221

511 1766 4225 5302 1530 3196 4042 4693 374 1006 3083 4310 1008 2520 3817 4990

827 2515 4590 2101 3253 4063 5277 440 1084 3392 4457 1308 2627 4192 5038

1135 2310 4709 5227

1166 2595 4830 5300

5128

(79)

C A ? 3 22 .

I

16 1255 1841 2826 4554

301 38~ 680 1037 1316 1635 1649 1714 2165 2272 2394 2740 2981 3068 3825 3953 4773

AArT 25

(84) AI~AT 29

( 8 5 ) oc~Gc 31

(86) ?I~AA 34

(87)

310 352 18~7 2015 3371 3443 4148 4701

Locations

61 996 2223 2651 5340 (81) TG~CA 23 209 1490 1963 3248 4273 (82) AGcr 24 160 1591 3310 4135 (83)

473 733 2030 2220 2588 3085

#

SO) AAXTT 22

TTAA 35

139 1604 2324 2988 5341 255 1573 2589 3461 4580 169 2444 3388 4393

694 822 957 1983 1984 2222 2325 2331 2650 3861 3900 4821 321 1693 3023 3928 5170 445 2469 3635 4751

5066 5098 5119 62 82 103

745 1604 2422 3940 63 708 1743 2714 3567 4621 158 1033 1819 2858 3297 4677 5310 105 661 1385 2097 2388 3085 4854 29 474

1269 1908 2389 3313 4459 26 (88) AAdT7 36 694 1604 2222 242a 3222 4821 5341

823 1984 2651 4270 387 998 2166 2741 3612 4624 547 1319 1972 3012 3798 4846

957 2223 2989 4437 484 1464 2169 2753 4018 4699 624 1488 1975 3129 4215 4970

327 733 1406 2220 2424 3097 4975 105 712 1293 2031 2540 4048 4854 61 822 1967 2223 2650 3861 5145

328 960 1407 2225 2728 3181 5232 245 733 1338 2220 2888 4260 4975 139 957 1983 2324

1273 1330 1711 1843 3092 3200 4008 4070

1452 2502 3839 4860 5209 140 963 2325 3861 4821 640 1577 2270 2896 4183 5364 738 1592 2338 3258 4387 5049

365 1011 1794 2266 2888 4048 5244 328 961 1407 2226 2915 4309 5023 397 996 1984 2325 2651 2839 3900 4257 5200 5287

1536 3164 3881 4979 694

997 2~31 3900 5341 678 1650 2392 3286 4435 892 1639 2596 3294 4398 5140 473 1093 2030 2328 3028 4472 442 1248

1455 2329 3085 4414 5193 511 1250 2027 2331 2988 4300 5340

18 TABLE VII LOCATIONS OF PALINDROMIC SEQUENCES IN 8V40 DNA Palindromic sequenees are numbered ( ,first column) and ,listed in order according to the frequencies (third co !uum)with ~whichthey occur ~ S V 4 0 DNA. ~.~e l ~ o n s of palindromic sequences areexpmsmd in nueleotide numbers ~ l l w.the o~ of DNA replication and extendinli in the 5' to 3' direction on the DNA strand that b com. plementery to early mRNA (Reddy et al., 19'78). Notations are u described in the legend to Table VL Sequence (1) CCGG ( 2) TCOA ( 3) CGMC~ ( ~)~CGZC3 ( 5)A~CGCr ( 6)CrTAAG ( 7)GAATTC ( 8)GATATC ( 9)GCCGGC (IO)GGArCC (11)GG~CCC (12)GGT~CC (13)T~ATCA (14) GCGC (15) Gcz~c

# Locations

Sequence

1 1 1 1 I 1 1 1 1 1 1 1 1 2 2

(51) Tr~AA

25~ ~657 262 262

(52) GA~TC

750

1617 1700 686 263 21151 2176 212 2688 261 751 579 5159 (16)CATATG 2 3 7 2 6 4744 (17)CC~AG~ 2 996 5105 (18)CTATAG 2 655 706 (19)CTGCAG 2 1906 3i22 (20)GOArGC" 2 63 118 (21)?AATTA 2 1087 2348 (2~)TAGCTA 2 3343 3697 (23)TCAT~A 2 1830 2691 (2q)TGrACA 2 3493 4904 (25) AGXCT 3 453 2699 2767 (26)ATGCAT 3 61 116 3501 (27)CAATTG 3 589 2595 3451 (28)CAGCTG 3 188 1634 34211 (29)C.CATGG 3 251 473 5198 (30)TTATAA 3 2564 4164 4183 (31)TTGCAA 3 1191 5084 5096 (32) TAZTA q 1128 2133 3326 4279 (33)AAATT~ 4 1314 1866 2531 4541 (34)ATTAAT q 1089 1717 3260 4109 (35)GTTAAC 4 417 437 2584 3651 (36) CTXAG 5 1629 2293 2847 3208 (37) G A K T C 5 2742 3528 4486 4510 (38) GAZTC 5 1657 2766 3291 4294 (39) GGZCC 5 1425 2176 2177 2716 (40) GGXCC 6 4 7 j 506 936 1931 5036 (41) GTZAC ! 6 270 738 784 2342 4596 (q2)AAGCTT 6 964 1411 1626 3394 5089 (q3)AATATT 1 1221 1714 1741 3174 4198 5001 (44)AGGCC¢ T 278 626 659 673 1379 5108 (qS) GArC B 792 2056 2452 2689 4018 4628 4688 (q6) GTXAC 3 1.251 1395 1762 1789 2557 39:15 4191 (47) £GZCA 31 953:1054 1493 3019 3723 3989 4 4 0 6 (48) CAXTG 9 930 1!59 21472 2746 3069 3902 4783 4797 (49) TC,~GA ) 2 0 6 234 525 639 4314 4415 4994 5147 (50) TCXGA 9 206 234 525 6 3 9 4314 4416 4994 5147

(53)

l'~xca

(54)

~rAC

(55) ; ; , l C C (56) TAXrA (57~

CrAG

(58) CtZA~, (59) ~rNAC

(60) AAZTr (61) ITZA£ (62)

rAdrA

(63) A~,ZCl 4305 5053 4377 3089 3456

(64) ATKAT

3156

(55) CC~GG

3920 4045

(56) CCXGG

1152 3634

(67)

CATG

(68)

.eAT&

1954 3226 3016 2956 2956 "

(69) TG~CA

# Locatfons 592 62> 101~ 1309 i956 3026 3413 3936 502,~ 1651 2742 2766 3291 352S ~294 /1377 ~ 8 6 4510 5053 174 269 989 1977 2193 2217 2906 3586 3748 4595 213 888 1385 2~90 3494 3738 38115 3960 4046 4794 4905 475 505 936 1425 19.~1 2175 2177 2715 3089 31156 5036 864 979 1220 1713 1716 3259 3817 4054 4108 4111 4276 997 1377 1528 1597 2073 2498 3352 3566 4023 4401 5030 5106 205 640 1255 1354 1:~9 1775 2711 2164 2915 4417 4726 4776 4836 5146 270 738 . 184 1251 13~5 1762 1789 1954 2342 2557 3156 3915 11191 4596 2,'3 035 !147 601 7~)9 11146 1523 1804 2582 2886 32~0 3360 39~4 4303 5070 858 1199 1215 2386 :~£161 2687 2690 9828 2831 2834 3635 3884 3968 4160 4394 864 979 1128 1220 1713 1716 2133 3259 3325 3917 4054 4108 4111 4276 4279 ~93 148 999 1321 1614 2006 2381 2728 3366 3595 3678 3116 4388 4853 5103 5121 865 1219 2024 2563 2684 3030 3080 3236 3~09 3612 3•73 4163 4209 4528 4591 4700 95 150 276 828 882 1326 1453 2445 2815 30611 3165 3253 3926 3987 4810 5010 95 150 276 82;3 882 1326 1453 21146 2315 3064 3165 3253 3~26 3987 4810 ~010 64 119 252 302 479 1831 2460 2692 <:758 2774 3152 3500 3784 3:~70 4291 4715 5 199 656 707 7 7 1 817 849 1110 2555 3614 3758 3793 4036 4165 4184 4221 4530 11547 4558 469"9 17Zl 269 953 "989 1054 1493 1917 2.193 2217 2906 3019 3226 3585 372~ 3748 3989 4406 ' 4595

19

TABLE VII (continued) •

Sequence # Locat ions (70) GGCC .)49 279 578 627 660 ~74 853 1153 138o 1425 177-2717 ,3089 3119 4780 i 0 9 5 i 5 2 5161 5167 93 148 453 999 1321 (71) A,},qCI 514 2006 2381 2699 2723 167 3365 3595 3678 3716 388 4853 5103 5121 205 640 1255 1354 1559 (72) CT~AO 629 1775 2293 2711 27.64 847 2913 3208 4305 4417 726 4775 4835 5146 687 818 871 1037 1197 (73) ATA1 222 1715 1742 1933 2032 277 3175 3615 3727 3757 046 4199 4548 4745 ~ 7 3 980 5002 365 562 1303 1314 1315 (74) AAXT~ 866 1867 2285 2325 2531 532 2995 3246 3381 3351 069 4092 4211 4541 4542 702 5187 496 905 1356 1434 1557 (75) CAZr( 689 1995 2036 215u 2257 506 2839 2~03 3055 3125 3139 3223 3301 3524 3549 226 4455 258 360 486 531 534 (76) GCXOq 537 ~67 582 535 551 702 720 1459 1633 1905 1154 2379 2570 3423 3426 1752 4230 5045 385 491 519 618 735 (77) AC~G' 807 904 1048 1083 1237 1644 1688 1755 1951 1972 1979 2149 2701 2737 ~840 )548 3705 3842 4150 4169 385 497 519 618 735 ( 7 8 ) ACXO 807 904 1048 1083 1237 1644 1688 1765 1951 1972 1979 2149 2701 2737 2840 3548 3705 3842 4150 4169 258 360 486 531 534 ( 7 9 ) GC~G 537 567 579 582 585 651 702 720 1459 1633 1905 21542379 2570 3423 3426 3762 4230 5045 5159 865 8 6 8 1 1 9 9 1216 1219 (80) A£NA 2024 2386 2461 2563 2684 2687 2690 2828 2831 2834 3030 3080 3236 3409 3612 3635 3713 3884 3958 4160 4163 4209 4394 4528 4591 4700 496 905 930 1159 1356 ( 8 1 ) CA~ 1484 1557 1689 1995 2036 2150 2257 2472 2505 2746

2839 2903 3016 3055 3059 3125 3139 3223 3301 3524 3549 3902- 4226 4455 4783 4797

Sequence

# Locations 3

(32)

~acr

(83)

rJ=a

(34)

AAr£

'/

(85) ,~A,¢Tr

I]

(86) TfXA~ ~,!

(87)

TTAA 4

(88) T ~ A A 4

132) 324 353 530 557 550 710 7 2 - " 965 1015 1-'68 1412 1461 1~4~1:~ 1:~5,-: t,327 1535 1753 1=i04 217o 2191 2441 2569 3~44 333 ~t 3395 5425 3693 3921 4dt~, 4232 4561 5044 50~0 51 '~'' 52 71 117 126 362 581 393 605 119 '13C 192 1291 13j:i 1431 I72 ~ 1907 2125 2185 25U4 251: 2599 3123 350-" 3710 ~ u 3 422:) 4745 47•0 47~0 490J 'IB4Z ~1961 3047 3035 3097 .3113 54 109 353 512 363 390 10~3 1104 1303 1315 1/'01 1367 2235 2325 2349 -~532 2595 2867 294~ 2995 3029 3247 3263 3332 3411 3452 3672 3952 4070 4092 4211 4311 4505 4542 4590 4702 51$3 223 365 43~ 447 562 501 799 I.~03 131'4 1315 1446 1523 18o4 1355 1367 2285 23a5 2331 2532 2382 2836 2 9 ~ 3210 3z45 j360 3381 3851 3994 4069 4692 4211 4303 4541 4542 4702 .~070 5137 491 948 1040 15.~5 1515 1576 1769 1801 2281 2282 2645 2646 2883 ~0'15 3076 3244 3390 3-191 3461 3462 3507 3509 )633 3815 3325 3826 3963 3973 4072 4~440 4520 4537 4564 4555 4915 4975 4975 5065 5056 237 4 1 3 . 433 492 949 1041 1090 1102 1535 1575 1513 1718 1729 1759 1301 2282 2339 2351 25=-5 2645 2383.2~9~ 3015 3219 3244 3261 33~4 3~391 3:;62 3508 3539 3552 3570 38~3 3325 3963 3979 4056 4072 4110 ]1440 4521 4537 4565 4517 4976 5066 491 592 622 '~43 101~ 1040 1309 1535 1515 15'15 1759 1801 1356 2281 2282 2545 2646 28,.'13 ?026 3075 3075 3244 3.190 3391 3413 3461 3462 350• :150-q :~639 3815 3325 3325 3336 3963 3978 4072 4440 4520 4537 4564 4565 4816 4975 4976 5022 5055 5035

20

.

.

.

.

-

:

TABLE V I H LOCATIONS OF ELEMENTS OF SETS OF RELATED HEXANUC~OTIDE S E Q U E N C E S I N ¢,X174 D N A ..... : See legend t o T a b l e ~

and Y indicate ~

~

e n t r y represents a s e t of:11dl~ee d i f f e r e n t sequences i n w h i c h R

and pydmidine, respectively.

# Locations Sequence zl1 92 ( 1 )AYATRT 1683 ( 2 )CAZRTO .,'i6oi ( 3)conlrcG ( q )CRCG'L'G 1977 162 ( 5)CTCGRG ( 6)GARXTC J~176 2782 ( 7)GAZRTC 4486 ( 8)nOOCC~r 35 Oq ( 9)~TATAX ( 10)TARYTA 4436 ( 1 1 )TZCORA 37 (12) YAOCTR 4750 221 2146 (13) ACRYOT (14)AOYRCT 3696 5188 2582 4194 (15)ATiiYAT (16)AYOCBT 221 2146 3939 4841 ( 17 ) CRATXG (18) CROCXG 893 3643 3154 4318 ( 19 ) CXATRO (20)QCRYGC 3326 5348 (21)OTRYAC 7 66 .4779 (22)0¥0CRC 4779 5348 ( 23 ) RGTAC¥ 905 1902 (24)RTGCAY 3508 4779 941 4193 (25)TAYRTA (26)TCXRGA 421 3865 (27)TGYRCA 155 3509 (28)YACGTR 826 4871 (29)¥CTAGR 3906 5060 11205 4757 (30)XOOOOR (31)¥TCGAR 162 1124 (32) AGR¥CT 1800 4486 5057 (33)ATYRAT 711 3831 4308 (34)A~CORT 1934 3048 4603 (35)AYTART 372 711 4308 (36)CARYTG 2195 3402 3939 1415 2859 5182 (37)CCRYGG (38)C~GCRG 890 2859 5382 (39)GG~RCC 1019 2478 2976 (qO)ORTA~C 589 1346 2052 (ql)TRCGYA 155 4376 4837 ( 42 ) TROC~'A 17'38 3157 5011 (43)TTRYAA 2138 2304 4863 (qq)~AATTR 962 3939 4436 (45)~ATATR 1683 3505 4989 (46)XCGCGR 2260 2859 4424 (qT)~GCGCR 155 771 5347 (q8)ZGTACR 413 767 2794 ( 49 ) AC'X'RGT 272 1301 2175 4983 (50)ARTAXT 1007 3057 3652 ~723 (51)00YRC0 413 812 828 2794 (52)C¥TARO 1361 2380 2914 4413 (53)OXATRC 1029 1609 4162 4891 (54)oz'z'A]tc 28 1292 1439 5022 (55)XCOCGZ 221 770 2146 5348 (S6)YTA1'dLR .1393 1804 2304 4318

# • Locations Sequence ( 57 ) c'rYR AO 5 162 987 2914 (58)GYCGRC 5 )-251 26111 3470 (59) c1'nxAo 6 1191 3061 4318 5382 ( 6 0 ) TYA TRA $ 388 1254 2170 4619 (61)TYTARA 6 277 327 851 4355 (62) ARCG~T 7 864 1641 1910 4525 4809 ( 63)ARGCXT 7 454 556 2101 4486 4511 (64)CRTAYG 7 413 1028 1683 q161 q665 (65)GC~'RGC 7 122 1112 1943 3459 4394 (66)ORCOXC 7 717 1019 1133 3363 5225 (67)RAATTY 7 139 693 1983 2650 5340 (68) T I'X RAA 7 327 661 1011 2424 2727 (69)AAR~TT 8 "511 1983 2027 2427 2650 5340 (70)RACOT~ 8 786 864 1765 2809 4224 5301 (71)RAOCT~ 8 168 q4q 1451 3367 3634 413~ (72) ROCOC'l 8 687 872 926 1411 2975 3759 (73)TRATYA 8 1456 1627' 2167 3254 4016 4436 (74)¥TOCAR 8 965 1586 2138 3766 4863 5382 (75)AAYRTT 9 657 705 864 1809 2514 2809 (76)ARATYr 9 1963 2222 2324 3962 5174 5330 (77)RT.rAA¥ 9 28 711 1268 2539 4259 4308 (78)TCR'LGA 9 388 1254 2260 3281 4229 4424 (79)TGRXCA 9 949 1066 1312 2173 2351-3647 (80)T'LGCRA 9 179 899 2138 3686 4424 4688 (81)RATArx 11 41 1007 1388 3774 3816 4305 5240 (82)~CATOR 11 15 300 388 1648 2164 2271 3824 (83)GT~'RAC 13 28 319 654 2349 2511 2590 4200 4812 5022 m

3631 4413 4710 5298 4801 5293 2271 2304 1406 2985 2514 2809 3971 4063 2070 2794 2946 3198 2782 2976 2222 2324 1384 1406 2222 2324 2514 2782 1535 3309 1019 111i2 2390 2601 2201 3192 1007 3108 2650 5340 1292 5022 2271 4652 1615 3655 2242 4863 1937 4419

1502 3087 1337 22?9 1759 2260 2626 4656

679 1254 2739 2825 951 1292 3360 3705

21

TABLF~ IX

LOCATIONS OF ELEMENTS OF SETS OF RELATED HEXANUCLEOTIDE SEQUENCES IN SV40 D N A See legends to Tables VII and VIH, .~uence ~ Locat4ons (1)AaCOZT ....1 750 ( 2)OAX,ilTC 1 686 ( 3)OCZROC 1 263 ( JI)GGZRCC 1 212 ( 5)GICGRC 1 263 ( 6 )eCCOOX 1 263 ( 7 ) RGCGC¥, 1 7 5 0 ( 8)ZQATCR 1 2688 ( 9)CAXRT~ "2 3726 4744 ( 10 ) CRTA~'G 2 3726 4744 ( 11)C~rGCRG 2 1906 3122 ( 12 ) GCR'/GC 2 63 118 ( 13 ) GRGC~'C 2 694 2176 (lq)GRI'A¥C 2 212 686 (15)GTR]fAC 2 3494 4O86 ( 1 6 ) RCAI"G]( 2 63 118 (17)RGTAC¥ 2 212 3787 (18)TCR¥GA 2 1830 2691 (19)XATATR 2 3726 4744 (20)A~GCRT 3 61 116 3501 (21)CCR¥GG 3 251 478 5198 (22)CYTARG 3 996 1617 5105 (23)GGR¥CC 3 935 2176 2451 (24)GRATYC 3 1700 2451 3528 (25)RGATCY 3 2451 4017 4687 (26)RTOCAX 3 61 116 3501 (27)TGR~CA 3 1998 2688 3336 (28)TXOCRA 3 1191 5084 5096 (29)¥GTACR 3 1384 3493 4904 (30)AC~ZRGT 4 441 741 1482 (31)AG~'RCT 4 750 1568 2042 (32)ORATYG 4 589 2595:1451 ( 3 3 ) c a o c x o 4 188 265 1634 (34)CTYRAG 4 760 961 1617 l( 3 5 ) Ok RX~ C q 1700 3 1 4 3 3181 (36)CCXROG 5 150 996 2446 ( 37,)OZATRC 5 .... 63 72 118 ( 38)TRATZA 5 1087 2348 2682 ( 39 ) YCTA'OR 5 996 1376 1527 (qO)ATRZAZ 6 = 61 116 2469 4037 ( 111)gzz AaC 6 417 437 1917 4752 C112)TAnZZA 6 1087 2348 3343

(43)ZCZeOA ( 411)TOZI~CA (115) A]Fr ART (II6)GIYRAC

Secjuence (51)ATYRAT (52)CZATRG (53)YCAT(}R (54)AGR¥CT (55)CARYTG (56)RG~CC¥ (57)¥AGCTR (58) ARfA~'T (59) CTRYA.G (60)ARA?¥T (61)RAGCT~ (62)rrRYAA (63)TYA I'RA 3334 2151 4654 3424 2402 4270 5009 127 2688 5029 3501

(64)AA~R?T (65)¥AATTR 5105 3906 4161 5105 3819

(67)TTZRAA

2584 3651 (68)RATATY 3667 3697

6 41o4

1662 2245 2454 3287 3957 5136 • 6 1289 2126 2103 3493 4770 4904 • 51 106 1089 1717 3"260

7

(66)YTGCAR

(69)RTrAAY ( 70)ARGC'fT

09 518 9

388 417 437 1975 2215 ,25843651 7 ,1302 13111 1700 1866 2531

7

(71)T¥T&RA

385111541

(11e)T.c. (q9)TRI'AZA (50)ZTATAii

7 11711 3554 7 847 • 4697 7 655 4183

1694 2849 2875-3~343 3697 31193 4O36 4219 J1530 49011 7 0 6 2564 3792 11164 11557

(7'2)AAR~TT

# Locations 8 1089 3260 8 39 2872 8 251 3151 9 278 1379 9 188 2595 9 278 1379 9 188 3343 10 1221 3174 i0 655 1926 11 774 3096 4567 11 352 1498 '5089 I1 1119 3916 5096 11 866 2564 4183 12 1221 3488 4369 12 53 1103 4504 12 361 1906 5084 13 1575 3390 4564 14 686 t714 3174 14 417 2350 3651 17 278 1152 3394 5108 17 348 2645 3507 4975 18 447 1626 3210 4541

1224 3449 251 4476 301 3783 626 2984 589 3424 626 2176 556 3424 1239 4045 706 1970 842 3113

1717 4109 478 5198 478 5198 659 5026 1634 3451 659 2716 649 3697 1714 4198 838 3122 1314 3282

2473 2744 655

706

1830 2691 673 5108 1575 4838 673 5108 709 5043 1741 4911 1686 3598 1866 3641

1152 1973 1152 1634 2960 5001 1906 4573 2531 4541

964 1014 1267 1411 1626 2440 3394 3920 1191 2555 2564 3109 4065 4164 4183 5084 1415 1505 1830 2424 2685 2691 4158 4164 1714 1741 3609 3654 5001 108 511 2348 2595 4589 586 718 2571 3122 5096 2281 2645 3461 3507 4975 5065 870 1036 1741 1937 4045 4198 437 1089 2584 2997 3669 4055 626 659 1379 1411 3920 4852 5120 1064 1406 2856 3075 3825 4489 5-06.5 799 964 18'03 1866 3394 3920 5069 5089

3174 3386 4045 4198 589 1087 3451 3671 729 1"191 4789 5046 3025 3075 3825 3836 1196 2081 5001 1101 3260 4109 673 1614 5089

1221

2276 1717 3383 964 1626 5102

1575 2281 3390 3461 4564 4647 1314 11111 2531 2885 3993 4302

22

merits. If the number and sizes of t h e restriction ~ e n t s and the locations of the cleavage sites are determined with reasonable accuracy it is very unlikely t h a t t h ~ m e t h o d would lead to an i n c o ~ e ~ recognition ~ u e n c e for a restriction endonuclease. ~ e fact that ~ e recognition s e q u e n c ~ f o r a f e w enzymes m a y n o t be d e t e n n ~ l e f r o m the t a b l ~ we have p r e e n ~ ~ only a minor disadvantage since so little time, materiai and eff0rt is at ~ k ! ~Furthermore, as new sequenced D N ~ become available the tables c ~ be e x p a n d ~ to include those sequences t h a t a r e n o t represented in ~ X17 4 or S V 40 DNAs, ACKNOWLEDGEMENTS f

This work was supported by a program-project grant from t h e National Cancer Institute (CA-07175). We are greatly obliged to Dr. F, Sanger of ~ M.R.C, Cambridge, England and to Dr. S.M. Weissman of Yale University for supplying us with their unpublished data on the sequences of ¢ X 1 7 4 and SV40 DNAs, respectively, The authors are grateful to Dr. Andrew Siegel, Dept. of Statistics and Drs. JanetiVIertz and Bill Sugden of t h e McArdle Laboratory for helpful discussions. We would also like to t h a n k Ms, Helen Pong for valuable computing assistance. REFERENCES

Kelly, Jr., T.J. and Smith, H.O., A restriction enzyme from Haemophil~z influenzae, II. Base sequence of the recognition site, J. Mol. Biol., 51 (1970) 393--409. Kleid, D., Humayun, Z., Jeffrey, A. and Pteshne, M., Novel properties of a restriction endonucleare isolated from Haemophilus parahaemolytieus, Proe. N~tl. Aead. Sci., US.s., 73 (1976) 293--297. Maxam, A.M. and Gilbert, W., A new method for sequencing DNA, Proe. Natl. Acad. Sci., USA, 74 (1977) 560-564. Murray, K., Hughes, S.G., Brown, J.S. and Bruce, S.A., Isolation and characterization of two sequence-specific endonueleases from Anabaena uaviabilis, Biochem. J., 59 (1976) 317--322. Nathans, D. and Smith, H.O., Restriction endonucleases in the analysis and restructuring of DNA molecules, Ann. Rev. Biochem., 44 (1975) 273--293. Reddy, V.B., Thimmappaya, B., Dhar, R., Subramanian, K_N., Sain, B.S., Pan, J., Ghceh, P.K., Cellma, M.L. and Weissman, S.M., The genome of sbnian virus 40, Science, 200 (1978) 494--502. Roberts, R.J., Restriction endonucleases, CRC Crit. Rev. Bioehem., 4 (1976) 123. Roberts, R.J., Restriction endonuclesses, in Bukhari, A.I., Shapiro, J.A., and Adhya, S.L. (Eds.), DNA Insertion Elements, Plsemids, and Episomes, Co~.dSpring Harbor, NY, 1977, pp. 757--768. Rosenvold, E.C. and Honigman, A., Mapping of Aval and XmaI cleavage sites in bacteriophage ~. DNA including a new technique of DNA digestion in agarcee gek, Gene, 2 (1977) 273--288. Sanger, F., Air, G.M,. Barrell, B.G,.Brown, N.L., Coulson, A.R,. Fiddes, J.C., Hutchinson, HI, C.A., Slocombe, P ~ . and Smith, M., Nucleotide ~seq.. :zenc~e,0f ba~ri013h~e ~X174 DNA, Nature, 265(1977) 687--695. . " ". Communicated by Z. Hradecna.

23

Note added in proofs:

Sequence TTTAAA, which was inadvertently omitted from Tables II, III, VI and VH, generates 2 fragments in ~X174 DNA [distances: 1079, 4307 (Table II), locations: 327, 1406 (Table VI)] and 11 fragments in SV40 DNA [distances: 46,71, 90, 315, 318, 364, 411, 430, 706, 739, 1736 (Table III), locations: 1575, 2281, 2645, 3075, 3390, 3461, 3507, 3825, 4564, 4975, 5056 (Table VH)]. Sequence YTTAAR should be added to Table IV (distances: 366, 561, 624, 739, 884, 1079, 1133), Table V (distances: 28, 42, 46, 71, 90, 147, 159, 168, 252, 318, 364, 430, 535, 664, 711, 1201), Table VIII (locations: 327, 1406, 2030, 2914, 4047, 4413, 4974), and Table IX (locations: 1040, 1575, 1617, 2281, 2645, 3075, 3243, 3390, 3461, 3507, 3825, 4536, 4564, 4816, 4975, 5065). We are greatly obliged to Dr. R. Blakesley who drew our attention to these omissions and who prepared an analogous computer program as described in Focus, Vol. 1, No. 4 (1978) p. 1.