Fractionation and sequences of the large pyrimidine oligonucleotides from bacteriophage fd DNA

Fractionation and sequences of the large pyrimidine oligonucleotides from bacteriophage fd DNA

J. Mol. (1972) 64, 87-102 Biol. Fractionation and Sequences of the Large Pyrimidine Oligonucleotides from Bacteriophage v. fd DNA LINat Medical ...

10MB Sizes 26 Downloads 102 Views

J. Mol.

(1972) 64, 87-102

Biol.

Fractionation and Sequences of the Large Pyrimidine Oligonucleotides from Bacteriophage v.

fd DNA

LINat

Medical Research Council Laboratory of Molecular Biology Hills Road, Cambridge, England

(Received I July 1971) The method of chemical depurination is specific and quantitative for releasing pyrimidine oligonucleotides from DNA and should prove useful in a general procedure for sequencing DNA. In this report, the fractionation of 32P-labelled pyrimidine oligonucleotides from depurinated fd DNA on a two-dimensional thin-layer system is described. This system resolves the depurination products according to size and base composition. In fd DNA there are eight large polypyrimidine nucleotides of nine base residues or greater and most of these nucleotides are present in single molar yield. A technique employing partial digestions with spleen and snake venom phosphodiesterase, and fractionation of the partially degraded products with the two-dimensional thin-layer system, has been developed for sequencing these large pyrimidine oligonucleotides. The largest pyrimidine oligonucleotide in fd DNA contains 20 residues and has a sequence of 5’-C-T-T-T-C-T-T-C-C-C-T-T-C-C-T-T-T-C-T-C-3’. None of the oligonucleotides sequenced appears to be derived from the coat protein gene.

1. Introduction Techniques $ Barrell, Barre&

worked out in this laboratory by Sanger and co-workers (Sanger, Brownlee 1965; Brownlee, Sanger & Barrel& 1908; Adams, Jeppesen, Sanger & 1969;

Brownlee

& Sanger,

1969;

Jeppesen,

1971;

Barrell,

1971)

have

estab-

lished a relatively rapid method for sequencing RNA molecules; however, as yet there is no general procedure for sequencing DNA. The need to sequence the DNA of an organism is based on the prediction that many important regions, possibly regulatory signals in the DNA, may not be transcribed into RNA and hence the DNA must be examined directly. While, at present, the known enzymes for cleaving DNA are rather unspecific and unsuitable for a general sequencing procedure, the method of chemical depurination by a solution of diphenylamine in formic acid (Burton & Petersen, 1960) has proved to be specific and quantitative for releasing pyrimidine oligonucleotides from DNA (Hall & Sinsheimer, 1963 ; cerny, C?ernB & Spencer, 1969 ; Petersen & Reeves, 1969; Southern, 1970). This method of cleaving DNA could have an important role in a general procedure for sequencing DNA. The potential usefulness of this method has been demonstrated recently by Southern (1970) who was able to deduce a basic repeating sequence of six nucleotides in guinea-pig a-satellite DNA by t Present address : Division ment of Medical Biophysics,

of Biological Research, University of Toronto, 81

The Ontario Cancer Toronto, Canada.

Institute

t The Depart-

88

V. LING

depurinating the separated strands of DNA and analysing their pyrimidine products. The depurination products of the DNA of the small bacteriophage 4x174 (Hall & Sinsheimer, 1963), 513 @err@ et al., 1969), and fl (Petersen & Reeves, 1969) have previously been characterized by ion-exchange column chromatography. Much information has been obtained on the relative amounts of the different sizes of polypyrimidine tracts in the DNA of these phages. It was found that, as expected, the small polypyrimidine tracts occurred in the DNA in many molar concentrations while the large polypyrimidine tracts, e.g. ten basesor longer, were present in limited or single molar yield. These large polypyrimidine tracts are of interest sincethey are unique in the phage and may have special functions. In this report, I have applied the two-dimensional thin-layer fractionation procedure of Brownlee & Sanger (1969) to separate the depurination products of 3aP-labelled fd DNA. This procedure is more rapid and lesslaborious than column chromatography and gives good separation of pyrimidine oligonucleotides of all sizes up to 20 bases in length. Also, a technique for sequencinglarge pyrimidine oligonucleotides will be described and the sequencesof the large pyrimidine oligonucleotides (9 bases or longer) of fd DNA will be presented. fd is a well characterized filamentous bacteriophage closely related to fl and M13. Its DNA is relatively small, being a single-stranded circle with a molecular weight of about 2 x IO6daltons (Marvin t Hoffmann-Berling, 1963) and containing probably only eight genes (Marvin & Hohn, 1969). One of the gene products, the major coat protein, contains 49 amino acids and its complete amino-acid sequence has been determined (Asbeck, Beyreuther, Kohler, von Wettstein & Braunitzer, 1969).

2. Materials

and Methods

(a) Chenzicals and enzyntes All chemicals used were of reagent grade. Enzymes spleen phosphodiesterase, snake venom phosphodiesterase and bacterial alkaline phosphatase (electrophoretioally pure) were obtained from Worthington Biochemical Corp. (Freehold, N.J., U.S.A.). The snake venom phosphodiesterase was further purified by passing a 5-mg solution through a Dowex 50 column as described by Laskowski (1966) and the purified enzyme stored in 0.1 nir-TrisHCl (pH 39)-0.01 M-MgCl, at a concentration of 1 mg protein/ml. The bacterial alkaline phosphatase was stored at 4°C as an ammonium sulphate precipitate. Before use, a required amount of the enzyme was collected by centrifugation (5000 g for 10 min) and the enzyme precipitate redissolved in the same buffer as the snake venom phosphodiestersse solution. (b) Preparation of 32PJabelled fd DNA Bacteriophage fd DNA uniformly labelled with [szP]phosphate was prepared in a manner similar to that described by Bretscher (1969) for preparing unlabelled fd DNA. Escherichia coli strains Dl or K38 were grown at 37°C with gentle aeration in 400 ml. of low phosphate medium (containing in 1 1. of solution, 1.5 g KCl, 5-O g NaCl, 1.0 g NH&l, 2.0 g vitamin-free Casamino acids, 2.0 g Bacto-peptone, 4 g glucose and 12 g Tris; the pH was adjusted to 7.4). When the culture gave an absorbance at 560 nm of 0.3, the bacteria were infected with fd phage at a multiplicity of 20 phage/cell in the presence of 2 mM-&Cl,. Then 60 mCi of [32P]orthophosphate, carrier-free (Amersham, England), was added and the incubation allowed to continue for 5 hr. The cells were then removed by centrifugation (10,000 g for 16 min) and polyethylene glycol (mol. wt 6000, Koch-Light Laboratories, England) was dissolved in the supernatant to a final concentration of 5%. This solution was kept in the cold room (4’C) overnight. The phage are quantitatively precipitated by this treatment and were collected by centrifugation (12,000 g for 20 min). The phage precipitate w&s resuspended in 10 ml. of M9 buffer (Levine & Borthwick, 1963) and

2 3 -? $

PLATE 1. (a) Radioautograph dimennion was electrophoresis with a 3%, solution of partin.lly depurination rraclion. B denotes

of the two-dimensional fractionttticxl uf dapurinsted ““I’-lebollod at pH 3.5 on oelluloae acetate and the second dimension hydrolyaed yeas+, RNA. (b) A diagram of (a). Nucleotides t.l-It: p&i(.ic~n nf the blue dye marker (xylene cyan”1 FF).

(b) fd DNA. As described in Mat,eriti.ls and w&s chromatography on a UEAE-cellulose art? nnmhnro~i 1 Lu 43. P, is free phosphate

/-- - . \

hlethoda, thin-laycr released

f&t plate by the

the

PLATE II. phodiesteraso except that

Radioautograph and (b) eiectrophoresis

spleen

of the two-dimensional phosphodiesterase. in the first dimension

fractionation The conditions was performed

of partial products of nucleotide 42 resultmg of digestion are given in Materials and at 6000 IT for 25 min. B denotes the position

from digestion with (a) snake Methods. Fractionation was of the blue dye marker.

venom phosas in Plate I

eysus (8) qq~

uo!c+s&p

urog 8u!+[nsal g& epyoelonu

‘11 aleId IIF 6’8 am suoypuo~ *ass.xa~se!poydsoyd meIds (q) pm eswa~se!poydsoyd UIOUBA 30 s+onpoJd fv!ymd 30 uoyxuoy~13 p~~o!suau~~p-o~~ eqy 30 ydm2o~nso!pc~ ‘111 X&vi?d

PLATE V. Estimation of the size of nucleotide 43. Autoradiograph showing result of chro~ matography of various nucleotides on a DEAE-cellulose thin layer with a 3% RNA solution. Nucleotides f, 9, h, i and j obtained from Plate IV(a) are products resulting from the partial digestion of nuoleotide 43 with snake venom phosphodiesterase. Nuoleotides Ml to M4 are dephosphorylated marker nucleot,ides obtained from depurinated fd DNA and isolated as in Plate I. Ml has a composition of (C,,T,), M2:(C,,T,), M3:(Cs,T,) and MP:(C,,T,). Nuoleotides M6 to M9 are similar marker nucleotides obtained from f 1 DNA. Marker nucleotide M6 has a composition of (C,,T,), M-7 :(C,,T,) MS:(C,,Ts) and MS:(C,,T,). The size of each marker nucleotide is indicated.

PYRIMIDINE

SEQUENCES

FROM

fd

DNA

89

centrifuged at 12,000 g for 20 min. The supernatant was saved and the pellet re-extracted with another 10 ml. of MQ buffer ss before. The M9 buffer extracts were combined snd the phage sedimented by centrifugation at 100,000 g for 2.6 hr, after which the phage pellet was dissolved in 4 ml. of SSC (standard saline citrate containing 0.16 M-NaCl-0~015 Msodium citrate, pH 7.0). To this phage solution was added 18 g CsCl and the phage banded by centrifugation in a cellulose nitrate tube in a swinging-bucket SW50 rotor (Spinco) at 38,000 rev&in for 20 hr at 5°C. The phage bsnd, which was clearly visible, formed about midway in the tube and was collected by puncturing the bottom of the tube. The phage collected in about 0.5 ml. of CsCl solution was diluted to 10 ml. with SSC and sedimented by centrifugation at 100,000 g for 2.5 hr. The phage pellet was dissolved in 1 ml. of O-1 MTris*HCl buffer, pH 8.9, containing 6 mu-EDTA. This was extracted at 70°C for 5 min with an equal volume of water-saturated phenol and allowed to cool to room temperature. After centrifugation to separate the phases, the aqueous layer was removed and the phenol layer re-extracted with another 1 ml. of buffer at room temperature. Both aqueous layers were combined and made 2% with respect to sodium acetate and the DNA precipitated with 2.5 vol. of absolute ethanol at -20°C. The precipitate was collected by centrifugation, washed once with cold ethanol and dried in a desiccator under vacuum. The dried precipitate was dissolved in water and stored frozen at - 20°C. This procedure usually yielded 1 to 2 mg of DNA with a specific activity of about 1 X lo6 cts/min/pg. (c) Deptwination of fd DNA The depurination of fd DNA was performed se described by Burton (1967). About 100 e of 32P-labelled fd DNA were dissolved in O-1 ml. water in a small sihconized tube and O-2 ml. of 3% diphenylamine in 98% formic acid was added. The tube was sealed with Parafilm, incubated at 37°C for 16 hr, after which 0.3 ml. of water was added, and this solution was then extractecl 4 times, each time with a fresh 2 ml. of ice-cold ether. The depurinated DNA solution was then dried in a desiccator under vacuum. (d) Fmction&m of depuriraated fd DNA The product of the depurinated fd DNA was fractionated by the two-dimensional thin-layer system of Brownlee & Sanger (1909). The DNA was dissolved in about 6 ~1. of electrophoresis buffer (0.9 r+pyridine acetate, pH 3.6, containing 7 M-UrB& and 5 muEDTA) and applied as a small spot near one end of a cellulose acetate strip (Schleicher t &hull, W. Germany) previously wetted with electrophoresis buffer. The strip was blotted and was subjected to electrophoresis at 120 V/ cm for 16 min in pyridine acetate, pH 3.5, containing 7 M-urea. The separated nucleotides were then blotted on to a thin-layer plate coated with a mixture of DEAE-cellulose and cellulose (Mackerey, Nagel & Co., W. Germany) at a ratio of 1: 7.5, the plate allowed to equilibrate in a 60°C oven for 20 min. the region of the origin sprayed with distilled water, and ascending chromatography at 60°C performed by placing the plate in a pre-equilibrated sealed glass tank containing a 3% solution of partially hydrolysed yeast RNA in 7 M-urea. The 3% RNA solution was prepared in the same manner as “homomixture c” of Brownlee & Sanger (1969) except that hydrolysis with 1 N-KOH was performed for 30 min at room temperature before neutralizing to pH 7.5 with HCl. The separated nucleotides were visualized by radioautography (Sanger et al., 1965). Details of the micromethods used for routine fractionation of radioactive nucleotides by high-voltage electrophoresis on paper and for eluting nucleotides from DEAE-paper on thin-layer plate with triethylamine carbonate, pH 10.0, have all previously been described (Sanger et aZ., 1965; Brownlee & Sanger, 1969; Barre& 1971). (e) Partial digestiona of oligonwleotides with spleen and snake venom phosphodiesterme Since the spleen and snake venom enzyme require a non-phosphorylated terminus in an oligonucleotide as substrate (Razzell & Khorana, 1969, 1961), it was convenient to isolate the large pyrimidine oligonucleotides of fd DNA in the dephosphorylated form. 100 rg of depurinated sap-labelled fd DNA were treated with 20 pg bacterial alkaline phosphatase at 3YC for 3 hr in 0.1 ml. buffer. At the end of the incubation, the digest was extracted with an equal volume of phenol by vortexing vigorously in a small siliconized

V.

90

LING

tube at room temperature, the aqueous layer was recovered after centrifugation, extracted twice with ether to remove phenol, and dried under vacuum. The dephosphorylated nucleotides were fractionated on the thin-layer system as described above (see Plate I). The pattern produced by the dephosphorylated pyrimidine nuoleotides was similar to Plate I except that all the nucleotides migrated further in the chromatography step. The large pyrimidine nuoleotides were easily recognized, emd the dephosphorylrtted nuoleotides corresponding to nucleotides 36 to 43 of Plate I were isolated from the thinlayer plate, eluted with triethylamine carbonate into small silioonized tubes and dried in E desiccator 3 times, each time re-wetting with water to effect the complete removal of the triethylamine carbonate. Since the chromatography step of the fractionation procedure wss performed with a 3% solution of RNA, each isolated labelled pyrimidine nucleotide contained a substantial amount of unlabelled RNA that was isolated along with it. Generally, for standardizing conditions of enzyme digestion, en isolated oligonucleotide which initially separated 8s a radioactive spot of 0.5 cm diameter on the thin-layer system was considered to contain about 100 pg of carrier RNA. Partial digestion of an oligonucleotide either with spleen or snake venom phosphodiesterase was performed in 5 ~1. of the appropriate buffer (0.1 M-ammonium acetate (pH 5.7), 0.002 M-EDTA, 0.05% Tween 80 for the spleen enzyme, and 0.1 M-TrisHCl (pH 8*9), 0.01 M-MgCl, for the snake venom enzyme) with an enzyme to substrate weight ratio of 1: 10. Thus a usual digestion mixture would contain half of the material from an isolated pyrimidine oligonuoleotide (containing about 50 pg carrier RNA) and 5 pg of the phosphodiesterase. The digestion mixture was taken up into the tip of a drawn-out capillary tube and incubated at 37’C for 30 min for the spleen and 20 min for the venom enzyme. At the end of the incubation, the digest was immediately spotted on to a cellulose acetate strip pre-wetted with buffer, in preparation for fractionation by electrophoresis. The action of the enzyme WZX+Jeffectively terminated by this treatment since the electrophoresis buffer was at pH 3.5 and contained 7 M-Urea. Once the digest solution had soaked into the cellulose acetate, an amount of undigested oligonucleotide equal to about l/lOth or 1/2Oth of the total radioactivity in the digest was applied on top of the digest. This ensured that some undegraded starting material would be present to serve ss a reference point in the fractionation of the digested nucleotides. Fractionation of products resulting from partial digestions with the phosphodiesterase enzymes was performed on the two-dimensional thin-layer system as already described, except that electrophoresis in the first dimension was performed at 120 V/cm for 25 mm.

3. Results (a) Fractionation

of fd pyrimidine

oligonucleotides

I shows the radioautography of labelled depurinated fd DNA separated on the two-dimensional thin-layer system. The pyrimidine oligonucleotides have the general formula of Pyr(n)p(n + 1) (Burton, 1967) and are separated into a very regular pattern of 43 spots. Each nuoleotide spot was out out and elutedwith triethylamine carbonate (Brownlee & Sanger, 1969) and analysed for size and composition as described below. Plate

(i) E&h&on of size It has been noted previously in the fractionation of T,-oligonucleotides of RI7 RNA with this thin-layer system that the nucleotides were separated largely according to size in the chromatography step except in cases when the oligonucleotide contained a high proportion of purines (Jeppesen, 1971; G. G. Brownlee, personal communication) and in these oases the oligonucleotide migrated much slower than expected. Since the depurination product contains only pyrimidines this anomaly was not present and the pyrimidine oligonucleotides may be expected to fractionate primarily according to size.

PYRIMIDINE

SEQUENCES

FROM

fd

DNA

91

Since the Burton depurination nucleotides contain both the 3’ and 5’ phosphate, it was possible to calculate the size of an oligonucleotide by comparing the amount of radioactivity in the terminal phosphates with that in the internal phosphates. For example, a depurination oligonucleotide of n bases will contain n + 1 phosphates of which two phosphates will be susceptible to bacterial alkaline phosphatase. The phosphatase-treated oligonucleotide will contain n - 1 internal phosphates. Therefore, the size of the oligonucleotide may be calculated as : n=

(radioactivity of phosphatase-treated oligonucleotide) radioactivity of phosphatase-released phosphates

x 2

+ 1.

The oligonucleotides obtained from Plate I were incubated with 10 pg of alkaline phosphatase in 10 ~1. of phosphatase buffer (0.1 M-TrisHCl (pH 8*9), O-01 M-MgCI,) at 37°C for 2 hr in sealed capillary tubes. The digests were then applied to DEAEpaper and subjected to electrophoresis at pH l-9 at 2000 V for 2 hr. The separated products were visualized by radioautography and the radioactivity of the released phosphates and internal phosphates from each nucleotide was determined by cutting out the appropriate areas from the paper and counting in a toluene-based scintillator fluid in a Unilux II scintillation counter (Nuclear-Chicago). The size of each oligonucleotide was calculated as described above and is shown in Table 1. Nucleotides 1 and 2 (Plate I) yielded only labelled phosphate when treated with alkaline phosphatase and must be mononucleoside diphosphates. In a separate experiment, 3H-labelled fd DNA labelled with [3H]thymidine was prepared, depurinated along with 32P-labelled fd DNA, and fractionated as in Plate I. Nucleotide 1 contained both 32P and 3H label, while nucleotide 2 contained only 32P. This confirms that nucleotide 1 is pTp and nucleotide 2 is pCp. If the chromatography step separates the nucleotides according to size, then it may be predicted that a descending row of oligonucleotides, e.g. nucleotides 2,4, 7, 11,16, etc. of Plate I, would progressively increase in size by one base each. Since nucleotide 2 (pCp) contains one base, nucleotides 4, 7, 11 and 16 may be predicted to be two, three, four and five bases long, respectively. From Table 1, it is seen that the calculated sizes of nucleotides 4, 7, 11 and 16, using the formula described above, were respectively 2.0, 3.0, 3.9 and 4.6 bases in length; agreeing well with those predicted and confirming that the pyrimidine oligonucleotides are separated according to size in the chromatography step. It may be noted from Table 1 that as the oligonucleotides got progressively larger, the phosphatase method of size calculation tended to underestimate the size and this may be due to some breakdown of the larger oligonucleotides during isolation. In general, it appears that the size of an oligonucleotide may be accurately predicted by its position relative to other nucleotides on the thin-layer plate. Nucleotide 43 remained near the origin during the chromatography step and its position was too isolated on the thin-layer system to deduce accurately its size. Petersen $ Reeves (1969) had reported previously the presence of a large pyrimidine tract of about 19 bases long in the DNA of the closely related phage fl. When this large pyrimidine tract was isolated from fl DNA and co-fractionated on the thin-layer system with nucleotide 43, both nucleotides occupied the same position on the thinlayer plate. Hence, nucleotide 43 is probably close to 19 bases long. A more definitive method for estimating the size of nucleotide 43 will be described in the section dealing with the sequencing of this nucleotide.

V.LING

92

TABLET Characterization

Nucleotide

of the fractionated pyrimidine Size : number of residues calculated sfter phosphatase treatment

Size : number of residues expeoted from position on thin-layer plate 1 1

1 2 3 4 6 6 7 8 9

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 26 26 27 28 29 30 31 32 33 34 36 30 37 38 39 40 41 42 43

: 2-o 2.0 2.0 2.8 3.0 2.8 2-9 3.6 3-9 3.6 3.9 3.8 4.6 4.6 4.9 4.9 4.4 6.2 5.3 5.7 6.3

N.A. 6.0 6.2

N.A. N.A. N.A. N.A. N.A.

N.A. N.A. N.A. N.A. N.A. N.A. N.A. N.A. N.A. N.A. N.A. N.A. t No internal phosphate N.A., not anelyaed.

products in Plate I

2 2 2 3 3 3 3 4 4 4 4 4 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 8 8 8 8 8 8 9 9 9

10 10 11 11 20

was found,

see text.

Composition

PYRIMIDINE

SEQUENCES

FROM

fd

DNA

93

(ii) Composition The composition of the pyrimidine oligonucleotides from Plate I was determined by isolating the dephosphorylated form of the oligonucleotides on DEAR-paper as described above, and after eluting from the paper (Sanger et al., 1965) they were subjected to complete digestion with snake venom phosphodiesterase. Digestion was performed with 1 pg enzyme in 10 ~1. of buffer 0.1 M-TriseHCl (pH 8*9), 0.01 M-MgCl, at 37°C for 3 hr and the products were separated by electrophoresis (3000 V for 1 hr) at pH 3.5 on Whatman 540 paper. pC and pT were well separated on this system and the relative amount of each base present in an oligonuoleotide was determined by cutting out the appropriate areas from the paper and determining their radioactivity. As seen in Table 1, nucleotides 3, 6, 10 and 15 contained only T after complete digestion with snake venom phosphodiesterase and since they are oligonuoleotides of two, three, four and five bases in length, they must be T,, T,, T, and T,, respectively. Similarly, nucleotides 5, 9 and 14 yielded only C and are, respectively, Ca C,, and Cd. Nucleotide 4 contained equal amounts of C and T and since it is a dinucleotide, it must have a composition of (C, T). Similarly, trinucleotides 7 and 8 contained ratios of C:T of 1:2 and 2: 1, respectively, and they must have compositions of (C, T,) and (C,, T). Analysis of the rest of the nucleotides showed that the separation of oligonucleotides by eleotrophoresis in the first dimension of this thin-layer system is based on the composition of the nucleotide. The nucleotides rich in T migrate faster than those rich in C. It may be seen that the position of any nucleotide on this fractionation system is determined by its size and composition. A diagram based on the nucleotide pattern of Plate I may be drawn in the form of a grid to illustrate this relationship (Fig. 1). Nucleotides containing the same number of C residues are joined by solid lines and those with the same number of T residues with broken lines. Figure 1 shows that the solid lines joining nucleotides of similar C content lie in approximately parallel rows; similarly, the broken lines joining the T residues also lie in rows perpendicular to the C rows. The composition and size of any nucleotide can be predicted by determining on which T row or C row it lies. For example, nucleotide 32 of Plate I lies at the intersection of row C3 and row T, hence this nucleotide may be predicted to have a composition of (C,, T5). Other nucleotides can be similarly deduced. Nucleotide 43 was considerably larger than the other nucleotides and was too isolated to determine its composition by its relation to the other nucleotides. Complete digestion of this nucleotide with snake venom phosphodiesterase yielded 40% C and 60% T. (b) kquencee of lurge py&midine

oligonwleotides

Partial digestions with spleen and snake venom phosphodiesterase have been used extensively for sequencing RNA (Sanger et al., 1965; Min Jou & Fiers, 1969; Barre& 1971). These enzymes have also been used for determining short sequences in DNA (Szekely & Sanger, 1969; Murray, 1970; Southern, 1970) and are used in this report for determinin g the sequences of large pyrimidine oligonuoleotides in fd. The general method of sequencing nucleic acid with these exonucleases is to subject an oligonucleotide to digestion with one of these enzymes under conditions in which the nucleotide is only partially degraded. Under ideal conditions, it is possible to obtain a mixture of products differing by one residue and ranging from unchanged

V.

94

a

LING

pH 3.5 -

Fro. 1. Diagram based on the pattern of pyrimidine oligonucleotides in Plate relation between the composition of an oligonucleotide and its position on the The position of each nucleotide is represented by l . Solid lines join nucleotides of and broken lines join nucleotides of similar T content. Details are described in

I, illustrating the thin-layer plate. similar C content the text.

starting material to material completely degraded to mononucleotides. The products are then fractionated and each partially degraded oligonucleotide analysed to determine what residues have been removed by the action of the enzyme. Since the enzyme attacks from one end of the oligonucleotide removing one residue at a time, analysis of progressively smaller products allows the construction of the sequence of the oligonucleotide from the end of enzyme attack. The partial digestion with snake venom phosphodiesterase usually yields sequences at the 3’ end, and with spleen the 5’ end of an oligonuoleotide. (i) Molar yields of the large pyrimidine tracts In fd DNA there are two pyrimidine tracts of 11 basesin length (nucleotides 41 and 42 of Plate I) ; these nucleotides are likely to be present in one molar quantity per mole of fd DNA and should have unique sequences.The basis for this assumption is that the probability of a PurPyr,,Pur sequencewhich would give rise to a pyrimidine undecanucleotide is 0-513= 1.2 x 10- 4 (assuming 50% pyrimidine content) and such a sequencewould be expected to occur only once in a molecule the size of fd DNA containing about 6000 nucleotides. To calculate the relative molar yields of the large pyrimidine tracts in fd DNA, nucleotides 36 to 43 of Plate I were eluted and their radioactivity determined by scintillation counting. Assigning a value of one mole for nucleotide 42, nucleotides 36, 37, 38, 39, 40, 41 and 43 contained 1.9, 2.0, 0.94, O-98, 1.1 and O-4, moles, respectively, relative to nuoleotide 42. This suggests that nucleotides 36 and 37 each contained two sequenceswhile the rest (38 to 43) contained unique sequences.The molar yield of nucleotide 43 is considered to be

PYRIMIDINE

SEQUENCES

FROM

fd

DNA

95

unity although the calculated value from its radioactivity was 0.4. This low value is probably due to the tendency of this large nucleotide to streak in the electrophoresis step of the fractionation procedure and not all of the nucleotide was recovered. (ii) Xequence of nudeotide

42 (C,,T,)

As already mentioned, the mobility of pyrimidine oligonucleotides in the electrophoresis step of the fractionation procedure is dependent on the relative C and T content, while mobility in the chromatography step is dependent on size. It may be observed in Plate I that the positions on the thin-layer plate of any two pyrimidine oligonucleotides of compositions (C, _l,T,) and (C,,T,- 1) relative to nucleotide (C,,T,) (where m and n are integral numbers) may be expressed as “one above to the left ” and “one above to the right “, respectively. Nucleotide (C,- l,T,,,) migrates faster than (C,,T,) by one residue in the chromatography step, hence “one above”, and since it contains one C residue lessthan (C,,T,) migrates faster in the electrophoresis step, hence “to the left”. Similarly, nucleotide (C&T,,,-J containing one T residue lessthan (C,,T,) migrates faster in the chromatography step but slower in the electrophoresis step than (C,,T,) and it may be said to occupy a position relative to (C,,T,) of “one above to the right”. For example, in Plate I, nucleotide 16 (CT,) and 17 (C,,T,) occupy positions relative to nucleotide 21 (C,,T,) of “one above to the left” and “one above to the right”, respectively. Plate II(a) shows the radioautograph of the products of nucleotide 42 partially degraded with snake venom phosphodiesterase and separated on the thin-layer system of Plate I. Seven oligonucleotides are observed, nucleotides a to g. Nucleotide a, being the largest, must be the undegraded form of nucleotide 42 with a composition of (C,, T,). When nucleotide a losesone residue from its 3’ end by the action of the venom enzyme, it could lose either pC or pT depending on its sequence. Hence, nucleotide b of Plate II(a), being one residue smaller than nucleotide a, must have a composition of either (C,, T6) or (C,, T5). Since nucleotide b clearly occupiesa position on the thin-layer plate relative to nucleotide a of “one above to the left” it must differ from a by one C residue, and must have a composition of (C,, T,). Therefore, nucleotide 42 has a C residue at its 3’ end. By similar reasoning, nucleotide c of Plate II(a) differs from nucleotide b by a C residue and the dinucleotide sequence at the 3’ end of 42 is deduced to be C-C. In general, it is possibleto follow from one partially degraded product to the next and decide whether a C or a T residuehad been removed by the action of the enzyme. This may be expresseddiagrammatically for Plate II(a) as:

where the conversion of one nucleotide to another by the lossof one residue is represented by an arrow and the nature of the residue removed, i.e. either a pC or a pT, is placed above the arrow. The sequenceat the 3’-end of nucleotide 42 may be constructed then as 5’-----C-C-T-C-C-C-3’. It may also be noted that the composition of nucleotide g of Plate II(a) is T, and the sequenceat the 5’ end of nucleotide 42 may be predicted to contain five T’s in a row. Plate II(b) shows the radioautograph of the products resulting from the partial

96

V.

LING

digestion of nucleotide 42 with spleenphosphodiesterasefractionated on the thin-layer system as in Plate II(a). The spleen enzyme digest yielded six products, nucleotides a to f, and sincethe position of each partially degraded product relative to the nucleotide one residue larger was of “one above to the right”, i.e.

the sequence at the 5’ end of nucleotide 42 as predicted was constructed to be #-T-T-T-T-T _____ 3’. To confirm the deduction of these sequencesat the 3’ and 5’ ends of nucleotide 42, the partial produats from Plate II(a) and (b) were further characterized by highvoltage electrophoresis on DEAE-paper at pH 1.9. Figure 2 shows the relative mobility of various marker dephosphorylated pyrimidine oligonucleotides run on DEAE-paper at pH l-9. The mobility of any T containing nucleotide in this system is largely regulated by its T content; that is an increaseor decreaseofaone T residue in an oligonucleotide greatly affeoted its mobility whereas a C residue has relatively little effect. Therefore this system is primarily useful for estimating the number of T residues in an oligonucleotide. Nucleotides containing six T residues or more, however, do not move off the origin and this system is not suitable for distinguishing

+Tz)--

(C,T) (C&T1 b&T)

-K,P-=$j-=3---

>(C,T3) =(C2,T$ C3,T3 ---CC*)--

K&T51 Origin

c2*T5

FIG. 2. Diagram illustrating the relative mobility of various dephosphorylated oligonucleotides on DEAE-paper. Electrophoresis was at pH 1.9 at 1200 V for dye masker, B, under these conditions migmted about 30 cm from the origin.

pyrimidine 14 hr. The blue

PYRIMIDINE

SEQUENCES

FROM

fd

DNA

97

nucleotides of greater than six T residues. It may be noted also that there is a separation of some isomers in this system, thus for example, dephosphorylated nucleotides of compositions (C,, TJ and (C, T3) are fractionated as two and three spots respectively. This had been noted previously by Szekely & Sanger (1969). When nucleotides a to d of Plate II(a) were eluted and characterized on this system, each of the nucleotides remained at the origin and must still contain six T residues, the same number as undegraded nucleotide 42 (C,, T,) : therefore, nucleotides a to d of Plate II(a) could only differ by C residues. Nucleotide e of Plate II(a) migrated on DEAE-paper in the position of nucleotides containing five T residues and hence nuoleotide e is smaller than nucleotide cl by one T residue. The positions of nucleotides f and g of Plate II(a) on DEAE-paper indicated that these nucleotides still contained five T residues and their conversion from nucleotide e could only involve C residues. Similar analysis on DEAE-paper of nucleotides a to f of Plate II(b) indicated that a contained six T residues, b five T, c four T, d three T, e two T and f one T residue, and confirmed that in Plate II(b) the conversion of each nucleotide to one smaller involved a T residue. Hence the results obtained on DEAE-paper confirmed the deduced sequences at the 3’ and 5’ end of nucleotide 42. Since the two deduced sequences of nucleotide 42 accounted for all the nuoleotides in 42, the complete sequence of nucleotide 42 may be written and is : 5’ - T - T - T - T - T C - C-T -C -C - C - 3’. This sequence is completely compatible with the composition of nucleotide 42 of (C,, T,). It should be noted also that only one unique sequence was obtained and this supports the initial prediction that nucleotide 42 was non-isomeric and occurred in fd DNA in one molar yield. The analysis of partial products on DEAE-paper appears to be rapid and complementary to the thin-layer system and it was routinely used to confirm sequence deductions of the other large pyrimidine nucleotides. (iii) Sequence of nucleotide 36 (C,, T,) As already mentioned, both nucleotides 36 and 37 of Plate I occurred in two molar yields relative to nucleotide 42. It was of interest to discover whether the two sequences presumed to be present in nucleotides 36 and 37 were different or identical. F’urther, it was important to discover whether the techniques so far discussed were adequate for sequencing a mixture of two oligonucleotides with the same base composition. Plate III(a) show the radioautograph of the thin-layer separation of products from a partial digestion of nucleotide 36 with snake venom phosphodiesterase. It is clear that one fundamental difference between the pattern of the partial products of nucleotide 36 from that of nucleotide 42 (Plate II) is that in 36 the removal of one residue from nucleotide b of Plate III(a) yielded two products, c and c’, and the diagrammatic representation of Plate III(a) contained a branch point, viz.:

98

V. LING

Such a branch point indicates the presence of more than one sequence and suggests that the two sequences present in nucleotide 36 are different. The two sequences deduced for the 3’ end of nucleotide 36 are : 5’_____ ((J-T-T-T-T-3’

and

5’_-___ (T)-T-T-C-T-3’.

As can be seen from Plate III(a), nucleotide b is derived from a, and c and c’ from b. Nucleotide d (C,, T4) can only be derived from c (Cc, T5) since it could not be converted from c’ (C, T,) by the lossof one residue. Nucleotide d’ (C, Ts), however, can be derived from either c or c’, or both. The fact that the quantitation of nucleotide 36 indicated the presenceof only two sequencesprecludes the derivation of nucleotide d’ from both c and c’ in this particular case. One sequenceof nucleotide 36 is already accounted for by the above observation that nucleotide d is derived from c; the second sequence then must follow from the derivation of nucleotide d’ from c’. Similar reasoning yields the derivation of nucleotide e from d and e’ from d’. The derivation of nucleotide f’ is ambiguous since it can be derived from either e or e’ or both. To resolve this ambiguity, nucleotides e (C,, Ts) and e’ (C, T4) were eluted from the thin-layer plate and subjected to complete digestion with spleenphosphodiesterase to determine the nucleotides at their 3’ termini. Since nucleotides e and e’ are completely dephosphorylated, a complete digestion of these nucleotides with spleen phosphodiesterasewould release their 3’ ends as mononucleosideswhich would not be detected, and the 3’ ends of e and e’ may be thus identified as the nucleotides ‘rmissing” from the known compositions of e and e’. By this means, the 3’ ends of nucleotide e was found to be C, and e’ T, and it was concluded that nucleotide f’ is derived from both e and e’. The partial spleen digestion of nucleotide 36 also yielded two sequencesat the 5’ end (Plate III(b)) and the deduction of these sequencesis represented diagrammatically as:

where the derivation of nucleotide e’ may be ambiguous. Examination of the 5’ ends of nucleotides d and d’ by complete digestion of these nucleotides with snake venom phosphodiesterasesuggeststhat nucleotide d contains a C residue and d’ a T residue at the 5’ end and that nucleotide e’ could be derived from both d and d’. The deduced sequencesat the 5’ end of 36 then are : 5’-T-T-T-C

_____ 3’ and 5’-T-C-T-T

_____ 3’.

With the information of the sequencesaheady deduced at the 3’ and 5’ ends of nucleotide 36, it was still not possibleunambiguously to assignone set of two unique sequencesto nucleotide 36 sincetwo are possible.If one assignsT - C - T-T - T - T - T -C - T as one sequenceof nucleotide 36, then the other sequencein the set must be T-T-T C-C-T-T-T-T. If these two sequencesconstitute set I, then set II contains two

PYRIMIDINE

SEQUENCES

FROM

fd

DNA

99

sequences of T-C-T-T-C-T-T-T-T and T-T-T-C-T-T-T-C-T. To decide which set was the correct one, nucleotides c and c’ (Plate III(b)), which were derived from nucleotides containing %-sequences of 5’-T-T-T-C-----3’ and 5’-T-C-T-T-----3’, respectively, were eluted from the thin-layer plate and subjected to partial digestion with snake venom phosphodiesterase to determine their sequences at the 3’ end. Nucleotide c contains a sequence of 5’ - - - - - T-C-T-3’,andnucleotidec’5’-----T-T-3’, at the 3’ end. This confirms set II as the two correct sequences for nucleotide 36. Nucleotide 37 also contained two unique sequences of the same base composition but it was possible in this case to assign unambiguously one set of sequences with the information obtained from the initial partial digests. (iv) Sequence of nucleotide 43 (C,, T,,) Initial partial digestions of nucleotide 43 with spleen and snake venom phosphodiesterase under the conditions already described yielded only a limited number of partial products which did not give enough information to deduce completely the sequence of this nucleotide. A more severe condition of digestion was indicated and Plate IV(a) shows the partial products of nucleotide 43 produced by digestion with snake venom phosphodiesterase for 40 minutes, twice the usual length of time, at 37°C. Sixteen products (nucleotides a to p) were obtained and a sequence of 15 nucleotides at the 3’ end of nucleotide 43 was deduced to be: 5’_____ T-T-C-C-C-T-T-C-C-T-T-T-C-T-C-3’. The partial spleen digest of nucleotide 43 under the usual conditions yielded nine products, nucleotides a to i of Plate IV(b), and a sequence of eight residues could be deduced from the 5’ end to be 5’-C-T-T-T-C-T-T-C-----3’. It is seen that this sequence appears to overlap the sequence at the 3’ end by three nucleotides, T-T-C, denoted by a solid underline. Combining these two sequences, nucleotide 43 may be constructed as a sequence of 20 base residues in length. Since the size of nucleotide 43 had not been precisely determined, it was not possible to decide whether the sequence of T-T-C was a true overlap or just a fortuitous sequence. To estimate better the size of nucleotide 43, partial products of 43 were spotted in a row at one end of a thin-layer plate and subjected to chromatography along with various known marker dephosphorylated pyrimidine nucleotides. Plate V shows that nucleotides h, i and j, partial products of nucleotide 43 (Plate IV(a)), migrated the same distance as marker nucleotides M6, M7, and both Ml and M8, respectively, and must be oligonucleotides of 13, 12 and 11 bases in length, since this chromatography procedure separates pyrimidine oligonucleotides according to size. If nucleotide h of Plate IV(a) contains thirteen bases, then by counting along the subsequently larger partial products in Plate IV(a), it is seen that nucleotide a (nucleotide 43), the starting material, is twenty bases in length. This suggests that the sequence deduced at the 3’ end of nucleotide 43 does overlap the deduced sequence at the 5’ end and the complete sequence of nucleotide 43 may be written as : 5’-C-T-T-T-C-T-T-C-C-C-T-T-C-C-T-T-T-C-T-C-3’. (v) Sequences of nucleotides 37 to 41 The sequences of nucleotides 37 to 41 presented no special difficulty and they were deduced using the methods already described. Table 2 shows the sequences of nucleotides 36 to 43. The sequences deduced by partial snake venom digests are indicated

100

V.

LING

TABLE Nucleotide 36 37 38 39 40 41 42 43

2

Composition

F&J’,) (C3CL3)

P&‘J&) VXJA) KWkJ (C3lW

G,‘J.‘d F&,TII)

Sequence T.C-T-T-C-T-T-T-T;T-T-T-C-T-T-T-C-T -----T-T-C-C-T-T-T-C-T;T-T-T-C-C-T-T-C-T -s-4 ---C-T-T-C-C-T-C-T-T ---T-T-T-T-T-C-C-T-T-T ----T-C-C-T-T-C-T-C-T-T ----C-C-T-T-T-T-T-T-T-T-C T-T-~-~-~-C-C-T-C-C-C ----C-T-T-T-C-T-T-C-C-C-T-T-C-C-T-T-T-C-T-C -----_--

Sequences of the large pyrimidine oligonucleotides in fd DNA. A solid underline denotes the sequence deduced by a partial digestion with snake venom phosphodiesterase, and a broken underline digestion with spleen phosphodiesterase.

by solid underlines and those by partial spleen digests by broken underlines. Sequences not underlined were deduced from knowledge of the size and composition of the oligonucleotide.

4. Discussion Previously, the separation of pyrimidine oligonucleotides from small phage DNA has been performed on either ion-exchange columns (Cernjr et al., 1969; Hall & Sinsheimer, 1963) or by two-dimensional electrophoresis on DEAE-paper (Szekely & Sanger, 1969; Murray, 1970; Southern, 1970). The column fractionation of pyrimidine oligonucleotides gave good resolution of nucleotides with chain-lengths of up to 19 residues (Petersen & Reeves, 1969) but it was time-consuming and laborious since additional chromatography at a different pH was required to resolve further each pyrimidine isoplith into oligonucleotides of different compositions. The two-dimensional eleotrophoresis of pyrimidine oligonucleotides on DEAE-paper was rapid and gave good resolution of smaller oligonucleotides. However, large oligonucleotides remained at the origin during the second dimension and were not resolved. For example, the largest pyrimidine oligonucleotides from fd DNA identi6ed on DEAEpaper were octamers of compositions (C,, T4) and (C,, Ts) (Szekely & Sanger, 1969). The fractionation procedure of Brownlee & Sanger (1969) on thin layer has the advantage of both being rapid and being able to resolve pyrimidine oligonucleotides of different compositions up to 20 bases in length. It has the further advantage in that the pyrimidine nucleotides are fractionated into a very regular pattern (Plate I) such that the size and composition of a nucleotide may be predicted by its position on the thin-layer plate. This fractionation procedure, however, is not capable of separating isomers of the same composition and most nucleotide spots (Plate I) probably contain several isomeric sequences (Szekely & Sanger, 1969). The exceptions are the lerge oligonuoleotides (nine bases or longer) which occur in molar yields and their pattern should be characteristic of the phage DNA. This fraotionation method thus could serve as a fIngerprinting procedure for identifying different phages. Preliminary experiments comp&ring the depurination fingerprints of the DNA from the closely related phases

PYRIMIDINE

SEQUENCES

FROM

fd DNA

101

fd, fl and Ml3 show that these phages gave similar patterns of the large pyrimidine tracts with some differences while the fingerprint of the DNA from the unrelated phage $X174 was quite different. This will be the subject of a further communication. One of the purposes of this investigation was to attempt to work toward, at least in part, a general method for sequencing DNA with the enzymes and techniques currently available. Since the Burton depurination of DNA is very specific for cleaving DNA it should prove generally useful in sequenceanalysis. Thus a general method for sequencing the pyrimidine products from a Burton depurination of fd DNA is described. This method can easily be used to sequencepyrimidine oligonucleotides of ten residuesin length and even a mixture of two sequenceswithin a nucleotide can be resolved. Very much larger oligonucleotides such as nucleotide 43 (Plate I) containing 20 residuesrequire some changesin the enzyme digestion conditions which could be determined empirically. While the sequencing procedure described here is somewhat limited in general applicability, being restricted to pyrimidine oligonucleotides, it seemspossible that under favourable conditions this method would be capable of sequencing a doublestranded DNA fragment of moderate length. One requirement would be that the two strands of this fragment could be separated for individual analysis of their pyrimidine sequences.Thus, for example, Southern (1970) was able to deduce a short sequencein the satellite DNA in guinea pig. No striking homologies can be generally detected in the sequencesof the large pyrimidine oligonucleotides presented in Table 2 except for nucleotides 39 (C,, T,) and 42 (C,, T,). These nucleotides share a common sequenceof eight residues in length at the 5’ end, i.e. 5’-T-T-T-T-T-C-C-T---3’. It is not known whether this sequenceis sign&ant for the function of this phage. One possibility is that the repetition of this sequencemay have arisen originally from someduplication of the phage DNA. Since the amino-acid sequence of the major coat protein is known for fd (Asbeck et al., 1969), it is possibleto write a degenerate DNA sequencefor the coat protein gene and to decide whether any large pyrimidine oligonucleotides could be derived from it. It appears that the largest polypyrimidine sequencethat could arise from the coat protein gene would only be eight residuesin length, and this precludes any of the large pyrimidine sequencesfrom Table 2 as part of the coat protein gene. Of particular interest is the discovery of the very large polypyrimidine tract (nucleotide 43, Plate I) of 20 basesin length in fd DNA. Petersen t Reeves (1969) have also noted the presenceof a polypyrimidine tract of similar size in the DNA of the related phage fl. In a DNA molecule of the size of fd, a polypyrimidine tract of this size is unexpected from probability considerations and it is possiblethis sequence of polypyrimidine may serve some special function in the phage DNA. Szybalski, Kubinski $ Sheldrick (1966) have suggested,for exmple, that pyrimidine clusters in DNA may represent the physical starting points for the transcription of an operon. I thank Dr F. Sanger, in whoselaboratory this work was carried out, for stimulating discussions and for his advice in the preparation of this manuscript. I also thank Dr H. D. Robertson for helpful discussions and Mr A. R. Coulson for skilled assistance in some of the experiments. I am supported by a fellowship from the Medical Research Council of Canada.

102

V. LINU

REFERENCES Adams, J. M., Jeppesen, P. G. N., Sanger, F. & Barrell, B. G. (1969). Nature, 223, 1009. Asbeck, F., Beyreuther, K., Kiihler, H., von Wettstein, G. & Braumitzer, G. (1969). Hoppe-Seyler’e 2. Phyeiol. Chem. 350, 1047. Barrell, B. G. (1971). In Procedures in Nucleic Acid Research, ed. by G. L. Cantoni & D. R. Davis. New York: Harper & Row, in the press. Bretscher, M. (1969). J. Mol. BioZ. 42, 595. Brownlee, G. G. & Sanger, F. (1969). Europ. J. Biochem. 11, 395. Brownlee, G. G., Sanger, F. 8: Barrell, B. G. (1968). J. Mol. BioZ. 34, 379. Burton, K. (1967). In Methods in EnzyrnoZogy, ed. by L. Grossamn & K. Moldave, vol. 12, part A, p. 222. New York & London: Academic Press. Burton, K. & Petersen, G. B. (1960). Biochem. J. 75, 17. cerny, R., Cerna, E. & Spencer, J. H. (1969). J. Mol. BioZ. 46, 145. Hall, J. B. & Sinsheimer, R. L. (1963). J. Mol. BioZ. 6, 115. Jeppesen, P. G. N. (1971). Biochem. J. 124, 357. Laskowski, M. (1966). In Procedures in Nucleic Acid Research, ed. by G. L. Cantoni & D. R. Davis, p. 154. New York, Evanston & London: Harper & Row. Levine, M. & Borthwick, M. (1963). ViroZogy, 21, 668. Marvin, D. A. & Hoffman-Berling, H. (1963). Nature, 197, 517. Marvin, D. A. & Hohn, B. (1969). Bact. Rev. 33, 172. Min Jou, W. & Fiers, W. (1969). J. Mol. Biol. 40, 187. Murray, K. (1970). Biochem. J. 118, 831. Petersen, G. B. & Reeves, J. M. (1969). Biochim. biophys. Acta, 179, 510. Razzell, W. E. t Khorana, H. G. (1959). J. BioZ. Chem. 234, 2114. Razzell, W. E. & Khorana, H. G. (1961). J. BioE. Chem. 236, 1144. Sanger, F., Brownlee, G. G. BEBarrell, B. G. (1965). J. Mol. BioZ. 13, 373. Southern, E. M. (1970). Nature, 227, 794. Szekely, M. & Sanger, F. (1969). J. Mol. BioZ. 43, 607. Szybalski, W., Kubinski, H. & Sheldrick, P. (1966). Cold Spr. Harb. Symp. Quant. Biol. 31, 123.